Regrettably SQL Server does not come with a built in Correlation calculation function, but any data analyst with self respect has written a correlation function (alternative we can move the data to R or similar tools).
Below, we are sharing the correlation function that NilssonHedge is using. The function below is free to reuse, if you find it useful. This function is one of the tools we use to identify diverging or converging programs. We use a number of other quality checking tools, but correlation is a quick way to identify highly correlated return streams.
Create FUNCTION [dbo].[ProgramCorrelation] ( @programId1 int, @programId2 int ) RETURNS decimal(8,4) AS BEGIN DECLARE @count int DECLARE @sumxy float DECLARE @sumx float DECLARE @sumx2 float DECLARE @sumy float DECLARE @sumy2 float DECLARE @temp TABLE ( date date, ror1 numeric(8,6), ror2 numeric(8,6) ) -- Create temp table with ror data to speed up calculations INSERT INTO @temp SELECT A.date, A.ror, B.ror FROM dbo.ManagerRoRAUM A INNER JOIN dbo.ManagerRoRAUM B on A.date=b.date WHERE A.id= @programId1 AND B.id=@programId2 -- temp variables SELECT @count = COUNT(*) FROM @temp SELECT @sumxy = sum(ror1*ror2) FROM @temp SELECT @sumx = sum(ror1) FROM @temp SELECT @sumy = sum(ror2) FROM @temp SELECT @sumx2 = sum(ror1*ror1) FROM @temp SELECT @sumy2 = sum(ror2*ror2) FROM @temp -- Pearson correlation formula if @count <12 RETURN -98 IF (SQRT( (@count*@sumx2 -@sumx*@sumx)*(@count*@sumy2 -@sumy*@sumy)) <> 0) RETURN (@count*@sumxy-@sumx*@sumy)/(SQRT( (@count*@sumx2 -@sumx*@sumx)*(@count*@sumy2 -@sumy*@sumy))) RETURN -99 -- Catch divide by zero END
The function is pretty straightforward. It fetches synced data from a table and then it applies the Pearson correlation formula and produces a correlation number. Note that if there are fewer than twelve months of data, the function returns an error.
It uses a temporary ror table to select the returns from the database. This works better than to select two return series from a large database table.