Skip to content

SQL – Correlation Calculation

Regrettably SQL Server does not come with a built in Correlation calculation function, but any data analyst with self respect has written a correlation function (alternative we can move the data to R or similar tools).

Below, we are sharing the correlation function that NilssonHedge is using. The function below is free to reuse, if you find it useful. This function is one of the tools we use to identify diverging or converging programs. We use a number of other quality checking tools, but correlation is a quick way to identify highly correlated return streams.

Create FUNCTION [dbo].[ProgramCorrelation]
	@programId1 int,
	@programId2 int
RETURNS decimal(8,4)
	DECLARE @count int
	DECLARE @sumxy float
	DECLARE @sumx float
	DECLARE @sumx2 float
	DECLARE @sumy float
	DECLARE @sumy2 float

		date	date,
		ror1	numeric(8,6),
		ror2	numeric(8,6)
	-- Create temp table with ror data to speed up calculations	
		SELECT, A.ror, B.ror
		FROM dbo.ManagerRoRAUM A
		INNER JOIN dbo.ManagerRoRAUM B on
		WHERE @programId1 AND
	-- temp variables
	SELECT @count =  COUNT(*) FROM @temp
	SELECT @sumxy = sum(ror1*ror2) FROM @temp
	SELECT @sumx = sum(ror1) FROM @temp
	SELECT @sumy = sum(ror2) FROM @temp
	SELECT @sumx2 = sum(ror1*ror1) FROM @temp
	SELECT @sumy2 = sum(ror2*ror2) FROM @temp
	-- Pearson correlation formula
	if @count <12
		RETURN -98
	IF (SQRT( (@count*@sumx2 -@sumx*@sumx)*(@count*@sumy2 -@sumy*@sumy)) <> 0) 
			RETURN (@count*@sumxy-@sumx*@sumy)/(SQRT( (@count*@sumx2 -@sumx*@sumx)*(@count*@sumy2 -@sumy*@sumy)))
	RETURN -99 -- Catch divide by zero

The function is pretty straightforward. It fetches synced data from a table and then it applies the Pearson correlation formula and produces a correlation number. Note that if there are fewer than twelve months of data, the function returns an error.

It uses a temporary ror table to select the returns from the database. This works better than to select two return series from a large database table.