SQL – Correlation Calculation

Regrettably SQL Server does not come with a built in Correlation calculation function, but any data analyst with self respect has written a correlation function (alternative we can move the data to R or similar tools).

Below, we are sharing the correlation function that NilssonHedge is using.

Create FUNCTION [dbo].[ProgramCorrelation]
(
	@programId1 int,
	@programId2 int
)
RETURNS decimal(8,4)
AS
BEGIN
	
	DECLARE @count int
	DECLARE @sumxy float
	DECLARE @sumx float
	DECLARE @sumx2 float
	
	DECLARE @sumy float
	DECLARE @sumy2 float
	

	DECLARE  @temp TABLE (
		date	date,
		ror1	numeric(8,6),
		ror2	numeric(8,6)
	)	
	
	-- Create temp table with ror data to speed up calculations	
	INSERT INTO @temp 
		SELECT A.date, A.ror, B.ror
		FROM dbo.ManagerRoRAUM A
		INNER JOIN dbo.ManagerRoRAUM B on A.date=b.date
		WHERE A.id= @programId1 AND B.id=@programId2
	-- temp variables
	SELECT @count =  COUNT(*) FROM @temp
	SELECT @sumxy = sum(ror1*ror2) FROM @temp
	SELECT @sumx = sum(ror1) FROM @temp
	SELECT @sumy = sum(ror2) FROM @temp
	SELECT @sumx2 = sum(ror1*ror1) FROM @temp
	SELECT @sumy2 = sum(ror2*ror2) FROM @temp
	-- Pearson correlation formula
	if @count <12
		RETURN -98
	IF (SQRT( (@count*@sumx2 -@sumx*@sumx)*(@count*@sumy2 -@sumy*@sumy)) <> 0) 
			RETURN (@count*@sumxy-@sumx*@sumy)/(SQRT( (@count*@sumx2 -@sumx*@sumx)*(@count*@sumy2 -@sumy*@sumy)))
	RETURN -99 -- Catch divide by zero
END

The function is pretty straightforward. It fetches synced data from a table and then it applies the Pearson correlation formula.

%d bloggers like this: