I have a matrix S[N1][N2] and I need to compute a new matriz C[N2][N2] so with elements:
C[i][j] = 1/(N1-1)*sum( (S[k][i]-P[i]) * (S[k][j] – P[j]) ) where sum runs from k=0 .. N1
and P[i] = 1/N1*sum(k,i) where k: 0 ..N1
I wrote the following function to:
void CoVar(float**C, float ** S, int N2, int N1)
float* Y = new float[N2];
for (i=0; i<N2; i++)
float** S, **C; //Matrices already initializedinitialized
for (i=0; i < N2; i++)
for (j=0; j<N2; j++)
double dSum = 0.0;
for (k=0; k<N1; k++)
double d1 = S[k][i] - Y[i];
double d2 = S[k][j] - Y[j];
dSum += d1 * d2;
dSum = dSum/(N1 - 1.0);
C[i][j] = dSum;
This works fine, however when N2 and N1 are large (e.g. 2048), the algorithm is quite slow. The computational complexity in terms of the number of ploating point operations scales with N2*N2*N1, yet I read somewhere that it should be N2*N2*N1/2.
My question is the following: Is there a way to speed this algorithm up?
I would really appreciate any optimization advice to make this algorithm to work faster.