Hi all,
I have a matrix S[N1][N2] and I need to compute a new matriz C[N2][N2] so with elements:
C[i][j] = 1/(N1-1)*sum( (S[k][i]-P[i]) * (S[k][j] – P[j]) ) where sum runs from k=0 .. N1
and P[i] = 1/N1*sum(k,i) where k: 0 ..N1
I wrote the following function to:
void CoVar(float**C, float ** S, int N2, int N1)
{
float* Y = new float[N2];
for (i=0; i<N2; i++)
Y[i] =Y[i]/N1;
float** S, **C; //Matrices already initializedinitialized
for (i=0; i < N2; i++)
{
for (j=0; j<N2; j++)
{
double dSum = 0.0;
for (k=0; k<N1; k++)
{
double d1 = S[k][i] - Y[i];
double d2 = S[k][j] - Y[j];
dSum += d1 * d2;
}
dSum = dSum/(N1 - 1.0);
C[i][j] = dSum;
}
}
delete[] Y;
}
This works fine, however when N2 and N1 are large (e.g. 2048), the algorithm is quite slow. The computational complexity in terms of the number of ploating point operations scales with N2*N2*N1, yet I read somewhere that it should be N2*N2*N1/2.
My question is the following: Is there a way to speed this algorithm up?
I would really appreciate any optimization advice to make this algorithm to work faster.
Thanks,
elito
for (i=0; i < N2; i++)
{
for (j=i; j<N2; j++) // ->Changed this
{
double dSum = 0.0;
for (k=0; k<N1; k++)
{
double d1 = S[k][i] - Y[i];
double d2 = S[k][j] - Y[j];
dSum += d1 * d2;
}
dSum = dSum/(N1 - 1.0);
C[i][j] = C[j][i] = dSum; //Changed this
}
}
Now the number of operation is proportional to N2*N2*N1/2. this makes the program 50% faster.
More ideas?