elito
asked on
Help to optimize a simple algorithm
Hi all,
I have a matrix S[N1][N2] and I need to compute a new matriz C[N2][N2] so with elements:
C[i][j] = 1/(N1-1)*sum( (S[k][i]-P[i]) * (S[k][j] – P[j]) ) where sum runs from k=0 .. N1
and P[i] = 1/N1*sum(k,i) where k: 0 ..N1
I wrote the following function to:
void CoVar(float**C, float ** S, int N2, int N1)
{
float* Y = new float[N2];
for (i=0; i<N2; i++)
Y[i] =Y[i]/N1;
float** S, **C; //Matrices already initializedinitialized
for (i=0; i < N2; i++)
{
for (j=0; j<N2; j++)
{
double dSum = 0.0;
for (k=0; k<N1; k++)
{
double d1 = S[k][i] - Y[i];
double d2 = S[k][j] - Y[j];
dSum += d1 * d2;
}
dSum = dSum/(N1 - 1.0);
C[i][j] = dSum;
}
}
delete[] Y;
}
This works fine, however when N2 and N1 are large (e.g. 2048), the algorithm is quite slow. The computational complexity in terms of the number of ploating point operations scales with N2*N2*N1, yet I read somewhere that it should be N2*N2*N1/2.
My question is the following: Is there a way to speed this algorithm up?
I would really appreciate any optimization advice to make this algorithm to work faster.
Thanks,
elito
I have a matrix S[N1][N2] and I need to compute a new matriz C[N2][N2] so with elements:
C[i][j] = 1/(N1-1)*sum( (S[k][i]-P[i]) * (S[k][j] – P[j]) ) where sum runs from k=0 .. N1
and P[i] = 1/N1*sum(k,i) where k: 0 ..N1
I wrote the following function to:
void CoVar(float**C, float ** S, int N2, int N1)
{
float* Y = new float[N2];
for (i=0; i<N2; i++)
Y[i] =Y[i]/N1;
float** S, **C; //Matrices already initializedinitialized
for (i=0; i < N2; i++)
{
for (j=0; j<N2; j++)
{
double dSum = 0.0;
for (k=0; k<N1; k++)
{
double d1 = S[k][i] - Y[i];
double d2 = S[k][j] - Y[j];
dSum += d1 * d2;
}
dSum = dSum/(N1 - 1.0);
C[i][j] = dSum;
}
}
delete[] Y;
}
This works fine, however when N2 and N1 are large (e.g. 2048), the algorithm is quite slow. The computational complexity in terms of the number of ploating point operations scales with N2*N2*N1, yet I read somewhere that it should be N2*N2*N1/2.
My question is the following: Is there a way to speed this algorithm up?
I would really appreciate any optimization advice to make this algorithm to work faster.
Thanks,
elito
Try with:
double dSum;
for (i=0; i < N2; i++)
{
for (j=i; j<N2; j++) {
dSum = 0.0;
for (k=0; k<N1; k++)
dSum += (S[k][i] - Y[i]) * (d2 = S[k][j] - Y[j]);
C[i][j] = C[j][i] = dSum/(N1 - 1.0);
}
}
Also if you use float instead of double you will get faster.
double dSum;
for (i=0; i < N2; i++)
{
for (j=i; j<N2; j++) {
dSum = 0.0;
for (k=0; k<N1; k++)
dSum += (S[k][i] - Y[i]) * (d2 = S[k][j] - Y[j]);
C[i][j] = C[j][i] = dSum/(N1 - 1.0);
}
}
Also if you use float instead of double you will get faster.
Do a time analysis on all three of those, and you'll find they are all in fact going to be done in n^3 time. As far as the algorithm goes, all three of your answers will be equally fast.
Hi nagraves,
Their order of magnitude is the same, not their runtime.
Cheers!
Stefan
Their order of magnitude is the same, not their runtime.
Cheers!
Stefan
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
A simple speedup would be to create an SP[N1][N2] matrix first, each element containing S[k][i]-Y[i].
Y[i] and Y[j] are constant throughout the innermost loop, anyway.
Y[i] and Y[j] are constant throughout the innermost loop, anyway.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If your S and/or P matrices are sparse, then you might yield significant time savings by testing for a zero value before multiplying
ASKER
for (i=0; i < N2; i++)
{
for (j=i; j<N2; j++) // ->Changed this
{
double dSum = 0.0;
for (k=0; k<N1; k++)
{
double d1 = S[k][i] - Y[i];
double d2 = S[k][j] - Y[j];
dSum += d1 * d2;
}
dSum = dSum/(N1 - 1.0);
C[i][j] = C[j][i] = dSum; //Changed this
}
}
Now the number of operation is proportional to N2*N2*N1/2. this makes the program 50% faster.
More ideas?