Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Hi all,

I have a matrix S[N1][N2] and I need to compute a new matriz C[N2][N2] so with elements:

C[i][j] = 1/(N1-1)*sum( (S[k][i]-P[i]) * (S[k][j] – P[j]) ) where sum runs from k=0 .. N1

and P[i] = 1/N1*sum(k,i) where k: 0 ..N1

I wrote the following function to:

void CoVar(float**C, float ** S, int N2, int N1)

{

float* Y = new float[N2];

for (i=0; i<N2; i++)

Y[i] =Y[i]/N1;

float** S, **C; //Matrices already initializedinitialized

for (i=0; i < N2; i++)

{

for (j=0; j<N2; j++)

{

double dSum = 0.0;

for (k=0; k<N1; k++)

{

double d1 = S[k][i] - Y[i];

double d2 = S[k][j] - Y[j];

dSum += d1 * d2;

}

dSum = dSum/(N1 - 1.0);

C[i][j] = dSum;

}

}

delete[] Y;

}

This works fine, however when N2 and N1 are large (e.g. 2048), the algorithm is quite slow. The computational complexity in terms of the number of ploating point operations scales with N2*N2*N1, yet I read somewhere that it should be N2*N2*N1/2.

My question is the following: Is there a way to speed this algorithm up?

I would really appreciate any optimization advice to make this algorithm to work faster.

Thanks,

elito

I have a matrix S[N1][N2] and I need to compute a new matriz C[N2][N2] so with elements:

C[i][j] = 1/(N1-1)*sum( (S[k][i]-P[i]) * (S[k][j] – P[j]) ) where sum runs from k=0 .. N1

and P[i] = 1/N1*sum(k,i) where k: 0 ..N1

I wrote the following function to:

void CoVar(float**C, float ** S, int N2, int N1)

{

float* Y = new float[N2];

for (i=0; i<N2; i++)

Y[i] =Y[i]/N1;

float** S, **C; //Matrices already initializedinitialized

for (i=0; i < N2; i++)

{

for (j=0; j<N2; j++)

{

double dSum = 0.0;

for (k=0; k<N1; k++)

{

double d1 = S[k][i] - Y[i];

double d2 = S[k][j] - Y[j];

dSum += d1 * d2;

}

dSum = dSum/(N1 - 1.0);

C[i][j] = dSum;

}

}

delete[] Y;

}

This works fine, however when N2 and N1 are large (e.g. 2048), the algorithm is quite slow. The computational complexity in terms of the number of ploating point operations scales with N2*N2*N1, yet I read somewhere that it should be N2*N2*N1/2.

My question is the following: Is there a way to speed this algorithm up?

I would really appreciate any optimization advice to make this algorithm to work faster.

Thanks,

elito

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get every solution instantly with premium.
Start your 7-day free trial.

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

double dSum;

for (i=0; i < N2; i++)

{

for (j=i; j<N2; j++) {

dSum = 0.0;

for (k=0; k<N1; k++)

dSum += (S[k][i] - Y[i]) * (d2 = S[k][j] - Y[j]);

C[i][j] = C[j][i] = dSum/(N1 - 1.0);

}

}

Also if you use float instead of double you will get faster.

for (i=0; i < N2; i++)

{

for (j=i; j<N2; j++) {

dSum = 0.0;

for (k=0; k<N1; k++)

dSum += (S[k][i] - Y[i]) * (S[k][j] - Y[j]);

C[i][j] = C[j][i] = dSum/(N1 - 1.0);

}

}

My version doesn't reduce iteration times, but reduces computation time by sparing some variables.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trialY[i] and Y[j] are constant throughout the innermost loop, anyway.

for (i=0; i < N2; i++){

for (k=0; k<N1; k++){

SP[k][i] = S[k][i] - Y[i];

}

}

for (i=0; i < N2; i++)

{

for (j=i; j<N2; j++) {

dSum = 0.0;

for (k=0; k<N1; k++)

dSum += SP[k][i] * SP[k][j];

C[i][j] = C[j][i] = dSum/(N1 - 1.0);

}

}

An iterative algorithm (say a for loop from 0 to 'n') will always be in 'n' time. If you have one nested loop, that will be n*n, or n^2 time. If you have a third nested loop, it in turn will be n^3 power . This holds true for n^x where 0 <= x.

If you have 100 versions of the same algorithm, and they are all n^3 time, they are all equally efficient. It does not matter if you "sparing some variables." or not. n^3 is n^3 is n^3. :)

About making your algorithm any more efficient, until you either drop the iterations (which isn't going to happen unless). It could happen if you rethink the way of solving your problem and finding a way to do what you need without 3 for loops.

You could do a few little things that maybe the compiler didnt optimize already:

Instead of dividing by N1, multiply by a (precomputed) 1/N1.

Factor some of the invariants out of the inner loops, like Y[i] and Y[j].

Step through the arrays with a pointer instead of the double subscripts, something like:

double * Ski; double * skj;

Ski = &S[k][i]; Skj = &S[k][j];

dSum = 0.0;

for (k=0; k<N1; k++, Ski += N1, Skj += N1)

dSum += (* Ski) * (* Skj);

C++

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get every solution instantly with premium.
Start your 7-day free trial.

for (i=0; i < N2; i++)

{

for (j=i; j<N2; j++) // ->Changed this

{

double dSum = 0.0;

for (k=0; k<N1; k++)

{

double d1 = S[k][i] - Y[i];

double d2 = S[k][j] - Y[j];

dSum += d1 * d2;

}

dSum = dSum/(N1 - 1.0);

C[i][j] = C[j][i] = dSum; //Changed this

}

}

Now the number of operation is proportional to N2*N2*N1/2. this makes the program 50% faster.

More ideas?