• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 199
  • Last Modified:

Help to optimize a simple algorithm

Hi all,

I have a matrix S[N1][N2] and I need to compute a new matriz C[N2][N2] so with elements:
                     

C[i][j] = 1/(N1-1)*sum( (S[k][i]-P[i]) * (S[k][j] – P[j]) ) where sum runs from k=0 .. N1

and P[i] = 1/N1*sum(k,i)  where k: 0 ..N1

I wrote the following function to:

void CoVar(float**C, float ** S, int N2, int N1)
{
      float* Y = new float[N2];
      for (i=0; i<N2; i++)
            Y[i] =Y[i]/N1;
      
      float** S, **C; //Matrices already initializedinitialized
      for (i=0; i < N2; i++)
      {
            for (j=0; j<N2; j++)
            {
                  double dSum = 0.0;
                  for (k=0; k<N1; k++)
                  {
                        double d1 = S[k][i] - Y[i];
                        double d2 = S[k][j] - Y[j];
                        dSum += d1 * d2;
                  }
                  dSum = dSum/(N1 - 1.0);
                  C[i][j] = dSum;
            }      
      }
      
      delete[] Y;
}


This works fine, however when N2 and N1 are large (e.g. 2048), the algorithm is quite slow. The computational complexity in terms of the number of ploating point operations scales with N2*N2*N1, yet I read somewhere that it should be N2*N2*N1/2.
My question is the following: Is there a way to speed this algorithm up?
I would really appreciate any optimization advice to make this algorithm to work faster.
Thanks,
elito
0
elito
Asked:
elito
  • 3
  • 2
  • 2
  • +3
4 Solutions
 
elitoAuthor Commented:
OK, I found that as C[][] will be a symmetric matrix, I can introduce the following modifications:

for (i=0; i < N2; i++)
     {
          for (j=i; j<N2; j++)   // ->Changed this
          {
               double dSum = 0.0;
               for (k=0; k<N1; k++)
               {
                    double d1 = S[k][i] - Y[i];
                    double d2 = S[k][j] - Y[j];
                    dSum += d1 * d2;
               }
               dSum = dSum/(N1 - 1.0);
               C[i][j] = C[j][i] = dSum;  //Changed this
          }    
     }

Now the number of operation is proportional to N2*N2*N1/2. this makes the program 50% faster.
More ideas?
0
 
Jaime OlivaresSoftware ArchitectCommented:
Try with:

double dSum;

for (i=0; i < N2; i++)
     {
          for (j=i; j<N2; j++) {
               dSum = 0.0;
               for (k=0; k<N1; k++)
                    dSum += (S[k][i] - Y[i]) * (d2 = S[k][j] - Y[j]);
               C[i][j] = C[j][i] = dSum/(N1 - 1.0);
          }    
     }

Also if you use float instead of double you will get faster.
0
 
nagravesCommented:
Do a time analysis on all three of those, and you'll find they are all in fact going to be done in n^3 time. As far as the algorithm goes, all three of your answers will be equally fast.
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
stefan73Commented:
Hi nagraves,
Their order of magnitude is the same, not their runtime.

Cheers!

Stefan
0
 
Jaime OlivaresSoftware ArchitectCommented:
Sorry, there is a mistake in my code, must be:

for (i=0; i < N2; i++)
     {
          for (j=i; j<N2; j++) {
               dSum = 0.0;
               for (k=0; k<N1; k++)
                    dSum += (S[k][i] - Y[i]) * (S[k][j] - Y[j]);
               C[i][j] = C[j][i] = dSum/(N1 - 1.0);
          }    
     }

My version doesn't reduce iteration times, but reduces computation time by sparing some variables.
0
 
stefan73Commented:
A simple speedup would be to create an SP[N1][N2] matrix first, each element containing S[k][i]-Y[i].
Y[i] and Y[j] are constant throughout the innermost loop, anyway.

0
 
stefan73Commented:
Like:

for (i=0; i < N2; i++){
   for (k=0; k<N1; k++){
        SP[k][i] = S[k][i] - Y[i];
   }
}
   

for (i=0; i < N2; i++)
     {
          for (j=i; j<N2; j++) {
               dSum = 0.0;
               for (k=0; k<N1; k++)
                    dSum += SP[k][i] * SP[k][j];
               C[i][j] = C[j][i] = dSum/(N1 - 1.0);
          }    
     }
0
 
nagravesCommented:
To measure the efficiency of the algorithm, you take the basic operation and then determine how many times it will occur.

An iterative algorithm (say a for loop from 0 to 'n') will always be in 'n' time. If you have one nested loop, that will be n*n, or n^2 time.  If you have a third nested loop, it in turn will be n^3 power . This holds true for n^x where 0 <= x.

If you have 100 versions of the same algorithm, and they are all n^3 time, they are all equally efficient. It does not matter if you "sparing some variables." or not. n^3 is n^3 is n^3. :)

About making your algorithm any more efficient, until you either drop the iterations (which isn't going to happen unless). It could happen if you rethink the way of solving your problem and finding a way to do what you need without 3 for loops.
0
 
grg99Commented:
You have triply-nested loops, so you're kinda stuck with N^3 behavior.

You could  do a few little things that maybe the compiler didnt optimize already:

Instead of dividing by N1, multiply by a (precomputed) 1/N1.

Factor some of the invariants out of the inner loops, like Y[i] and Y[j].

Step through the arrays with a pointer instead of the double subscripts, something like:

double * Ski;  double * skj;

Ski = &S[k][i];  Skj = &S[k][j];
dSum = 0.0;
               for (k=0; k<N1; k++, Ski += N1, Skj += N1)
                    dSum +=  (* Ski)  *   (* Skj);





0
 
MikeAThonCommented:
If your S and/or P matrices are sparse, then you might yield significant time savings by testing for a zero value before multiplying
0

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

  • 3
  • 2
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now