Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Help to optimize a simple algorithm

Posted on 2004-10-21
13
Medium Priority
?
197 Views
Last Modified: 2010-04-01
Hi all,

I have a matrix S[N1][N2] and I need to compute a new matriz C[N2][N2] so with elements:
                     

C[i][j] = 1/(N1-1)*sum( (S[k][i]-P[i]) * (S[k][j] – P[j]) ) where sum runs from k=0 .. N1

and P[i] = 1/N1*sum(k,i)  where k: 0 ..N1

I wrote the following function to:

void CoVar(float**C, float ** S, int N2, int N1)
{
      float* Y = new float[N2];
      for (i=0; i<N2; i++)
            Y[i] =Y[i]/N1;
      
      float** S, **C; //Matrices already initializedinitialized
      for (i=0; i < N2; i++)
      {
            for (j=0; j<N2; j++)
            {
                  double dSum = 0.0;
                  for (k=0; k<N1; k++)
                  {
                        double d1 = S[k][i] - Y[i];
                        double d2 = S[k][j] - Y[j];
                        dSum += d1 * d2;
                  }
                  dSum = dSum/(N1 - 1.0);
                  C[i][j] = dSum;
            }      
      }
      
      delete[] Y;
}


This works fine, however when N2 and N1 are large (e.g. 2048), the algorithm is quite slow. The computational complexity in terms of the number of ploating point operations scales with N2*N2*N1, yet I read somewhere that it should be N2*N2*N1/2.
My question is the following: Is there a way to speed this algorithm up?
I would really appreciate any optimization advice to make this algorithm to work faster.
Thanks,
elito
0
Comment
Question by:elito
  • 3
  • 2
  • 2
  • +3
13 Comments
 

Author Comment

by:elito
ID: 12376832
OK, I found that as C[][] will be a symmetric matrix, I can introduce the following modifications:

for (i=0; i < N2; i++)
     {
          for (j=i; j<N2; j++)   // ->Changed this
          {
               double dSum = 0.0;
               for (k=0; k<N1; k++)
               {
                    double d1 = S[k][i] - Y[i];
                    double d2 = S[k][j] - Y[j];
                    dSum += d1 * d2;
               }
               dSum = dSum/(N1 - 1.0);
               C[i][j] = C[j][i] = dSum;  //Changed this
          }    
     }

Now the number of operation is proportional to N2*N2*N1/2. this makes the program 50% faster.
More ideas?
0
 
LVL 55

Expert Comment

by:Jaime Olivares
ID: 12376921
Try with:

double dSum;

for (i=0; i < N2; i++)
     {
          for (j=i; j<N2; j++) {
               dSum = 0.0;
               for (k=0; k<N1; k++)
                    dSum += (S[k][i] - Y[i]) * (d2 = S[k][j] - Y[j]);
               C[i][j] = C[j][i] = dSum/(N1 - 1.0);
          }    
     }

Also if you use float instead of double you will get faster.
0
 
LVL 4

Expert Comment

by:nagraves
ID: 12377042
Do a time analysis on all three of those, and you'll find they are all in fact going to be done in n^3 time. As far as the algorithm goes, all three of your answers will be equally fast.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 12

Expert Comment

by:stefan73
ID: 12382136
Hi nagraves,
Their order of magnitude is the same, not their runtime.

Cheers!

Stefan
0
 
LVL 55

Accepted Solution

by:
Jaime Olivares earned 256 total points
ID: 12382232
Sorry, there is a mistake in my code, must be:

for (i=0; i < N2; i++)
     {
          for (j=i; j<N2; j++) {
               dSum = 0.0;
               for (k=0; k<N1; k++)
                    dSum += (S[k][i] - Y[i]) * (S[k][j] - Y[j]);
               C[i][j] = C[j][i] = dSum/(N1 - 1.0);
          }    
     }

My version doesn't reduce iteration times, but reduces computation time by sparing some variables.
0
 
LVL 12

Expert Comment

by:stefan73
ID: 12382516
A simple speedup would be to create an SP[N1][N2] matrix first, each element containing S[k][i]-Y[i].
Y[i] and Y[j] are constant throughout the innermost loop, anyway.

0
 
LVL 12

Assisted Solution

by:stefan73
stefan73 earned 248 total points
ID: 12382591
Like:

for (i=0; i < N2; i++){
   for (k=0; k<N1; k++){
        SP[k][i] = S[k][i] - Y[i];
   }
}
   

for (i=0; i < N2; i++)
     {
          for (j=i; j<N2; j++) {
               dSum = 0.0;
               for (k=0; k<N1; k++)
                    dSum += SP[k][i] * SP[k][j];
               C[i][j] = C[j][i] = dSum/(N1 - 1.0);
          }    
     }
0
 
LVL 4

Assisted Solution

by:nagraves
nagraves earned 248 total points
ID: 12384769
To measure the efficiency of the algorithm, you take the basic operation and then determine how many times it will occur.

An iterative algorithm (say a for loop from 0 to 'n') will always be in 'n' time. If you have one nested loop, that will be n*n, or n^2 time.  If you have a third nested loop, it in turn will be n^3 power . This holds true for n^x where 0 <= x.

If you have 100 versions of the same algorithm, and they are all n^3 time, they are all equally efficient. It does not matter if you "sparing some variables." or not. n^3 is n^3 is n^3. :)

About making your algorithm any more efficient, until you either drop the iterations (which isn't going to happen unless). It could happen if you rethink the way of solving your problem and finding a way to do what you need without 3 for loops.
0
 
LVL 22

Assisted Solution

by:grg99
grg99 earned 248 total points
ID: 12385960
You have triply-nested loops, so you're kinda stuck with N^3 behavior.

You could  do a few little things that maybe the compiler didnt optimize already:

Instead of dividing by N1, multiply by a (precomputed) 1/N1.

Factor some of the invariants out of the inner loops, like Y[i] and Y[j].

Step through the arrays with a pointer instead of the double subscripts, something like:

double * Ski;  double * skj;

Ski = &S[k][i];  Skj = &S[k][j];
dSum = 0.0;
               for (k=0; k<N1; k++, Ski += N1, Skj += N1)
                    dSum +=  (* Ski)  *   (* Skj);





0
 
LVL 2

Expert Comment

by:MikeAThon
ID: 12387366
If your S and/or P matrices are sparse, then you might yield significant time savings by testing for a zero value before multiplying
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

  Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and …
Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question