Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Code optimization

Posted on 2004-10-12
21
Medium Priority
?
252 Views
Last Modified: 2010-05-18
Dear all,

I have been trying to optimize the following code in assembly or sse/mmx.
Can anyone help?
THanks

With Beat Regards
HCK


typedef struct PslRbpFeature {
      int noOfBlocks;  
      int* points;          
      int* specs;  
        int** sumPtr;
      double** sqSumPtr;
      float invArea;

      int refWeight;
      int id;  
      int winWidth;
      int winHeight;
 
      //precalculation
      float normStFactor;
}PslRbpFeature;



int evalRbpFeature(
   PslRbpFeature* rb,
   int offset,
   float* strength
){
      *strength=0.0f;

      float strength_plus=0.0f;
      float strength_minus=0.0f;
      int count_plus=0;
      int count_minus=0;

      int** sumPtr0=rb->sumPtr ;
      int** sumPtr1=sumPtr0+1;
      int** sumPtr2=sumPtr0+2;
      int** sumPtr3=sumPtr0+3;
      int* p0 = *sumPtr0 +offset;
      int* p1 = *sumPtr1 +offset;
      int* p2 = *sumPtr2 +offset;
      int* p3 = *sumPtr3 +offset;
      int sRef =   *p0  -  *p1   -  *p2  +  *p3 ;

      int N=rb->noOfBlocks-1;
      strength_plus=0.0;//sRef;
      count_plus=N;
      sRef/=N;  

      int key=0;      
      for(int n=0; n<N; n++){
      
             sumPtr0 +=4;
             sumPtr1 +=4;
             sumPtr2 +=4;
             sumPtr3 +=4;
             p0=*sumPtr0 +offset;
             p1=*sumPtr1 +offset;
             p2=*sumPtr2 +offset;
               p3=*sumPtr3 +offset;
             int s = *p0 -  *p1   -  *p2  +  *p3 ;
             s -= sRef;
             if(s<0) {
                  strength_minus += s;
                  count_minus++;
                  key |=   1  << n ;
             }
             
      }
      if(!key) return key;
 
        strength_plus  -= strength_minus;
      count_plus -= count_minus;
      
      *strength=strength_plus/count_plus-strength_minus/count_minus;
      *strength *= rb->normStFactor;
      return key;
}
0
Comment
Question by:hengck23
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 7
  • 6
21 Comments
 
LVL 46

Assisted Solution

by:Kent Olsen
Kent Olsen earned 750 total points
ID: 12294592
Hi hengck23,

Several pointers.

The C compiler optimizers generally do a very good job of generating fast object code.  It's unlikely that you're going to beat this by more than just a few percent.

Indirect references chew up cycles.  Worse, they chew up cycles waiting for memory which are generally among the slowest instructions.  (There is a lot of difference in the number of instructions that AMD 850 and an AMD 2800 can execute in a fixed time interval as long as the operands are within the stack.  Reference memory and you'll find that they all wait at the same speed.)

Don't create stack variables within the function.  Every pass through the for() loop you modify the stack parameters by adding and deleting variables.  This chews up more cycles.  Create them as function variables instead of stack variables.


Good Luck!
Kent
0
 
LVL 16

Accepted Solution

by:
PaulCaswell earned 750 total points
ID: 12295672
I have a few suggestions:

1. Use as few local variables as possible. Loads of extra parameters stops the compiler using registers efficiently.

Try to remove all those sumPtrs and just use one and [].
Try to remove the p0/1/2/3 stuff. If you can get it down to offsets from sumPtrs even better.

2. Try to avoid floating point maths, especially inside loops.

You've got
>>strength_minus += s;

in the loop and the only reason its a float is because 'strength' is a float. If you make 'strength_minus' an int you can work out the maths at the end.

3. Break the process up into smaller parts. Often, the overhead of a function call is insignificant compared to the extra speed the compiler can gain by using just registers.

I'd suggest you understand exactly what the code is doing. It looks like it is running through an array of values and whenever a bit of maths generates a negative result you keep track of it. You then generate a final result called strength using some simple maths.

Try coding the loop in a separate function using as few variables as possible, then perform the maths.

4. Evaluate the current speed of the function before you start optimising. This will give you a good indication of when to stop and teach you what works best in your environment.

Paul


0
 

Author Comment

by:hengck23
ID: 12295969
Dear all,

Based on the comments, I have modified my code to the below. There is an improvement of speed from 1.03 sec to 0.93 sec. However, it is still not fast enough. Is there anymore suggestions? Thanks.


static float pslReciprocalTable[] = {
      0.0f,
      1.0f/1.0f,             1.0f/2.0f,             1.0f/3.0f,
      1.0f/4.0f,             1.0f/5.0f,             1.0f/6.0f,
      1.0f/7.0f,             1.0f/8.0f,             1.0f/9.0f,
      1.0f/10.0f,       1.0f/11.0f,       1.0f/12.0f,
};

int evalRbpFeature2(
   PslRbpFeature* rb,
   int offset,
   float* strength
){
      *strength=0.0f;
      //..............................................................
 
      float strength_minus=0.0f;
      int count_minus=0;

      int** sumPtr=rb->sumPtr;
       int sRef =    sumPtr[0][offset]
                   -  sumPtr[1][offset]  
                     -  sumPtr[2][offset]  
                     +  sumPtr[3][offset];
      
 
    int N=rb->noOfBlocks-1;       
    sRef/=N;       

      int key=0;
      int s;
      for(int n=0; n<N; n++){
             sumPtr +=4;
           s =    sumPtr[0][offset]
                   -  sumPtr[1][offset]  
                   -  sumPtr[2][offset]  
                   +  sumPtr[3][offset];
             s -= sRef;
             if(s<0) {
                  strength_minus += s;
                  count_minus++;
                  key |=   1  << n ;
             }
             
      }
      if(!key) return key;
 
      *strength = strength_minus*(pslReciprocalTable[N-count_minus]+pslReciprocalTable[count_minus]);
    *strength *= rb->normStFactor;
      return key;
      
}
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 16

Assisted Solution

by:PaulCaswell
PaulCaswell earned 750 total points
ID: 12296000
What range of values does offset and rb->noOfBlocks have? This might give us an idea of what is slowing it down.

Paul
0
 
LVL 16

Assisted Solution

by:PaulCaswell
PaulCaswell earned 750 total points
ID: 12296014
I assume you didnt make 'strenght_minus' an integer for a good reason such as it can go out of range.

What is the idea behind 'key'?

Paul
0
 

Author Comment

by:hengck23
ID: 12296015
Sorry, there is a slight typing error in my last comment. It should have been:

     int  strength_minus=0;
     int count_minus=0;
0
 

Author Comment

by:hengck23
ID: 12296034
rb->noOfBlocks should be between 3 to 9
(for future code, i might increase that to 25).

The function evalRbpFeature2() will be called many times, with different PslRbpFeature* rb and int offset. The output of the function is key and strength.

Offset actually is an image postition. If (x,y) is a point in the iomage, then offset = x+y*width

'strenght_minus' would be unlikely to go  out of range. The largest  value it can get is -20*20*255.

0
 
LVL 16

Assisted Solution

by:PaulCaswell
PaulCaswell earned 750 total points
ID: 12296054
This may be a silly one but see if you get any time improvement with:

        //    sumPtr +=4;
        s =    sumPtr[4*n][offset]
                -  sumPtr[4*n+1][offset]  
                -  sumPtr[4*n+2][offset]  
                +  sumPtr[4*n+3][offset];


Would it be possible to reorganise sumPtr so you can use:

        s =    sumPtr[offset][4*n]
                -  sumPtr[offset][4*n+1]  
                -  sumPtr[offset][4*n+2]  
                +  sumPtr[offset][4*n+3];

this would ensure that the four memory accesses are sequential in memory and would therefor increase the chance of cache hits.

Paul

0
 

Author Comment

by:hengck23
ID: 12296093
>Would it be possible to reorganise sumPtr so you can use:

        s =    sumPtr[offset][4*n]
                -  sumPtr[offset][4*n+1]  
                -  sumPtr[offset][4*n+2]  
                +  sumPtr[offset][4*n+3];

I think it would be difficult.

There is an image data:
int* data;

sumPtr is actually a list of pointers that points to the key postitions in data, when the application point is(0,0) .eg)
sumPtr[0]=&data[ .... coordinte(0,0) ...]
sumPtr[1]=&data[ .... coordinte(5,6) ...]
sumPtr[2]=&data[ .... coordinte(8,9) ...]
....

When I change the application point to (x,y), it would be
new_sumPtr[0]=&data[ .... coordinte(0+x,0+y) ...] =sumPtr[0][offset]
new_sumPtr[1]=&data[ .... coordinte(5+x,6+y) ...] =sumPtr[1][offset]
new_sumPtr[2]=&data[ .... coordinte(8+x,9+y) ...] =sumPtr[2][offset]
0
 
LVL 16

Assisted Solution

by:PaulCaswell
PaulCaswell earned 750 total points
ID: 12296216
>>Offset actually is an image postition. If (x,y) is a point in the iomage, then offset = x+y*width
Is there an outer loop then that's calling this function width*eight times? If so, you might shave a little off that.

Try to track your timings further into the code. Take a time at the start of the function and record how long it takes to get to the end of the loop and then how long the float maths takes. This would allow us to focus on the slowest part.

Is there any pre-calculation you can do while you are building the image? Perhaps you can evaluate 's' and 'key' elswhere and store it in the structure.

Are you sure you have switched on the most appropriate optimisations?

We are getting near the limit here. Have you got a specific figure in mind?

Paul
0
 
LVL 46

Assisted Solution

by:Kent Olsen
Kent Olsen earned 750 total points
ID: 12297130
Hi hengck23,

Since rb->noOfBlocks can be between 3 and 25, you might get a pretty good performance boost by gathering the sumPtr data into a "local" array and basing your calculations on the local array.  If necessary, you can always scatter the changed data in the array back via the pointers in sumPtr, but you don't seem to modify the contents of the array(s).  (I used to work on vector systems where the scatter/gather functions were built into the hardware.  A truly nice feature for this kind of work!)  

This should generate a significant performance increase that only gets better as rb->noOfBlocks increases.  (You'll pay the "penalty" of the indirect reference only once, when you build the temporary array.  Thereafter, you require only one reference to get to the data.)

One thing in the code that really concerns me is the definition of sumPtr.  It's defined as a int**, but you commonly reference it as a two-dimensional array.  This is certainly within the rules of C, but I've never seen it done.  Are you sure that you're evaluating the data that you expect?



Kent
0
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 12297359
Kent,

Perhaps you arent as old as you used to be ;).

int** used to be the only reliable way of defining two-dimensional data because early C compilers had problems with int*[] and int[]. You will see this in the definition of the argv parameter to main in older compilers. This was especially true when both the row address array and the rows themselves are created using malloc. Nowadays, [] is handled much better and is even allowed to appear in structs in some compilers so long as it is at the end.

Strangely, I cant find any evidence to back me up on this.

Paul

0
 
LVL 46

Assisted Solution

by:Kent Olsen
Kent Olsen earned 750 total points
ID: 12297432

Hi Paul,

The int** designation is quite common, but I've never seen anything defined as int** dereferenced as array[x][y].  The classic rules for two dimensional arrays state that the upper limit of *y* must be known so that the offset can be calculated whenever x is non-zero.

In the usage above, I assume that sizeof() the *y* dimension is simply sizeof (int*).  If so, the gather technique that I described should work quite well.

>  Strangely, I cant find any evidence to back me up on this.

My wife says this to me a lot, too.  ;)

0
 

Author Comment

by:hengck23
ID: 12297601
Hi,

int** sumPtr is an arrays of pointers.
The actual data is int* data, which is not shown hewre in the code.
(Although an image is two-dimensional, I am treating it as a one demensional array in int* data)

Each element of sumPtr is actually a pointer that points to int* data.

The code is correct. It works the way I want. This is actually an object detector in image.It is a car detector.
0
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 12297656
Ahhh!! I see where youre going! How arrogant of me to assume it was something simple.

>>sumPtr[0][offset]

You're right, it does look a bit odd. Perhaps it would be clearer coding to use:
>>(sumPtr[0])[offset]

>>My wife says this to me a lot, too.  ;)
Consider yourself lucky. If my wife had no evidence to back her up she'd prove she didnt need any first. ;)

Paul
0
 

Author Comment

by:hengck23
ID: 12297845
(sumPtr[0])[offset] is correct and this is what i am doing!

As an example, cosider:

raw image:
x x x x x
x x x x x
x x x x x

Assume that some kind of processing (eg filtering), i end up with the following data:

2 3 5 6 8
7 8 3 7 0
1 2 0 9 0

Then int* data will be
data[0]=2;
data[1]=3;
data[2]=5;
data[3]=6;
data[4]=8;
data[5]=7;
....
int* data will be fixed.

However, i will need to extract useful information (feature) of data.
This would be in the form of
feature = data[point0] - data[point1]-data[point2]+data[point3];

point0,point1,point2,point3 actually defines a rectangle (we call RECT), within a window.

Assuming
point0=(0,0)
point1=(2,0)
point2=(0,1)
point3=(2,1)

then,
 sumPtr[0]=&data[0] //point0
 sumPtr[1]=&data[2] //point1
 sumPtr[2]=&data[0+1*width] //point2
 sumPtr[3]=&data[2+1*width] //point3

(with offset=0)
feature =sumPtr[0][0]-sumPtr[1][0]-sumPtr[0][0]+sumPtr[2][0];
           =2-5-7+3;

Now imagine RECT is shifted by one unit to the right, then
new_point0=(0+1,0)
new_point1=(2+1,0)
new_point2=(0+1,1)
new_point3=(2+1,1)

(with offset=1)
new_feature =sumPtr[0][1]-sumPtr[1][1]-sumPtr[0][1]+sumPtr[2][1];
           =3-6-8+7;
0
 
LVL 46

Assisted Solution

by:Kent Olsen
Kent Olsen earned 750 total points
ID: 12297855


>sumPtr is actually a list of pointers that points to the key postitions in data, when the application point is(0,0) .eg)
>sumPtr[0]=&data[ .... coordinte(0,0) ...]
>sumPtr[1]=&data[ .... coordinte(5,6) ...]
>sumPtr[2]=&data[ .... coordinte(8,9) ...]

The statement is inaccurate.  sumPtr (defined as int**) is a list of pointer to pointers to the key positions in the data.

Because *offset* doesn't change within the function, you should be able to gather the target data into an int* array.  You pay the double reference penalty only once.  As *N* increases, the savings should become more pronounced.

int *newPtr;

  newPtr = malloc (sizeof (int*) * n->noOfBlocks);
  for (idx = 0; idx < n->noOfBlocks; idx++)
    newPtr[idx] = sumPtr[idx][offset];

/*  perform calculations with newPtr[idx] instead of sumptr[idx][offset]  */

  free (newPtr);



Kent
0
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 12297952
I'd see that as a very worthwhile method to try. You've got the timer to check the results. I wouldnt use malloc/free however, I'd use a static array because noOfBlocks is only likely to go up to about 9.

Paul
0
 
LVL 46

Assisted Solution

by:Kent Olsen
Kent Olsen earned 750 total points
ID: 12298093

True.  I'd thought about putting the newPtr array on the stack, but I think that you're right about it needing to be in static memory.  Since the stated limit of N is 25, the buffer is significantly less than 1K.  Well worth the "wasted" storage for the reduced references.


Kent
0
 

Author Comment

by:hengck23
ID: 12304576
Hi,

Thank you for the responses. I will try to create a static memory and let you know the results maybe tomorrow.
0
 
LVL 46

Expert Comment

by:Kent Olsen
ID: 12304855
Hi hengck23,

If you're going to create a static array, its type should be 'int'.

Kent
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you thought about creating an iPhone application (app), but didn't even know where to get started? Here's how: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Important pre-programming comments: I’ve never tri…
An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question