Link to home
Start Free TrialLog in
Avatar of hengck23
hengck23

asked on

Code optimization

Dear all,

I have been trying to optimize the following code in assembly or sse/mmx.
Can anyone help?
THanks

With Beat Regards
HCK


typedef struct PslRbpFeature {
      int noOfBlocks;  
      int* points;          
      int* specs;  
        int** sumPtr;
      double** sqSumPtr;
      float invArea;

      int refWeight;
      int id;  
      int winWidth;
      int winHeight;
 
      //precalculation
      float normStFactor;
}PslRbpFeature;



int evalRbpFeature(
   PslRbpFeature* rb,
   int offset,
   float* strength
){
      *strength=0.0f;

      float strength_plus=0.0f;
      float strength_minus=0.0f;
      int count_plus=0;
      int count_minus=0;

      int** sumPtr0=rb->sumPtr ;
      int** sumPtr1=sumPtr0+1;
      int** sumPtr2=sumPtr0+2;
      int** sumPtr3=sumPtr0+3;
      int* p0 = *sumPtr0 +offset;
      int* p1 = *sumPtr1 +offset;
      int* p2 = *sumPtr2 +offset;
      int* p3 = *sumPtr3 +offset;
      int sRef =   *p0  -  *p1   -  *p2  +  *p3 ;

      int N=rb->noOfBlocks-1;
      strength_plus=0.0;//sRef;
      count_plus=N;
      sRef/=N;  

      int key=0;      
      for(int n=0; n<N; n++){
      
             sumPtr0 +=4;
             sumPtr1 +=4;
             sumPtr2 +=4;
             sumPtr3 +=4;
             p0=*sumPtr0 +offset;
             p1=*sumPtr1 +offset;
             p2=*sumPtr2 +offset;
               p3=*sumPtr3 +offset;
             int s = *p0 -  *p1   -  *p2  +  *p3 ;
             s -= sRef;
             if(s<0) {
                  strength_minus += s;
                  count_minus++;
                  key |=   1  << n ;
             }
             
      }
      if(!key) return key;
 
        strength_plus  -= strength_minus;
      count_plus -= count_minus;
      
      *strength=strength_plus/count_plus-strength_minus/count_minus;
      *strength *= rb->normStFactor;
      return key;
}
SOLUTION
Avatar of Kent Olsen
Kent Olsen
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of hengck23
hengck23

ASKER

Dear all,

Based on the comments, I have modified my code to the below. There is an improvement of speed from 1.03 sec to 0.93 sec. However, it is still not fast enough. Is there anymore suggestions? Thanks.


static float pslReciprocalTable[] = {
      0.0f,
      1.0f/1.0f,             1.0f/2.0f,             1.0f/3.0f,
      1.0f/4.0f,             1.0f/5.0f,             1.0f/6.0f,
      1.0f/7.0f,             1.0f/8.0f,             1.0f/9.0f,
      1.0f/10.0f,       1.0f/11.0f,       1.0f/12.0f,
};

int evalRbpFeature2(
   PslRbpFeature* rb,
   int offset,
   float* strength
){
      *strength=0.0f;
      //..............................................................
 
      float strength_minus=0.0f;
      int count_minus=0;

      int** sumPtr=rb->sumPtr;
       int sRef =    sumPtr[0][offset]
                   -  sumPtr[1][offset]  
                     -  sumPtr[2][offset]  
                     +  sumPtr[3][offset];
      
 
    int N=rb->noOfBlocks-1;       
    sRef/=N;       

      int key=0;
      int s;
      for(int n=0; n<N; n++){
             sumPtr +=4;
           s =    sumPtr[0][offset]
                   -  sumPtr[1][offset]  
                   -  sumPtr[2][offset]  
                   +  sumPtr[3][offset];
             s -= sRef;
             if(s<0) {
                  strength_minus += s;
                  count_minus++;
                  key |=   1  << n ;
             }
             
      }
      if(!key) return key;
 
      *strength = strength_minus*(pslReciprocalTable[N-count_minus]+pslReciprocalTable[count_minus]);
    *strength *= rb->normStFactor;
      return key;
      
}
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry, there is a slight typing error in my last comment. It should have been:

     int  strength_minus=0;
     int count_minus=0;
rb->noOfBlocks should be between 3 to 9
(for future code, i might increase that to 25).

The function evalRbpFeature2() will be called many times, with different PslRbpFeature* rb and int offset. The output of the function is key and strength.

Offset actually is an image postition. If (x,y) is a point in the iomage, then offset = x+y*width

'strenght_minus' would be unlikely to go  out of range. The largest  value it can get is -20*20*255.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>Would it be possible to reorganise sumPtr so you can use:

        s =    sumPtr[offset][4*n]
                -  sumPtr[offset][4*n+1]  
                -  sumPtr[offset][4*n+2]  
                +  sumPtr[offset][4*n+3];

I think it would be difficult.

There is an image data:
int* data;

sumPtr is actually a list of pointers that points to the key postitions in data, when the application point is(0,0) .eg)
sumPtr[0]=&data[ .... coordinte(0,0) ...]
sumPtr[1]=&data[ .... coordinte(5,6) ...]
sumPtr[2]=&data[ .... coordinte(8,9) ...]
....

When I change the application point to (x,y), it would be
new_sumPtr[0]=&data[ .... coordinte(0+x,0+y) ...] =sumPtr[0][offset]
new_sumPtr[1]=&data[ .... coordinte(5+x,6+y) ...] =sumPtr[1][offset]
new_sumPtr[2]=&data[ .... coordinte(8+x,9+y) ...] =sumPtr[2][offset]
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Kent,

Perhaps you arent as old as you used to be ;).

int** used to be the only reliable way of defining two-dimensional data because early C compilers had problems with int*[] and int[]. You will see this in the definition of the argv parameter to main in older compilers. This was especially true when both the row address array and the rows themselves are created using malloc. Nowadays, [] is handled much better and is even allowed to appear in structs in some compilers so long as it is at the end.

Strangely, I cant find any evidence to back me up on this.

Paul

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

int** sumPtr is an arrays of pointers.
The actual data is int* data, which is not shown hewre in the code.
(Although an image is two-dimensional, I am treating it as a one demensional array in int* data)

Each element of sumPtr is actually a pointer that points to int* data.

The code is correct. It works the way I want. This is actually an object detector in image.It is a car detector.
Ahhh!! I see where youre going! How arrogant of me to assume it was something simple.

>>sumPtr[0][offset]

You're right, it does look a bit odd. Perhaps it would be clearer coding to use:
>>(sumPtr[0])[offset]

>>My wife says this to me a lot, too.  ;)
Consider yourself lucky. If my wife had no evidence to back her up she'd prove she didnt need any first. ;)

Paul
(sumPtr[0])[offset] is correct and this is what i am doing!

As an example, cosider:

raw image:
x x x x x
x x x x x
x x x x x

Assume that some kind of processing (eg filtering), i end up with the following data:

2 3 5 6 8
7 8 3 7 0
1 2 0 9 0

Then int* data will be
data[0]=2;
data[1]=3;
data[2]=5;
data[3]=6;
data[4]=8;
data[5]=7;
....
int* data will be fixed.

However, i will need to extract useful information (feature) of data.
This would be in the form of
feature = data[point0] - data[point1]-data[point2]+data[point3];

point0,point1,point2,point3 actually defines a rectangle (we call RECT), within a window.

Assuming
point0=(0,0)
point1=(2,0)
point2=(0,1)
point3=(2,1)

then,
 sumPtr[0]=&data[0] //point0
 sumPtr[1]=&data[2] //point1
 sumPtr[2]=&data[0+1*width] //point2
 sumPtr[3]=&data[2+1*width] //point3

(with offset=0)
feature =sumPtr[0][0]-sumPtr[1][0]-sumPtr[0][0]+sumPtr[2][0];
           =2-5-7+3;

Now imagine RECT is shifted by one unit to the right, then
new_point0=(0+1,0)
new_point1=(2+1,0)
new_point2=(0+1,1)
new_point3=(2+1,1)

(with offset=1)
new_feature =sumPtr[0][1]-sumPtr[1][1]-sumPtr[0][1]+sumPtr[2][1];
           =3-6-8+7;
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'd see that as a very worthwhile method to try. You've got the timer to check the results. I wouldnt use malloc/free however, I'd use a static array because noOfBlocks is only likely to go up to about 9.

Paul
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

Thank you for the responses. I will try to create a static memory and let you know the results maybe tomorrow.
Hi hengck23,

If you're going to create a static array, its type should be 'int'.

Kent