hengck23
asked on
Code optimization
Dear all,
I have been trying to optimize the following code in assembly or sse/mmx.
Can anyone help?
THanks
With Beat Regards
HCK
typedef struct PslRbpFeature {
int noOfBlocks;
int* points;
int* specs;
int** sumPtr;
double** sqSumPtr;
float invArea;
int refWeight;
int id;
int winWidth;
int winHeight;
//precalculation
float normStFactor;
}PslRbpFeature;
int evalRbpFeature(
PslRbpFeature* rb,
int offset,
float* strength
){
*strength=0.0f;
float strength_plus=0.0f;
float strength_minus=0.0f;
int count_plus=0;
int count_minus=0;
int** sumPtr0=rb->sumPtr ;
int** sumPtr1=sumPtr0+1;
int** sumPtr2=sumPtr0+2;
int** sumPtr3=sumPtr0+3;
int* p0 = *sumPtr0 +offset;
int* p1 = *sumPtr1 +offset;
int* p2 = *sumPtr2 +offset;
int* p3 = *sumPtr3 +offset;
int sRef = *p0 - *p1 - *p2 + *p3 ;
int N=rb->noOfBlocks-1;
strength_plus=0.0;//sRef;
count_plus=N;
sRef/=N;
int key=0;
for(int n=0; n<N; n++){
sumPtr0 +=4;
sumPtr1 +=4;
sumPtr2 +=4;
sumPtr3 +=4;
p0=*sumPtr0 +offset;
p1=*sumPtr1 +offset;
p2=*sumPtr2 +offset;
p3=*sumPtr3 +offset;
int s = *p0 - *p1 - *p2 + *p3 ;
s -= sRef;
if(s<0) {
strength_minus += s;
count_minus++;
key |= 1 << n ;
}
}
if(!key) return key;
strength_plus -= strength_minus;
count_plus -= count_minus;
*strength=strength_plus/co unt_plus-s trength_mi nus/count_ minus;
*strength *= rb->normStFactor;
return key;
}
I have been trying to optimize the following code in assembly or sse/mmx.
Can anyone help?
THanks
With Beat Regards
HCK
typedef struct PslRbpFeature {
int noOfBlocks;
int* points;
int* specs;
int** sumPtr;
double** sqSumPtr;
float invArea;
int refWeight;
int id;
int winWidth;
int winHeight;
//precalculation
float normStFactor;
}PslRbpFeature;
int evalRbpFeature(
PslRbpFeature* rb,
int offset,
float* strength
){
*strength=0.0f;
float strength_plus=0.0f;
float strength_minus=0.0f;
int count_plus=0;
int count_minus=0;
int** sumPtr0=rb->sumPtr ;
int** sumPtr1=sumPtr0+1;
int** sumPtr2=sumPtr0+2;
int** sumPtr3=sumPtr0+3;
int* p0 = *sumPtr0 +offset;
int* p1 = *sumPtr1 +offset;
int* p2 = *sumPtr2 +offset;
int* p3 = *sumPtr3 +offset;
int sRef = *p0 - *p1 - *p2 + *p3 ;
int N=rb->noOfBlocks-1;
strength_plus=0.0;//sRef;
count_plus=N;
sRef/=N;
int key=0;
for(int n=0; n<N; n++){
sumPtr0 +=4;
sumPtr1 +=4;
sumPtr2 +=4;
sumPtr3 +=4;
p0=*sumPtr0 +offset;
p1=*sumPtr1 +offset;
p2=*sumPtr2 +offset;
p3=*sumPtr3 +offset;
int s = *p0 - *p1 - *p2 + *p3 ;
s -= sRef;
if(s<0) {
strength_minus += s;
count_minus++;
key |= 1 << n ;
}
}
if(!key) return key;
strength_plus -= strength_minus;
count_plus -= count_minus;
*strength=strength_plus/co
*strength *= rb->normStFactor;
return key;
}
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Sorry, there is a slight typing error in my last comment. It should have been:
int strength_minus=0;
int count_minus=0;
int strength_minus=0;
int count_minus=0;
ASKER
rb->noOfBlocks should be between 3 to 9
(for future code, i might increase that to 25).
The function evalRbpFeature2() will be called many times, with different PslRbpFeature* rb and int offset. The output of the function is key and strength.
Offset actually is an image postition. If (x,y) is a point in the iomage, then offset = x+y*width
'strenght_minus' would be unlikely to go out of range. The largest value it can get is -20*20*255.
(for future code, i might increase that to 25).
The function evalRbpFeature2() will be called many times, with different PslRbpFeature* rb and int offset. The output of the function is key and strength.
Offset actually is an image postition. If (x,y) is a point in the iomage, then offset = x+y*width
'strenght_minus' would be unlikely to go out of range. The largest value it can get is -20*20*255.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
>Would it be possible to reorganise sumPtr so you can use:
s = sumPtr[offset][4*n]
- sumPtr[offset][4*n+1]
- sumPtr[offset][4*n+2]
+ sumPtr[offset][4*n+3];
I think it would be difficult.
There is an image data:
int* data;
sumPtr is actually a list of pointers that points to the key postitions in data, when the application point is(0,0) .eg)
sumPtr[0]=&data[ .... coordinte(0,0) ...]
sumPtr[1]=&data[ .... coordinte(5,6) ...]
sumPtr[2]=&data[ .... coordinte(8,9) ...]
....
When I change the application point to (x,y), it would be
new_sumPtr[0]=&data[ .... coordinte(0+x,0+y) ...] =sumPtr[0][offset]
new_sumPtr[1]=&data[ .... coordinte(5+x,6+y) ...] =sumPtr[1][offset]
new_sumPtr[2]=&data[ .... coordinte(8+x,9+y) ...] =sumPtr[2][offset]
s = sumPtr[offset][4*n]
- sumPtr[offset][4*n+1]
- sumPtr[offset][4*n+2]
+ sumPtr[offset][4*n+3];
I think it would be difficult.
There is an image data:
int* data;
sumPtr is actually a list of pointers that points to the key postitions in data, when the application point is(0,0) .eg)
sumPtr[0]=&data[ .... coordinte(0,0) ...]
sumPtr[1]=&data[ .... coordinte(5,6) ...]
sumPtr[2]=&data[ .... coordinte(8,9) ...]
....
When I change the application point to (x,y), it would be
new_sumPtr[0]=&data[ .... coordinte(0+x,0+y) ...] =sumPtr[0][offset]
new_sumPtr[1]=&data[ .... coordinte(5+x,6+y) ...] =sumPtr[1][offset]
new_sumPtr[2]=&data[ .... coordinte(8+x,9+y) ...] =sumPtr[2][offset]
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Kent,
Perhaps you arent as old as you used to be ;).
int** used to be the only reliable way of defining two-dimensional data because early C compilers had problems with int*[] and int[]. You will see this in the definition of the argv parameter to main in older compilers. This was especially true when both the row address array and the rows themselves are created using malloc. Nowadays, [] is handled much better and is even allowed to appear in structs in some compilers so long as it is at the end.
Strangely, I cant find any evidence to back me up on this.
Paul
Perhaps you arent as old as you used to be ;).
int** used to be the only reliable way of defining two-dimensional data because early C compilers had problems with int*[] and int[]. You will see this in the definition of the argv parameter to main in older compilers. This was especially true when both the row address array and the rows themselves are created using malloc. Nowadays, [] is handled much better and is even allowed to appear in structs in some compilers so long as it is at the end.
Strangely, I cant find any evidence to back me up on this.
Paul
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi,
int** sumPtr is an arrays of pointers.
The actual data is int* data, which is not shown hewre in the code.
(Although an image is two-dimensional, I am treating it as a one demensional array in int* data)
Each element of sumPtr is actually a pointer that points to int* data.
The code is correct. It works the way I want. This is actually an object detector in image.It is a car detector.
int** sumPtr is an arrays of pointers.
The actual data is int* data, which is not shown hewre in the code.
(Although an image is two-dimensional, I am treating it as a one demensional array in int* data)
Each element of sumPtr is actually a pointer that points to int* data.
The code is correct. It works the way I want. This is actually an object detector in image.It is a car detector.
Ahhh!! I see where youre going! How arrogant of me to assume it was something simple.
>>sumPtr[0][offset]
You're right, it does look a bit odd. Perhaps it would be clearer coding to use:
>>(sumPtr[0])[offset]
>>My wife says this to me a lot, too. ;)
Consider yourself lucky. If my wife had no evidence to back her up she'd prove she didnt need any first. ;)
Paul
>>sumPtr[0][offset]
You're right, it does look a bit odd. Perhaps it would be clearer coding to use:
>>(sumPtr[0])[offset]
>>My wife says this to me a lot, too. ;)
Consider yourself lucky. If my wife had no evidence to back her up she'd prove she didnt need any first. ;)
Paul
ASKER
(sumPtr[0])[offset] is correct and this is what i am doing!
As an example, cosider:
raw image:
x x x x x
x x x x x
x x x x x
Assume that some kind of processing (eg filtering), i end up with the following data:
2 3 5 6 8
7 8 3 7 0
1 2 0 9 0
Then int* data will be
data[0]=2;
data[1]=3;
data[2]=5;
data[3]=6;
data[4]=8;
data[5]=7;
....
int* data will be fixed.
However, i will need to extract useful information (feature) of data.
This would be in the form of
feature = data[point0] - data[point1]-data[point2]+ data[point 3];
point0,point1,point2,point 3 actually defines a rectangle (we call RECT), within a window.
Assuming
point0=(0,0)
point1=(2,0)
point2=(0,1)
point3=(2,1)
then,
sumPtr[0]=&data[0] //point0
sumPtr[1]=&data[2] //point1
sumPtr[2]=&data[0+1*width] //point2
sumPtr[3]=&data[2+1*width] //point3
(with offset=0)
feature =sumPtr[0][0]-sumPtr[1][0] -sumPtr[0] [0]+sumPtr [2][0];
=2-5-7+3;
Now imagine RECT is shifted by one unit to the right, then
new_point0=(0+1,0)
new_point1=(2+1,0)
new_point2=(0+1,1)
new_point3=(2+1,1)
(with offset=1)
new_feature =sumPtr[0][1]-sumPtr[1][1] -sumPtr[0] [1]+sumPtr [2][1];
=3-6-8+7;
As an example, cosider:
raw image:
x x x x x
x x x x x
x x x x x
Assume that some kind of processing (eg filtering), i end up with the following data:
2 3 5 6 8
7 8 3 7 0
1 2 0 9 0
Then int* data will be
data[0]=2;
data[1]=3;
data[2]=5;
data[3]=6;
data[4]=8;
data[5]=7;
....
int* data will be fixed.
However, i will need to extract useful information (feature) of data.
This would be in the form of
feature = data[point0] - data[point1]-data[point2]+
point0,point1,point2,point
Assuming
point0=(0,0)
point1=(2,0)
point2=(0,1)
point3=(2,1)
then,
sumPtr[0]=&data[0] //point0
sumPtr[1]=&data[2] //point1
sumPtr[2]=&data[0+1*width]
sumPtr[3]=&data[2+1*width]
(with offset=0)
feature =sumPtr[0][0]-sumPtr[1][0]
=2-5-7+3;
Now imagine RECT is shifted by one unit to the right, then
new_point0=(0+1,0)
new_point1=(2+1,0)
new_point2=(0+1,1)
new_point3=(2+1,1)
(with offset=1)
new_feature =sumPtr[0][1]-sumPtr[1][1]
=3-6-8+7;
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I'd see that as a very worthwhile method to try. You've got the timer to check the results. I wouldnt use malloc/free however, I'd use a static array because noOfBlocks is only likely to go up to about 9.
Paul
Paul
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi,
Thank you for the responses. I will try to create a static memory and let you know the results maybe tomorrow.
Thank you for the responses. I will try to create a static memory and let you know the results maybe tomorrow.
Hi hengck23,
If you're going to create a static array, its type should be 'int'.
Kent
If you're going to create a static array, its type should be 'int'.
Kent
ASKER
Based on the comments, I have modified my code to the below. There is an improvement of speed from 1.03 sec to 0.93 sec. However, it is still not fast enough. Is there anymore suggestions? Thanks.
static float pslReciprocalTable[] = {
0.0f,
1.0f/1.0f, 1.0f/2.0f, 1.0f/3.0f,
1.0f/4.0f, 1.0f/5.0f, 1.0f/6.0f,
1.0f/7.0f, 1.0f/8.0f, 1.0f/9.0f,
1.0f/10.0f, 1.0f/11.0f, 1.0f/12.0f,
};
int evalRbpFeature2(
PslRbpFeature* rb,
int offset,
float* strength
){
*strength=0.0f;
//........................
float strength_minus=0.0f;
int count_minus=0;
int** sumPtr=rb->sumPtr;
int sRef = sumPtr[0][offset]
- sumPtr[1][offset]
- sumPtr[2][offset]
+ sumPtr[3][offset];
int N=rb->noOfBlocks-1;
sRef/=N;
int key=0;
int s;
for(int n=0; n<N; n++){
sumPtr +=4;
s = sumPtr[0][offset]
- sumPtr[1][offset]
- sumPtr[2][offset]
+ sumPtr[3][offset];
s -= sRef;
if(s<0) {
strength_minus += s;
count_minus++;
key |= 1 << n ;
}
}
if(!key) return key;
*strength = strength_minus*(pslRecipro
*strength *= rb->normStFactor;
return key;
}