c code optimization

i have attached a C code for the  which will  run on  DSP.
I need to optimise it for speed (less number of cycles). please suggest some ways..

note:
WIDTHX is the width of the image, HEIGHTY is its height. the frame_1 is a 1-D array
pixels are stored sequentially. i have attached the original and the modified code.
original was taking around 111million. I modified it to take 55. furthur modifications are necessary
i need to bring it below 10million.. modified.c
Original.c
srimallikarthikAsked:
Who is Participating?
 
satsumoConnect With a Mentor Software DeveloperCommented:
You might also try referring to each source pixel with its own pointer.  So there will be 4 increment/decrement operations but no arithmetic on indexes.
0
 
satsumoSoftware DeveloperCommented:
You could build a 255 by 255 table that says which combinations of gradientX and gradientY pass the edge test.  Rather than doing _mpyu twice, just look up the edgemap value (0 or 255) in the table.  If you used 1 byte per combination the table is 64k.  Though you only need half of it and theoretically only 1 bit per combination, so 4k as a minimum.
0
 
satsumoSoftware DeveloperCommented:
Will locality of reference make a difference on the DSP?  If the image is very wide, will it miss a cache when referring to the line below?  If so you might consider dividing the image into smaller sub images, if that helps the caching.
0
Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

 
ozoCommented:
If you are multiplying gradientX and gradientY by itself, there is no need for the abs operation
But getting from 55 to 10million would probably require better use of parallelism in the DSP
0
 
srimallikarthikConnect With a Mentor Author Commented:
Experts,

Thanks for the support and comments.

Firstly, Sorry for the delayed response.

I made it to 10 million IPC by changing the code as per the satsumo's suggestion of using pointer arithmatic for incrementing and decrementing

And this is not the total solution. Only index calculation i used satsumo's suggestion of using pointers, Rest of the implementation i did it in assembly.


~Karthik
0
 
srimallikarthikAuthor Commented:
suggestions was helpful indeed, but are not one shot answers which saved my time. so GRADE B
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.