Solved

speed up yuv422 to yuv420 software conversion

Posted on 2012-03-19
2
1,223 Views
Last Modified: 2012-08-13
Hi,

I have used the following code to convert yuv422 to yuv420 images.

void ConvertUyvyToYuv420P(uint8_t* destFrame,
                                            uint8_t* srcFrame,
                                            int width,
                                            int height)
      {
            
            uint8_t* pyFrame = destFrame;
            uint8_t* puFrame = pyFrame + width*height;
            uint8_t* pvFrame = puFrame + width*height/4;
            
            int uvOffset = width * 4 * sizeof(uint8_t);
            
            int i,j;
            
            for(i=0; i<height-2; i++)
            {
                  for(j=0;j<width;j+=2)
                  {
                        uint16_t calc;
                            if ((i&1) == 0)
                            {
                                  calc = *srcFrame;
                                  calc += *(srcFrame + uvOffset);
                                    calc /= 2;
                                  *puFrame++ = (uint8_t) calc;
                                 }
                             srcFrame++;
                           *pyFrame++ = *srcFrame++;
                           if ((i&1) == 0)
                           {
                              calc = *srcFrame;
                              calc += *(srcFrame + uvOffset);
                              calc /= 2;
                              *pvFrame++ = (uint8_t) calc;
                               }
                           srcFrame++;
                           *pyFrame++ = *srcFrame++;
                      }
               }
       }

When I used this on 1080p input at 30 frames per second I am able to convert only at 15 frames per second, is there any way to improve the above snippets speed or is there a better algorithm for conversion.

Any help would be great!!
Thanks
0
Comment
Question by:Shiv_Sg
2 Comments
 
LVL 3

Accepted Solution

by:
algorith earned 500 total points
ID: 37744127
Hi, a lot of this depends on what system you are programming for. As you probably know, many have multiple CPUs, and hyperthreading can make an individual CPU look like 2.  In this case you could separate your outer loop above into multiple threads, each doing their work in parallel.  Or, you could put your entire subroutine into a thread, then spawn as many frame processing threads as there are (effective) CPUs so that you could do multiple frames in parallel - this is my suggestion for the best approach.

On the other hand, if the memory bandwidth is too low then no amount of threading will improve the situation. To check this for your target system: you need to address at least (width * height) locations, so make sure this is not an outrageous number to do at 30 fps on whatever hardware you have. e.g. 1920*1080*30 = 30 Mb/s or so, not an outrageous number for some hardware, out of the question for others.  Keep in mind that addressing many MB of serial locations in memory may not be as fast as you would expect from the computer's specs, so needs to be tested.

In lieu of multi-threading, other approaches may be warranted. Among the simpler  things to do are to remove anything you can from the inner loop. However, depending on the processor, and its out-of-sequence scheduling algorithms, it is sometimes surprising what exactly will speed up a loop.

In particular, it is often possible to trade off increased space for reduced speed, so adjusting your algorithm to make 2 passes might enable you to eliminate the tests "if((i&1) == 0) {...}".  I have not played with this kind of bit twiddling for a while, but sometimes just changing the addressing scheme from dereferencing pointers to using array subscripts might enable the compiler optimizer or the processor scheduling to figure out what to do.

Finally, I assume that optimization is turned on in the compiler?  Sometimes optimizing for speed is not the correct way to go, and optimizing for minimal code size works better. Although again, I have not played with this for some time and today this may not be a distinction.

good luck!
0
 

Author Comment

by:Shiv_Sg
ID: 37746071
Hi Thanks a lot for the quick response.. I currently have dual core CPU so would try to add more threads and see how much it improves..

I am also trying to modify the code based on ur space tradeoff suggestion in para 3, ll let u know how things improve up.

Thanks again.
Regards
Shiv
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Software to copy DVDs recommendation 13 96
best sources to up-to-date in C++? 8 80
Recording my desktop on a Mac 3 71
Error creating a new C++ project in ,net 20 33
I have a Synology DS212+ NAS.  These are not only great for backup and normal NAS stuff, but also for delivering media throughout your home or LAN via DLNA.  I copied my whole audio collection from iTunes over to the box, but couldn't figure out how…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.
Viewers will learn the basics of creating custom device Racks in Ableton Live. Place instrument(s) and effects onto a track, and select them all by holding the Shift key and clicking on the device title bars: Group them by typing Command-G (Ctrl-G…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question