Link to home
Start Free TrialLog in
Avatar of roland4
roland4

asked on

Motion Detection

Hi,

I have a webcam and written a program in VB6 which takes frames from it. Now I´m interested in programming relatively good motion detection. I tried it with saving two pictures as byte- arrays and compare them, but the problem was, that a lot of pixels are differently without any motion (because of light, cheap CMOS sensor,...?)!!
I think it would be a nice advantage to write a C++ DLL, that´s why I´m posting here.
Please could you give any explanations, code examples,... to me (there are a lot of points to get for good tips! ;-))?

Thanks in advance
Avatar of corduroy9
corduroy9


That's an interesting take on a motion detector.  But it seems quite an undertaking to compare data in two pictures, seems they would never be exactly the same.  You may have to decide how much of a difference you're going to allow between the data in each picture, and when that threshold is passed, then you can claim to have detected motion.   Also, each picture's data may have parts/tags that you may not want to compare, for example, I know TIF images can contain Header information which has no impact on the image data itself.

Here's a link to conventional motion detectors:
http://www.glolab.com/pirparts/infrared.html

Avatar of Kyle Abrahams, PMP
Also, for efficiency, you shouldn't check every pixel, depending on the size, you may want every 5th pixel or something.



Hi Roland!

Motion Detection is usually done using a background model.  That means you have some notion (e.g. an image) of what the view looks like without objects in it.  Every time you then get a new image you can determine for each pixel how much it differs from the same pixel of the empty image.  This can be done by pixelwise differencing and then thresholding.  The resulting binary motion image can be post-processed, e.g. with a (spatial) median filter, to remove noise.  That way you can get the areas in the image which move.

Your background model is very important.  It governs how well you can detect moving pixels.  As the lighting conditions can be expected to change over time the model has to be continuously updated: Once in a while (maybe twice a second or so, depends on the speed of your computer) you use the current video image to update your background model.  Let's say the background model is a simple image, because other models are more complicated and computationally expensive (albeit much better...).  Then you could say that for each pixel position (x,y) you say something like background_model[x][y] = alpha * background_model[x][y] + (1-alpha) * current_image[x][y];  In this case alpha (something between 0 and 1) determines how fast your model updates to make up for lighting changes etc. (NB the code is pseudo code, check how your image is addressed).

A problem with this is that the background will at some point also include the objects which appear in the image.  If you do not like that, experiment with making alpha dependent on whether the current model thinks that the pixel at (x,y) is moving.  (Maybe alpha = 1 if current background model thinks that the pixel is moving.)

Also, one problem is the AGC (auto gain control) of your camera.  If you disable it, this will work much better.

Hope this helps,

Nils.
The amount of contrast between the background and foreground is important.

Even though the background may be static, unfortunately, a number of things conspire against two images being identical: (1) sampling error in the CCD of the camera, (2) random noise, (3) changing lighting conditions.

You should investigate either averaging the pixels over a number of frames (then you can calculate the std deviation). This would only be good if you expect the lighting to remain fairly constant and there is not a whole lot of background motion (rustling leaves might be okay, but cars constantly zipping by would not).

You should also consider trying to identify objects. Basically, rather than looking at the image as a collection of pixels, you would try to group bunches of pixels into simple polygons. This way, if a new polygon suddnly appears, or a number of polygons significantly distort, you have a clue that something is afoot. The polygons do not have to correspond to anything that we as humans would consider sensible - it is perfectly fine if your sofa gets chopped up into three arbitrary polygons.

You should also try detecting motion on a larger area than a single pixel. For example, if your camera is 320x200, you may need to have a rule that any perceieved motion must come from an object (polygon) with rectangular dimensions of 32x20 with a minimum coverage of 400 pixels (these should be customizable).

As well, to detect motion you need to define that the object has travelled somewhere (say a minimum fifteen pixel displacement from frame to frame - depending on what your sample rate is).

If you go with the polygon idea, you should measure displacement from the centre of the polygon, since the diemensions are likely to change.

John
Hi Roland!

And thanks for the comment, John.  If you search the literature you will find that there are several ways to handle the problems John and I talked about.  Those are the methods I called "more complicated and computationally expensive (albeit much better...)".  What is usually done is to have a statistical model for each pixel whether it is "foreground" (i.e. moving) or "background".  The simple method I described boils down to a model of Gaussian noise with 0 mean in the channels.  The threshold you use is then connecte to your standard deviation \sigma if you do a mathematical model.

More complicated methods, again for a statical camera, will either have multiple Gaussian distributions per pixel (e.g. pixel can be tree leaf or sky behind it as the tree moves in the wind), normally 3 to 5 [1], or non-parametrical models [2].  Both methods work well but require quite some CPU time.

You should probably test simple methods first before resorting to these complicated ones.  They require some work to implement them.  For instance, [1] have it SSE optimised so that it runs in realtime.

As for the mimimum size, you do that when your motion detector extracts moving blobs (connected components) from the binary motion image.  You simply have a threshold for the minimum size of such blobs.

The minimum motion threshold will not be useful.  Depending on your application and camera's field of view, a) people could move slowly on purpose and thereby avoid detection, or b) people could walk towards the camera, getting larger but not moving much in the image.

Some references (top two motion detection research groups) below.  Code might be available from these people's web sites (or ask google...).

Kind regards,

Nils.

[1] Chris Stauffer and W Eric L Grimson (MIT): "Adaptive Background Mixture Models for Real-time Tracking", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'99), Fort Collins, USA, volume 2, pages 246-252, 1999.

[2] Ahmed Elgammal and David Harwood and Larry Davis (University of Maryland): "Non-parametric Model for Background Subtraction", 6th European Conference on Computer Vision (ECCV 2000), Dublin, Ireland, 2000, Springer Verlag} pages 751-767.  For a review see http://www.eecs.lehigh.edu/~tboult/FRAME/Elgammal/bgmodel.html and bgmodel.pdf.

Thanks for you comments Nils.

Yes, I recognize that requiring a minimum displacement for motion detection can be defeated by moving slowly. However, even for humans, slow movement is hard to detect unless we refer to some previous reference (5 minutes ago, 1 hour ago, last weeek?).

As well, moving towards or away from the camera can also cause problems. This could be handled by detecting objects that are growing or shrinking to be considered as moving.

Roland, it depends on how versatile, how accurate you need the application to be. You may be able to get away with something simple, or you may need something more complicated.

You may want to look at MPEG. Part of the algorithm does motion compensation. To minimize bandwidth, MPEG tries to reuse previous images. It does this by detecting which objects are moving in a scene. It then applies a displacement to those objects. This way it can reuse the background and the object. All it needs to do is move the object by the calculated displacement.

http://www.cs.washington.edu/homes/gidon/presentations/mocomp.ppt (this one is quite nice and basic)
http://www.dcs.warwick.ac.uk/research/mcg/bmmc/
http://www.hpl.hp.com/techreports/96/HPL-96-53.pdf 
http://icsl.ee.washington.edu/~woobin/papers/General/node5.html
http://www.newmediarepublic.com/dvideo/compression/adv08.html
http://guido.bruck.bei.t-online.de/kompen/lukompeg.htm

You may also want to look into edge detection.

http://prettyview.com/edge/
http://www-ece.rice.edu/~kkelly/elec539/laplacian.html
http://library.wolfram.com/examples/edgedetection/
http://www.cs.cf.ac.uk/Dave/Vision_lecture/node24.html

I am not sure if using an MPEG library is the way you want to go. I am also not sure if it is easy to get access to motion information from it. There are probably free MPEG encoding libraries (but I haven't found any in a quick search).

http://starship.python.net/~gward/mpeglib/ (this is a decoder)

John
Avatar of roland4

ASKER

Hi all of you!
Thanks for your numerously comments! I will look at that stuff in the next days...
Avatar of roland4

ASKER

Wow, thats very difficult stuff. Isn't there a simple piece of code out?
It's also hard to decide whom of you (John or Nils) I should give the points, because all of your comments are great.
- But a little bit to hard for me :-(
Hi Roland,

Don't despair, doing it the "proper" way is difficult. However, depending on your needs you may be able to get away with something that is just "good enough".

Using MPEG or creating your own object detection algorithms is really the platinum plated way to go. But, you may be able to get by on a less naive version of your byte-array comparison.

(1) Decide if you want to deal with monochrome images or colour images. If you opt for colour images, then you need to deal with each colour plane separately (i.e. you need to process the red, blue, and green images). You can convert a colour pixel into a monochrome pixel with the following formula Y = 0.3*RED + 0.59*GREEN + 0.11*BLUE (http://www.bobpowell.net/grayscale.htm)

(2) You cannot simply byte compare the two images, because sampling error and random noise always guarantees the two images will never be 100% identical. But, you can compensate by establishing a threshold value (call it sensitivity). So your comparison would be something like

if (abs(pixel1 - pixel2) > sensitivity) dirty_pixel = true;

Clearly, the smaller the sensitivity, the most sensitive your algorithm will be to noise. The larger the sensitivity, the more constrast will be required between the two pixels.

Of course, you can experiment with different pixel difference / sensitivity algorithms. You may want try non-linear sensitivity.

(3) You collect all the dirty pixels.

(3A) The simplest "motion detection", would say something like

if (total_dirty_pixels > motion threshold) motion_detected = true;

Of course, it would be insensitive to small objects. It would also be sensitive to a lot of random noise in the image.

(3B) A smarter motion detection would try to detect if the pixels are grouped together. You would either be looking at a statistical analysis of the distribution of the dirty pixels to see if the cluster together. Or, you could try to calculate the convex hull of the points - although it would have to be a modified convex hull algorithm that ignores outlining dirty pixels. If you use the modified convex hull, you would probably apply another criteria like "are at least xx% of the pixels in my convex hull dirty?"

http://www.cse.unsw.edu.au/~lambert/java/3d/ConvexHull.html 
http://www.cs.sunysb.edu/~algorith/files/convex-hull.shtml
http://www.cs.princeton.edu/~ah/alg_anim/version1/ConvexHull.html

Of course the algorithm I present is not "accurate", but it should be good enough if all you want to do is detect that something approaching motion occured.

Limitations, (1) not likely to detect slow motion, (2) changing lighting conditions can confuse it, (3) small objects may not be detected, (4) low contrast between background and foreground may not be detected, ...

Never give up, remember, even expert mission critical software has  limitations and problems (remember the Iranian Airbus which was mistaken for a F14 Tomcat 15 years ago http://www.geocities.com/CapitolHill/5260/july88crash.html - the software was unable to make an accurate identification because the plane was flying directly into the radar).

As for awarding the points, you can always split the points.

John
Avatar of roland4

ASKER

I tried the thing with "sensitivity". It´s a little bit better, but not enough for lighting changes or objects that are far away!
I think it would be a nice way to split up the picture to several regions and check that regions for changes, because it would be less CPU intensive. - John, you talked about polygons. But there I also have the problem with noise, lighting,...
What about the background model? - I would have to save a new image before each comparison, right? - I already do that. But there is also noise in it...
If you could identify where the noise is coming from, just ignore those points.  (You're looking at a still frame with movements.  Even if it were a security camera, the frame reference would just extend beyond what you can view at one time.  With this in mind, identify what points the noise are coming from, any maybe assign weights to it, only having mass changes in the noise bring up the motion detection.
ASKER CERTIFIED SOLUTION
Avatar of John_Drake
John_Drake

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of roland4

ASKER

Hi John,

Thanks for your time and tips!
I see, it´s yet a long way to go for me with that stuff...
So I gave you the points today, but I would be happy if you could explain the thing with polygons in some sort of pseudo code or so!

Thanks a lot,
Roland
Okay, I'll try to do so. You will have to give me a few days to pull my thoughts and ideas together.

Motion detection is one of those problems that sounds fairly trivial ... but gets murkier and murkier the further you go.

John
You have probably already solved this problem but here's a simple method I'm using.  I am taking the input from a web camera set up at 24bit - 160x120 res.  So the example will only work with those parameters, although it's easy to change it.  Anyway here's the small class I wrote to do it.  just call ImageDetectionStart(char*, unsigned int) !  If there is enough of a change then it will return true... if the image is similar it returns false.


#ifndef _MOTIONDETECTION_H_
#define _MOTIONDETECTION_H_

class CMotionDetection
{
private:
      char m_CurrentVideoData[160][120];
      char m_OldVideoData[160][120];
      bool m_bFirstRun;
      unsigned int  m_nBufferSize;
      unsigned int  m_nMatched;

public:
      CMotionDetection()
      {
            m_bFirstRun = true;
            m_nBufferSize = 160 * 120 * 3;  //image width * image height * 3 bytes for 24bit
            m_nMatched = 0;
      };

      ~CMotionDetection()
      {

      };



      bool ImageDetectionStart(char* pNewFrame, unsigned int nFrameSize)
      {
            
            
            // 1. Covert to b/w .. Easier for use to process
            DownSample(pNewFrame,nFrameSize);


            if (m_bFirstRun) //We have done all we can and have a buffer to compare next time around
            {
                  m_bFirstRun = false; //We have been through the process at least once...set the flag...
                  return true;
            }


            m_nMatched = 0; //reset the counter

            // 2. Compare to previous Image
            PixelCompare();


            // 3. Return false if image is similar.. or true if image has enough differnce we would need to draw it.
            
            //To test we set the bar at 25%

            if (m_nMatched < (m_nBufferSize / 4)) return true; // We have enough of a change to do a redraw

            
            
            return false;
      };


      //Convert 24bit to B/W custom image.
      void DownSample(char* pNewFrame, unsigned int nFrameSize)
      {

            int x;
            int y;
            DWORD p = 0;
            DWORD  px;
            unsigned char pDest[2];
            char clr;
            

                  DWORD len = 160 * 120 * 3;

                  for (y = 0; y <  120 ; y++) {
                        for (x = 0 ; x < 160; x++) {

                              memcpy(pDest,((char*)pNewFrame + len - 3) - p * 3,3);
                        
                              p ++;

                  
                              px = RGB(pDest[0], pDest[1],pDest[2]);
                        
                              if ( px >= RGB(230,230,230)) {
                                    clr = char(1);//'a';//1;
                              }else if (px < RGB(230,230,230) && px > RGB(220,220,220)) {
                                    clr = char(2);//'b';//2;
                              }else if (px < RGB(220,220,220) && px > RGB(210,210,210)) {
                                    clr = char(3);//'c';//2;
                              }else if (px < RGB(210,210,210) && px > RGB(200,200,200)) {
                                    clr = char(4);//'d';//2;
                              }else if (px < RGB(200,200,200) && px > RGB(190,190,190)) {
                                    clr = char(5);//'e';//2;
                              }else if (px < RGB(190,190,190) && px > RGB(180,180,180)) {
                                    clr = char(6);//'f';//2;
                              }else if (px < RGB(180,180,180) && px > RGB(170,170,170)) {
                                    clr = char(7);//'g';//2;
                              }else if (px < RGB(170,170,170) && px > RGB(160,160,160)) {
                                    clr = char(8);//'h';//2;
                              }else if (px < RGB(160,160,160) && px > RGB(150,150,150)) {
                                    clr = char(9);//'i';//2;
                              }else if (px < RGB(150,150,150) && px > RGB(140,140,140)) {
                                    clr = char(10);//'j';//2;
                              }else if (px < RGB(140,140,140) && px > RGB(130,130,130)) {
                                    clr = char(11);//'k';//2;
                              }else if (px < RGB(130,130,130) && px > RGB(120,120,120)) {
                                    clr = char(12);//'l';//2;
                              }else if (px < RGB(120,120,120) && px > RGB(110,110,110)) {
                                    clr = char(13);//'l';//2;
                              }else if (px < RGB(110,110,110) && px > RGB(100,100,100)) {
                                    clr = char(14);//'m';//2;
                              }else if (px < RGB(100,100,100) && px > RGB(90,90,90)) {
                                    clr = char(15);//'n';//2;
                              }else if (px < RGB(90,90,90) && px > RGB(80,80,80)) {
                                    clr = char(16);//'o';//2;
                              }else if (px < RGB(80,80,80) && px > RGB(70,70,70)) {
                                    clr = char(17);//'p';//2;
                              }else if (px < RGB(70,70,70) && px > RGB(60,60,60)) {
                                    clr = char(18);//'q';//2;
                              }else if (px < RGB(60,60,60) && px > RGB(50,50,50)) {
                                    clr = char(19);//'r';//2;
                              }else if (px < RGB(50,50,50) && px > RGB(40,40,40)) {
                                    clr = char(20);//'s';//2;
                              }else if (px < RGB(40,40,40) && px > RGB(30,30,30)) {
                                    clr = char(21);//'t';//2;
                              }else if (px < RGB(30,30,30) && px > RGB(20,20,20)) {
                                    clr = char(22);//'u';//2;
                              }else if (px < RGB(20,20,20) && px > RGB(10,10,10)) {
                                    clr = char(23);//'v';//2;
                              }else if (px < RGB(10,10,10) && px > RGB(0,0,0)) {
                                    clr = char(24);//'w';//2;
                              }
                        
                              if (m_bFirstRun) //First time around fill the Old buffer
                              {
                                    m_OldVideoData[x][y] = clr;
                                    
                              }else{
                                    m_CurrentVideoData[x][y] = clr;
                              }
                              
                        }
                  }
            

      }



      void PixelCompare()
      {

            for (int y = 0; y <  120 ; y++)
            {
                  for (int x = 0 ; x < 160; x++)
                  {
                  
                        if (m_OldVideoData[x][y] == m_CurrentVideoData[x][y]) m_nMatched ++;
                        
                        //Swap the old value with the new
                        m_OldVideoData[x][y] = m_CurrentVideoData[x][y];
                  }
            }

            

      };





}; //eof class


#endif //_MOTIONDETECTION_H_