Create a Thumbnail Image of a Web Page

DanRollins
CERTIFIED EXPERT
Published:
Updated:
Here you will find source code for a C++ program that captures a thumbnail image of any web page.  You might use web page thumbnails in a visual directory of your site, or to show as a sort of "preview" placed next to a link.

I needed to develop a web page that shows a small image of the home page of several thousand sites.  We looked at using some existing programs to generate the thumbnails, but it turned out to be a simple programming task, so we did it in-house.

Overview
1) In a Web Browser control, navigate to a certain URL
2) Set the "Optical Zoom Factor" to a small value (near 25%)
3) Bitblt the resulting image to a memory DC
4) Save the image file using the CImage object.

In the EE Article, Browser Bot -- Automate Browsing Sequences with C++, I described how to set up to view a web page in a simple program.  Read that article for a detailed step-by-step.  Here, I'll just summarize:

1) Create a dialog-based MFC application.
2) Add an ActiveX control: Microsoft Web Browser and make a control-type variable for it.
3) Add a button and a button handler that will load and display the web page.
We'll assume that you have all of that in place.  For instance:
Starting point: MFC dialog-based app with ActiveX controlWe are going to use the browser control's ExecWB function to set the zoom, using the OLECMDID_OPTICAL_ZOOM verb (new with Internet Explorer 7). It lets you choose a value as low as 10%.   You can noodle around with the code, trying various values until you get a "good enough" image for your thumbnail.  But, there are some calculations you can do to zero-in on the perfect values.

We want to generate a specific size of thumbnail image, say 200x200.  We don't want to capture the scrollbars or the borders, so we set the browser control's window size a bit larger than needed:
    int nScrollbarWide= GetSystemMetrics( SM_CXVSCROLL ); 
                          int nEdgeWide= GetSystemMetrics( SM_CXEDGE ); 
                          int nNonClientWide= nScrollbarWide+nEdgeWide; 
                          int nCtlWide= nThumbnailWide+nNonClientWide;
                          int nCtlHigh= nThumbnailWide+nNonClientWide;
                       
                          m_ctlBrowser.MoveWindow( 0,0, nCtlWide,nCtlHigh );

Open in new window

Next, we need to decide on a zoom percentage.  I ended up with a 20% zoom (1/5th normal size), but here's my thinking on the subject:

Most pages these days are designed to look best at a width of about 1000.  Try it out -- go to a page and resize the browser window, making the horizontal scrollbar appear and disappear.   Also watch the top of the page where a navigation bar often appears -- that's usually the key width.   Many pages look fine with a width of 800, often just cutting off part of an advertisement bar on the right side, but I decided on a "base width" of 1000.

I wanted a thumbnail that will show that 1000 pixel-wide image in a width of 200, so I hardly needed to break out my 7th-grade math book to figure the percentage:

   200/1000 = n/100 ... n=20 (20%)
Some other examples:
  page width   800, thumbnail width 150:     150/800 = n/100 ... n=18.5  (19%)
  page width 1024, thumbnail width 180:   180/1024 = n/100 ... n=17.58 (18%)
  page width 1024, thumbnail width 100:   100/1024 = n/100 ... n= 9.976 (10%)

We can't easily control when the browser will display the scrollbars, so I found it best to assume that they are always there.   We are creating a thumbnail, which by its nature is expected to be fuzzy and lossy, so it's no big deal if I cut off a thin strip of the right side of a page, but it would look bad if I show a piece of a scrollbar.
Variations in zoom percent:  25, 19, 16Note that making the zoom too small ends up with those strips of dark gray background on the sides.  We want to avoid capturing that, so it's better to set the zoom value slightly larger rather than slightly smaller.

Snapping the Shot and Saving to a File
Finally, here's the code that takes the image rendered in the browser control and outputs it to a file:
#include <atlimage.h> // needed for CImage object
                      void DelayMs( int nMs );
                      
                      void CThumbnailWebPgDlg::OnBnClickedButton1()
                      {
                          int nThumbnailWide= 200;
                          int nThumbnailHigh= 200;
                          int nZoomPercent=   21;  // see text for calculations
                      
                          //------------------ set the size of the browser control
                          int nScrollbarWide= GetSystemMetrics( SM_CXVSCROLL ); 
                          int nEdgeWide= GetSystemMetrics( SM_CXEDGE ); 
                          int nNonClientWide= nScrollbarWide+nEdgeWide; 
                          int nCtlWide= nThumbnailWide+nNonClientWide;
                          int nCtlHigh= nThumbnailWide+nNonClientWide;
                          m_ctlBrowser.MoveWindow( 0,0, nCtlWide,nCtlHigh );
                      
                          //--------------- trick to avoid timing problems (see text)
                          CString sUrl= L"about:blank";
                          m_ctlBrowser.Navigate( sUrl, 0,0,0,0 ); 
                          while ( m_ctlBrowser.get_ReadyState() != READYSTATE_COMPLETE ) {
                              DelayMs( 100 );
                          }
                          CComVariant vZoom( nZoomPercent ); 
                          m_ctlBrowser.ExecWB( 
                              OLECMDID_OPTICAL_ZOOM, OLECMDEXECOPT_DONTPROMPTUSER, 
                              &vZoom, NULL);
                          // ---------------------------------------- now load the target page	
                          sUrl= L"http://www.experts-exchange.com/Programming/Languages/CPP/";
                          m_ctlBrowser.Navigate( sUrl, 0,0,0,0 ); 
                      
                          while ( m_ctlBrowser.get_ReadyState() != READYSTATE_COMPLETE ) {
                              DelayMs( 100 );
                          }
                          DelayMs( 3000 );  // extra delay to let flash, et al., show its image
                          //------------ Get the part of the image we want into a bitmap
                          //------------ Create a memory DC and bitmap to hold the image
                          CDC* pDC= GetDC();           
                          CDC dcMemory;
                          dcMemory.CreateCompatibleDC(pDC);
                      
                          CBitmap cBmp;
                          cBmp.CreateCompatibleBitmap( pDC, nThumbnailWide,nThumbnailHigh );
                          CBitmap* pOldBitmap = dcMemory.SelectObject( &cBmp );
                      
                          //------ copy image from the browser window to the memory DC
                          CDC* pBrowserDC= m_ctlBrowser.GetDC();
                          dcMemory.BitBlt(
                              0,0,                           // dest x,y
                              nThumbnailWide,nThumbnailHigh, 
                              pBrowserDC,
                              nEdgeWide,nEdgeWide,  // src x,y skips frame
                              SRCCOPY
                          );
                          dcMemory.SelectObject( pOldBitmap ); // standard cleanup
                      	
                          //-------------- output the thumbnail image to a file
                          //-------------- options are .JPG, .BMP and .PNG
                          CImage cImg;
                          cImg.Attach(cBmp);            
                          cImg.Save( L"c:\\temp\\temp.jpg", Gdiplus::ImageFormatJPEG );
                          // cImg.Save( L"c:\\temp\\temp.bmp"  );
                          // cImg.Save( L"c:\\temp\\temp.png", Gdiplus::ImageFormatPNG );
                      }
                      //------------- utility fn for waiting for the browser
                      void DelayMs( int nMs )
                      {
                          DWORD nMaxTick= GetTickCount()+ nMs;
                               while ( GetTickCount() < nMaxTick ) {
                                  MSG msg;
                                  while ( ( GetTickCount() < nMaxTick ) 
                                          && (::PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE) ) 
                                      ) {
                                      AfxGetApp()->PumpMessage();  // does a ::GetMessage()
                                  }
                              Sleep(1);
                          } 
                      }

Open in new window

There are a few items that need some explanation:
I wanted to do things sequentially, rather than work with the OnDocumentComplete event, so I used the technique from the previous article that lets me wait for the READYSTATE_COMPLETE status as part of a sequence of steps.
 
I could find no way to ask the browser if "Zoom rendering" is complete -- and that is an issue as it may take as much as several seconds for the browser to regenerate the page in the new size.  However, I found a workaround:

If I load an empty page and then set the zoom, that setting "sticks" and the next page is not signaled as "complete" until the page is rendered in the thumbnail size.  So the code loads about:blank, then sets the zoom, then loads the target page.  In production code, you would set the zoom just once as an initialization step, rather than for each web page.
Even with these delays, I found that a lot of popular pages use flash and other controls that may show as a blank unless I added a final delay before snapping the picture (the three-second delay on line 35).
I chose to output the file in JPG format, but as shown in the comments at the end, it is possible to output in BMP or PNG formats, as well.

In my production code, I included logic to read a series of URL records from a database and store the thumbnail images in that database, for output by the web server.  That's all just straight-forward database programming, so I've left it out of this little sample program.  200x200 and 100x100 JPG thumbnails=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author,  please click the Yes button near the:
      Was this article helpful?
label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
3
6,404 Views
DanRollins
CERTIFIED EXPERT

Comments (5)

Commented:
If the webpage is large, It appears your program only produces a picture with a portion of the webpage showing.  Is it possible to capture the entire webpage and save it to a jpeg (or other format)?
CERTIFIED EXPERT
Author of the Year 2009

Author

Commented:
Yes.  By increasing the size of the window and decreasing the zoom percentage, you can generate output that is tall and narrow -- say to see all of two or three "normal screen heights" worth of page data (albeit in very small writing and images).  However, the technique used here is, in fact, limited by the on-screen viewing area -- it cannot capture image data that is below the bottom of the screen.  

There might be an alternate way to obtain a single large JPG of a long page (say, 1000 x 20,000), possibly by using the WM_PRINT message, but that was not the focus of this article.

Commented:
I modified the above code to read in a list of URLs from a file and then generate a thumbnail for each URL.  However, when doing about 50+ URLs, my program crashes.  It appears to be a write access violation.  I believe I tracked it down to the CImage object.  If I remove these lines, everything works OK.  Do you know why this would be causing my program to crash?
CERTIFIED EXPERT
Author of the Year 2009

Author

Commented:
It looks like I failed to call m_ctlBrowser.ReleaseDC() in the per-image function (OnBnClickedButton1 in the example).  See if adding that solves the problem.  The CImage handling looks like it should work as expected.  But if there is a resource leak, it could affect objects in unexpected ways.

Commented:
I am still having problems.  Do I need to release/delete any more device contexts (pDC or dcMemory)?  How about cBmp?  Should I perform a deleteObject on that?

       

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.