Link to home
Start Free TrialLog in
Avatar of VapiSoft
VapiSoft

asked on

CreateDIBSection is very slow - how to change "Hardware Acceleration"

OK, I found that if I set the "Hardware Acceleration" to "None" than it makes CreateDIBSection to work much faster
(40ms vs 400ms).
But I found that it causes games to utilitize much more CPU (in some cases).
Now I change the "Hardware Acceleration" from the registry (and I need to restart the computer).

Is there any API call that I can set the "Hardware Acceleration" to None before I use the CreateDIBSection, and then (after the call) return it back to Full ?
Avatar of mahesh1402
mahesh1402
Flag of India image

>>Is there any API call that I can set the "Hardware Acceleration" to None

I think there's no specifc API for this, and that the Display control panel applet just modifies the registry and/or SYSTEM.INI.  I'd also assume what exactly it modifies varies between operating system versions.

>>Now I change the "Hardware Acceleration" from the registry

What you changed from registry manually that you can also change registry key programatically using MFC class such as CRegKey..but then as you said you need to restart machine means changing this key's value manually / programatically does not instantaneous set the computer's hardware acceleration level..so i think its not useful for your purpose..and Other way is to implement hack to adjust hardware acceleration level slider programatically  but not sure abt this too.

What I found closer to this is.. using DDK functions ...EngQueryDeviceAttribute instructs the video driver to look the registry key's value for the hardware
acceleration level.   And DrvNotify instructs the video driver to set the hardware acceleration level to that value. BUT I am not sure of this ..
As given here http://www.osronline.com/DDKx/graphics/dpyddi_2sh3.htm <== EngQueryDeviceAttribute to query the current acceleration level and DrvNotify change the acceleration level..

But it seems there is no direct API to implement this.

-MAHESH
Avatar of VapiSoft
VapiSoft

ASKER

Hi MAHESH,

Sorroy, but as I understand the DrvNotify does not set the QDA_ACCELERATION_LEVEL but only notifies when there is a change.
I also did not understand the parameters of the EngQueryDeviceAttribute.
It tells me to get hdev from DrvCompletePDEV but there it is an IN parameter.
The only thing I have is hDC, is it the same as hdev?

In any case, I don't understand how other apps work with this CreateDIBSection if it takes so much time ???

>>CreateDIBSection is very slow

I suggest you to look as an Alternative to this using 'DrawDib' family of functions. That's what AVI playing engine uses..

Refer :
http://windowssdk.msdn.microsoft.com/en-us/library/ms708083.aspx
http://windowssdk.msdn.microsoft.com/en-us/library/ms708163.aspx

specially DrawDibDraw()  method is faster..

-MAHESH
Also NOTE : For example suppose your bitmap is 8-bpp and screen is 24-bpp, for every pixel GDI/driver needs to do a table lookup, which is not as fast as a memory copy. SO always try to change your bitmap to be the same as current display for better performance.

-MAHESH


The problem is that I need to get the DIB from the screen (I am using GetDIBits).
I try to do it without CreateDIBSection and the GetDIBits took all the time.
This is catch 22.
For two reasons (disk space and comparison) I need it in 8 bits, but when "Hardware Acceleration" is none it takes about 1/10 time (full screen about 40ms) which is almost bearable.
It seems no API is avail to set Hardware Acceleration to none....As I said you may look at alternative such as above DrawDib family functions OR..otherwise DirectX-DirectDraw..

-MAHESH
OK, I am looking at them.
I also found out that the CPU time is "wasted" in the "BitBlt" see the following.

HDC memDC   =CreateCompatibleDC(hdc);
 HBITMAP hbmp=getDIBS(hdc, wr.cx, wr.cy); // Here I do CreateDIBSection
      DWORD t3=GetTickCount();
 HBITMAP oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
      DWORD t31=GetTickCount();

==================
 BitBlt(memDC,0,0, wr.cx, wr.cy, hdc, x_offset, y_offset, SRCCOPY); // This takes 800 ms
==================
      DWORD t32=GetTickCount();
 oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
      DWORD t4=GetTickCount();
If you really mean that much faster, use DirectX as suggested. You can get a pointer directly into video memory and manage the bits yourself.

-MAHESH
Avatar of DanRollins
Off hand, I don't know why the video accelleration setting should make a difference...

However, it seems to me that the problem boils down to the fact that at some point in the sequence (you or some process) is needing to convert 24- or 32-bpp data to 8-bpp data.  That takes a lot of CPU... to build a palette that minimizes color artifacts and color loss.  It all takes place in one call, whether it is a bitblt to a 8-bbp target bitmap or a GetDiBits call.  As the documentation for that function says...

    >> If the requested format for the DIB matches its internal format, the RGB values for
    >> the bitmap are copied. If the requested format doesn't match the internal format, a
    >> color table is synthesized.

A direct copy of the 32-bit data is certain to be 10 times faster that a GetDIBits call that converts from true color to 8bpp palletized colors.

I'll bet that your best (fastest) bet would be to stay with 32-bit colors at every step of the way.  If you are doing something like transferring whole screens (as with PC-Anywhere) then you can identify the changed subset of the screen and then compress the data as the final step before transporting it.

-- Dan
It is not the problem because if look at the code, when I do the BitBlt the code the code is not doing any color conversion. Only after that in the GetDIBis , I do the conversion.
As I pointed out, BitBlt does the conversion "behind the scenes" so to speak.  It is a time-consuming process.

What options are you using in your calls to CreateDIBSection?
Currently an not using the CreateDIBSections.

HDC hdc=GetWindowDC(hwnd);
HDC memDC =CreateCompatibleDC(hdc);
HBITMAP hbmp=CreateCompatibleBitmap(hdc, wr.cx, wr.cy);  // getDIBS(hdc,wr.cx, wr.cy);
DWORD t3=GetTickCount();
HBITMAP oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
DWORD t31=GetTickCount();
BitBlt(memDC,0,0, wr.cx, wr.cy, hdc, x_offset, y_offset, SRCCOPY); <== This is where it takes 400ms
DWORD t32=GetTickCount();
DWORD t4=GetTickCount();

msg->hDIB=getPictureHandle(hbmp,memDC,wr.cy,wr.cx,msg->pixels); <== Here I do the GetDIBits

DWORD t5=GetTickCount();


Before that I used the CreateDIBSection in getDIBS, but it did not change anything.

HBITMAP getDIBS(HDC hdc, int w2, int h2)
{
 BITMAPINFO bi;
      
      bi.bmiHeader.biSize            = sizeof(BITMAPINFOHEADER);
      bi.bmiHeader.biWidth           = w2;
      bi.bmiHeader.biHeight          = h2;
      bi.bmiHeader.biPlanes          = 1;
      bi.bmiHeader.biBitCount        = 32; // Bitmap.bmBitsPixel;
      bi.bmiHeader.biCompression     = 0;
      bi.bmiHeader.biSizeImage       = 0;
      bi.bmiHeader.biXPelsPerMeter   = 0;
      bi.bmiHeader.biYPelsPerMeter   = 0;
      bi.bmiHeader.biClrUsed         = 0;
      bi.bmiHeader.biClrImportant    = 0;
      bi.bmiHeader.biSizeImage=0;

      void *start;
      return CreateDIBSection(hdc,(LPBITMAPINFO) &bi,DIB_RGB_COLORS,&start,0,0);
}


Here is my test run:

void CD22Dlg::OnButton1()
{
      CRect wr(0,0, 1000,1000 );
      int x_offset= 0;
      int y_offset= 0;
      
      HDC     hdc=   ::GetWindowDC( 0 );
      HDC     memDC= CreateCompatibleDC(hdc);
      HBITMAP hbmp=   CreateCompatibleBitmap( hdc, wr.Width(), wr.Height() );  // getDIBS(hdc,wr.cx, wr.cy);
      HBITMAP oldBmp=(HBITMAP) SelectObject(memDC,hbmp);

  DWORD   t30=GetTickCount();
      BOOL    fOK= BitBlt(memDC,0,0, wr.Width(), wr.Height(), hdc, x_offset, y_offset, SRCCOPY); //<== This is where it takes 400ms
  DWORD   t31= GetTickCount();

      DWORD nTicks30= t31-t30;

      BITMAP rBmp;
      GetObject( hbmp, sizeof(BITMAP), &rBmp );

      BITMAPINFO bi;
      memset( &bi, 0, sizeof(BITMAPINFOHEADER) );  
      bi.bmiHeader.biSize= sizeof(BITMAPINFOHEADER); // 40
      bi.bmiHeader.biWidth= rBmp.bmWidth;            // 1000
      bi.bmiHeader.biHeight= rBmp.bmHeight;          // 1000
      bi.bmiHeader.biPlanes= rBmp.bmPlanes;          // 1
      bi.bmiHeader.biBitCount= rBmp.bmBitsPixel;     // 32

      BYTE* pBuf= new BYTE[ 1000*1000*4 ];
  DWORD t40=GetTickCount();
      int n= GetDIBits( memDC, hbmp, 0, 1000, pBuf, &bi, DIB_RGB_COLORS );
  DWORD t41=GetTickCount();

      DWORD nTicks40= t41-t40;

      BYTE* pBits= 0;

  DWORD t50=GetTickCount();
      HBITMAP hbm2= CreateDIBSection( hdc, &bi, DIB_RGB_COLORS, (void**)&pBits, 0,0 );
  DWORD t51=GetTickCount();

      DWORD nTicks50= t51-t50;
}

==-=-=-=-=-=-=-=-=-
Both nTicks30 ( BitBlt to a memDC) and nTicks30 (CreateDIBSection) come out as 0 -- indicating less time than GetTickCount timer resolution (I think that is about 15ms).

nTicks40 (GetDIBits) took 95ms.

==-=-=-=-=-=-=-=-=-
You could get much larger values if the size of the bitmaps are huge and/or if you are grabbing a rectangle stat does not start on a multiple of cfour.  What are your values for wr.xx, wr.cy, x_offset and y_offset ?

-- Dan
The above times were with hardware accelleration set to FULL.

When I set the values down to "NONE" then nTicks40 (GetDIBits) wnnt DOWN to 15ms.  And Both nTicks30 (BitBlt to a memDC) went UP to 15ms.

-- Dan
Hi Dan,

As you can see, the first part is almost exacly like my code (except that I used GetWindowDc(hwnd) and you used 0).
Therefore, as I excpected, it still has the exact same problem.
The BitBlt takes about 350-400ms.

I checked it in other computers and it alwas the same (when Hardware Acceleration is set to Full).
It's interesting that I actually saw the opposite effect.  Turning hardware accelleration to NONE INCREASED the time.  Such discrepancies are obviously related to the device driver and one would expect them to be machine-dependant (i.e, there may be nothing you can do about it other than trying very hard to optimize -- avoid accesses that are not needed).

I'll go ahead and ask these again...

   Are your  bitmaps huge?  
   Are you grabbing a rectangle that does not start on a multiple of four?
   What are your values for wr.xx, wr.cy, x_offset and y_offset ?
Hi Dan,

No I an just trying to capture a window about the size of the desktop
wr.cx=1024
wr.cy=738.
x_offset=4
 y_offset =4

You tell me that for you it woks fast.
But for it works very very slow. So now I start to othink that maybe something is wrong with my libraries or the compiler settings.


Most drivers are optimized for BitBlits that move the entire bitmap.

Just as a test, see if you notice any difference when x_offset and y_offset are 0.
I tried it, it doesn't make any difference.
It does make one wonder how programs like VNC and PC Anywhere do it so quickly.  My bet is that they integrate into the device drivers to know what parts of the screen are changing ... as they change.  Thus, rather than taking snapshot after snapshot and finding differences, they know that they can confine themselves to certain (often small) areas of the screen to get all the work done about 10 times per second.

I believe that the source code for various versions of VNC are freely available, for instance here:
   http://www.koders.com/info.aspx?c=ProjectInfo&pid=AC8QNT72FM4FVWVYFKGLQ6G1LG
... just in case you want to persue that option; that is, of you want to  see how these same problems have been solved by others in the past.

-- Dan
In fact, after looking at the source, it is clear that they install a mumber of Windows Hooks and keep track of messages such as WM_PAINT to stay informed about which parts of the screen have changed.
   http://www.koders.com/c++/fid8BC8E6705291CC1F8A85F59A94C9AA0E32BA4B79.aspx

They also use a "smart" communications protocol that lets the client do a lot of the work -- based on short data packets that describe screen changes (rather than brute-force reproducing the entire screen).
Hi Dan,

My problem is not to know what part of the screen is changed. I also install hooks (although now I inderstand that it is a problem in Vista). I will look at the source to see how they read the screen.
I checked the code (vncDesktop.cpp) they do exacly like I do.
So again I am confused, why the hell my code is so slow???
I wanted to see if they have the same problem, so I downloaded the Setup file.
It installs two exe files WinVNC.exe wich is the server, and vncviewer.exe.
I did not understand how do I install the client and in general how do I work with this.
So I could not see if they have the same problem (slow).
Do you understand how to work with it?

I haven't used VNC.  It seems like setup ought to be fairly straightforward and would be covered in the docs.  Here's a FAQ:
   http://faq.gotomyvnc.com/fom-serve/cache/1.html

THis one will be of interest to you:
   Is VNC always this slow?
   http://faq.gotomyvnc.com/fom-serve/cache/58.html
   ... it is necessary that you completely disable "Hardware Acceleration" on the machines that run WinVNC (server). ...
Also, the Google Groups search:
  http://groups.google.com/groups?lnk=hpsg&hl=en&q=VNC+%22Hardware+Acceleration%22
turns up this
   Hardware Acceleration disable in code?
   http://groups.google.com/group/microsoft.public.win32.programmer.gdi/browse_frm/thread/dcdad404e2283329
and a number of other threads that are relevant.  In one...

   The reason it speeds things up for VNC is that it goes through the
   normal GDI functions to blit stuff to the screen.  VNC hooks into this
   code to tell which portions of the screen have been updated.  When
   Windows uses acceleration to draw the desktop, it bypasses this library
   and writes directly to the video card.

   VNC can still detect these changes by polling the video card RAM for
   changes to the screen, but this is slow since it has to go across the
   PCI/AGP bus
I did not understand the last part but I think that you are wrong about why it is slow.
What I saw is that is uses BitBlt as I do and this is very slow when the Hardware acceleration is on.
But I know that there are other programs like PC-Anywhere that works OK.
The question is how they overcome the problem.
ASKER CERTIFIED SOLUTION
Avatar of DanRollins
DanRollins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I checked it, it takes 0ms to do it.
From what I understood replacing the Display device driver is very complicated and very risky.
Yes.  Plus your would need to write a video device driver, which is complicated.