VapiSoft
asked on
CreateDIBSection is very slow - how to change "Hardware Acceleration"
OK, I found that if I set the "Hardware Acceleration" to "None" than it makes CreateDIBSection to work much faster
(40ms vs 400ms).
But I found that it causes games to utilitize much more CPU (in some cases).
Now I change the "Hardware Acceleration" from the registry (and I need to restart the computer).
Is there any API call that I can set the "Hardware Acceleration" to None before I use the CreateDIBSection, and then (after the call) return it back to Full ?
(40ms vs 400ms).
But I found that it causes games to utilitize much more CPU (in some cases).
Now I change the "Hardware Acceleration" from the registry (and I need to restart the computer).
Is there any API call that I can set the "Hardware Acceleration" to None before I use the CreateDIBSection, and then (after the call) return it back to Full ?
ASKER
Hi MAHESH,
Sorroy, but as I understand the DrvNotify does not set the QDA_ACCELERATION_LEVEL but only notifies when there is a change.
I also did not understand the parameters of the EngQueryDeviceAttribute.
It tells me to get hdev from DrvCompletePDEV but there it is an IN parameter.
The only thing I have is hDC, is it the same as hdev?
In any case, I don't understand how other apps work with this CreateDIBSection if it takes so much time ???
Sorroy, but as I understand the DrvNotify does not set the QDA_ACCELERATION_LEVEL but only notifies when there is a change.
I also did not understand the parameters of the EngQueryDeviceAttribute.
It tells me to get hdev from DrvCompletePDEV but there it is an IN parameter.
The only thing I have is hDC, is it the same as hdev?
In any case, I don't understand how other apps work with this CreateDIBSection if it takes so much time ???
>>CreateDIBSection is very slow
I suggest you to look as an Alternative to this using 'DrawDib' family of functions. That's what AVI playing engine uses..
Refer :
http://windowssdk.msdn.microsoft.com/en-us/library/ms708083.aspx
http://windowssdk.msdn.microsoft.com/en-us/library/ms708163.aspx
specially DrawDibDraw() method is faster..
-MAHESH
I suggest you to look as an Alternative to this using 'DrawDib' family of functions. That's what AVI playing engine uses..
Refer :
http://windowssdk.msdn.microsoft.com/en-us/library/ms708083.aspx
http://windowssdk.msdn.microsoft.com/en-us/library/ms708163.aspx
specially DrawDibDraw() method is faster..
-MAHESH
Also NOTE : For example suppose your bitmap is 8-bpp and screen is 24-bpp, for every pixel GDI/driver needs to do a table lookup, which is not as fast as a memory copy. SO always try to change your bitmap to be the same as current display for better performance.
-MAHESH
-MAHESH
ASKER
The problem is that I need to get the DIB from the screen (I am using GetDIBits).
I try to do it without CreateDIBSection and the GetDIBits took all the time.
This is catch 22.
I try to do it without CreateDIBSection and the GetDIBits took all the time.
This is catch 22.
ASKER
For two reasons (disk space and comparison) I need it in 8 bits, but when "Hardware Acceleration" is none it takes about 1/10 time (full screen about 40ms) which is almost bearable.
It seems no API is avail to set Hardware Acceleration to none....As I said you may look at alternative such as above DrawDib family functions OR..otherwise DirectX-DirectDraw..
-MAHESH
-MAHESH
Have a look at example code links I have given on your other question :
https://www.experts-exchange.com/questions/21910717/CreateDIBSection-and-Display-Setting-Hardware-acceleration.html
-MAHESH
https://www.experts-exchange.com/questions/21910717/CreateDIBSection-and-Display-Setting-Hardware-acceleration.html
-MAHESH
ASKER
OK, I am looking at them.
I also found out that the CPU time is "wasted" in the "BitBlt" see the following.
HDC memDC =CreateCompatibleDC(hdc);
HBITMAP hbmp=getDIBS(hdc, wr.cx, wr.cy); // Here I do CreateDIBSection
DWORD t3=GetTickCount();
HBITMAP oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
DWORD t31=GetTickCount();
==================
BitBlt(memDC,0,0, wr.cx, wr.cy, hdc, x_offset, y_offset, SRCCOPY); // This takes 800 ms
==================
DWORD t32=GetTickCount();
oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
DWORD t4=GetTickCount();
I also found out that the CPU time is "wasted" in the "BitBlt" see the following.
HDC memDC =CreateCompatibleDC(hdc);
HBITMAP hbmp=getDIBS(hdc, wr.cx, wr.cy); // Here I do CreateDIBSection
DWORD t3=GetTickCount();
HBITMAP oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
DWORD t31=GetTickCount();
==================
BitBlt(memDC,0,0, wr.cx, wr.cy, hdc, x_offset, y_offset, SRCCOPY); // This takes 800 ms
==================
DWORD t32=GetTickCount();
oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
DWORD t4=GetTickCount();
If you really mean that much faster, use DirectX as suggested. You can get a pointer directly into video memory and manage the bits yourself.
-MAHESH
-MAHESH
Off hand, I don't know why the video accelleration setting should make a difference...
However, it seems to me that the problem boils down to the fact that at some point in the sequence (you or some process) is needing to convert 24- or 32-bpp data to 8-bpp data. That takes a lot of CPU... to build a palette that minimizes color artifacts and color loss. It all takes place in one call, whether it is a bitblt to a 8-bbp target bitmap or a GetDiBits call. As the documentation for that function says...
>> If the requested format for the DIB matches its internal format, the RGB values for
>> the bitmap are copied. If the requested format doesn't match the internal format, a
>> color table is synthesized.
A direct copy of the 32-bit data is certain to be 10 times faster that a GetDIBits call that converts from true color to 8bpp palletized colors.
I'll bet that your best (fastest) bet would be to stay with 32-bit colors at every step of the way. If you are doing something like transferring whole screens (as with PC-Anywhere) then you can identify the changed subset of the screen and then compress the data as the final step before transporting it.
-- Dan
However, it seems to me that the problem boils down to the fact that at some point in the sequence (you or some process) is needing to convert 24- or 32-bpp data to 8-bpp data. That takes a lot of CPU... to build a palette that minimizes color artifacts and color loss. It all takes place in one call, whether it is a bitblt to a 8-bbp target bitmap or a GetDiBits call. As the documentation for that function says...
>> If the requested format for the DIB matches its internal format, the RGB values for
>> the bitmap are copied. If the requested format doesn't match the internal format, a
>> color table is synthesized.
A direct copy of the 32-bit data is certain to be 10 times faster that a GetDIBits call that converts from true color to 8bpp palletized colors.
I'll bet that your best (fastest) bet would be to stay with 32-bit colors at every step of the way. If you are doing something like transferring whole screens (as with PC-Anywhere) then you can identify the changed subset of the screen and then compress the data as the final step before transporting it.
-- Dan
ASKER
It is not the problem because if look at the code, when I do the BitBlt the code the code is not doing any color conversion. Only after that in the GetDIBis , I do the conversion.
As I pointed out, BitBlt does the conversion "behind the scenes" so to speak. It is a time-consuming process.
What options are you using in your calls to CreateDIBSection?
What options are you using in your calls to CreateDIBSection?
ASKER
Currently an not using the CreateDIBSections.
HDC hdc=GetWindowDC(hwnd);
HDC memDC =CreateCompatibleDC(hdc);
HBITMAP hbmp=CreateCompatibleBitma p(hdc, wr.cx, wr.cy); // getDIBS(hdc,wr.cx, wr.cy);
DWORD t3=GetTickCount();
HBITMAP oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
DWORD t31=GetTickCount();
BitBlt(memDC,0,0, wr.cx, wr.cy, hdc, x_offset, y_offset, SRCCOPY); <== This is where it takes 400ms
DWORD t32=GetTickCount();
DWORD t4=GetTickCount();
msg->hDIB=getPictureHandle (hbmp,memD C,wr.cy,wr .cx,msg->p ixels); <== Here I do the GetDIBits
DWORD t5=GetTickCount();
Before that I used the CreateDIBSection in getDIBS, but it did not change anything.
HBITMAP getDIBS(HDC hdc, int w2, int h2)
{
BITMAPINFO bi;
bi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
bi.bmiHeader.biWidth = w2;
bi.bmiHeader.biHeight = h2;
bi.bmiHeader.biPlanes = 1;
bi.bmiHeader.biBitCount = 32; // Bitmap.bmBitsPixel;
bi.bmiHeader.biCompression = 0;
bi.bmiHeader.biSizeImage = 0;
bi.bmiHeader.biXPelsPerMet er = 0;
bi.bmiHeader.biYPelsPerMet er = 0;
bi.bmiHeader.biClrUsed = 0;
bi.bmiHeader.biClrImportan t = 0;
bi.bmiHeader.biSizeImage=0 ;
void *start;
return CreateDIBSection(hdc,(LPBI TMAPINFO) &bi,DIB_RGB_COLORS,&start, 0,0);
}
HDC hdc=GetWindowDC(hwnd);
HDC memDC =CreateCompatibleDC(hdc);
HBITMAP hbmp=CreateCompatibleBitma
DWORD t3=GetTickCount();
HBITMAP oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
DWORD t31=GetTickCount();
BitBlt(memDC,0,0, wr.cx, wr.cy, hdc, x_offset, y_offset, SRCCOPY); <== This is where it takes 400ms
DWORD t32=GetTickCount();
DWORD t4=GetTickCount();
msg->hDIB=getPictureHandle
DWORD t5=GetTickCount();
Before that I used the CreateDIBSection in getDIBS, but it did not change anything.
HBITMAP getDIBS(HDC hdc, int w2, int h2)
{
BITMAPINFO bi;
bi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
bi.bmiHeader.biWidth = w2;
bi.bmiHeader.biHeight = h2;
bi.bmiHeader.biPlanes = 1;
bi.bmiHeader.biBitCount = 32; // Bitmap.bmBitsPixel;
bi.bmiHeader.biCompression
bi.bmiHeader.biSizeImage = 0;
bi.bmiHeader.biXPelsPerMet
bi.bmiHeader.biYPelsPerMet
bi.bmiHeader.biClrUsed = 0;
bi.bmiHeader.biClrImportan
bi.bmiHeader.biSizeImage=0
void *start;
return CreateDIBSection(hdc,(LPBI
}
Here is my test run:
void CD22Dlg::OnButton1()
{
CRect wr(0,0, 1000,1000 );
int x_offset= 0;
int y_offset= 0;
HDC hdc= ::GetWindowDC( 0 );
HDC memDC= CreateCompatibleDC(hdc);
HBITMAP hbmp= CreateCompatibleBitmap( hdc, wr.Width(), wr.Height() ); // getDIBS(hdc,wr.cx, wr.cy);
HBITMAP oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
DWORD t30=GetTickCount();
BOOL fOK= BitBlt(memDC,0,0, wr.Width(), wr.Height(), hdc, x_offset, y_offset, SRCCOPY); //<== This is where it takes 400ms
DWORD t31= GetTickCount();
DWORD nTicks30= t31-t30;
BITMAP rBmp;
GetObject( hbmp, sizeof(BITMAP), &rBmp );
BITMAPINFO bi;
memset( &bi, 0, sizeof(BITMAPINFOHEADER) );
bi.bmiHeader.biSize= sizeof(BITMAPINFOHEADER); // 40
bi.bmiHeader.biWidth= rBmp.bmWidth; // 1000
bi.bmiHeader.biHeight= rBmp.bmHeight; // 1000
bi.bmiHeader.biPlanes= rBmp.bmPlanes; // 1
bi.bmiHeader.biBitCount= rBmp.bmBitsPixel; // 32
BYTE* pBuf= new BYTE[ 1000*1000*4 ];
DWORD t40=GetTickCount();
int n= GetDIBits( memDC, hbmp, 0, 1000, pBuf, &bi, DIB_RGB_COLORS );
DWORD t41=GetTickCount();
DWORD nTicks40= t41-t40;
BYTE* pBits= 0;
DWORD t50=GetTickCount();
HBITMAP hbm2= CreateDIBSection( hdc, &bi, DIB_RGB_COLORS, (void**)&pBits, 0,0 );
DWORD t51=GetTickCount();
DWORD nTicks50= t51-t50;
}
==-=-=-=-=-=-=-=-=-
Both nTicks30 ( BitBlt to a memDC) and nTicks30 (CreateDIBSection) come out as 0 -- indicating less time than GetTickCount timer resolution (I think that is about 15ms).
nTicks40 (GetDIBits) took 95ms.
==-=-=-=-=-=-=-=-=-
You could get much larger values if the size of the bitmaps are huge and/or if you are grabbing a rectangle stat does not start on a multiple of cfour. What are your values for wr.xx, wr.cy, x_offset and y_offset ?
-- Dan
void CD22Dlg::OnButton1()
{
CRect wr(0,0, 1000,1000 );
int x_offset= 0;
int y_offset= 0;
HDC hdc= ::GetWindowDC( 0 );
HDC memDC= CreateCompatibleDC(hdc);
HBITMAP hbmp= CreateCompatibleBitmap( hdc, wr.Width(), wr.Height() ); // getDIBS(hdc,wr.cx, wr.cy);
HBITMAP oldBmp=(HBITMAP) SelectObject(memDC,hbmp);
DWORD t30=GetTickCount();
BOOL fOK= BitBlt(memDC,0,0, wr.Width(), wr.Height(), hdc, x_offset, y_offset, SRCCOPY); //<== This is where it takes 400ms
DWORD t31= GetTickCount();
DWORD nTicks30= t31-t30;
BITMAP rBmp;
GetObject( hbmp, sizeof(BITMAP), &rBmp );
BITMAPINFO bi;
memset( &bi, 0, sizeof(BITMAPINFOHEADER) );
bi.bmiHeader.biSize= sizeof(BITMAPINFOHEADER); // 40
bi.bmiHeader.biWidth= rBmp.bmWidth; // 1000
bi.bmiHeader.biHeight= rBmp.bmHeight; // 1000
bi.bmiHeader.biPlanes= rBmp.bmPlanes; // 1
bi.bmiHeader.biBitCount= rBmp.bmBitsPixel; // 32
BYTE* pBuf= new BYTE[ 1000*1000*4 ];
DWORD t40=GetTickCount();
int n= GetDIBits( memDC, hbmp, 0, 1000, pBuf, &bi, DIB_RGB_COLORS );
DWORD t41=GetTickCount();
DWORD nTicks40= t41-t40;
BYTE* pBits= 0;
DWORD t50=GetTickCount();
HBITMAP hbm2= CreateDIBSection( hdc, &bi, DIB_RGB_COLORS, (void**)&pBits, 0,0 );
DWORD t51=GetTickCount();
DWORD nTicks50= t51-t50;
}
==-=-=-=-=-=-=-=-=-
Both nTicks30 ( BitBlt to a memDC) and nTicks30 (CreateDIBSection) come out as 0 -- indicating less time than GetTickCount timer resolution (I think that is about 15ms).
nTicks40 (GetDIBits) took 95ms.
==-=-=-=-=-=-=-=-=-
You could get much larger values if the size of the bitmaps are huge and/or if you are grabbing a rectangle stat does not start on a multiple of cfour. What are your values for wr.xx, wr.cy, x_offset and y_offset ?
-- Dan
The above times were with hardware accelleration set to FULL.
When I set the values down to "NONE" then nTicks40 (GetDIBits) wnnt DOWN to 15ms. And Both nTicks30 (BitBlt to a memDC) went UP to 15ms.
-- Dan
When I set the values down to "NONE" then nTicks40 (GetDIBits) wnnt DOWN to 15ms. And Both nTicks30 (BitBlt to a memDC) went UP to 15ms.
-- Dan
ASKER
Hi Dan,
As you can see, the first part is almost exacly like my code (except that I used GetWindowDc(hwnd) and you used 0).
Therefore, as I excpected, it still has the exact same problem.
The BitBlt takes about 350-400ms.
I checked it in other computers and it alwas the same (when Hardware Acceleration is set to Full).
As you can see, the first part is almost exacly like my code (except that I used GetWindowDc(hwnd) and you used 0).
Therefore, as I excpected, it still has the exact same problem.
The BitBlt takes about 350-400ms.
I checked it in other computers and it alwas the same (when Hardware Acceleration is set to Full).
It's interesting that I actually saw the opposite effect. Turning hardware accelleration to NONE INCREASED the time. Such discrepancies are obviously related to the device driver and one would expect them to be machine-dependant (i.e, there may be nothing you can do about it other than trying very hard to optimize -- avoid accesses that are not needed).
I'll go ahead and ask these again...
Are your bitmaps huge?
Are you grabbing a rectangle that does not start on a multiple of four?
What are your values for wr.xx, wr.cy, x_offset and y_offset ?
I'll go ahead and ask these again...
Are your bitmaps huge?
Are you grabbing a rectangle that does not start on a multiple of four?
What are your values for wr.xx, wr.cy, x_offset and y_offset ?
ASKER
Hi Dan,
No I an just trying to capture a window about the size of the desktop
wr.cx=1024
wr.cy=738.
x_offset=4
y_offset =4
You tell me that for you it woks fast.
But for it works very very slow. So now I start to othink that maybe something is wrong with my libraries or the compiler settings.
No I an just trying to capture a window about the size of the desktop
wr.cx=1024
wr.cy=738.
x_offset=4
y_offset =4
You tell me that for you it woks fast.
But for it works very very slow. So now I start to othink that maybe something is wrong with my libraries or the compiler settings.
Most drivers are optimized for BitBlits that move the entire bitmap.
Just as a test, see if you notice any difference when x_offset and y_offset are 0.
Just as a test, see if you notice any difference when x_offset and y_offset are 0.
ASKER
I tried it, it doesn't make any difference.
It does make one wonder how programs like VNC and PC Anywhere do it so quickly. My bet is that they integrate into the device drivers to know what parts of the screen are changing ... as they change. Thus, rather than taking snapshot after snapshot and finding differences, they know that they can confine themselves to certain (often small) areas of the screen to get all the work done about 10 times per second.
I believe that the source code for various versions of VNC are freely available, for instance here:
http://www.koders.com/info.aspx?c=ProjectInfo&pid=AC8QNT72FM4FVWVYFKGLQ6G1LG
... just in case you want to persue that option; that is, of you want to see how these same problems have been solved by others in the past.
-- Dan
I believe that the source code for various versions of VNC are freely available, for instance here:
http://www.koders.com/info.aspx?c=ProjectInfo&pid=AC8QNT72FM4FVWVYFKGLQ6G1LG
... just in case you want to persue that option; that is, of you want to see how these same problems have been solved by others in the past.
-- Dan
In fact, after looking at the source, it is clear that they install a mumber of Windows Hooks and keep track of messages such as WM_PAINT to stay informed about which parts of the screen have changed.
http://www.koders.com/c++/fid8BC8E6705291CC1F8A85F59A94C9AA0E32BA4B79.aspx
They also use a "smart" communications protocol that lets the client do a lot of the work -- based on short data packets that describe screen changes (rather than brute-force reproducing the entire screen).
http://www.koders.com/c++/fid8BC8E6705291CC1F8A85F59A94C9AA0E32BA4B79.aspx
They also use a "smart" communications protocol that lets the client do a lot of the work -- based on short data packets that describe screen changes (rather than brute-force reproducing the entire screen).
ASKER
Hi Dan,
My problem is not to know what part of the screen is changed. I also install hooks (although now I inderstand that it is a problem in Vista). I will look at the source to see how they read the screen.
My problem is not to know what part of the screen is changed. I also install hooks (although now I inderstand that it is a problem in Vista). I will look at the source to see how they read the screen.
ASKER
I checked the code (vncDesktop.cpp) they do exacly like I do.
So again I am confused, why the hell my code is so slow???
So again I am confused, why the hell my code is so slow???
ASKER
I wanted to see if they have the same problem, so I downloaded the Setup file.
It installs two exe files WinVNC.exe wich is the server, and vncviewer.exe.
I did not understand how do I install the client and in general how do I work with this.
So I could not see if they have the same problem (slow).
Do you understand how to work with it?
It installs two exe files WinVNC.exe wich is the server, and vncviewer.exe.
I did not understand how do I install the client and in general how do I work with this.
So I could not see if they have the same problem (slow).
Do you understand how to work with it?
I haven't used VNC. It seems like setup ought to be fairly straightforward and would be covered in the docs. Here's a FAQ:
http://faq.gotomyvnc.com/fom-serve/cache/1.html
THis one will be of interest to you:
Is VNC always this slow?
http://faq.gotomyvnc.com/fom-serve/cache/58.html
... it is necessary that you completely disable "Hardware Acceleration" on the machines that run WinVNC (server). ...
http://faq.gotomyvnc.com/fom-serve/cache/1.html
THis one will be of interest to you:
Is VNC always this slow?
http://faq.gotomyvnc.com/fom-serve/cache/58.html
... it is necessary that you completely disable "Hardware Acceleration" on the machines that run WinVNC (server). ...
Also, the Google Groups search:
http://groups.google.com/groups?lnk=hpsg&hl=en&q=VNC+%22Hardware+Acceleration%22
turns up this
Hardware Acceleration disable in code?
http://groups.google.com/group/microsoft.public.win32.programmer.gdi/browse_frm/thread/dcdad404e2283329
and a number of other threads that are relevant. In one...
The reason it speeds things up for VNC is that it goes through the
normal GDI functions to blit stuff to the screen. VNC hooks into this
code to tell which portions of the screen have been updated. When
Windows uses acceleration to draw the desktop, it bypasses this library
and writes directly to the video card.
VNC can still detect these changes by polling the video card RAM for
changes to the screen, but this is slow since it has to go across the
PCI/AGP bus
http://groups.google.com/groups?lnk=hpsg&hl=en&q=VNC+%22Hardware+Acceleration%22
turns up this
Hardware Acceleration disable in code?
http://groups.google.com/group/microsoft.public.win32.programmer.gdi/browse_frm/thread/dcdad404e2283329
and a number of other threads that are relevant. In one...
The reason it speeds things up for VNC is that it goes through the
normal GDI functions to blit stuff to the screen. VNC hooks into this
code to tell which portions of the screen have been updated. When
Windows uses acceleration to draw the desktop, it bypasses this library
and writes directly to the video card.
VNC can still detect these changes by polling the video card RAM for
changes to the screen, but this is slow since it has to go across the
PCI/AGP bus
ASKER
I did not understand the last part but I think that you are wrong about why it is slow.
What I saw is that is uses BitBlt as I do and this is very slow when the Hardware acceleration is on.
But I know that there are other programs like PC-Anywhere that works OK.
The question is how they overcome the problem.
What I saw is that is uses BitBlt as I do and this is very slow when the Hardware acceleration is on.
But I know that there are other programs like PC-Anywhere that works OK.
The question is how they overcome the problem.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I checked it, it takes 0ms to do it.
From what I understood replacing the Display device driver is very complicated and very risky.
From what I understood replacing the Display device driver is very complicated and very risky.
Yes. Plus your would need to write a video device driver, which is complicated.
I think there's no specifc API for this, and that the Display control panel applet just modifies the registry and/or SYSTEM.INI. I'd also assume what exactly it modifies varies between operating system versions.
>>Now I change the "Hardware Acceleration" from the registry
What you changed from registry manually that you can also change registry key programatically using MFC class such as CRegKey..but then as you said you need to restart machine means changing this key's value manually / programatically does not instantaneous set the computer's hardware acceleration level..so i think its not useful for your purpose..and Other way is to implement hack to adjust hardware acceleration level slider programatically but not sure abt this too.
What I found closer to this is.. using DDK functions ...EngQueryDeviceAttribute
acceleration level. And DrvNotify instructs the video driver to set the hardware acceleration level to that value. BUT I am not sure of this ..
As given here http://www.osronline.com/DDKx/graphics/dpyddi_2sh3.htm <== EngQueryDeviceAttribute to query the current acceleration level and DrvNotify change the acceleration level..
But it seems there is no direct API to implement this.
-MAHESH