Link to home
Start Free TrialLog in
Avatar of pvginkel
pvginkelFlag for Netherlands

asked on

Memory problems with Windows VCR library

First: I am using Microsoft Visual Studio 2005 with debugging libraries turned on, on a Windows XP installation.

I have problems with the debug versions of the memory functions, namely _malloc_dbg and co. The first time I got this problem was when _malloc_dbg returned NULL on a request. That time I traced it back to a stack overflow created through the WndProc with a recursive SendMessage (yes it was a bug). The problem was that I didn't get a stack overflow error. Why is that?

After that it got worse. When I solved that I stumbled upon a second instance when _malloc_dbg returned NULL. That time I found this in the output log (*** is the app name):

First-chance exception at 0x7c926a36 (ntdll.dll) in ***.exe: 0xC0000005: Access violation writing location 0x454d4545.
First-chance exception at 0x7c910f29 (ntdll.dll) in ***.exe: 0xC0000005: Access violation reading location 0x454d4545.
First-chance exception at 0x7c91b3fb (ntdll.dll) in ***.exe: 0xC0000005: Access violation reading location 0x454d4545.

The stack frames for the first write violation (I am too tired to trace the second and third; they didn't seem that important because I though that if the first was solved, the rest would follow) show as:

ntdll.dll!_RtlFreeHeapSlowly@12()  + 0x17f bytes      
ntdll.dll!_RtlDebugFreeHeap@12()  + 0x193 bytes      
ntdll.dll!_RtlFreeHeapSlowly@12()  + 0x23d19 bytes      
ntdll.dll!_RtlFreeHeap@12()  + 0x16470 bytes      
msvcr80d.dll!_free_base(void * pBlock=0x009e3fe0)  Line 109 + 0x13 bytes      C
msvcr80d.dll!_free_dbg_nolock(void * pUserData=0x009e4000, int nBlockUse=1)  Line 1329 + 0x9 bytes      C++
msvcr80d.dll!_free_dbg(void * pUserData=0x009e4000, int nBlockUse=1)  Line 1194 + 0xd bytes      C++
***.exe!dbg_safe_free(void * _Memory=0x009e4000, int _BlockType=1)  Line 84 + 0x10 bytes      C
***.exe!AppWindowWmRequestUpdate(tagREQUESTUPDATE * lpUpdate=0x009e3fb8)  Line 2135 + 0xe bytes      C
***.exe!AppWindowWndProc(HWND__ * hWnd=0x007302f6, unsigned int uMsg=1025, unsigned int wParam=0, long lParam=10371000)  Line 385 + 0x9 bytes      C
user32.dll!77d48734()       
[Frames below may be incorrect and/or missing, no symbols loaded for user32.dll]      
user32.dll!77d48816()       
user32.dll!77d4b4c0()       
user32.dll!77d4b50c()       
ntdll.dll!_KiUserCallbackDispatcher@12()  + 0x13 bytes      
user32.dll!77d491be()       
user32.dll!77d51082()       
***.exe!AppWindowMsgLoop()  Line 280 + 0x12 bytes      C
***.exe!WinMain(HINSTANCE__ * hInstance=0x00400000, HINSTANCE__ * hPrevInstance=0x00000000, char * lpCmdLine=0x00151f2c, int nCmdShow=1)  Line 50 + 0x5 bytes      C
***.exe!__tmainCRTStartup()  Line 578 + 0x35 bytes      C
***.exe!WinMainCRTStartup()  Line 403      C
kernel32.dll!_BaseProcessStart@4()  + 0x23 bytes      

I traced the value 0x454d454d back to the value of pHead->pBlockHeaderNext->pBlockHeaderPrev at dbgheap.c:1329, but that's a guess. It could very well be that 0x454d454d is a generic dead fill that just happened to be at that location.

What's really neat of this is that _free_dbg has a try / catch so that these errors don't actually throw exceptions (that I can see), so that makes it just that much harder to catch this.

Now here we go. I only get this error when I turn on a specific part of the code which is in a secondary thread. It has two modes of operation (an offline for testing and online for retail, but still debug). The notable difference between the offline and retail mode is that the retail mode accesses the internet through libCURL. Now, I've checked this code like a hundred times (it's not that complicated) and I can't find any problems there.

My guess is then that it isn't there, but that's the only thing that changes; how could it be something else?

One last thing. I get the first access error when I free some memory in the main thread allocated in the secondary thread (which isn't in the mentioned block but is also located in the offline version and doesn't throw this error in that instance) and I get the NULL from _malloc_dbg after 6 new message on my WndProc.

I suspect this to be a stack overflow (or corruption) too but I don't know how to test for this.

If ANYBODY has ANY ideas, please throw them at me. I am at a total loss.
Avatar of pvginkel
pvginkel
Flag of Netherlands image

ASKER

When I disable free in the application, I just get the last access error:

First-chance exception at 0x7c91b3fb (ntdll.dll) in ExpertPoster.exe: 0xC0000005: Access violation reading location 0x454d4545.

This is what directly causes the _malloc_dbg to return NULL. I think this definitely is heap corruption. How do I test for this? I think the _Crt* functions don't cover this.

This time the try / finally is in dbgheap.c:353. Again a stack trace:

ntdll.dll!_RtlAllocateHeapSlowly@12()  + 0x14f bytes      
ntdll.dll!_RtlDebugAllocateHeap@12()  + 0xaf bytes      
ntdll.dll!_RtlAllocateHeapSlowly@12()  + 0x2ea6c bytes      
ntdll.dll!_RtlAllocateHeap@12()  + 0xacc4 bytes      
msvcr80d.dll!_heap_alloc_base(unsigned int size=68)  Line 105 + 0x28 bytes      C
msvcr80d.dll!_heap_alloc_dbg(unsigned int nSize=32, int nBlockUse=1, const char * szFileName=0x0042f7d4, int nLine=14)  Line 409 + 0x9 bytes      C++
msvcr80d.dll!_nh_malloc_dbg(unsigned int nSize=32, int nhFlag=0, int nBlockUse=1, const char * szFileName=0x0042f7d4, int nLine=14)  Line 266 + 0x15 bytes      C++
msvcr80d.dll!_malloc_dbg(unsigned int nSize=32, int nBlockUse=1, const char * szFileName=0x0042f7d4, int nLine=14)  Line 189 + 0x1b bytes      C++
***.exe!dbg_safe_malloc(unsigned int _Size=32, int _BlockType=1, const char * _Filename=0x0042f7d4, int _LineNumber=14)  Line 52 + 0x18 bytes      C
***.exe!TransferStart(HWND__ * hWnd=0x008f02f6, void * lpUserData=0x009e3e60, char * szDescription=0x00431de8, char * szUrl=0x0012f6bc, char * szRequestForPost=0x00000000, int fUseCookie=0)  Line 14 + 0x17 bytes      C
***.exe!AppWindowSequenceProcessLogon(tagSEQUENCE * lpSequence=0x009e3e60, tagRESPONSE * lpResponse=0x009e42c0, int * lpRetry=0x0012f9d0, int * lpFinished=0x0012f9c4)  Line 2257 + 0x26 bytes      C
***.exe!AppWindowWmRequestDone(tagRESPONSE * lpResponse=0x009e42c0)  Line 2184 + 0x15 bytes      C
***.exe!AppWindowWndProc(HWND__ * hWnd=0x008f02f6, unsigned int uMsg=1026, unsigned int wParam=0, long lParam=10371776)  Line 389 + 0x9 bytes      C
user32.dll!77d48734()       
[Frames below may be incorrect and/or missing, no symbols loaded for user32.dll]      
user32.dll!77d48816()       
user32.dll!77d4b4c0()       
user32.dll!77d4b50c()       
ntdll.dll!_KiUserCallbackDispatcher@12()  + 0x13 bytes      
user32.dll!77d491be()       
user32.dll!77d51082()       
***.exe!AppWindowMsgLoop()  Line 280 + 0x12 bytes      C
***.exe!WinMain(HINSTANCE__ * hInstance=0x00400000, HINSTANCE__ * hPrevInstance=0x00000000, char * lpCmdLine=0x00151f2c, int nCmdShow=1)  Line 50 + 0x5 bytes      C
***.exe!__tmainCRTStartup()  Line 578 + 0x35 bytes      C
***.exe!WinMainCRTStartup()  Line 403      C
kernel32.dll!_BaseProcessStart@4()  + 0x23 bytes      
Avatar of jkr
>>First-chance exception at 0x7c926a36

There's a misconception and not a problem:

'First-chance exception in xxx...' just means that a function from within the 'xxx' caused an access-violation exception that was handled successfully inside the SEH frame that was active when the exception occurred. You can think of it being the same as if you use code like this:

long l;

__try // set up current SEH frame
{
CopyMemory ( &l, 0, sizeof ( long)); // read from 0x00000000
}
__except( EXCEPTION_EXECUTE_HANDLER) // handler for current frame
{
puts ( "We knew that this would go wrong...");
}

(Additional info: MS KB Article Q105675)

The article can be found at http://support.microsoft.com/support/kb/articles/q105/6/75.asp 

A first chance exception is called so as it is passed to a debugger before the application 'sees' it. This is done by sending a 'EXCEPTION_DEBUG_EVENT' to the debugger, which can now decide whether it is passed to the apllication to handle it or 'ignore' it (e.g. like an 'EXCEPTION_BREAKPOINT' aka 'int 3'). If the exception isn't handled, it becomes a '2nd chance' exception, the debugger 'sees' it the 2nd time and will usually terminate the program (without using a debugger, these exceptions end up at 'UnhandledExceptionFilter()' which will also signal the exception to the user with one of these 'nice' message boxes and terminate the program, also...)

In short: This message is only generated by a debugger & you can safely ignore it...
I understand this very well. The thing is though that that exception that is caught in dbgheap.c:353 is the exception that causes _malloc_dbg to return NULL, and is the first-chance exception I refere to.

There is one simple reason why I believe that this must be the cause of _malloc_dbg returning NULL. Exceptions are slow so it cannot be a standard practice of _malloc (well, ntdll.dll!_RtlAllocateHeapSlowly actually throws the exception, the ntdll library!) to throw an exception. And if it would be, your output logs would be riddled with first-chance exceptions. I have never seen one before in these circumstances.
For people who stumbled upon this question in search of an answer, this is what was wrong:

char * p;

// some code

p = _malloc_dbg(20);
strcat(p, "foo bar");

This currupted the heap in such a way that the effects showed up in a very strange and difficult to debug place. The correct code in this case was:

char * p;

// some code

p = _realloc_dbg(p, 20);
stract(p, "foo bar");

I hope this can help someone some day.
ASKER CERTIFIED SOLUTION
Avatar of GranMod
GranMod

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial