explorer.exe causing AV with windows hook and IAT patching
Posted on 2007-10-05
I have a DLL that is loaded into every process using a system-wide CBT hook. On DLL_PROCESS_ATTACH it checks if the process' primary module is explorer.exe, and if so, performs IAT patching of one function across all loaded modules in that process. On DLL_PROCESS_DETATCH, it checks if the process' primary module is explorer.exe, and if so, reverses the changes to the IAT.
After working with the hook DLL for a while, I noticed that occasionally the DLL wouldn't unload immediately from various random processes upon calling UnhookWindowsHookEx(). The solution was to broadcast a WM_* message [PostMessage(HWND_BROADCAST, WM_NULL, 0, 0)] which caused the straggling processes to unload the DLL from memory, letting me replace the DLL on disk. Worked great, and the DLL freed up immediately after the broadcast was sent. I don't entirely understand why sometimes it unloads quick, and other times it takes the broadcast, but for this question it probably isn't relevant.
So now I've got IAT patching of explorer.exe on DLL_PROCESS_ATTACH, which is facilitated by SetWindowsHookEx(). A bit later, UnhookWindowsHookEx() is called, a PostMessage(HWND_BROADCAST, WM_NULL, 0, 0) is called, and DLL_PROCESS_DETATCH occurs in the DLL, causing the IAT restoration in explorer.exe.
--> The Problem: Sometimes, when I go through the unhook procedure, explorer.exe crashes with an AV.
The AV occurs at instructions like TEST EAX, EAX where EAX = 1 (and other seemingly 'normal' instructions), so I'm not sure what is actually causing the AV. It's not trying to dereference a bad pointer, unless I'm completely missing something.
Additionally, I notice that it occurs a bit after a PeekMessageW call in explorer.exe, and sometimes when PeekMessageW is in the call stack (executing code from SHELL32 or USER32). Again, it AV's on a seemingly 'healthy' instruction and it's not always the same instruction.
--> Complications: When I don't patch the IAT but *do* the CBT hook/unhook and PostMessage, it doesn't AV. When I don't send the PostMessage but *do* the IAT and CBT hook/unhook, it doesn't AV. It takes both the IAT hooking and PostMessage() together to cause the AV. And it only occurs every few times that I load/unload my application (not consistent). Another note: the function I am hooking in explorer.exe is not WM_* related.
Thoughts? Ideas to further investigate what might be the source of the AV? Any ideas why it would AV on an instruction that appears to be 'healthy' and normal like TEST EAX, EAX where EAX = 1?
I'm currently building up a simplified version of the code that I think is causing the errors (hooking DLL, post message) which works fine, and will add in the IAT patching tomorrow to see if I can't get it to replicate the problem - then I'll tear the code apart bit by bit until I figure out what exactly is causing the issue. Though, I am hoping that someone on EE will have a suggestion that will solve the problem so I don't have to spend countless hours messing around with it.
--> Another note: in my most recent tests my DLL is unloading quick with my CBT hook (previously I was using a CALLWNDPROC hook, which was causing the unload delays). However, even though I might not need the PostMessage (which is one of the required pieces to repro the issue), I would still like to figure out why the AV is occuring in the scenario above so I know it's not an issue rooted in my other code.