We help IT Professionals succeed at work.

explorer.exe causing AV with windows hook and IAT patching

jimstar
jimstar asked
on
1,268 Views
Last Modified: 2008-01-09
I have a DLL that is loaded into every process using a system-wide CBT hook. On DLL_PROCESS_ATTACH it checks if the process' primary module is explorer.exe, and if so, performs IAT patching of one function across all loaded modules in that process. On DLL_PROCESS_DETATCH, it checks if the process' primary module is explorer.exe, and if so, reverses the changes to the IAT.

Fairly simple.

After working with the hook DLL for a while, I noticed that occasionally the DLL wouldn't unload immediately from various random processes upon calling UnhookWindowsHookEx(). The solution was to broadcast a WM_* message [PostMessage(HWND_BROADCAST, WM_NULL, 0, 0)] which caused the straggling processes to unload the DLL from memory, letting me replace the DLL on disk. Worked great, and the DLL freed up immediately after the broadcast was sent. I don't entirely understand why sometimes it unloads quick, and other times it takes the broadcast, but for this question it probably isn't relevant.

So now I've got IAT patching of explorer.exe on DLL_PROCESS_ATTACH, which is facilitated by SetWindowsHookEx(). A bit later, UnhookWindowsHookEx() is called, a PostMessage(HWND_BROADCAST, WM_NULL, 0, 0) is called, and DLL_PROCESS_DETATCH occurs in the DLL, causing the IAT restoration in explorer.exe.

--> The Problem: Sometimes, when I go through the unhook procedure, explorer.exe crashes with an AV.

The AV occurs at instructions like TEST EAX, EAX where EAX = 1 (and other seemingly 'normal' instructions), so I'm not sure what is actually causing the AV. It's not trying to dereference a bad pointer, unless I'm completely missing something.

Additionally, I notice that it occurs a bit after a PeekMessageW call in explorer.exe, and sometimes when PeekMessageW is in the call stack (executing code from SHELL32 or USER32). Again, it AV's on a seemingly 'healthy' instruction and it's not always the same instruction.

--> Complications: When I don't patch the IAT but *do* the CBT hook/unhook and PostMessage, it doesn't AV. When I don't send the PostMessage but *do* the IAT and CBT hook/unhook, it doesn't AV. It takes both the IAT hooking and PostMessage() together to cause the AV. And it only occurs every few times that I load/unload my application (not consistent). Another note: the function I am hooking in explorer.exe is not WM_* related.

Thoughts? Ideas to further investigate what might be the source of the AV? Any ideas why it would AV on an instruction that appears to be 'healthy' and normal like TEST EAX, EAX where EAX = 1?

I'm currently building up a simplified version of the code that I think is causing the errors (hooking DLL, post message) which works fine, and will add in the IAT patching tomorrow to see if I can't get it to replicate the problem - then I'll tear the code apart bit by bit until I figure out what exactly is causing the issue. Though, I am hoping that someone on EE will have a suggestion that will solve the problem so I don't have to spend countless hours messing around with it.

--> Another note: in my most recent tests my DLL is unloading quick with my CBT hook (previously I was using a CALLWNDPROC hook, which was causing the unload delays). However, even though I might not need the PostMessage (which is one of the required pieces to repro the issue), I would still like to figure out why the AV is occuring in the scenario above so I know it's not an issue rooted in my other code.
Comment
Watch Question

CERTIFIED EXPERT
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
If the above lengthy description wasn't enough for you, here's some more notes/thoughts:

1. Because the error occurs with PeekMessageW in the call stack or after having executed it, I'm wondering if it could be related to unhooking the windows hook. For example, maybe the DLL detached before the DLL's hooking callback was removed from the hook chain. Or, maybe explorer.exe was in the middle of processing a message when the DLL was unloaded, thus having the system call the DLL's callback (which triggers the AV). However, I would think this would cause an AV at the instruction that calls the DLL's old memory... not at TEST EAX, EAX. This also doesn't explain why I can't reproduce the error with just the CBT hook and PostMessage() - the IAT patching is required to repro it.

2. When the debugger attached to explorer.exe after the AV, the DLL that contained the hook function was not loaded into the process and a log file that I generated showed the DLL_PROCESS_DETATCH successfully finished unhooking the IAT (ie, normal unload w/o errors). So it's definitely occuring *after* the DLL finishes unloading from memory.

I know it'd be much easier if I could post some code that would repro the issue, but that will have to wait until I have a simplified version of the code available. Until then, I'm hoping that someone would have some debugging suggestions around the AV. Especially how it could AV on such a simple instruction as TEST EAX, EAX. The only thing I can think of is that the memory containing that instruction had EXECUTE access removed somehow - I haven't checked that yet.

Author

Commented:
Hi ZOPPO,

Interesting observation. There are a number of confusing possibilities that could be occuring and I'll have to do some extra careful checking of my code to see if it might be the cause.

>>Is your function to patch the IAT threadsafe?

The patching is handled entirely during DLL_PROCESS_ATTACH and DLL_PROCESS_DETATCH, which occurs during DllMain. Since this only occurs once for each process, it should be threadsafe.

>> that's just a guess, but might be it's a multithreading problem? If i.e. two explorer windows are open they may recieve the HWND_BROADCAST message within a short time span - then maybe there's some conflict when they call the same functionality.

The broadcast doesn't actually trigger any functionality of mine - its purpose is to force Windows to realize that the hook was unhooked, which then causes my library to unload. When my library unloads, that's when my cleanup occurs and the IAT is restored. However, you do bring up an interesting thought. Perhaps if there are two explorer windows, and they both receive the broadcast (not sure if two different explorer windows in the same process have separate message loops), it might be possible that both window A and B kick off message processing, A causes windows to realize it needs to remove the hook which causes the DLL to unload, and then B (still processing the message) tries to call the CBTProc function that isn't there anymore. However, it seems like this would AV on the call to the CBTProc function, not on a 'normal' instruction like TEST EAX, EAX in user32/shell32.

Author

Commented:
>> it might be possible that both window A and B kick off message processing, A causes windows to realize it needs to remove the hook which causes the DLL to unload, and then B (still processing the message) tries to call the CBTProc function that isn't there anymore

Re-reading this, I strongly suspect this isn't occuring - it's Windows' job to manage the hook proc functions and ensure this type of scenario doesn't occur. Unless something is totally broken with how Windows handles unhooking, I suspect I'm just missing the true cause of the AV.
CERTIFIED EXPERT
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
I've decided to use a different mechanism to load my code into other processes. There's just too much going on 'behind the scenes' in Windows with the SetWindowsHook/UnhookWindowsHook that could potentially be causing the issues.

Zoppo,

Thanks for the input on the problem - I do appreciate it. For the help, I went ahead and gave you the question points. Have a great weekend!
CERTIFIED EXPERT

Commented:
Hi jimstar,

I guess that's anyway a good idea - using global hook for injecting code into one special process is anyhow waste of resources ...

Thanks,

ZOPPO
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.