You Won’t Learn This in School: Disabling Kernel Functions in Your Process

Detecting and reporting unhandled exceptions with SetUnhandledExceptionFilter seemed logical, and, in fact, it worked… for a while. Eventually, we started to notice failures that should have been reported as a last-chance exception but weren’t. After much investigation, we discovered that both Direct3D and Flash were installing their own unhandled exception filters! Worse, they were fighting over it, installing their handlers several times per second! In practice, this meant our last-chance crash reports were rarely generated, convincing us our crash metrics were better than they were. (Bad, bad libraries!)

It’s pretty ridiculous that we had to solve this problem, but, as Avery Lee says, “Just because it is not your fault does not mean it is not your problem.”

The obvious solution is to join the fray, calling SetUnhandledExceptionFilter every frame, right? How about we try something a bit more reliable… I hate implementing solutions that have obvious flaws. Thus, we chose to disable (with code modification) the SetUnhandledExceptionFilter function immediately after installing our own handler. When Direct3D and Flash try to call it, their requests will be ignored, leaving our exception handler installed.

Code modification… isn’t that scary? With a bit of knowledge and defensive programming, it’s not that bad. In fact, I’ll show you the code up front:

// If this doesn't make sense, skip the code and come back!

void lockUnhandledExceptionFilter() {
    HMODULE kernel32 = LoadLibraryA("kernel32.dll");

    if (FARPROC gpaSetUnhandledExceptionFilter = GetProcAddress(kernel32, "SetUnhandledExceptionFilter")) {
        unsigned char expected_code[] = {
            0x8B, 0xFF, // mov edi,edi
            0x55,       // push ebp
            0x8B, 0xEC, // mov ebp,esp

        // only replace code we expect
        if (memcmp(expected_code, gpaSetUnhandledExceptionFilter, sizeof(expected_code)) == 0) {
            unsigned char new_code[] = {
                0x33, 0xC0,       // xor eax,eax
                0xC2, 0x04, 0x00, // ret 4

            BOOST_STATIC_ASSERT(sizeof(expected_code) == sizeof(new_code));

            DWORD old_protect;
            if (VirtualProtect(gpaSetUnhandledExceptionFilter, sizeof(new_code), PAGE_EXECUTE_READWRITE, &old_protect)) {
                CopyMemory(gpaSetUnhandledExceptionFilter, new_code, sizeof(new_code));

                DWORD dummy;
                VirtualProtect(gpaSetUnhandledExceptionFilter, sizeof(new_code), old_protect, &dummy);

                FlushInstructionCache(GetCurrentProcess(), gpaSetUnhandledExceptionFilter, sizeof(new_code));

If that’s obvious to you, then great: We’re hiring!

Otherwise, here is an overview:

Use GetProcAddress to grab the real address of SetUnhandledExceptionFilter. (If you just type &SetUnhandledExceptionFilter you’ll get the relocatable import thunk, not the actual SetUnhandledExceptionFilter function.)

Most Windows functions begin with five bytes of prologue:

mov edi, edi ; 2 bytes for hotpatching support
push ebp     ; stack frame
mov ebp, esp ; stack frame (con't)

We want to replace those five bytes with return 0;. Remember that __stdcall functions return values in the eax register. We want to replace the above code with:

xor eax, eax ; eax = 0
ret 4        ; pops 4 bytes (arg) and returns

Also five bytes! How convenient! Before we replace the prologue, we verify that the first five bytes match our expectations. (If not, we can’t feel comfortable about the effects of the code replacement.) The VirtualProtect and FlushInstructionCache calls are standard fare for code modification.

After implementing this, it’s worth stepping through the assembly in a debugger to verify that SetUnhandledExceptionFilter no longer has any effect. (If you really enjoy writing unit tests, it’s definitely possible to unit test the desired behavior. I’ll leave that as an exercise for the reader.)

Finally, our last-chance exception reporting actually works!

27 thoughts on “You Won’t Learn This in School: Disabling Kernel Functions in Your Process”

  1. It was obvious to me how it worked, although I didn’t know about the 5-byte preamble thing. I just figured you peaked at the first five bytes of the function.

    Of course, if you asked me to write it, I probably wouldn’t have been able to. Kinda like how I can read Spanish-language newspapers but if you asked me to translate something *to* Spanish, I’d just look at you blankly.

  2. That’s great — until you run into the newer windows kernels which actively seek to prevent code modification. What happens on Win64?

  3. It’s always nice to see posts that dive all the way down to this level. I’m wondering what the impact of this is on other processes. DLLs (particularly system DLLs) are shared modules so they ought to be mapped into your process’ address space with only one instance in physical memory. When you modify the code, *all* processes that load that DLL should see the new version.

    Since you’re tinkering with kernel32.dll, all other processes ought to see the change. Have you noticed this behaviour?

  4. I guess a better way would be to do an IAT hook for direct3d and flash. That way only calls to SetUnhandledExceptionFilter from those modules will be affected. And you don’t have to do nasty code patching (which should probably be done with a detours-like approach), and is x64/x86 portable.

  5. > Since you’re tinkering with kernel32.dll, all other processes ought to see the change. Have you noticed this behaviour?

    By patching code like that will only affect the current process. DLLs are mapped with copy-on-write so it will affect only 1 process. If it didn’t, then a process can go wild and blow kernel32.dll and userland will be borked. Win95 was a long time ago. :)

  6. sounds like virus writer stuff. what protections exist to prevent this from being used to turn off antivirus?

  7. This obviously gets the job done, but I think a much “cleaner” solution would be to just hook SetUnhandledExceptionFilter using Detours or madCHook, which works much in the same way and has the same effect, but looks a lot less frightening to people trying to understand your code.

  8. I love how Kraln tries to teach you something yet doesn’t have a clue what he’s even talking about.

    On another note, this isn’t necessarily malware-writer stuff. It’s also useful if you have 3rd party code running in your process. Think plugins. Of course they could just undo your work if they’re malicious, but if they really are malicious you’ve already lost.

    FWIW, you can even consider system DLLs as 3rd party code in your own process. I’ve used a similar technique to change the behavior of shell32.dll when we encountered a bug in ShellExecute (something about ShellExecute behavior on WOW64). The code modified a helper function in shlwapi that ShellExecute/shell32 was calling into. In the end IAT hooking turned out to be sufficient, but at an intermediate stage the code rewrote the helper function’s prologue as shown here. Funky stuff.

    As for the code being obvious – I don’t think so. I certainly would not have known the byte codes off the top of my head and typically forget to call FlushInstructionCache. That mistake can bite you badly.

  9. Addendum: I found out about your blog via reddit and have already read your entire minidump/debug symbols/stack traces series. I love it! I have been cooking my own and have definitely learned a thing or two (and bookmarked this page too).

  10. I did a similar thing back a while ago on OSX.

    What I did though was a bit more ‘pacifist’ and I had set out to make my code modification as small as possible.

    I’d load a DLL into the target process via the equivalent of LoadLibrary, get the address of a function and then make the first instruction of the function you’re hooking `jmp` to a “code island”.

    Essentially, my code island was a declspec(naked) function which had a function call to the hook implementation inside, and a “ret” inline assembly at the end (meaning that the hooked function would turn into a zero stack prepared function: jmp, call, ret).

    That way, I only had to modify one instruction in the binary and had a fully C++ solution afterwards.

    Anyways, all of this has been done by mach_star which injects code and installs the island for you etc.

    And finally, I must say that the title is slightly misleading: you didn’t disable kernel functions. You disabled a kernel library function. The library is but a client. The kernel is the server.

  11. ilm: Cool, glad it was helpful! With x86 so ubiquitous, I think these types of techniques are increasingly valid.

    memet: mach_star sounds really handy. Thanks for the tip!

  12. Your approach is fine. But keep in mind that title is in fact misleading, as other have mentioned. Not because you are not disabling kernel system call, but because SetUnhandledExceptionFilter(..) is not even a system call… Sure it makes indirect calls to kernel when it makes call to VirtualQuery(), but it is not what you disable.

  13. How does the OS let you get away with this? I would think that this kind of behavior would have been prohibited by the same class of security measures implemented with DEP (though, DEP clearly doesn’t apply in this case). I could have sworn I read something about how later versions of Windows don’t allow a process to write into the executable space of a loaded module, but I can’t find the references…

  14. On Win9x, kernel DLLs (and maybe all DLLs?) are mapped into the process read-only, so this technique wouldn’t work. On NT, DLLs are mapped copy-on-write, so this modification makes a private copy of that page in your process. Thus, whenever your process calls SetUnhandledExceptionFilter, it will run the modified code, not the original.

  15. But does that technique work on the latest versions of Windows (Vista, Server 2008, Win7)??
    I thought for sure they had updated that functionality to prevent that sort of thing (unless you’re in the .NET runtime or something). I’m not sure that code-signing would be very effective against it, but isn’t there a way for a DLL to force read-only loading? or for the OS to monitor for that sort of behavior on DLLs that it knows are critical for the OS? I thought that was the whole reason why Vista was dog-slow.

    The next edition of Windows Internals won’t be available until May, so I don’t have a text reference to look, and my searches online are turning up bupkis.
    …I guess I’ll just have to try it myself when I get home.

  16. Yeah, it works on Vista too. There are lots of programs that depend on techniques like that, so I can’t imagine it would go away anytime soon.

    This is not a security violation or anything. It just affects the user-mode portion of system API calls in your process.

  17. It works with 3rd party dlls, but doesn’t work with MS CRT dlls (I tested with msvcr71d.dll). When a function in msvcr71d.dll generates an exception, Windows error report still displays.
    This code snippet will generate an exception in msvcr71d.dll:
    char * str1=””;
    char str2[20];
    strncpy(str2, str1,strlen(str1)-10);

  18. Hi Paul,

    I agree, and we patch the IAT for a ton of other functions (including HeapAlloc and some of the mm APIs for our Flash integration) but you have to remember to patch the IAT after you LoadLibrary every dynamic component.

    If you modify the function directly, the change will persist for all dynamically-loaded libraries.


  19. @Chad No you don’t – LL will not overwrite your change once the library is loaded, it’ll just increment the refcount on the DLL. Once you rig the IAT, it’ll stay rigged. This is the main mechanism as to how AppCompat shims work

  20. Hm, I’m referring to hooking IATs before libraries are loaded. We use a library called “APIHook” to patch up IATs at application start, but we load Flash and Direct3D dynamically. After loading d3d.dll or flash10b.ocx, we’d have to remember to modify their IATs too.

    Or maybe my terminology is screwed up and we’re talking about different things?

  21. For the guys finding the same for x64 architectures

    xor eax, eax ; eax = 0
    ret 4 ; pops 4 bytes (arg) and returns

    translates on x64 to

    xor rax,rax ;this time rax = 0 is the return value
    ret ;on 64 bit the caller will unwind the stack.

    this leads to the following code machine:

    unsigned char new_code[] = { 0x48, 0x33, 0xC0, 0xC3 };

    as for the part with the expected code, I haven’t checked if it is the same. maybe somebody else does it.

Leave a Reply

Your email address will not be published. Required fields are marked *