You Won’t Learn This in School: Disabling Kernel Functions in Your Process

Detecting and reporting unhandled exceptions with SetUnhandledExceptionFilter seemed logical, and, in fact, it worked… for a while. Eventually, we started to notice failures that should have been reported as a last-chance exception but weren’t. After much investigation, we discovered that both Direct3D and Flash were installing their own unhandled exception filters! Worse, they were fighting over it, installing their handlers several times per second! In practice, this meant our last-chance crash reports were rarely generated, convincing us our crash metrics were better than they were. (Bad, bad libraries!)

It’s pretty ridiculous that we had to solve this problem, but, as Avery Lee says, “Just because it is not your fault does not mean it is not your problem.”

The obvious solution is to join the fray, calling SetUnhandledExceptionFilter every frame, right? How about we try something a bit more reliable… I hate implementing solutions that have obvious flaws. Thus, we chose to disable (with code modification) the SetUnhandledExceptionFilter function immediately after installing our own handler. When Direct3D and Flash try to call it, their requests will be ignored, leaving our exception handler installed.

Code modification… isn’t that scary? With a bit of knowledge and defensive programming, it’s not that bad. In fact, I’ll show you the code up front:

// If this doesn't make sense, skip the code and come back!

void lockUnhandledExceptionFilter() {
    HMODULE kernel32 = LoadLibraryA("kernel32.dll");
    Assert(kernel32);

    if (FARPROC gpaSetUnhandledExceptionFilter = GetProcAddress(kernel32, "SetUnhandledExceptionFilter")) {
        unsigned char expected_code[] = {
            0x8B, 0xFF, // mov edi,edi
            0x55,       // push ebp
            0x8B, 0xEC, // mov ebp,esp
        };

        // only replace code we expect
        if (memcmp(expected_code, gpaSetUnhandledExceptionFilter, sizeof(expected_code)) == 0) {
            unsigned char new_code[] = {
                0x33, 0xC0,       // xor eax,eax
                0xC2, 0x04, 0x00, // ret 4
            };

            BOOST_STATIC_ASSERT(sizeof(expected_code) == sizeof(new_code));

            DWORD old_protect;
            if (VirtualProtect(gpaSetUnhandledExceptionFilter, sizeof(new_code), PAGE_EXECUTE_READWRITE, &old_protect)) {
                CopyMemory(gpaSetUnhandledExceptionFilter, new_code, sizeof(new_code));

                DWORD dummy;
                VirtualProtect(gpaSetUnhandledExceptionFilter, sizeof(new_code), old_protect, &dummy);

                FlushInstructionCache(GetCurrentProcess(), gpaSetUnhandledExceptionFilter, sizeof(new_code));
            }
        }
    }
    FreeLibrary(kernel32);
}

If that’s obvious to you, then great: We’re hiring!

Otherwise, here is an overview:

Use GetProcAddress to grab the real address of SetUnhandledExceptionFilter. (If you just type &SetUnhandledExceptionFilter you’ll get the relocatable import thunk, not the actual SetUnhandledExceptionFilter function.)

Most Windows functions begin with five bytes of prologue:

mov edi, edi ; 2 bytes for hotpatching support
push ebp     ; stack frame
mov ebp, esp ; stack frame (con't)

We want to replace those five bytes with return 0;. Remember that __stdcall functions return values in the eax register. We want to replace the above code with:

xor eax, eax ; eax = 0
ret 4        ; pops 4 bytes (arg) and returns

Also five bytes! How convenient! Before we replace the prologue, we verify that the first five bytes match our expectations. (If not, we can’t feel comfortable about the effects of the code replacement.) The VirtualProtect and FlushInstructionCache calls are standard fare for code modification.

After implementing this, it’s worth stepping through the assembly in a debugger to verify that SetUnhandledExceptionFilter no longer has any effect. (If you really enjoy writing unit tests, it’s definitely possible to unit test the desired behavior. I’ll leave that as an exercise for the reader.)

Finally, our last-chance exception reporting actually works!

Reporting Crashes in IMVU: Last-Chance Exceptions

So far, our crash reporting is looking pretty comprehensive. But what if there is a crash in the crash reporting itself? Or perhaps a crash on another thread (outside of any __try ... __except blocks)? Or, heaven forbid, we somehow cause Python itself to crash? In all of these situations, we can’t count on our Python crash reporting code to handle the error.

There are a couple ways to report these failures, and IMVU chose to implement a last-chance exception handler with SetUnhandledExceptionFilter. This handler runs whenever a structured exception bubbles out of any thread in the process. Unfortunately, you can’t run arbitrary code in the handler – maybe your process’s heap is corrupted and further attempts to allocate will cause another access violation.

By the way, the last-chance handler is your opportunity to run code just before the Windows “this program has performed an illegal operation” dialog.

Access Violation

In IMVU’s last-chance handler, we do what we can to save the state of the failure to disk. The next time the client starts, if failure data exists, the client submits it to the server. (Assuming the customer tries to run the client again.) Here is is the last-chance handler’s implementation:

LONG WINAPI handleLastChanceException(
	EXCEPTION_POINTERS* ExceptionInfo
) {
	DWORD error = writeMiniDump(
		ExceptionInfo);
	if (error) {
		CRASHLOG("Failed to write minidump");
	}

	// Try to record some additional system
	// info if we can.

	// Show the application error dialog.
	return EXCEPTION_CONTINUE_SEARCH;
}

Pretty easy! Again, try to restrict yourself to system calls in your last-chance handler.

When we first implemented this, we found out that the IMVU client was crashing a ton and we didn’t even know! Were I to start over from scratch, I’d implement the broadest crash detection possible first, and then implement narrower, more-detailed detection as necessary.

Next time I’ll talk about an unexpected shortcoming of this implementation. (Can you guess what it is?)

Reporting Crashes in IMVU: Creating Your Very Own Symbol Server

With minidumps or a bit of hand-rolled code, it’s pretty easy to report symbolic C++ stack traces whenever your application crashes. But reporting is just one side of the coin. Once you begin collecting crash reports, you’ll need a way to read them. As I mentioned before, you can generate some MAP files and look up your functions manually, but Microsoft provides some lesser-known tools that take care of this for you.

Why should I run my own symbol server?

  • Once you create a symbol server, you can easily debug a release build of your program from any computer with network access to the symbol server.
  • Existing tools such as WinDbg and Visual Studio play nicely with symbol servers.
  • Creating a symbol server is easy: it requires a single download and a directory on a network share.
  • Symbol servers can be cascaded. Microsoft runs a symbol server for their operating system DLLs, so entries from both your code and system DLLs will be displayed in a single call stack.

An example of a call stack containing IMVU code and Windows code.

What is a symbol server?

If you compile your program with the correct options, the compiler and linker will generate symbol files (PDBs). PDBs contain information required to parse stack traces and locate identifiers in your program. A PDB also contains the signature of the DLL or EXE associated with it. A symbol server is a directory of PDBs, organized by signature. Thus, given a DLL or EXE, you can find its PDB signature (the PdbSig70 and PdbAge fields in the corresponding debug PE header) and look up the PDB in
the symbol server directory.

Creating the symbol server

Download and install the Debugging
Tools for Windows
. Make sure symstore.exe is in your path.

Create a directory on an accessible network share. Note the full path; you’ll need it later. Ours is \\hydra\devel\SymbolServer.

Every time you release your software, add the generated PDBs to the symbol server. Our release script runs this command:

symstore.exe add /s \\hydra\devel\SymbolServer /compress /r /f *.pdb /t IMVUClient

This command takes a while to run, but it simply searches for PDBs and indexes them in the SymbolServer directory by their signature so Visual Studio and WinDbg can find them later.

Using the symbol server

There are generally two ways to coerce your debugger to resolve call stacks with your symbol server. You can either set the _NT_SYMBOL_PATH environment variable or configure the debugger with the paths directly.

The syntax for _NT_SYMBOL_PATH is a bit wonky, but it should look something like this:

srv*c:\localsymbols*\\hydra\devel\SymbolServer*http://msdl.microsoft.com/download/symbols

_NT_SYMBOL_PATH behaves just like any other PATH: paths are searched in order until the appropriate PDB is found. To make future lookups faster, if a PDB is found on the network, it will be copied to the local path.

The last entry in the list is Microsoft’s public symbol server. This allows the debugger to find function names for Microsoft operating system DLLs.

Each debugger generally has its own mechanism for configuring symbol servers. Here’s Visual Studio 2005’s symbol path editor:

MSDev Symbol Path Configuration

To save time, you may prefer using _NT_SYMBOL_PATH over per-debugger configuration, since it will work with all debuggers.

I hope this information was useful. I certainly wish we’d discovered it years ago. Happy debugging!

Handling exceptions from XULRunner callbacks

(I was composing an e-mail to IMVU’s engineering team when I realized this information was generally applicable to anyone embedding XULRunner into their application. Hope it’s useful.)

XULRunner is written in a subset of C++ that we’ll call XPCOM. An embedded XULRunner window communicates back to the host application through XPCOM interfaces that the host implements. In IMVU, we generally use C++ exceptions to signal failure. On the other hand, XPCOM uses nsresult error codes. Specifically, XULRunner is not written to support C++ exceptions, nor is it compiled with them enabled. (Note that compiling with exceptions enabled is not sufficient to guarantee defined behavior when they’re thrown. You must use techniques like RAII to properly unwind and restore state if an exception is thrown.)

If XULRunner is calling into our code, and our code uses exceptions to signal failure, and throwing an exception through a XULRunner call stack results in undefined behavior, what do we do? This is the strategy I took:

In every method of every IMVU implementation of an XPCOM interface, I bracketed the function body with IMVU_BEGIN_DISALLOW_EXCEPTIONS_XPCOM and IMVU_END_DISALLOW_EXCEPTIONS_XPCOM. For example:

nsresult xpcom_method_runtime_error_with_error_info() {
    IMVU_BEGIN_DISALLOW_EXCEPTIONS_XPCOM {
        boost::throw_exception(std::runtime_error("error"));
    } IMVU_END_DISALLOW_EXCEPTIONS_XPCOM;
}

These two macros generate a try ... catch clause that handles every C++ exception thrown from the body, returning NS_ERROR_UNEXPECTED to the caller.

If the exception thrown is a Python error (boost::python::error_already_set), then the Python exception is pulled (PyErr_Fetch) and scheduled to be reraised (PyErr_Restore) in the next iteration through the IMVU client’s message loop.

If the exception thrown is a C++ exception, we’d like to take the same approach. However, C++0x has not shipped, so there’s no built-in mechanism for transferring exceptions across contexts. Thus, we take advantage of the boost::exception framework to copy and rethrow the exception from the main message loop. Unfortunately, you can’t just “throw X()”. You have to use boost::throw_exception, which enables the machinery for current_exception() and rethrow_exception(). To enforce this requirement, I have modified our C++ exception hierarchy so that you must use X::throw_(arguments) instead of throw X(arguments).

If the exception thrown is a C++ exception but not a subclass of std::exception, then we catch it with catch (...) or std::uncaught_exception() in a sentry object’s destructor, and raise a structured exception to at least indicate that this is occurring in the field.

For reference, here is the implementation:

void handlePythonError();
void handleStandardException(const std::exception& e);

#define IMVU_BEGIN_DISALLOW_EXCEPTIONS         \
    DisallowExceptionsSentry PP_UNIQUE_NAME(); \
    try

#define IMVU_END_DISALLOW_EXCEPTIONS(block)                             \
    catch (const boost::python::error_already_set&) {               \
        handlePythonError();                                            \
        block ;                                                         \
    }                                                                   \
    catch (const std::exception& e) {                               \
        handleStandardException(e);                                     \
        block ;                                                         \
    }

#define IMVU_BEGIN_DISALLOW_EXCEPTIONS_XPCOM IMVU_BEGIN_DISALLOW_EXCEPTIONS
#define IMVU_END_DISALLOW_EXCEPTIONS_XPCOM IMVU_END_DISALLOW_EXCEPTIONS({ return NS_ERROR_UNEXPECTED; })

#define IMVU_DISALLOW_EXCEPTIONS_XPCOM(block)                           \
    IMVU_BEGIN_DISALLOW_EXCEPTIONS_XPCOM {                              \
        block ;                                                         \
    } IMVU_END_DISALLOW_EXCEPTIONS_XPCOM

And the source file:

DisallowExceptionsSentry::~DisallowExceptionsSentry() {
    if (std::uncaught_exception()) {
        RaiseException(EXC_EXCEPTIONS_NOT_ALLOWED, 0, 0, 0);
    }
}

/*
 * On error handling and the GIL.  Gecko's event handlers run during
 * the message loop, which means the GIL is not held.  Calls back into
 * Python require that the GIL be reacquired.  If the Python call
 * fails, the GIL is released (while error_already_set is unwinding
 * the stack).  The GIL must be reacquired to grab the exception
 * information and marshal it to the main thread.
 *
 * However, PumpWaitingMessages releases the GIL too!  Thus, reraising
 * the error on the main thread requires GIL reacquisition.
 *
 * If another thread acquires the GIL and blocks on the main thread's
 * message pump, a deadlock will occur.  Thus, secondary threads
 * should never block on the main thread's message pump.
 */

void reraisePythonError(PyObject* type, PyObject* value, PyObject* traceback) {
    HoldPythonGIL PP_UNIQUE_NAME();
    PyErr_Restore(type, value, traceback);
    boost::python::throw_error_already_set();
}

void error_but_no_error() {
    throw std::runtime_error("error_already_set but no Python error?");
}

void handlePythonError() {
    PyObject* type;
    PyObject* value;
    PyObject* traceback;
    {
        HoldPythonGIL PP_UNIQUE_NAME();
        PyErr_Fetch(&type, &value, &traceback);
    }
    if (type) {
        boost::function<void()> fn(boost::bind(reraisePythonError, type, value, traceback));
        queueMainThreadJob(fn);
    } else {
        queueMainThreadJob(error_but_no_error);
    }
}

void rethrowStandardException(const std::string& s) {
    std::string prefix("Unknown std::exception: ");
    throw std::runtime_error(prefix + s);
}

void handleStandardException(const std::exception& e) {
    if (boost::exception_detail::get_boost_exception(&e)) {
        queueMainThreadJob(boost::bind(
            boost::rethrow_exception,
            boost::current_exception()));
    } else {
        queueMainThreadJob(boost::bind(rethrowStandardException, std::string(e.what())));
    }
}

Reporting Crashes in IMVU: C++ Call Stacks

Last time, we talked about including contextual information to help us
actually fix crashes that happen in the field. Minidumps are a great
way to easily save a snapshot of the most important parts of a running
(or crashed) process, but it’s often useful to understand the
low-level mechanics of a C++ call stack (on x86). Given some basic
principles about function calls, we will derive the implementation
of code to walk a call stack.

C++ function call stack entries are stored on the x86 stack, which
grows downward in memory. That is, pushing on the stack subtracts
from the stack pointer. The ESP register points to the
most-recently-written item on the stack; thus, push eax
is equivalent to:

sub esp, 4
mov [esp], eax

Let’s say we’re calling a function:

int __stdcall foo(int x, int y)

The __stdcall
calling convention pushes arguments onto the stack from right to left
and returns the result in the EAX register, so calling
foo(1, 2) generates this code:

push 2
push 1
call foo
; result in eax

If you aren’t familiar with assembly, I know this is a lot to absorb,
but bear with me; we’re almost there. We haven’t seen the
call instruction before. It pushes the EIP
register, which is the return address from the called function onto
the stack and then jumps to the target function.
If we didn’t store the instruction pointer, the called function would
not know where to return when it was done.

The final piece of information we need to construct a C++ call stack is
that functions live in memory, functions have names, and thus sections
of memory have names. If we can get access to a mapping of memory
addresses to function names (say, with the /MAP
linker option
), and we can read instruction pointers up the call
stack, we can generate a symbolic stack trace.

How do we read the instruction pointers up the call stack?
Unfortunately, just knowing the return address from the current
function is not enough. How do you know the location of the caller’s
caller? Without extra information, you don’t. Fortunately, most
functions have that information in the form of a function prologue:

push ebp
mov ebp, esp

and epilogue:

mov esp, ebp
pop ebp

These bits of code appear at the beginning and end of every function, allowing you
to use the EBP register as the “current stack frame”.
Function arguments are always accessed at positive offsets from EBP,
and locals at negative offsets:

; int foo(int x, int y)
; ...
[EBP+12] = y argument
[EBP+8]  = x argument
[EBP+4]  = return address (set by call instruction)
[EBP]    = previous stack frame
[EBP-4]  = local variable 1
[EBP-8]  = local variable 2
; ...

Look! For any stack frame EBP, the caller’s address is
at [EBP+4] and the previous stack frame is at [EBP].
By dereferencing EBP, we can walk
the call stack, all the way to the top!

struct stack_frame {
    stack_frame*  previous;
    unsigned long return_address;
};

std::vector<unsigned long> get_call_stack() {
    std::vector<unsigned long> call_stack;

    stack_frame* current_frame;
    __asm mov current_frame, ebp

    while (!IsBadReadPtr(current_frame, sizeof(stack_frame))) {
        call_stack.push_back(current_frame->return_address);
        current_frame = current_frame->previous;
    }
    return call_stack;
}

// Convert the array of addresses to names with the aforementioned MAP file.

Yay, now we know how to grab a stack trace from any location in the
code. This implementation is not robust, but the concepts are
correct: functions have names, functions live in memory, and we can
determine which memory addresses are on the call stack. Now that you
know how to manually grab a call stack, let Microsoft do the heavy
lifting with the StackWalk64
function.

Next time, we’ll talk about setting up your very own Microsoft Symbol Server so you can
grab accurate function names from every version of your software.

Reporting Crashes in IMVU: Call Stacks and Minidumps

So far, we’ve implemented reporting for Python exceptions that bubble
out of the main loop
, C++ exceptions that bubble into Python (and then
out of the main loop), and structured exceptions that bubble into
Python
(and then out of the main loop.) This is a fairly
comprehensive set of failure conditions, but there’s still a big piece
missing from our reporting.

Imagine that you implement this error reporting and have customers try
the new version of your software. You’ll soon have a collection of
crash reports, and one thing will stand out clearly. Without the
context in which crashes happened (call stacks, variable values,
perhaps log files), it’s very hard to determine their cause(s). And
without determining their cause(s), it’s very hard to fix them.

Reporting log files are easy enough. Just attach them to the error
report. You may need to deal with privacy concerns or limit the size
of the log files that get uploaded, but those are straightforward
problems.

Because Python has batteries
included
, grabbing the call stack from a Python exception is
trivial. Just take a quick look at the traceback
module
.

Structured exceptions are a little harder. The structure of a call
stack on x86 is machine- and sometimes compiler-dependent.
Fortunately, Microsoft provides an API to dump the relevant process
state to a file such that it can be opened in Visual
Studio
or WinDbg,
which will let you view the stack trace and select other data. These
files are called minidumps, and they’re pretty small. Just call MiniDumpWriteDump
with the context of the exception and submit the generated file with your crash
report.

Grabbing a call stack from C++ exceptions is even harder, and maybe
not desired. If you regularly use C++ exceptions for communicating
errors from C++ to Python, it’s probably too expensive to grab a call
stack or write a minidump every single time. However, if you want to
do it anyway, here’s one way.

C++ exceptions are implemented on top of the Windows kernel’s
structured exception machinery. Using the try and
catch statements in your C++ code causes the compiler to
generate SEH code behind the scenes. However, by the time your C++
catch clauses run, the stack has already been unwound. Remember
that SEH has three passes: first it runs filter expressions until it
finds one that can handle the exception; then it unwinds the stack
(destroying any objects allocated on the stack); finally it runs the
actual exception handler. Your C++ exception handler runs as the last stage,
which means the stack has already been unwound, which means you can’t
get an accurate call stack from the exception handler. However, we
can use SEH to grab a call stack at the point where the exception was
thrown, before we handle it…

First, let’s determine the SEH exception code of C++ exceptions
(WARNING, this code is compiler-dependent):

int main() {
    DWORD code;
    __try {
        throw std::exception();
    }
    __except (code = GetExceptionCode(), EXCEPTION_EXECUTE_HANDLER) {
        printf("%X\n", code);
    }
}

Once we have that, we can write our exception-catching function like
this:

void throw_cpp_exception() {
    throw std::runtime_error("hi");
}

bool writeMiniDump(const EXCEPTION_POINTERS* ep) {
    // ...
    return true;
}

void catch_seh_exception() {
    __try {
        throw_cpp_exception();
    }
    __except (
        (CPP_EXCEPTION_CODE == GetExceptionCode()) && writeMiniDump(GetExceptionInformation()),
        EXCEPTION_CONTINUE_SEARCH
    ) {
    }
}

int main() {
    try {
        catch_seh_exception();
    }
    catch (const std::exception& e) {
        printf("%s\n", e.what());
    }
}

Now we’ve got call stacks and program state for C++, SEH, and Python
exceptions, which makes fixing reported crashes dramatically easier.

Next time I’ll go into more detail about how C++ stack traces work,
and we’ll see if we can grab them more efficiently.

Reporting Crashes in IMVU: Structured Exceptions

Previously, we discussed the implementation of automated reporting of
unhandled C++ exceptions
. However, if you’ve ever programmed in C++,
you know that C++ exceptions are not the only way your code can fail.
In fact, the most common failures probably aren’t C++ exceptions at
all. You know what I’m referring to: the dreaded access violation
(sometimes called segmentation fault).

Access Violation

How do we detect and report access violations? First, let’s talk
about what an access violation actually is.

Your processor has a mechanism for detecting loads and stores from
invalid memory addresses. When this happens, it raises an interrupt,
which Windows exposes to the program via Structured Exception Handling
(SEH). Matt Pietrek has written an excellent article on how
SEH works
, including a description of C++ exceptions implemented
on top of SEH. The gist is that there is a linked list of stack
frames that can possibly handle the exception. When an exception
occurs, that list is walked, and if an entry claims it can handle it,
it does. Otherwise, if no entry can handle the exception, the program
is halted and the familiar crash dialog box is displayed to the user.

OK, so access violations can be detected with SEH. In fact, with the
same mechanism, we can detect all other types of structured
exceptions, including division by zero and stack overflow. What does
the code look like? It’s approximately:

bool handle_exception_impl_seh(function f) {
    __try {
        // This is the previously-described C++ exception handler.
        // For various reasons, they need to be in different functions.
        // C++ exceptions are implemented in terms of SEH, so the C++
        // exception handling must be deeper in the call stack than
        // the structured exception handling.
        return handle_exception_impl_cpp(f);
    }
    // catch all structured exceptions here
    __except (EXCEPTION_EXECUTE_HANDLER) {
        PyErr_SetString(PyExc_RuntimeError, "Structured exception in C++ function");
        return true; // an error occurred
    }
}

Note the __try and __except keywords. This means we’re using
structured exception handling, not C++ exception handling. The filter
expression in the __except statement evaluates to
EXCEPTION_EXECUTE_HANDLER, indicating that we always want to handle
structured exceptions. From the filter expression, you can optionally
use the GetExceptionCode
and GetExceptionInformation
intrinsics to access information about the actual error.

Now, if you write some code like:

Object* o = 0;
o->method(); // oops!

The error will be converted to a Python exception, and reported
with our existing mechanism. Good enough for now! However, there are
real problems with this approach. Can you think of them?

Soon, I’ll show the full implementation of the structured
exception handler.

Reporting Crashes in IMVU: Part II: C++ Exceptions

A year ago, I explained
how the IMVU client automatically reports unexpected Python exceptions
(crashes) to us. I intended that post to be the first of a long
series that covered all of the tricks we use to detect and report
abnormal situations. Clearly, my intentions have not played out yet,
so I am going to pick up that series by describing how we catch
exceptions that occur in our C++ code. Without further ado,

Reporting C++ Exceptions

As discussed earlier, IMVU’s error handling system can handle any
Python exception that bubbles out of the client’s main loop and
automatically report the failure back to us so that we can fix it for
the next release. However, our application is a
combination of Python and C++, so what happens if our C++ code has a
bug and raises an uncaught C++ exception, such as std::bad_alloc
or std::out_of_range?

Most of our C++ code is exposed to Python via the excellent
Boost.Python library, which automatically catches C++ exceptions at
the boundary and translates them to Python exceptions. The
translation layer looks something like this:

bool handle_errors(function fn) {
    try {
        fn();
        return false; // no error
    }
    catch (const std::runtime_error& e) {
        // raise RuntimeError into Python
        PyErr_SetString(PyExc_RuntimeError, e.what());
    }
    catch (const std::bad_alloc&) {
        // raise MemoryError into Python
        PyErr_SetString(PyExc_MemoryError, "out of memory");
    }
    catch (const std::exception& e) {
        // raise Exception into Python
        PyErr_SetString(PyExc_Exception, e.what());
    }
    catch (...) {
        PyErr_SetString(PyExc_Exception, "Unknown C++ exception");
    }
    return true;
}

Thus, any C++ exception that’s thrown by the C++ function is
caught by Boost.Python and reraised as the appropriate Python
exception, which will already be handled by the previously-discussed
crash reporting system.

Let’s take another look at the client’s main loop:

def mainLoop():
    while running:
        pumpWindowsMessages()
        updateAnimations()
        redrawWindows()

def main():
    try:
        mainLoop()
    except:
        # includes exception type, exception value, and python stack trace
        error_information = sys.exc_info()
        if OK == askUserForPermission():
            submitError(error_information)

If the C++ functions called from updateAnimations() or redrawWindows()
raise a C++ exception, it will be caught by the Python error-handling
code and reported to us the same way Python
exceptions are.

Great! But is this a complete solution to the problem? Exercise
for the reader: what else could go wrong here? (Hint: we use Visual
Studio 2005 and there was a bug in catch (…) that Microsoft fixed in
Visual Studio 2008…)

Evaluating JavaScript in an Embedded XULRunner/Gecko Window

I intended to write something with more substance tonight, but I’m
exhausted from wrasslin’ with Gecko/XULRunner/SpiderMonkey in a
days-long marathon debugging session. None of you will understand
this entry, because its intent is to contain enough keywords and
content that others don’t have to go through the pain that I did.

If you’re embedding Gecko/XULRunner/SpiderMonkey into your
application, and you want to evaluate some JavaScript in the context
of an nsIDOMWindow or nsIWebBrowser, you’d think you’d have many
approaches. You could call JS_EvaluateScript or JS_EvaluateUCScript
directly, getting the JSContext from the nsIScriptContext and the
JSObject* global from the nsIScriptGlobalObject… However, I simply
could not get this to work: I kept running into crazy errors inside of
JS_InitArrayClass. I still don’t understand those errors.

People suggested using EvaluateString and EvaluateStringWithValue on
nsIScriptContext, but that failed in an empty window (I define empty
as not having called nsIWebNavigation::LoadURI) because it did not
have a security principal (nsIPrincipal). Eventually I learned that
you can grab the system principal from the nsIScriptSecurityManager
service and pass that directly to EvaluateStringWithValue. With a few
more minor details, this approach worked in all cases that we care
about so far!

Here is the final magic incantation:

typedef std::map<jsval, boost::python::object> ReferenceMap;

boost::python::object GeckoWindow::evalJavaScript(const std::wstring& js) {
    nsresult rv;

    nsCOMPtr<nsIPrincipal> principal;
    nsCOMPtr<nsIScriptSecurityManager> secMan = do_GetService(
        NS_SCRIPTSECURITYMANAGER_CONTRACTID);
    rv = secMan->GetSystemPrincipal(getter_AddRefs(principal));
    if (NS_FAILED(rv)) {
        throw GeckoError("Failed to get system principal");
    }

    nsCOMPtr<nsIScriptGlobalObject> sgo = do_GetInterface(webBrowser);
    nsCOMPtr<nsIScriptContext> ctx = sgo->GetContext();

    JSContext* cx = reinterpret_cast<JSContext*>(ctx->GetNativeContext());
    Assert(cx);
    uint32 previous = JS_SetOptions(
        cx,
        JS_GetOptions(cx) | JSOPTION_DONT_REPORT_UNCAUGHT);

    jsval out;
    rv = ctx->EvaluateStringWithValue(
        nsString(js.data(), js.size()),
        sgo->GetGlobalJSObject(),
        principal,
        "mozembed",
        0,
        nsnull,
        &out,
        nsnull);

    JS_SetOptions(cx, previous);

    JSAutoRequest ar(cx);
    JSAutoLocalRootScope alrs(cx);

    maybeThrowPythonExceptionFromJsContext(cx);

    if (NS_SUCCEEDED(rv)) {
        ReferenceMap references;
        return buildPythonObjectFromJsval(references, cx, out);
    } else {
        throw GeckoEvalUnknownError("eval failed with no exception set");
    }
}

void GeckoWindow::maybeThrowPythonExceptionFromJsContext(JSContext* cx) {
    jsval exception;
    if (JS_GetPendingException(cx, &exception)) {
        JS_ClearPendingException(cx);
        ReferenceMap references;
        boost::python::object py_exc_value(buildPythonObjectFromJsval(
            references,
            cx,
            exception));
        throw GeckoEvalError(py_exc_value.ptr());
    }
}

boost::python::object GeckoWindow::buildPythonObjectFromJsval(
    ReferenceMap& references,
    JSContext* cx,
    const jsval v
) {
    using namespace boost::python;

    if (v == JSVAL_TRUE) {
        return object(handle<>(Py_True));
    } else if (v == JSVAL_FALSE) {
        return object(handle<>(Py_False));
    } else if (v == JSVAL_NULL) {
        return object(handle<>(Py_None));
    } else if (v == JSVAL_VOID) {
        return object(handle<>(Py_None));
    } else if (JSVAL_IS_INT(v)) {
        return object(handle<>(PyInt_FromLong(JSVAL_TO_INT(v))));
    } else if (JSVAL_IS_NUMBER(v)) {
        return object(handle<>(PyFloat_FromDouble(*JSVAL_TO_DOUBLE(v))));
    // } else if (JSVAL_IS_STRING(v)) {
    //
    } else if (JSVAL_IS_OBJECT(v)) {
        JSObject* obj = JSVAL_TO_OBJECT(v);

        if (references.count(v)) {
            return references[v];
        }

        if (JS_IsArrayObject(cx, obj)) {
            list rv;
            references[v] = rv;
            jsuint length;
            if (JS_GetArrayLength(cx, obj, &length)) {
                jsval element;
                for (jsuint i = 0; i < length; ++i) {
                    if (JS_GetElement(cx, obj, i, &element)) {
                        rv.append(buildPythonObjectFromJsval(references, cx, element));
                    }
                }
            }
            return rv;
        } else {
            dict rv;
            references[v] = rv;

            JSObject* iterator = JS_NewPropertyIterator(cx, obj);
            if (!iterator) {
                throw GeckoEvalUnknownError("Error creating object property iterator while marshalling");
            }
            for (;;) {
                jsid propertyName;
                if (!JS_NextProperty(cx, iterator, &propertyName)) {
                    throw GeckoEvalUnknownError("Error enumerating property list of object while marshalling");
                }

                if (propertyName == JSVAL_VOID) {
                    break;
                }

                jsval propertyNameValue;
                jsval propertyValue;
                object k;

                if (!JS_IdToValue(cx, propertyName, &propertyNameValue)) {
                    throw GeckoEvalUnknownError("Error converting property name to jsval while marshalling");
                }
                if (JSVAL_IS_INT(propertyNameValue)) {
                    jsint propertyIndex = JSVAL_TO_INT(propertyNameValue);
                    k = long_(propertyIndex);

                    if (!JS_LookupElement(cx, obj, propertyIndex, &propertyValue)) {
                        throw GeckoEvalUnknownError("Error looking up property value by index");
                    }
                } else if (JSVAL_IS_STRING(propertyNameValue)) {
                    JSString* kjsstr = JSVAL_TO_STRING(propertyNameValue);
                    std::wstring kstr(JS_GetStringChars(kjsstr), JS_GetStringLength(kjsstr));
                    k = object(kstr);

                    if (!JS_LookupUCProperty(cx, obj, kstr.c_str(), kstr.size(), &propertyValue)) {
                        throw GeckoEvalUnknownError("Error looking up property value by name");
                    }
                } else {
                    throw GeckoEvalUnknownError("Unknown property name type while marshalling");
                }

                rv[k] = buildPythonObjectFromJsval(references, cx, propertyValue);
            }
            return rv;
        }
    } else {
        // We don't know what type it is, or we can't marshal it,
        // so convert it to a string and hope for the best...
        JSString* string = JS_ValueToString(cx, v);
        return str(std::wstring(JS_GetStringChars(string), JS_GetStringLength(string)));
    }
}

Hope that helps, and Godspeed.

A Global Approach to Optimization

Since joining IMVU, I have had two people tell me “Profilers (especially VTune) suck. They don’t help you optimize anything.” That didn’t make sense to me: how could a tool that gives such amazingly detailed data about the execution of your program be useless?

Now, these are two very smart people, so I knew there had to be some truth in what they were saying. I just had to look for it. At least a year later, after I’d dramatically improved the performance of some key subsystems in the IMVU client, I realized what they meant. They should have said: “Don’t just run a profiler to find out which functions are taking the most time, and make them execute faster. Take a global approach to optimization. Prevent those functions from being called at all.”

There you have it: take a global approach to optimization. But how does that work? First, let me ramble a bit about the benefits of performance.

There are two types of features:

  1. interactive
  2. not interactive (i.e. slow)

Searching on Google, opening a Word document, and firing a gun in Team Fortress 2 are all interactive.

Compressing large files, factoring large primes, and downloading HD movies are not.

We wish all features were interactive, but computers can’t do everything instantly. Sometimes, however, a feature switches from non-interactive to interactive, to dramatic effect. Remember way back? Before YouTube? Downloading videos took forever, and probably wouldn’t even play. YouTube made video consumption so fast and so easy that it changed the shape of the internet. Similarly, thanks to Google, it’s faster to search the internet for something than it is to search your own hard drive in Windows XP.

Anyway, if you truly want to make something as fast as it can be, you need to think like this:

  • What’s the starting state A?
  • What’s the ending state B?
  • What’s the minimal set of operations to get from A to B, and how do I execute them as fast as possible?

Optimizing your existing, naive code won’t get you there. You’ll have to build your application around these goals. There’s plenty of room for out-of-the-box thinking here. Take MacOS X’s resumption-from-hibernation feature:

  • The starting state: the computer is off, the memory is saved to disk.
  • The ending state: the computer is on and the user is productive.

MacOS X takes advantage of the fact that this is not purely a technology problem. The user has to remember what they were doing and become reattached to the computer. Thus, they show you a copy of what was on your screen last to remind you what was happening while the computer prepares for your actions. Opportunities for this kind of parallelism abound: why is it that operating system installers ask you questions, download packages, and install them serially? There is dramatic room for improvement there.

I don’t claim that IMVU’s website is the fastest website out there, but here’s an example of a type of optimization that takes the whole picture into account: when you start loading a page on IMVU.com, it optimistically fetches hundreds of keys from our memcache servers, before even looking up your customer information. It’s probable that you’ll need many of those keys, and it’s faster to get them all at once than to fetch them as you need them.

Someday, I hope to apply this global optimization approach to a software build system (a la Make, SCons, MSBuild). It’s insane that we don’t have a build system with all of the flexibility of SCons and instantaneous performance. Sure, the first build may need to scan for dependencies, but there’s no reason that a second build couldn’t reuse the information from the first build and start instantaneously. Just have a server process that watches for changes to files and updates the internal dependency graph. On large projects, I’ve seen SCons take minutes to figure out that it has to rebuild one file, which is simply crazy.

When optimizing a feature, take the user into consideration, and write down the minimum set of steps between the starting state and ending state. Execute those steps as fast as you can, and run them in parallel if it helps. If technology has advanced enough, maybe you have just transformed something non-interactive into an interactive feature.