Designing DFPL – The Broad Strokes

Why not Python?

I love Python. It’s easy to learn, predictable, and works well for a wide variety of programs. Python’s the first language where I regularly found myself writing programs and having them work on the first try. The problem with Python is that it breaks down at scale.


In my experience, whenever a Python codebase reaches a couple hundred thousand lines of code, performance becomes an issue. It’s well-known that Python’s straight-line perf is not even close to C or even Java. In addition, Python programs don’t make efficient use of memory: objects are large and sparse — programs consist of a lot of pointer-chasing. The garbage collector’s not great, and startup and module loading times become an issue at scale, especially for short-lived processes like unit tests, build scripts, or desktop applications.

There are Python JIT compilers, but JITs don’t really help with startup time or memory usage.


Python has a GIL, meaning it’s hard to write programs that make efficient use of the hardware. It’s possible to move CPU-bound work into C, but any kind of coordination between Python and C has to ping-pong across the GIL, which becomes a central bottleneck.

Launching multiple processes can help, but each process pays Python’s base process memory usage overhead, and they can’t easily share in-memory caches.

As we’ve seen from Haskell and Go, the ability to have convenient, lightweight, efficient parallelism is a big deal.


I talked about this in The Long-Term Problem With Dynamically Typed Languages. As codebases get huge, you really want to be able to lean on the computer in order to safely make changes. Type systems are mini proof assistants for expressing guarantees like "I never mix up binary data with textual data" or "I always check the case that optional values are empty". Unit tests help, but are "proof by example". Type systems add a meaningful layer of confidence, allowing software to scale to larger teams and faster changes.

Why not Haskell/Go/Rust/Scala/F#/Whatever?

I agree 100% with Patrick Walton here:

There is room for some statically-typed language to occupy the network-services niche where performance needs to be good—maybe 2x slower than optimal (note that this is not acceptable for all network services)—and speed of development trumps most other considerations. Some sort of hybrid of Swift, Rust, and Go, perhaps, as none of them are 100% ideal for this space.

The dynamic language space is relatively saturated at this point. Rust is doing a great job in the native machine performance space. F# and Swift and, to some degree, Scala are excellent choices for the .NET, Apple, and JVM ecosystems, respectively.

But as of yet, there’s nothing great in either the web frontend space or the application server space.

Haskell’s laziness, curried functions, and nonobvious operational semantics are not suitable for frontend code. While it’s an otherwise perfect language for writing application servers, its syntax is tragically divisive, and I have no hope that Haskell as it stands today will become mainstream.

Go’s type system somehow manages to be both verbose and not expressive enough. Go is sufficient for single-purpose servers, but, from first-hand experience, the lack of generics really hurts in application servers. (I probably owe an essay on the difference between the two types of servers.) The Go community tends to rely on idioms, but it would be better if the language was general and expressive enough to allow direction expression of things like timeouts and cancellation. Also, Go is not actually memory-safe. Go is especially frustrating because, if it had, say, generics and tuples instead of the more limited multiple-return-values, it would be a truly excellent programming language.

Rust’s concurrency is not lightweight enough, and the syntax is extremely verbose.

Scala has a significant platform dependency on the JVM. Check out the size of some Scala.js programs. Also Scala’s type system is insanely complex.

F# brings along a large .NET dependency. It also suffers from the ML syntax. (Actually, it has two syntaxes.)

Maybe Swift is great, but it was only just open sourced, and I’d be worried about the Obj-C platform dependency. I’ve yet to write any real Swift programs, but what I’ve seen looks great.

All of these languages are lovely and have their uses, but none are particularly satisfactory replacements for Go or TypeScript/JavaScript. That is, lightweight code that can run at high performance in a browser and on a concurrent, scalable server.

But, hold on, what about js_of_ocaml? This is where things get interesting. Andy used js_of_ocaml for a while and found that, on paper, Ocaml has all the right properties: it’s safe, high-performance, compiles to very concise, efficient JavaScript, expressive, concurrent, established… In fact, it’s nearly perfect… if you can stomach the syntax, that is. As I mentioned previously, at this point I’m convinced that ML-family languages haven’t caught on largely for reasons of taste (or lack thereof — just look at the js_of_ocaml website’s font selection).

The Goals

So what are we trying to do with this thing?

The language doesn’t need anything new or innovative. ML gives us how bidirectional type inference, sum types with pattern matching, and soundness. Haskell gives us overloading (i.e. type classes, i.e. type-indexed values). Go shows us how important a solid experience is, both for the first time and in daily use. Python shows us how it’s possible to connect predictable syntax and solid, composable semantics.

Thus, this is largely a human factors project. Our goal is to see if we can marry all of those things into one language.

Our goals are to have

  • Memory safety.
  • Static typing without heavyweight type annotations everywhere.
  • Language features that are composable.
  • A minimal runtime, with no large platform dependency.
  • Obvious execution semantics. Like imperative languages, you should be able to look at a program and understand its runtime behavior.
  • Tight, clean compilation to JavaScript. Should be a suitable alternative to TypeScript or CoffeeScript.
  • A reasonable replacement for Go or Node on the backend.
  • Cost-free abstractions.


  • C- or Rust-like performance. Rust is already doing a great job there.
  • Support for restricted effects. It’s not clear the usability cost is worth it.

Next time, I’ll either dive into one of the specific design decisions we’ve made, or talk about the language’s overall gestalt or texture.

Tracing Leaks in Python: Find the Nearest Root

Garbage Collection Doesn’t Mean You Can Ignore Memory Altogether…

This post is available on the IMVU Engineering Blog.

Garbage collection removes a great deal of burden from programming. In fact, garbage collection is a critical language feature for all languages where abstractions such as functional closures or coroutines are common, as they frequently create reference cycles.

IMVU is a mix of C++ and Python. The C++ code generally consists of small, cohesive objects with a clear ownership chain. An Avatar SceneObject owns a ModelInstance which owns a set of Meshes which own Materials which own Textures and so on… Since there are no cycles in this object graph, reference-counting with shared_ptr suffices.

The Python code, however, is full of messy object cycles. An asynchronous operation may hold a reference to a Room, while the Room may be holding a reference to the asynchronous operation. Often two related objects will be listening for events from the other. While Python’s garbage collector will happily take care of cycles, it’s still possible to leak objects.

Imagine these scenarios:

  • a leaked or living C++ object has a strong reference to a Python object.
  • a global cache has a reference to an instance’s bound method, which implicitly contains a reference to the instance.
  • two objects with __del__ methods participate in a cycle with each other, and Python refuses to decide which should destruct first

To detect these types of memory leaks, we use a LifeTimeMonitor utility:

a = SomeObject()
lm = LifeTimeMonitor(a)
del a
lm.assertDead() # succeeds

b = SomeObject()
lm = LifeTimeMonitor(b)
lm.assertDead() # raises ObjectNotDead

We use LifeTimeMonitor’s assertDead facility at key events, such as when a user closes a dialog box or 3D window. Take 3D windows as an example. Since they’re the root of an entire object subgraph, we would hate to inadvertently leak them. LifeTimeMonitor’s assertDead prevents us from introducing an object leak.

It’s good to know that an object leaked, but how can you determine why it can’t be collected?

Python’s Garbage Collection Algorithm

Let’s go over the basics of automatic garbage collection. In a garbage-collected system there are objects and objects can reference each other. Some objects are roots; that is, if an object is referenced by a root, it cannot be collected. Example roots are the stacks of live threads and the global module list. The graph formed by objects and their references is the object graph.

In SpiderMonkey, Mozilla’s JavaScript engine, the root set is explicitly-managed. SpiderMonkey’s GC traverses the object graph from the root set. If the GC does not reach an object, that object is destroyed. If C code creates a root object but fails to add it to the root set, it risks the GC deallocating the object while it’s still in use.

In Python however, the root set is implicit. All Python objects are ref-counted, and any that can refer to other objects — and potentially participate in an object cycle — are added to a global list upon construction. Each GC-tracked object can be queried for its referents. Python’s root set is implicit because anyone can create a root simply by incrementing an object’s refcount.

Since Python’s root set is implicit, its garbage collection algorithm differs slightly from SpiderMonkey’s. Python begins by setting GCRefs(o) to CurrentRefCount(o) for each GC-tracked PyObject o. Then it traverses all referents r of all GC-tracked PyObjects and subtracts 1 from GCRefs(r). Then, if GCRefs(o) is nonzero, o is an unknown reference, and thus a root. Python traverses the now-known root set and increments GCRefs(o) for any traversed objects. If any object o remains where GCRefs(o) == 0, that object is unreachable and thus collectible.

Finding a Path From the Nearest Root to the Leaked Object

Now that we know how Python’s garbage collector works, we can ask it for its set of roots by calculating GCRefs(o) for all objects o in gc.get_objects(). Then we perform a breadth-first-search from the root set to the leaked object. If the root set directly or indirectly refers to the leaked object, we return the path our search took.

Sounds simple, but there’s a catch! Imagine that the search function has signature:

PyObject* findPathToNearestRoot(PyObject* leakedObject);

leakedObject is a reference (incremented within Python’s function-call machinery itself) to the leaked object, making leakedObject a root!

To work around this, change findPathToNearestRoot so it accepts a singleton list containing a reference to the leaked object. findPathToNearestRoot can borrow that reference and clear the list, ensuring that leakedObject has no untracked references.

findPathToNearestRoot will find paths to expected Python roots like thread entry points and module objects. But, since it directly mirrors the behavior of Python’s GC, it will also find paths to leaked C references! Obviously, it can’t directly point you to the C code that leaked the reference, but the reference path should be enough of a clue to figure it out.

The Code

template<typename ArgType>
void traverse(PyObject* o, int (*visit)(PyObject* visitee, ArgType* arg), ArgType* arg) {
    if (Py_TYPE(o)->tp_traverse) {
        Py_TYPE(o)->tp_traverse(o, (visitproc)visit, arg);

typedef std::map<PyObject*, int> GCRefs;

static int subtractKnownReferences(PyObject* visitee, GCRefs* gcrefs) {
    if (gcrefs->count(visitee)) {
    return 0;

typedef int Backlink; // -1 = none

typedef std::vector< std::pair<Backlink, PyObject*> > ReferenceList;
struct Referents {
    std::set<PyObject*>& seen;
    Backlink backlink;
    ReferenceList& referenceList;

static int addReferents(PyObject* visitee, Referents* referents) {
    if (!referents->seen.count(visitee) && PyObject_IS_GC(visitee)) {
        referents->referenceList.push_back(std::make_pair(referents->backlink, visitee));
    return 0;

static Backlink findNextLevel(
    std::vector<PyObject*>& chain,
    const ReferenceList& roots,
    PyObject* goal,
    std::set<PyObject*>& seen
) {
    if (roots.empty()) {
        return -1;

    for (size_t i = 0; i < roots.size(); ++i) {
        if (roots[i].first != -1) {
            if (goal == roots[i].second) {
                return roots[i].first;

    ReferenceList nextLevel;
    for (size_t i = 0; i < roots.size(); ++i) {
        Referents referents = {seen, i, nextLevel};
        traverse(roots[i].second, &addReferents, &referents);

    Backlink backlink = findNextLevel(chain, nextLevel, goal, seen);
    if (backlink == -1) {
        return -1;

    return roots[backlink].first;

static std::vector<PyObject*> findReferenceChain(
    const std::vector<PyObject*>& roots,
    PyObject* goal
) {
    std::set<PyObject*> seen;
    ReferenceList unknownReferrer;
    for (size_t i = 0; i < roots.size(); ++i) {
        unknownReferrer.push_back(std::make_pair<Backlink>(-1, roots[i]));
    std::vector<PyObject*> rv;
    // going to return -1 no matter what: no backlink from roots
    findNextLevel(rv, unknownReferrer, goal, seen);
    return rv;

static object findPathToNearestRoot(const object& o) {
    if (!PyList_Check(o.ptr()) || PyList_GET_SIZE(o.ptr()) != 1) {
        PyErr_SetString(PyExc_TypeError, "findNearestRoot must take a list of length 1");

    // target = o.pop()
    object target(handle<>(borrowed(PyList_GET_ITEM(o.ptr(), 0))));
    if (-1 == PyList_SetSlice(o.ptr(), 0, 1, 0)) {

    object gc_module(handle<>(PyImport_ImportModule("gc")));
    object tracked_objects_list = gc_module.attr("get_objects")();
    // allocating the returned list may have run a GC, but tracked_objects won't be in the list

    std::vector<PyObject*> tracked_objects(len(tracked_objects_list));
    for (size_t i = 0; i < tracked_objects.size(); ++i) {
        object to = tracked_objects_list[i];
        tracked_objects[i] = to.ptr();
    tracked_objects_list = object();

    GCRefs gcrefs;
    // TODO: store allocation/gc count per generation

    for (size_t i = 0; i < tracked_objects.size(); ++i) {
        gcrefs[tracked_objects[i]] = tracked_objects[i]->ob_refcnt;

    for (size_t i = 0; i < tracked_objects.size(); ++i) {
        traverse(tracked_objects[i], subtractKnownReferences, &gcrefs);

    // BFS time
    std::vector<PyObject*> roots;
    for (GCRefs::const_iterator i = gcrefs.begin(); i != gcrefs.end(); ++i) {
        if (i->second && i->first != target.ptr()) { // Don't count the target as a root.
    std::vector<PyObject*> chain = findReferenceChain(roots, target.ptr());

    // TODO: assert that allocation/gc count per generation didn't change

    list rv;
    for (size_t i = 0; i < chain.size(); ++i) {

    return rv;

IMVU Crash Reporting: Stalls and Deadlocks

By mid-2006, we’d primarily focused on access violations and unhandled exceptions, the explosive application failures. After extensive effort, we got our client’s crash rate down to 2% or so, where 2% of all sessions ended in a crash.* Still the customers cried “Fix the crashes!”

It turns out that when a customer says “crash” they mean “it stopped doing what I wanted”, but engineers hear “the program threw an exception or caused an access violation”. Thus, to the customer, crash can mean:

  • the application was unresponsive for a period of time
  • the UI failed to load, making the client unusable
  • the application has been disconnected from the server

In short, any time the customer cannot make progress and it’s not (perceived to be) their fault, the application has crashed.

OK, we’ve got our work cut out for us… Let’s start by considering deadlocks and stalls.

First, some terminology: in computer science, a deadlock is a situation where two threads or processes are waiting for each other, so neither makes progress. That definition is a bit academic for our purposes. Let’s redefine deadlock as any situation where the program becomes unresponsive for an unreasonable length of time. This definition includes livelock, slow operations without progress indication, and network (or disk!) I/O that blocks the program from responding to input.

It actually doesn’t matter whether the program will eventually respond to input. People get impatient quickly. You’ve only got a few seconds to respond to the customer’s commands.

Detecting Deadlocks in C++

The embedded programming world has a “watchdog timer” concept. Your program is responsible for periodically pinging the watchdog, and if for several seconds you don’t, the watchdog restarts your program and reports debugging information.

Implementing this in C++ is straightforward:

  • Start a watchdog thread that wakes up every few seconds to check that the program is still responding to events.
  • Add a heartbeat to your main event loop that frequently pings the watchdog.
  • If the watchdog timer detects the program is unresponsive, record stack traces and log files, then report the failure.

IMVU’s CallStack API allows us to grab the C++ call stack of an arbitrary thread, so, if the main thread is unresponsive, we report its current stack every couple of seconds. This is often all that’s needed to find and fix the deadlock.

Detecting Deadlocks in Python

In Python, we can take the same approach as above:

  1. Start a watchdog thread.
  2. Ping the Python watchdog thread in your main loop.
  3. If the watchdog detects that you’re unresponsive, record the main thread’s Python stack (this time with sys._current_frames) and report it.

Python’s global interpreter lock (GIL) can throw a wrench in this plan. If one thread enters an infinite loop while keeping the GIL held (say, in a native extension), the watchdog thread will never wake and so cannot report a deadlock. In practice, this isn’t a problem, because the C++ deadlock detector will notice and report a deadlock. Plus, most common deadlocks are caused by calls that release the GIL: threading.Lock.acquire,,, and so on.

It might help to think of the Python deadlock detector as a fallback that, if successful, adds richer information to your deadlock reports. If it failed, whatever. The C++ deadlock detector is probably enough to diagnose and fix the problem.

What did we learn?

It turned out the IMVU client had several bugs where we blocked the main thread on the network, sometimes for up to 30 seconds. By that point, most users just clicked the close box [X] and terminated the process. Oops.

In addition, the deadlock detectors pointed out places where we were doing too much work in between message pumps. For example, loading some assets into the 3D scene might nominally take 200ms. On a computer with 256 MB of RAM, though, the system might start thrashing and loading the same assets would take 5s and report as a “deadlock”. The solution was to reducing the program’s working set and bite off smaller chunks of work in between pumps.

I don’t recall seeing many “computer science” deadlocks, but these watchdogs were invaluable in tracking down important failure conditions in the IMVU client.

Next time, we’ll improve the accuracy of our crash metrics and answer the question “How do you know your metrics are valid?”

* Median session length is a more useful reliability metric. It’s possible to fix crashes and see no change in your percentage of failed sessions, if fixing crashes simply causes sessions to become longer.

Visualizing Python Import Dependencies

In a large Python program such as IMVU, startup time is dominated by Python module imports. Take these warm timings:

$ time python -c 'None'

real    0m0.096s
user    0m0.077s
sys     0m0.061s

$ time python -c 'import urllib2'

real    0m0.314s
user    0m0.155s
sys     0m0.186s

That’s 300ms for a single basic dependency. Importing the entire IMVU client takes 1.5s warm and 20s cold on a typical user’s machine.


The IMVU client’s loading progress bar imports modules bottom-up; that is, leaf modules are imported before their parents. The root module is imported last.

Implementing a bottom-up import sequence requires generating a graph of dependencies between modules:

def get_dependencies(module_name):
    Takes a module name as input (e.g. 'xml.dom') and returns a set of
    (lhs, rhs) tuples where lhs and rhs are module names and lhs
    imports rhs.
    # module_from_key is a dict from a module key, an arbitrary
    # object, to a module object.  While importing, we discover
    # dependencies before we have access to the actual module objects.
    # import_dependencies is a list of (lhs, rhs) tuples where lhs and
    # rhs are module keys, and module_from_key[lhs] imported
    # module_from_key[rhs].

    root_key = object()
    module_from_key = {root_key: __main__}
    import_dependencies = []
    stack = [root_key]

    def import_in_stack(key, name, globals, locals, fromlist, level):
            return original_import(name, globals, locals, fromlist, level)

    import __builtin__
    original_import = __builtin__.__import__

    def my_import(name, globals=globals(), locals=locals(), fromlist=[], level=-1):
        # fromlist is a whore.  Most of the complexity in this
        # function stems from fromlist's semantics.  See
        # If a module imports 'xml.dom', then the module depends on
        # both 'xml' and 'xml.dom' modules.
        dotted = name.split('.')
        for i in range(1, len(dotted)):
            my_import('.'.join(dotted[0:i]), globals, locals, [], level)

        module_key = object()
        parent_key = stack[-1]

        def add_dependency_from_parent(key, m):
            module_from_key[key] = m
            import_dependencies.append((parent_key, key))

        submodule = import_in_stack(module_key, name, globals, locals, ['__name__'], level)
        add_dependency_from_parent(module_key, submodule)

        for f in (fromlist or []):
            from_key = object()
            module = import_in_stack(from_key, name, globals, locals, [f], level)
            if f == '*':
            submodule = getattr(module, f)
            if isinstance(submodule, types.ModuleType):
                add_dependency_from_parent(from_key, submodule)

        return original_import(name, globals, locals, fromlist, level)

    # Import module_name with import hook enabled.
    original_modules = sys.modules.copy()
    __builtin__.__import__ = my_import
        __builtin__.__import__ = original_import

    assert stack == [root_key], stack

    return sorted(set(
        (module_from_key[lhs].__name__, module_from_key[rhs].__name__)
        for lhs, rhs in import_dependencies

(You can view all of the code at SourceForge).

First, we install an __import__ hook that discovers import dependencies between modules, and convert those relationships into a directed graph:


Then, we merge cycles. If module A imports B, B imports C, and C imports A, then it doesn’t matter which you import first. Importing A is equivalent to importing B or C. After this step, we have a DAG:

xml.dom.minidom DAG

Finally, we can postorder traverse the DAG to determine a good import sequence and cost (approximated as the number of modules in the cycle) per import:

1 xml
3 xml.dom
1 copy_reg
1 types
1 copy
1 xml.dom.NodeFilter
1 xml.dom.xmlbuilder
1 xml.dom.minidom
1 __main__

Now let’s look at some less-trivial examples. urllib2:






Final notes: Other people have solved this problem with bytecode scanning, but we wanted to know the actual modules imported for an accurate progress bar. A simpler __import__ hook could have calculate the correct import sequence, but I find a visual representation of module dependencies to have additional benefits.

Reporting Crashes in IMVU: Part II: C++ Exceptions

A year ago, I explained
how the IMVU client automatically reports unexpected Python exceptions
(crashes) to us. I intended that post to be the first of a long
series that covered all of the tricks we use to detect and report
abnormal situations. Clearly, my intentions have not played out yet,
so I am going to pick up that series by describing how we catch
exceptions that occur in our C++ code. Without further ado,

Reporting C++ Exceptions

As discussed earlier, IMVU’s error handling system can handle any
Python exception that bubbles out of the client’s main loop and
automatically report the failure back to us so that we can fix it for
the next release. However, our application is a
combination of Python and C++, so what happens if our C++ code has a
bug and raises an uncaught C++ exception, such as std::bad_alloc
or std::out_of_range?

Most of our C++ code is exposed to Python via the excellent
Boost.Python library, which automatically catches C++ exceptions at
the boundary and translates them to Python exceptions. The
translation layer looks something like this:

bool handle_errors(function fn) {
    try {
        return false; // no error
    catch (const std::runtime_error& e) {
        // raise RuntimeError into Python
        PyErr_SetString(PyExc_RuntimeError, e.what());
    catch (const std::bad_alloc&) {
        // raise MemoryError into Python
        PyErr_SetString(PyExc_MemoryError, "out of memory");
    catch (const std::exception& e) {
        // raise Exception into Python
        PyErr_SetString(PyExc_Exception, e.what());
    catch (...) {
        PyErr_SetString(PyExc_Exception, "Unknown C++ exception");
    return true;

Thus, any C++ exception that’s thrown by the C++ function is
caught by Boost.Python and reraised as the appropriate Python
exception, which will already be handled by the previously-discussed
crash reporting system.

Let’s take another look at the client’s main loop:

def mainLoop():
    while running:

def main():
        # includes exception type, exception value, and python stack trace
        error_information = sys.exc_info()
        if OK == askUserForPermission():

If the C++ functions called from updateAnimations() or redrawWindows()
raise a C++ exception, it will be caught by the Python error-handling
code and reported to us the same way Python
exceptions are.

Great! But is this a complete solution to the problem? Exercise
for the reader: what else could go wrong here? (Hint: we use Visual
Studio 2005 and there was a bug in catch (…) that Microsoft fixed in
Visual Studio 2008…)

Finding the shortest path between two objects in Python

I’ve been doing a bit of memory profiling and debugging in the IMVU client lately, and we’re starting to collect a nice set of tools. Today, I’m adding a function that can find the shortest path from one Python object to another. No guarantees about performance or correctness, but it seems to work so far.

The findShortestPath function.

Its tests.

Update: Fixed some ordering bugs in longer paths and optimized it a tad.

Reporting Crashes in IMVU: Catching Python Exceptions

For years now, I have been meaning to write a series of articles on the automated crash reporting system in the IMVU client. This first article will give a bit of background on the structure of the client and show how we handle Python exceptions.

At IMVU, we generally subscribe to the Fail Fast philosophy of handling errors: when the client encounters an unexpected error, we immediately crash the program and ask the user to submit a crash report. As part of the crash report, we send log files, stack traces, system information, and anything else that might help us debug the failure.

You might wonder why we crash the program whenever anything goes wrong rather than trying to catch the error and continue running. Counterintuitively, crashing the program forces us to act on crashes and immediately exposes bugs that might trigger unwanted behavior or lost data down the road.

Now let’s talk a little bit about how the client is structured. The IMVU client is written primarily in Python, with time-critical components such as the 3D renderer written in C++. Since the client is a cross between a normal interactive Windows program and a real-time game, the main loop looks something like this:

def main():
    while running:
        pumpWindowsMessages() # for 1/30th of a second

This structure assumes that no exceptions bubble into or out of the main loop. Let’s imagine that updateAnimations() has a bug and occasionally raises an uncaught exception. If running the client with a standard command-line python invocation, the program would print the exception and stack trace to the console window and exit. That’s all great, but our users don’t launch our client by invoking python from the command line: we use py2exe to build a standalone executable that users ultimately run. With an unmodified py2exe application, uncaught exceptions are printed to sys.stderr (as above), except there is no console window to display the error. Thus, the py2exe bootstrap code registers a handler so that errors are logged to a file, and when the program shuts down, a dialog box shows something like “An error has been logged. Please see IMVUClient.exe.log.”

From a crash reporting standpoint, this is not good enough. We can’t be asking our users to manually hunt down some log files on their hard drives and mail them to us. It’s just too much work – they will simply stop using our product. (Unfortunately, most of the software out there asks users to do exactly this!) We need a way for the client to automatically handle errors and prompt the users to submit the reports back to us. So let’s rejigger main() a bit:

def mainLoop():
    while running:

def main():
        error_information = sys.exc_info()
        if OK == askUserForPermission():

This time, if a bug in updateAnimations() raises an exception, the top-level try: except: clause catches the error and handles it intelligently. In our current implementation, we post the error report to a Bugzilla instance, where we have built custom tools to analyze and prioritize the failures in the field.

This is the main gist of how the IMVU client automatically reports failures. The next post in this series will cover automatic detection of errors in our C++ libraries.

Open sourced our pstats viewer!

IMVU has benefited greatly from open source software. Large components of our client and website are built on top of various open source projects. In fact, you can see the list of software on our technology page.

Well, I’m happy to announce that we’re beginning to contribute source code back! We have created a project on SourceForge and released our first standalone tool: pstats_viewer is a tool for browsing the data stored in a .pstats file generated by the Python profiler.

Tasks and Futures and Concurrency in the IMVU Client

The IMVU client has a platform for developing asynchronous or long-running operations that we call the task system. The following describes why we built it. (Chris finally convinced me to post this information to the web.)

Most programs that we write can be split into two categories. There are short-lived programs like gzip that take command line arguments and files and inputs and output to other files or stdout.

There are also longer-lived programs that run indefinitely, handling events that happen to it. Think about MySQL or memcached. It takes inputs from the network, time, and files; doing work and spitting output back onto the network or the file system.

The operating system kernel is one of those event-driven, long-lived programs. It schedules the execution of subsidiary short- and long-lived programs.

We can certainly map from one type another. Take PHP for example. It is running within the context of a long-running server process, but the programming model for a single page is a lot closer to gzip. The program takes inputs (session data, request variables) and outputs HTML.

You can also think of a web service as a set of computers all running a bunch of short-lived processes that combine into one large program. PHP is the CPU, memcached is memory, the database is the file system, etc.

Desktop applications, including the IMVU client and other “rich internet clients” are structured a lot like servers: they are long-lived programs that respond to events from the keyboard, mouse, time, the windowing system, network requests… and interact with its users via graphics, sound, files, and the network.

The key notion here is that these are abstractions on top of an enormous state machine known as your processors and memory.

Now let’s talk a little bit about concepts and modeling. When you come into a code base, say, our client, you expect to see structures that look and behave like concepts you know: avatars, chat windows, buddy list, inventory, products from the catalog, etc. This is one reason OOP is so powerful – it’s an effective method for describing concepts and the way they interact.

Now let’s talk about a simpler concept. One so ingrained you probably don’t even think about it. Functions (in the imperative sense). Functions are an abstraction on top of the state machine that is your processor. We’re all very familiar with functions: “This operation takes X and Y as inputs, does steps A, B, and C, and returns Z.” Even if you throw in a few branches and loops and call some other functions, we’re good at following the logic. For example:

def _readNumber(self):
    isfloat = False
    result = self._next()
    peek = self._peek()
    while peek is not None and (peek.isdigit() or peek == "."):
        isfloat = isfloat or peek == "."
        result = result + self._next()
        peek = self._peek()
        if isfloat:
            return float(result)
            return int(result)
    except ValueError:
        raise ReadException, "Not a valid JSON number: '%s'" % result

This a fairly elaborate function, but we have no problem following it. Still, it’s an abstraction on top of the processor and memory: it’s compiled by Python into bytecode, executed by the interpreter loop, which is written in C, which is compiled to x86 assembly, which is being converted by the processor into micro ops… The point here is that we can build our own abstractions if they make us more effective at doing our jobs.

There is a concept in the IMVU client that may be unfamiliar to some of you: long-running processes. These are like functions, except that many can be running at the same time, and they can be blocked, waiting for results from the network or disk. Or perhaps just waiting for some time to elapse. Pseudocode for one of these processes might look like this:

def loadAvatar(chatId, userId):
    avatarInfo = getAvatarInfo(userId)
    if failure, show dialog box and stop loading
    products = getCurrentOutfit(chatId, userId)
    for p in products:
    loadAvatarIntoScene(products) # this might take a while

In fact, these processes can be fairly easily implemented on top of threads, and many people do it that way, to varying degrees of success.

However, threads have several disadvantages:

  • Each thread (on CPython/Windows) requires 1 MB of memory for the stack (plus any kernel objects), limiting you to at most 2000.
  • Because threads are scheduled by the operating system, they’re prone to nondeterminism and race conditions. In our case, threads are a large source of intermittent test failures and real bugs.
  • Threads in CPython don’t actually buy you any concurrency because of the global interpreter lock, so you might as well just run everything on the main thread.
  • Threads are GC roots and must be explicitly cleaned up.
  • You have to be extremely careful when interacting with the windowing system or any third-party system from threads. Many APIs are not thread safe.

Another way to manage these types of long-running operations is by building an explicit state machine and periodically pumping it, when it will check and update its state if possible. This is generally how interactive games work, because they can get away with using up all of the CPU and having a fixed frame rate. Much of the code in the IMVU client uses this model, but we’ve found it 1) hard to understand the interactions between states and 2) inefficient. Most of the time you’re _not_ updating the state of a particular object, so you don’t need to check every single frame. For example, some of our most avid users have thousands of items in their inventory, and our old inventory UI walks that entire list 30 times per second, asking “does this one have any events to process? does this one have any events to process? does this one have any events to process?” This takes up to 10% of the CPU on the user’s computer without a 3D window open! Finally, this approach generally makes the application feel sluggish, because the latency between an event (say, a mouse click) being sent to the client and it being processed depends on the rate at which the application is pumped, rather than having the client immediately handle messages as they come in.

A third technique you might call continuation-passing style or “onComplete callbacks”. Basically, any function that can be asynchronous takes an onComplete callback function that it runs (with the result or error) when the operation is completed. The problem with this style is that hearkens back to the days of goto with spaghetti callback loops and non-linear control flow. Moreover, if you accidentally forget to call an onComplete, or you call it twice, you’ve just introduced a very hard-to-find bug. We have had both of these types of bugs multiple times in our client.

We did not find any of the above approaches particularly compelling, so we built the task system on top of Python’s bidirectional generators. First, I’ll enumerate the advantages:

  • Tasks are a lightweight resource (sorta) – you can start as many as you’d like.
  • Tasks look like normal functions with ‘yield’ statements sprinkled in.
  • There’s no chance you’ll forget to call onComplete or call it twice.
  • Exceptions propagate automatically — with stack traces!
  • Tasks can be implicitly cancelled by releasing the reference to their result, so you won’t leak them as is so easy with threads.
  • All tasks run interleaved on the main thread, so you don’t have to worry about most kinds of race conditions or calling into external libraries.
  • Tasks run as part of the Windows event loop, which means that popping up modal dialog boxes or modal menus does not block them from executing, keeping the application responsive.

Here’s what the hypothetical example from above would look like as a task:

def loadAvatar(chatId, userId):
        avatarInfo = yield getAvatarInfo(userId)
    except Exception:

    products = yield getCurrentOutfit(chatId, userId)
    for p in products:
            yield loadProduct(p)
        except Exception:
            pass # ignore products that fail to load?
    yield loadAvatarIntoScene(products)

At every ‘yield’, other tasks get a chance to run, and if that task does not return immediately, then the task is paused until the asynchronous call is done. getAvatarInfo, for example, will contact the server in order to fetch the avatar name. Tasks can yield on other task calls, suspending their execution until the subsidiary call returns a value or raises an exception. (Sound familiar?)

Tasks can start other tasks, with the admittedly strange “yield Start(task)” syntax which gives back a Future object. Think of futures as values that may not have materialized yet. You can later wait for the result of a future by yielding on it.

def printFuture(future):
    # this will throw an exception if the future
    # contains an error
    result = yield future
    print result

def printAvatarInfo():
    avatarInfo = yield Start(getAvatarInfo(userId))
    print 'started fetching avatarInfo'
    yield printFuture(avatarInfo)

You can use futures to start several tasks in parallel and then wait on all of them, making efficient use of parallel resources. Some examples with real code might help. Unfortunately you can’t run them, as we have not (yet) open sourced the task system.

Today, much of the client’s code uses threads, polling, and onCompletes, but we’re slowly converting things to tasks, usually to fix bugs. I hope this means that the quality of the client from a user’s perspective will continue to improve and engineers will be able to work in the code more easily.