The Story of Haskell at IMVU

An interesting blog post and its corresponding comment thread on Hacker News made me realize that, while I’d told the story of Haskell at IMVU to many people, I’d never written it down. :)

IMVU’s backend had been written in PHP from the beginning. Startup code is always gross, but after years of learning PHP’s ins and outs, we bent it to our will. Part of me looks back and says “It wasn’t that bad”, and indeed, it has a lot of great properties. But then I remember the horror.

The terrible straight-line performance, the lack of concurrent IO, the insane memory usage… Once the codebase reached about a million lines, the latency on our REST service response times was over half a second and in the worst case several seconds. For some types of services, that’s okay, but IMVU strives to be a real-time, delightful experience, and service latency on interactive things like dressing up, browsing the catalog, sending messages to your friends, and shopping is really important.

Some of the performance-minded of us tried to get PHP to be fast, but it was basically intractable, and we couldn’t afford to fork PHP with a high-performance JIT like Facebook ended up doing years later.

The combination of PHP’s awful performance, terrifying pitfalls, and the fact that millions of lines of code in a dynamic language (even with copious test coverage) makes it really hard to refactor, led us to evaluate alternatives. Someone suggested Java (which would have been totally fine in hindsight) but the infrastructure team decided to adopt this newfangled Node.js thing. Node sucked. This predated ES6, promises, and the rich ecosystem of npm, so we spent far too long rebuilding basic, basic, infrastructure (like propagating stack traces across async operations) in Node that we already had in PHP, and it still wasn’t as good. In the end we decided to drop Node and stick with PHP.

Fast forward a year or two…

There were some people at IMVU who had dabbled with Haskell, but it always seemed too idealistic and not very practical. In fact, one weekend, Andy went home with the explicit goal of proving that Haskell was stupid and not good for writing production software.

He came back on Monday glowing. “Haskell is perfect for web services! We should use it! It’s fast and safe and concurrent!” We smiled and nodded (sometimes Andy gets enthusiastic about new things) and went on with our daily work. But in the evenings and weekends he plugged away, and built a prototype web service that was 1) dramatically faster than the corresponding PHP services 2) actually quite short in implementation and 3) surprisingly clear. Huh, maybe this Haskell thing did have legs.

OK, so let’s consider the possibility that Haskell could be something IMVU could adopt. How we would vet it?

We started by, during the next hack week, forming a small team of everyday engineers to learn Haskell and work on a feature. We learned two things: teaching Haskell was a nontrivial problem. BUT, Haskell newcomers hacking on the codebase only caused one regression, and that regression actually exposed a bug in some of our infrastructure. Well that’s pretty cool. For years, IMVU had a tradition of throwing new engineers into the deep end, after which they’d invariably take down the site, and then we’d demonstrate that we’re egoless and all failures are an opportunity to improve our process with a postmortem. What if Haskell could help us avoid some of these problems?

Over time we slowly ramped up Haskell within the company. At first we had growing pains. The infrastructure in our PHP stack was extremely mature and well-baked. The corresponding infrastructure in our Haskell (error reporting, DB access, unit testing) was not. It took us a while to sort out exactly what our Haskell best practices would be, and by that point a few people started to get a negative vibe about Haskell. But we kept pushing forward.

(By the way, I keep saying “we”, but at this point I hadn’t written any Haskell yet, and was not particularly for or against it, even though I did think PHP was terrible.)

By this point we’d shipped at least two sets of mobile services on Haskell, and many people considered it a success. However… Tensions about Haskell had really started to heat up, with some people saying things like “I never want to even look at Haskell again” or “at this stage in my career I don’t want another programming language” or “I think we’re making a horrible mistake by building another stack, especially in Haskell.” People were complaining to their managers, and the managers were coming to me. Some people loved Haskell, some people hated it, but very few were in the middle.

As a neutral third party, and as someone with a lot of respect in the organization, the engineering managers sent me out to gather ground truth. I took my little black notebook and a pen and interviewed something like half the engineering team on their opinions on Haskell, whether they were fans or not. I spent an hour with each person, asking what they liked, what they didn’t like, what they thought we should do, etc.

I then collated all the feedback and presented back to the engineering managers and leadership team (of which I was a part too, but since I had no direct reports and had no experience with Haskell at IMVU, I was fairly unbiased). I don’t have my report handy, but it went approximately like this:

75% of the people I interviewed were positive about or fine with Haskell.

25% of the people had major concerns. The concerns, in roughly decreasing priority, were:

  • Haskell strings suck
    • Specifically there’s a lot of cognitive overhead in switching between the five common string representations as needed
  • Syntactic concerns
    • Haskell’s syntax is iconoclastic: parens, $, /= instead of !=
    • some people in the Haskell community love point-free style
    • cognitive overhead in switching between pure and monadic style
    • let vs. where
  • concerns about a second backend stack
    • duplicate infrastructure across PHP and Haskell
  • concerns about it not being a mainstream language
    • teaching the whole organization
    • interaction between Haskell experts and novices sometimes leaves the novices feeling dumb or inadequate
    • bus factor to organization on key infrastructure
    • Haskell is less debuggable by ops engineers

The wins were:

  • GHC’s support for safe concurrent programming is world-class
  • Sound type safety led to refactoring being a breeze
  • Fun, stretches the mind (for some people)

Ultimately, we settled on a middle ground. We would not deprecate PHP, but we would continue to support Haskell as a technology stack. We had specific services in mind that needed high throughput and low latency (<100 ms) and we knew that PHP could not achieve those. That said, we respected the fact that several engineers, even those in leadership roles, put their foot down and said they didn’t want to touch Haskell. (From my perspective it was kind of a weird experience. Highly-skilled people that I respect just got really angry and irrational and basically refused to even try to learn Haskell, which resulted in a culture divide in the company.)

In the end, I never did manage to convince some of my peers to try it. That said, there were many engineers who loved Haskell and we built a great deal of critical infrastructure in it. The Haskell infrastructure basically hums like a well-oiled machine. Perfectly reliable unit tests. Concurrent batched IO. Abstractions for things like timeouts and running tasks on other threads while correctly bubbling failures. Fantastic support for safely encoding and decoding JSON. One of the best benchmarking tools I’ve ever used.

Since one of the biggest concerns about Haskell was education, I taught a sequence of four or five classes for the engineering team with my take on how it should be taught. Traditional Haskell materials tend to talk about laziness and currying and all this FP junk that is certainly interesting and different, but mostly irrelevant to people trying to come in hot and get the job done. Instead, I focused on specifics first, and then generalizations.

“You know how you can add two integers and get a new integer back? Well, you can also add two floating point numbers. Int and Float are instances of Num.”

“You know how you can concatenate strings? And concatenating with the empty string gives you the same value? It turns out many things satisfy these properties. They’re called Monoids.”

“You know how you can map across a list? We do it all the time in PHP or Python. Well you can also map across Maybe. Think of Maybe as a list that can have either zero or one element. Things that can be mapped are called Functors.”

etc. etc.

We started with specific examples of how to do stuff with Haskell, then talked about type classes, and the final optional class was about how Haskell is compiled down to the machine. You’d have to ask the attendees how much they remember, but the classes were well-reviewed at the time. :)

In summary, I definitely wouldn’t say that Haskell was a failure. GHC is an amazing piece of technology, and IMVU built infrastructure in it that wouldn’t have been possible in PHP. I also can’t say Haskell was a resounding success. I was truly disturbed by the culture drama that it brought. And, frankly, saddened and confused by how some people closed their minds. Personally, I would have happily traded a lot of Haskell’s safety for avoiding drama. I sometimes wonder what would have happened had we chosen Java instead of Node.js back in 2010. We could have hit our performance goals in Java too.

Oh well. Many of us still love Haskell. It’s not going away. I left IMVU last year (needed some more experience elsewhere) but I personally miss Haskell a great deal. If I were starting a new project with people that I trusted, I’d definitely reach for Haskell first. The expressiveness, brevity, and safety let you move extremely quickly, and Haskell is definitely mature enough for production use.

Dropbox Hack Week: GraphQL Server in Haskell

Last week was hack week at Dropbox. I took the opportunity to explore the implementation of a GraphQL server that does optimal IO batching and concurrency.

Serial IO

A common performance problem in the implementation of web services is serial IO. Let’s say you have a service that returns the names of all of your friends. It’s easy and natural to implement it like this:

me = fetch_user_info(my_user_id)
friend_names = []
for friend_id in me.friends:
return friend_names

The problem is that each user info lookup for my friends occurs sequentially. Even with TCP connection pooling, that’s still a packet round-trip in the datacenter. “But that should be faster than 10 ms right?” Even if it is, it doesn’t take many friends to blow your performance budget and send your response time across the threshold of perceptible delay.

Moreover, this problem compounds on itself. If your inner functions have serial IO and you call them in a loop, you’ve just added potentially thousands of round-trips to your backend data sources. I would hazard a guess and say serial IO is the largest contributor to service response latencies.

Manually Batched IO

One solution is to always batch IO. Every function takes a list and returns a list. This can be made to work (indeed, I’ve achieved excellent performance by carefully, manually, batching) but doesn’t compose as your services scale. Sometimes it’s just too hard to know all of the data you will need to fetch, and the dependencies between that data.

OK, so serial IO is bad. More on that in a bit. Let’s talk about REST now.


At IMVU, we went all-in on REST, building a rather substantial framework to make it easy to create REST services. We anticipated the fact that REST services tend to require many round-trips from clients, so we built a response denormalization system, where the service can anticipate the other services a client will need to hit and include their responses too.

This sounded great at first, and in fact was mostly an improvement from the prior state of affairs. But, at scale, REST has some challenging performance problems. For one, REST services have a defined schema, and they always return the data they advertise. As I mentioned, if a service doesn’t return all the data a client needs, the client needs to hit more services, increasing the number of client-to-server round-trips. In addition, because the service always returns the same set of data, it must query the backend database to fetch more data than the client even cares about.

This phenomenon tends to happen most frequently with core domain objects like users. Because users are so important, they accumulate relationships with so many pieces of data (e.g. list of subscribed experiments, contact lists, shopping carts, etc.), almost all of which is irrelevant to most clients.

Why GraphQL?

This is where GraphQL comes in. In GraphQL, the client specifies the data that it wants. There is only one request, even if the data is deeply nested, and the server only has to fetch exactly what’s needed.

Consider the query:

query HeroNameQuery {
    newhope_hero: hero(episode: NEWHOPE) {
    empire_hero: hero(episode: EMPIRE) {
    jedi_hero: hero(episode: JEDI) {

It looks up the hero of each of the first three Star Wars movies, fetches any information it needs from the backend, and returns only what is requested:

"data": {
    "HeroNameQuery": {
        "jedi_hero": {
            "name": "R2-D2"
        "empire_hero": {
            "name": "Luke Skywalker"
        "newhope_hero": {
            "name": "R2-D2"

There are GraphQL implementations for many languages but many of them don’t solve the serial IO problem I described to start this post. In fact, a naive GraphQL implementation might issue IO per field of each object requested.

For hack week, I wanted to explore the design of a GraphQL server that issued all of its backend IO in optimally concurrent batches.

Why Haskell?

Dropbox doesn’t use Haskell, but I find it to be a great language for exploring design spaces, particularly around execution models. Also, Facebook open sourced their excellent Haxl library which converts code written with serial IO into efficient batched requests. Haxl provides an effect type that, when it can understand that two data fetches are independent, runs them both in parallel. When all Haxl operations are blocked on backend data fetches, only then does it issue the backends. My prototype GraphQL resolvers are surprisingly naive, specified with sequential code. Haxl automatically batches up the requests and hands them to the DataSource for execution.

In addition, there is nothing clever about the GraphQL request handler or graph traversal and filtering — all cleverness is handled by Haxl.

At this point, you might be thinking “So was anything challenging about this project?” On the data fetch side, no, not really. :) However, I did run into one unexpected snag when using Haxl for data updates: because Haxl tries really hard — and in a general, composable way — to run your IO in parallel, you must be careful about how writes and reads are sequenced together. If you leave out the () <- on line 105, Haskell sequences the operations with >> instead of >>=, and Haxl’s >> uses the Applicative bind operation instead of the Monad bind operation, and thus it assumes it can run them in parallel. And, as you might expect, issuing a write and read to the same data concurrently doesn’t end well. :)


I am very thankful for jdnavarro’s excellent GraphQL query parser. With it, in four days, I was able to get a prototype GraphQL server up and running. Using Haxl and Hedis, I have it hooked up to a Redis data source, and it correctly batches all independent IO reads and runs them concurrently.

The Star Wars hero names query above results in two batched backend requests:

fetch star wars batch of size 3:
fetch star wars batch of size 2:

You can even see that it noticed that R2-D2 is the hero of both movies, and only requested its info once.

The performance is pretty good: on some AWS instances, I measured about 3500 queries per second per machine and a query latency averaging 3 ms. Of course, that could worsen as permissions checks and so on are implemented. On the other hand, the code is completely unoptimized, full of lazy data structures and an unoptimized parser without a parser cache.

The prototype code is open sourced on the Dropbox GitHub.

It’s probably possible to build something like Haxl in Python with generators, but you’d have to give up standard loops and list comprehensions, instead using some kind of parallel map operation. You also would not benefit from anything that teases concurrency out of imperative functions like GHC’s upcoming ApplicativeDo extension. There are some things that Haskell’s restricted effects are uniquely good at. :)

I’d guess it would probably be even trickier to do a good implementation in Go given that the programmer has less control over goroutines than Python’s generators and Monads in Haskell. That said, perhaps someone will discover a clean, optimal implementation. We should pay attention to efforts like this.

I think GraphQL will be a huge deal for high-performance web services. It’s harder to implement than REST or traditional RPC services, but the reduction in response latencies could be substantial. Naive implementations aren’t going to have good performance, but if you are interested in aiming straight at the finish line, Haskell and Haxl will give you a head start.


Several years ago, when we started training IMVU’s client team on modern web technologies, an engineer suggested we give CoffeeScript a chance. The purported benefit was that CoffeeScript helps you stay within "the good parts" of JavaScript by providing concise syntax for lambdas, object literals, destructuring binds, object iteration, and other idioms.

However, after a month or two of the team using it, we all agreed to switch back to JavaScript.

You see, the problem with CoffeeScript is that its benefits are shallow. CoffeeScript’s motto is "It’s just JavaScript", even though it has the opportunity to be so much more.

Programming languages have several aspects. The syntax of a language affects a person’s ability to visually parse its structure, the amount of typing required, and a language’s overall visual style. Syntax has a large effect on a language’s overall pleasantness, so people tend to give it a great deal of attention. Also, syntax is the first impression people have of a language. I’ve heard people say "JavaScript is in the C family" even though almost the only thing the two languages have in common is curly braces.

The semantics of a language, on the other hand, provide upper bounds on the size of programs that can be written with it. Language semantics apply constraints, and thus guarantees, making it easier and safer to work on larger programs. Examples follow:

  • this value is always an integer
  • this program does not write to memory it doesn’t own
  • this function has the same result no matter how many times it is called (idempotence)
  • this function has no observable side effects (purity)
  • memory will be automatically reclaimed when it is no longer reachable
  • the completion callback for this asynchronous operation is called exactly once
  • MySQL queries cannot be issued in the middle of a Redis transaction
  • this object always contains properties x, y, and z
  • if you have an object of type X, it will never be null, and it will always be fully-constructed
  • if an object is constructed on the stack, its destructor will run before the current function returns

Different languages provide different sets of guarantees.

Both syntax and semantics are important. Syntax solves for pleasantness and human factors, but strong semantics are what help people safely write larger and more complicated programs.

To be especially clear, I am not saying syntax is unimportant. As an example, consider C++11 lambda functions. They are "merely" syntax sugar for code structures possible in C++98. However, because they’re so convenient, they fundamentally change the economics of writing in certain styles. Thus syntax provides a vital role in guiding programmers down good paths.

The problem with CoffeeScript, and ultimately the reason why IMVU dropped it, is that, while it looks nice at first glance, it doesn’t substantially improve the semantics of JavaScript. Missing properties still return undefined. It’s still possible to trip over this in lambda functions. There are a few wins: it’s no longer easy to accidentally write to a global variable (like you might in JavaScript by forgetting var), and CoffeeScript’s for own makes it easy to iterate over an object’s properties without walking the prototype chain. But those wins can be easily obtained in JavaScript through tools such as jshint.

At IMVU, we decided CoffeeScript’s syntactic brevity was not worth the cost of having a translation step for our code. That translation step adds a bit of latency between editing a file and reloading it in your browser, and it adds a degree of mental overhead for programmers, especially those not deeply familiar with the web. That is, not only must a programmer know JavaScript semantics, but they must also know how CoffeeScript translates into JavaScript.

And, sometimes, that translation is not at all obvious. Consider the following examples. I encourage you to try to guess what they do and then paste the code into

The following snippets all call fn with two arguments:

fn a, b
fn a,
fn x:y, b

However, if the first argument is not an object literal, then it’s a syntax error:


How many parameters do you think are passed to fn? What are their values?

fn (

There are too many ways to express branches. I hypothesize that having so many different control flow idioms makes rapid scanning of flow through a function harder, with no meaningful benefit.

if x then return
unless x then return
return if x
return unless x
for x in ls then foo(x)
foo(x) for x in ls
foo(x) while i++ < 10
foo(x) until i++ > 10

How do you think the following snippets parse?



t for t in ls if t > 1
t for t in ls when t > 1

o = (t for t in ls when t > 1)
o = (for t in ls then)

foo ? 1 : 2 # legal
foo ? [] : 2 # not legal

fn a, debugger, b
a = debugger
fn a debugger # not legal?? is debugger an expression or not?

e = try f

a = if b then

a = [f for f in x]
a = (f for f in x)

I’ve literally been told the best way to cope is to install an editor plugin that makes it easy to compile bits of code so I can verify that the compiled JavaScript matches the behavior I intended.

The following code is legal! And produces a rather strange result.

a = (for b in x then)

Yet not everything is an expression.

a = (return) # illegal

CoffeeScript has semi-open and closed intervals. Guess which is which?


Real CoffeeScript Hazards

Now that I’m done making fun of the syntax, in all seriousness, there are actually a couple real dangers in CoffeeScript too.

Because CoffeeScript has no syntax for explicitly introducing a variable binding, the assignment to a name introduces a variable binding in that scope. If an existing variable is defined in an outer scope, the inner assignment will reuse the outer variable binding. This means that changes to outer scopes can change the meaning of code inner scopes.

Consider the following code (and in case it’s not clear, do simply calls its argument function with no arguments):

a = 'hello world'
do ->
  for i in [0...10]
    print i
print a

It prints the values 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, "hello world".

Now rename the outer variable a to i.

i = 'hello world'
do ->
  for i in [0...10]
    print i
print i

We didn’t change the inner scope at all. But now the inner loop does not create a new variable binding — instead it reuses the outer variable i. This code now prints 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9.

Another risk that comes up in CoffeeScript comes from the fact that the last expression in a function is the return value. Consider the following function:

processStrings = ->
  for longString in dataSource
    process longString

Let’s assume processStrings is intended to have no return value. Let’s also assume that the process function has a large return value that we intend to discard.

processStrings will actually return a list containing the return value of process for each iteration through the loop. This is not a correctness hazard so much as a performance hazard, but it’s come up several times in the codebase I’ve been working on. To fix processStrings, it should be written as follows:

processStrings = ->
  for longString in dataSource
    process longString

This situation is why Haskell distinguishes between forM and forM_.

CoffeeScript Has Benefits Too

I’ve just listed a lot of unfortunate things about CoffeeScript, but it does provide some rather significant benefits. In fact, I think that a team that deeply understands JavaScript will enjoy CoffeeScript simply because it’s so concise.

Arrow functions and CoffeeScript’s braceless object literals do make it easy to visually scan through large chunks of code. Traditional JavaScript is littered with curly braces and parentheses – sometimes JavaScript feels as noisy as a LISP.

Classes nicely encode the pattern that everyone expresses manually with prototypes in JavaScript. Someone recently made the comment "I’ve been writing CoffeeScript for years and I’ve never needed to understand JavaScript prototypes". I can believe that, especially if you’re not interacting with JavaScript libraries that make direct use of prototypes.

Destructuring binds are a huge deal. They dramatically simplify the keyword arguments pattern. Consider the following imvujs JavaScript module:

  foo: 'foo.js',
  bar: 'bar.js',
}, function(imports) {
  var foo =;
  var bar =;

The corresponding CoffeeScript is much shorter and easier to read:

  foo: 'foo.js'
  bar: 'bar.js'
, ({foo, bar}) ->

Fortunately, many of the niceties of CoffeeScript are making it into ES6 and TypeScript, so CoffeeScript’s value proposition in the future is not very high.

So what are you saying, Chad?

My opinions tend to be nuanced. I am not the kind of person to come out guns blazing and say "X sucks!" or "Y is the best thing ever!"

Do I think CoffeeScript is a big win over JavaScript? Nope.

Would I start a new project in CoffeeScript today? Nope.

Would I be happy working in an existing CoffeeScript codebase? Sure.

Do I think CoffeeScript has some advantages over JavaScript? Sure.

Do I think CoffeeScript has much of a future given that ES6 and TypeScript have arrow functions, classes, and destructuring binds? Nope.

Do I think it’s possible to have a language with the syntax niceties of CoffeeScript and real semantic benefits like coroutines and static types? Absolutely, and I’m very much looking forward to the new few years of AltJS innovation.

I think Jeremy Ashkenas deserves a lot of credit for exploring this programming language space. It was a worthwhile experiment, and it validated the usefulness of several features that made it into ES6.


My opinions on this topic are not new. Many others have come to the same conclusion. For example:

The Long-Term Problem With Dynamically Typed Languages

This may be the only time I weigh in on the static vs. dynamic typing discussion. Each side has its extreme proponents, and people differ in their ability and desire to work in systems with implicit invariants.

Many years ago, back when Java and C++ were the Mainstream Languages and Python was the shiny new up-and-comer, I read Bruce Eckel’s arguments in support of dynamically typed languages, and some of the nonobvious (at the time) ways you can get more done at higher quality in a more flexible language.

If I recall correctly, the arguments were that type systems can be approximated with unit tests (neither subsumes the other), and the ease of getting code up and running in a dynamically-typed language outweighs the benefits static types provide, especially when (explicit) static types require more thought and commitment before the problem space has been developed and understood. That is, dynamic languages are more fluid, and you can test bits of the program even before they’re made to fit with the rest of the code. In statically typed languages, on the other hand, the code must compile before it can be run at all.

Note: I feel a temptation to qualify every statement I’m making, because the programming language space has since been explored in more depth, and significantly more nuance is understood now than was in the late 90s and early 2000s. We are in the midst of a programming language renaissance.

I have some empathy for Bruce Eckel’s argument. Early on in a system’s life it is important to have some flexibility and "play". However, as the correct shape or factoring of the software is discovered, it becomes useful to have the computer enforce more and more invariants.

Note that some people treat static typing vs. dynamic typing as a binary switch. In reality it’s a bit more of a continuum. Python will tend to throw exceptions rather than implicitly convert values. JavaScript will implicitly convert values sometimes, but not always. PHP tries to make whatever code you wrote run, even if it’s nonsensical. Thus, among dynamically-typed languages, some languages tend to catch type errors sooner, simply because they’re less forgiving at runtime.

IMVU’s technology stack is weighted heavily towards dynamic languages. This is largely a product of the year of its birth. Everyone was writing PHP and Python in the mid 2000s. The website backend consists of a prodigious amount of PHP. The website frontend is JavaScript, because that’s really the only option. The client is built out of C++ (for the engine), Python (for all the business logic), and JavaScript (for the UI).

After ten years, however, I can look back and speak with confidence about the long-term effects of building a business on top of dynamic languages.

Most of the core PHP APIs from IMVU’s early years are still around. And they’re terrible. They’re terrible but they’re "impossible" to change, because the iteration time on running all the tests, verifying in production, and so on is never worth the smallish, indirect benefits of improving the APIs. Obviously, nothing is impossible, but relying on a giant test suite and test infrastructure to prove the correctness of renaming a function or adding a parameter, in practice, is a significant coefficient of friction on the software’s ability to evolve over time.

Not improving core APIs results in a kind of broken windows effect. When APIs are confusing or vague, people tend not to notice other confusing or vague APIs, and it slows everyone down in the long term.

Thus, instead of easily refactoring the legacy APIs, people think "I’ll just make a new one and migrate the code over!" And now you have two hard-to-change APIs. And then three. And the cycle continues. Additionally, this cycle is fed by architect types who know or think they know a better way to do things, but can’t be bothered to update the old systems.

With a type system (such as in C++, Go, Haskell, or Java), the iteration time on renaming a function or changing its signature is a single build. In Haskell, it’s even better: simply type “:r” into ghci and see all the places where the code needs to be updated. That is, a type system flattens the cost of change curve. Small API or performance improvements that otherwise wouldn’t be worth it suddenly are, because the compiler can quickly tell you all the places that need to be updated. You don’t need to run all the tests across the history of the company in order to see whether you’ve missed a spot.

There’s another huge cost of dynamic languages. It has to do with engineers building an understanding of existing systems. For all the mental energy spent on whitespace and curly braces, the most important component of understanding software is grasping data flow. Programs exist to transform data, and understanding how that’s done is paramount. Types accelerate the process of building a mental understanding of the program, especially when lightweight types such as CustomerId (vs. Int) are used.

I can’t say I regret IMVU building its technology on top of PHP or Python or JavaScript but I now believe that the long-term reduction in agility is not worth the short-term benefits, especially given the plethora of other options: Java, Go, Haskell, perhaps F# or some other ML, and so on.

I wonder how Facebook feels on this topic. :)

Announcing BufferBuilder: Encode JSON in Haskell 2-5x faster than Aeson


This tale is a little long and detailed, so for those of you who want immediate payoff…

Andy and I wrote a Haskell library called Data.BufferBuilder (including Data.BufferBuilder.Utf8 and Data.BufferBuilder.Json) which makes it easy to safely and efficiently encode Haskell data structures into JSON. In our benchmarks, using Data.BufferBuilder.Json to encode JSON is 4-5x as fast as using Aeson.

Even if you’re already using Aeson, you can benefit somewhat from BufferBuilder’s improved encoding performance. The buffer-builder-aeson package adds a ToJson instance for Aeson’s Value type, which our benchmarks show is 50% to 100% faster than Aeson’s built-in encoder. All you need to do is call Data.BufferBuilder.Json.encodeJson instead of Data.Aeson.encode!

Why did we build BufferBuilder?

Some of IMVU’s backend services are written in Haskell. While Haskell is incredible for many use cases, we ran into an unexpected bottleneck: JSON encoding. Our service response structure produces quite a lot of JSON, and much of that JSON contains URLs encoded into JSON strings.

Amazingly, URL and JSON encoding was showing up as a significant cost center when generating JSON responses. Some services spent over a second encoding JSON!

When faced with a performance problem, my first instinct is to either pound the problem into the ground or unmake the problem so the code isn’t necessary in the first place.

So let’s look at the problem holistically:

  • A JSON response is represented in memory as a collection of varied, in-memory data structures. The response happens to contain many URLs — sometimes more than a hundred.
  • URLs are represented by a data structure consisting of the protocol, hostname, path segments, query string, and so on.
  • Each URL becomes a JSON string.
  • Each in-memory data structure is converted to a JSON object whose properties depend on the type on the corresponding data structure.

Using Aeson to encode all of this results in the following steps:

  • ToJSON instances convert Haskell data types to an AST of Aeson Values.
  • The keys of an Aeson object are Text values. In memory, Text is encoded in UTF-16. Thus, URLs must be translated from their in-memory representation (ours is ASCII) into UTF-16 before they fit into the Aeson AST.
  • Then, the entity bodies are converted into JSON objects, where the keys are known at compile-time, but must be encoded as UTF-16.
  • The entire AST is encoded into a Lazy Text Builder.
  • Then the Lazy Text is encoded into a Lazy ByteString containing UTF-8.
  • Then, to generate the response body, the Lazy ByteString is concatenated into a single strict ByteString.

That’s a lot of steps! To summarize:

(URLs and Custom Data Types) -> Aeson AST -> Lazy Text -> Lazy UTF-8 ByteString -> Strict ByteString

The Design of BufferBuilder

This is where Andy and I sat down to create an API to let us cleanly express JSON encoding without sacrificing type safety OR performance.

We know that a fast, if not the fastest, way to build up a buffer of bytes is to allocate a chunk of memory, stream writes to it, and either chunk or realloc() as needed. Obviously, this kind of code can be trivially expressed in C:

void buffer_append(buffer* b, const char* s, size_t length) {
    if (!b->has_room(length)) {
    memcpy(b->data + b->size, s, length);

Because I’d been told that bouncing between Haskell and C with the foreign function interface can be slow, my first approach was to attempt to build a Haskell monad that grabbed the RealWorld token out of IO (IO a is basically a newtype around RealWorld -> (RealWorld, a)), augmented it with some extra “local variables” like the output ptr, capacity, and current size, and manually implemented allocation and memory writes with GHC.Prim APIs. GHC did not like this at all. The generated code ran 20 times slower than naive usage of Data.ByteString.Builder. Nonetheless, it was an interesting technique, so maybe I’ll write about it another time.

Surely it was possible to do better. So I tried the foreign function interface after all.

I wrote a tiny C API that allocated the underlying growable buffer. It provided APIs for appending bytes, buffers, UTF-16-to-UTF-8 transcoding, and so on. These FFI calls can only happen within IO actions, but building buffers is fundamentally a pure operation, and the provided Haskell API should be effectively pure. The solution is to offer a restricted state monad like ST which limits the operations within the monad to safe buffer-building operations.

This approach was by the fastest of any that I tried. In fact, in a sequence of appendBS operations, if the arguments are known-strict ByteStrings, GHC will compile the appendBS sequence directly into a series of cheap C function calls. For example, the following code:

data BSTriple = BSTriple !ByteString !ByteString !ByteString

writeBSTriple :: BSTriple -> BufferBuilder ()
writeBSTriple !(BSTriple a b c) = do
    appendBS a
    appendBS b
    appendBS c

compiles into something like:

movq %rbx,%rdi
movq 120(%rsp),%rax
movq %rax,%rsi
movq 96(%rsp),%rax
movq %rax,%rdx
movq 112(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp
movq %rbx,%rdi
movq 152(%rsp),%rax
movq %rax,%rsi
movq 128(%rsp),%rax
movq %rax,%rdx
movq 144(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp
movq %rbx,%rdi
movq 216(%rsp),%rax
movq %rax,%rsi
movq 160(%rsp),%rax
movq %rax,%rdx
movq 208(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp

Obviously GHC’s code generator has some room to improve, but the main thrust is exactly right. An equivalent C++ API would generate much the same kind of code, modulo the poor instruction and register selection, which overall doesn’t matter too much here.

In almost all situations, Haskell isn’t going to be as fast as straightforward C, but with some work and appropriate use of FFI, it’s possible to come close.


Once we had an API to safely and efficiently build up buffers of bytes, we wanted to build safe APIs on top for constructing valid UTF-8 buffers and valid JSON.

Utf8Builder is a newtype around BufferBuilder with a restricted API. If you only call safe functions in Data.BufferBuilder.Utf8, the result is guaranteed to be valid UTF-8. Unsafe functions are provided for when you know precisely what you’re doing.


Data.BufferBuilder.Json is built on top of Data.BufferBuilder.Utf8. Data.BufferBuilder.Json’s Value type is a newtype around Utf8Builder, meaning there’s no Aeson-style AST. Each Value simply knows how to write itself into an output buffer. Just like how the safe Utf8Builder functions guarantee the output is legal UTF-8, the safe JsonBuilder functions guarantee (almost) that the output is a legal JSON document. (There are a couple caveats, see the documentation for details.)

I suspect Data.BufferBuilder.Json is approaching the limit of how fast a JSON encoder can be. And thanks to the beauty of Haskell, it’s convenient and safe!

If you’re using Aeson and encoding performance matters to you, give BufferBuilder a shot!

High Order Bits (or: mostly correct but in the right direction)

First, the moral: Unit tests are good. But reliable design is better.

Even if you have to deal with short-term pain. Even if you haven’t figured out all of the edge cases.

Let me back up. I love automated tests. I’ve been test-driving code at IMVU since I started. We buy new engineers a copy of Test-Driven Development: By Example. Whenever there is a bug, we write tests to make sure it never happens again.

After years of working this way, seeing projects succeed and fail, I’d like to refine my perspective. Let me share a story.

IMVU was originally a bolt-on addition to AOL Instant Messenger. Two IMVU clients communicated with each other by manipulating AOL IM’s UI and scanning the window for new text messages, much like a screen reader would. This architecture propagated some implications through our entire codebase:

1) The messaging layer was inherently unreliable. AOL IM chat windows could be manipulated by the user or other programs. Thus, our chat protocol was built around eventual consistency.

2) We could not depend on an authoritative source of truth. Since text-over-IM is peer-to-peer, no client has a true view into where all of the avatars are sitting or who currently owns the room.

Thus, in 2008, long after we’d dropped support for integration with third-party IM clients and replaced it with an authoritative state database, we continued to have severe state consistency bugs. The client’s architecture still pretended like the chat protocol was unreliable and state was peer-to-peer instead of authoritative.

To address these bugs, we wrote copious test coverage. These were deep tests: start up a local Apache and MySQL instance, connect a couple ClientApp Python processes to them, have one invite another to chat, and assert that their scene graphs were consistent. We have dozens of these tests, for all kinds of edge cases. And we thought we’d fixed the bugs for good…

But the bugs returned. These tests are still running and passing, mind you, but differences in timing and sequencing result in the same state consistency issues we saw in 2008. It’s clear that test coverage is not sufficient to prevent these types of bugs.

So what’s the other ingredient in reliable software? I argue that, in agile software development, correct-by-design systems are underemphasized.

Doesn’t Test Driven Development guide me to build correct-by-design systems?

TDD prescribes a “red, green, refactor” rhythm, where you write a failing test, do the simplest work to make it pass, and then refactor the code so it’s high quality. TDD helps you reach the “I haven’t seen it fail” stage, by verifying that yes, your code can pass these tests. But just because you’ve written some tests doesn’t mean your code will always work.

So there’s another level of reliability: “I have considered the ways it can fail, but I can’t think of any.” This statement is stronger, assuming you’re sufficiently imaginative. Even still, you won’t think of everything, especially if you’re working at the edge of your human capacity (as you should be).

Nonetheless, thoughtfulness is better than nothing. I recommend adding a fourth step to your TDD rhythm: “Red, green, refactor, what else could go wrong?” In that fourth step, you deeply examine the code and think of additional tests to write.

The strongest level of software correctness is not about finding possible failure conditions; it’s about proving that your system works in the presence of all inputs. Correctness proofs for non-trivial algorithms are too challenging for all of the code we write, but in a critical subsystem like chat state management, the time spent on a lightweight proof will easily pay for itself. Again, I’m not advocating that we always prove the correctness of our software, but we should at least generally be convinced of its correctness and investigate facts that indicate otherwise. TDD by itself is not enough.

OK, so we can’t easily test-drive or refactor our way out of the chat system mess we got ourselves into, because it’s simply too flawed, so what can we do? The solution is especially tricky, because in situations like this, there are always features that depend on subtleties of the poor design. A rewrite would break those features, which is unacceptable, right? Even if breaking those features is acceptable to the company, there are political challenges. Imagine the look on your product owner’s face when you announce “Hey I have a new architecture that will break your feature but provide no customer benefit yet.”

The ancient saying “You can’t make an omelette without breaking some eggs” applies directly here. Preserving 100% feature compatibility is less important than fixing deep flaws.

Why? High-order bits are hardest to change, but in the end, are all that matters. The low-order bits are easy to change, and any competent organization will fix the small things over time.

I can’t help but recall the original iPhone. Everyone said “What?! No copy and paste?!” Indeed, the iPhone couldn’t copy and paste until 18 months and two major OS releases later. Even still, the iPhone reshaped the mobile industry. Clearly 100% feature compatibility is not a requirement for success.

My attitude towards unit testing has changed. While I write, run, and love unit testing, I value correct-by-design subsystems even more. When it comes down to it, tests are low-order bits compared to code that just doesn’t break.

For those curious, what are we doing about the chat system? I’ll let Jon’s GDC presentation speak for itself.

Tracing Leaks in Python: Find the Nearest Root

Garbage Collection Doesn’t Mean You Can Ignore Memory Altogether…

This post is available on the IMVU Engineering Blog.

Garbage collection removes a great deal of burden from programming. In fact, garbage collection is a critical language feature for all languages where abstractions such as functional closures or coroutines are common, as they frequently create reference cycles.

IMVU is a mix of C++ and Python. The C++ code generally consists of small, cohesive objects with a clear ownership chain. An Avatar SceneObject owns a ModelInstance which owns a set of Meshes which own Materials which own Textures and so on… Since there are no cycles in this object graph, reference-counting with shared_ptr suffices.

The Python code, however, is full of messy object cycles. An asynchronous operation may hold a reference to a Room, while the Room may be holding a reference to the asynchronous operation. Often two related objects will be listening for events from the other. While Python’s garbage collector will happily take care of cycles, it’s still possible to leak objects.

Imagine these scenarios:

  • a leaked or living C++ object has a strong reference to a Python object.
  • a global cache has a reference to an instance’s bound method, which implicitly contains a reference to the instance.
  • two objects with __del__ methods participate in a cycle with each other, and Python refuses to decide which should destruct first

To detect these types of memory leaks, we use a LifeTimeMonitor utility:

a = SomeObject()
lm = LifeTimeMonitor(a)
del a
lm.assertDead() # succeeds

b = SomeObject()
lm = LifeTimeMonitor(b)
lm.assertDead() # raises ObjectNotDead

We use LifeTimeMonitor’s assertDead facility at key events, such as when a user closes a dialog box or 3D window. Take 3D windows as an example. Since they’re the root of an entire object subgraph, we would hate to inadvertently leak them. LifeTimeMonitor’s assertDead prevents us from introducing an object leak.

It’s good to know that an object leaked, but how can you determine why it can’t be collected?

Python’s Garbage Collection Algorithm

Let’s go over the basics of automatic garbage collection. In a garbage-collected system there are objects and objects can reference each other. Some objects are roots; that is, if an object is referenced by a root, it cannot be collected. Example roots are the stacks of live threads and the global module list. The graph formed by objects and their references is the object graph.

In SpiderMonkey, Mozilla’s JavaScript engine, the root set is explicitly-managed. SpiderMonkey’s GC traverses the object graph from the root set. If the GC does not reach an object, that object is destroyed. If C code creates a root object but fails to add it to the root set, it risks the GC deallocating the object while it’s still in use.

In Python however, the root set is implicit. All Python objects are ref-counted, and any that can refer to other objects — and potentially participate in an object cycle — are added to a global list upon construction. Each GC-tracked object can be queried for its referents. Python’s root set is implicit because anyone can create a root simply by incrementing an object’s refcount.

Since Python’s root set is implicit, its garbage collection algorithm differs slightly from SpiderMonkey’s. Python begins by setting GCRefs(o) to CurrentRefCount(o) for each GC-tracked PyObject o. Then it traverses all referents r of all GC-tracked PyObjects and subtracts 1 from GCRefs(r). Then, if GCRefs(o) is nonzero, o is an unknown reference, and thus a root. Python traverses the now-known root set and increments GCRefs(o) for any traversed objects. If any object o remains where GCRefs(o) == 0, that object is unreachable and thus collectible.

Finding a Path From the Nearest Root to the Leaked Object

Now that we know how Python’s garbage collector works, we can ask it for its set of roots by calculating GCRefs(o) for all objects o in gc.get_objects(). Then we perform a breadth-first-search from the root set to the leaked object. If the root set directly or indirectly refers to the leaked object, we return the path our search took.

Sounds simple, but there’s a catch! Imagine that the search function has signature:

PyObject* findPathToNearestRoot(PyObject* leakedObject);

leakedObject is a reference (incremented within Python’s function-call machinery itself) to the leaked object, making leakedObject a root!

To work around this, change findPathToNearestRoot so it accepts a singleton list containing a reference to the leaked object. findPathToNearestRoot can borrow that reference and clear the list, ensuring that leakedObject has no untracked references.

findPathToNearestRoot will find paths to expected Python roots like thread entry points and module objects. But, since it directly mirrors the behavior of Python’s GC, it will also find paths to leaked C references! Obviously, it can’t directly point you to the C code that leaked the reference, but the reference path should be enough of a clue to figure it out.

The Code

template<typename ArgType>
void traverse(PyObject* o, int (*visit)(PyObject* visitee, ArgType* arg), ArgType* arg) {
    if (Py_TYPE(o)->tp_traverse) {
        Py_TYPE(o)->tp_traverse(o, (visitproc)visit, arg);

typedef std::map<PyObject*, int> GCRefs;

static int subtractKnownReferences(PyObject* visitee, GCRefs* gcrefs) {
    if (gcrefs->count(visitee)) {
    return 0;

typedef int Backlink; // -1 = none

typedef std::vector< std::pair<Backlink, PyObject*> > ReferenceList;
struct Referents {
    std::set<PyObject*>& seen;
    Backlink backlink;
    ReferenceList& referenceList;

static int addReferents(PyObject* visitee, Referents* referents) {
    if (!referents->seen.count(visitee) && PyObject_IS_GC(visitee)) {
        referents->referenceList.push_back(std::make_pair(referents->backlink, visitee));
    return 0;

static Backlink findNextLevel(
    std::vector<PyObject*>& chain,
    const ReferenceList& roots,
    PyObject* goal,
    std::set<PyObject*>& seen
) {
    if (roots.empty()) {
        return -1;

    for (size_t i = 0; i < roots.size(); ++i) {
        if (roots[i].first != -1) {
            if (goal == roots[i].second) {
                return roots[i].first;

    ReferenceList nextLevel;
    for (size_t i = 0; i < roots.size(); ++i) {
        Referents referents = {seen, i, nextLevel};
        traverse(roots[i].second, &addReferents, &referents);

    Backlink backlink = findNextLevel(chain, nextLevel, goal, seen);
    if (backlink == -1) {
        return -1;

    return roots[backlink].first;

static std::vector<PyObject*> findReferenceChain(
    const std::vector<PyObject*>& roots,
    PyObject* goal
) {
    std::set<PyObject*> seen;
    ReferenceList unknownReferrer;
    for (size_t i = 0; i < roots.size(); ++i) {
        unknownReferrer.push_back(std::make_pair<Backlink>(-1, roots[i]));
    std::vector<PyObject*> rv;
    // going to return -1 no matter what: no backlink from roots
    findNextLevel(rv, unknownReferrer, goal, seen);
    return rv;

static object findPathToNearestRoot(const object& o) {
    if (!PyList_Check(o.ptr()) || PyList_GET_SIZE(o.ptr()) != 1) {
        PyErr_SetString(PyExc_TypeError, "findNearestRoot must take a list of length 1");

    // target = o.pop()
    object target(handle<>(borrowed(PyList_GET_ITEM(o.ptr(), 0))));
    if (-1 == PyList_SetSlice(o.ptr(), 0, 1, 0)) {

    object gc_module(handle<>(PyImport_ImportModule("gc")));
    object tracked_objects_list = gc_module.attr("get_objects")();
    // allocating the returned list may have run a GC, but tracked_objects won't be in the list

    std::vector<PyObject*> tracked_objects(len(tracked_objects_list));
    for (size_t i = 0; i < tracked_objects.size(); ++i) {
        object to = tracked_objects_list[i];
        tracked_objects[i] = to.ptr();
    tracked_objects_list = object();

    GCRefs gcrefs;
    // TODO: store allocation/gc count per generation

    for (size_t i = 0; i < tracked_objects.size(); ++i) {
        gcrefs[tracked_objects[i]] = tracked_objects[i]->ob_refcnt;

    for (size_t i = 0; i < tracked_objects.size(); ++i) {
        traverse(tracked_objects[i], subtractKnownReferences, &gcrefs);

    // BFS time
    std::vector<PyObject*> roots;
    for (GCRefs::const_iterator i = gcrefs.begin(); i != gcrefs.end(); ++i) {
        if (i->second && i->first != target.ptr()) { // Don't count the target as a root.
    std::vector<PyObject*> chain = findReferenceChain(roots, target.ptr());

    // TODO: assert that allocation/gc count per generation didn't change

    list rv;
    for (size_t i = 0; i < chain.size(); ++i) {

    return rv;

How to Write an Interactive, 60 Hz Desktop Application

This post is available on the IMVU Engineering Blog.

IMVU’s client application doesn’t fit neatly into a single development paradigm:

  • IMVU is a Windows desktop application. Mouse clicks, window resizes, and dialog boxes must all respond with imperceptible latency. Running IMVU should not significantly affect laptop battery life.
  • IMVU is an interactive 3D game. The 3D scene must be simulated and drawn at smooth, interactive frame rates, 60 Hz if possible.
  • IMVU is a networked application. Sending and receiving network packets must happen quickly and the UI should never have to wait for I/O.

Thus, let us clarify some specific requirements:

  • Minimal CPU usage (and thus battery consumption) when the application is minimized or obscured.
  • Minimal CPU usage in low-complexity scenes. Unlike most games, IMVU must never unnecessarily consume battery life while waiting in spin loops.
  • Animation must continue while modal dialog boxes and menus are visible. You don’t have control over these modal event loops, but it looks terrible if animation pauses while menus and dialogs are visible.
  • Animation must be accurate and precise. It looks much better if every frame takes 22 milliseconds (45 Hz) than if some frames take 30 milliseconds and some take 15 milliseconds (averaging 45 Hz).
  • Animation must degrade gracefully. In a really complex room with a dozen avatars, IMVU can easily spend all of a core’s CPU trying to animate the scene. In this case, the frame rate should gradually drop while the application remains responsive to mouse clicks and other input events.
  • Support for Windows XP, Vista, and 7.

Naive Approach #1

Windows applications typically have a main loop that looks something like:

MSG msg;
while (GetMessage(&msg, 0, 0, 0) > 0) {

What went wrong

Using SetTimer/WM_TIMER sounds like a good idea for simulation and painting, but it’s way too imprecise for interactive applications.

Naive Approach #2

Games typically have a main loop that looks something like the following:

while (running) {
    // process input events
    MSG msg;
    while (PeekMessage(&msg, 0, 0, 0, PM_REMOVE)) {

    if (frame_interval_has_elapsed) {

What went wrong

The above loop never sleeps, draining the user’s battery and burning her legs.

Clever Approach #1: Standard Event Loop + timeSetEvent

void runMainLoop() {
    MSG msg;
    while (GetMessage(&msg, 0, 0, 0) > 0) {

void customWindowProc(...) {
    if (message == timerMessage) {
        // schedules paint with InvalidateRect

    if (0 == InterlockedExchange(&inFlight, 1)) {
        PostMessage(frameTimerWindow, timerMessage, 0, 0);

void startFrameTimer() {
    RegisterClass(customWindowProc, ...);
    frameTimerWindow = CreateWindow(...);
    timeSetEvent(FRAME_INTERVAL, 0, &TimerProc, 0, TIME_PERIODIC);

What went wrong

The main loop’s GetMessage call always returns messages in a priority order. Slightly oversimplified, posted messages come first, then WM_PAINT messages, then WM_TIMER. Since timerMessage is a normal message, it will preempt any scheduled paints. This would be fine for us, since simulations are cheap, but the dealbreaker is that if we fail to maintain frame rate, WM_TIMER messages are entirely starved. This violates our graceful degradation requirement. When frame rate begins to degrade, code dependent on WM_TIMER shouldn’t stop entirely.

Even worse, the modal dialog loop has a freaky historic detail. It waits for the message queue to be empty before displaying modal dialogs. When painting can’t keep up, modal dialogs simply don’t appear.

We tried a bunch of variations, setting flags when stepping or painting, but they all had critical flaws. Some continued to starve timers and dialog boxes and some degraded by ping-ponging between 30 Hz and 15 Hz, which looked terrible.

Clever Approach #2: PostThreadMessage + WM_ENTERIDLE

A standard message loop didn’t seem to be getting us anywhere, so we changed our timeSetEvent callback to PostThreadMessage a custom message to the main loop, who knew how to handle it. Messages sent via PostThreadMessage don’t go to a window, so the event loop needs to process them directly. Since DialogBox and TrackPopupMenu modal loops won’t understand this custom message, we will fall back on a different mechanism.

DialogBox and TrackPopupMenu send WM_ENTERIDLE to their owning windows. Any window in IMVU that can host a dialog box or popup menu handles WM_ENTERIDLE by notifying a global idle handler, which can decide to schedule a new frame immediately or in N milliseconds, depending on how much time has elapsed.

What Went Wrong

So close! In our testing under realistic workloads, timeSetEvent had horrible pauses and jitter. Sometimes the multimedia thread would go 250 ms between notifications. Otherwise, the custom event loop + WM_ENTERIDLE approach seemed sound. I tried timeSetEvent with several flags, but they all had accuracy and precision problems.

What Finally Worked

Finally, we settled on MsgWaitForMultipleObjects with a calculated timeout.

Assuming the existence of a FrameTimeoutCalculator object which returns the number of milliseconds until the next frame:

int runApp() {
    FrameTimeoutCalculator ftc;

    for (;;) {
        const DWORD timeout = ftc.getTimeout();
        DWORD result = (timeout
            ? MsgWaitForMultipleObjects(0, 0, TRUE, timeout, QS_ALLEVENTS)
            : WAIT_TIMEOUT);
        if (result == WAIT_TIMEOUT) {

        MSG msg;
        while (PeekMessage(&msg, 0, 0, 0, PM_REMOVE)) {
            if (msg.message == WM_QUIT) {
                return msg.wParam;


Well, what about modal dialogs?

Since we rely on a custom message loop to animate 3D scenes, how do we handle standard message loops such as the modal DialogBox and TrackPopupMenu calls? Fortunately, DialogBox and TrackPopupMenu provide us with the hook required to implement frame updates: WM_ENTERIDLE.

When the standard DialogBox and TrackPopupMenu modal message loops go idle, they send their parent window a WM_ENTERIDLE message. Upon receiving WM_ENTERIDLE, the parent window determines whether it’s time to render a new frame. If so, we animate all visible 3D windows, which will trigger a WM_PAINT, which triggers a subsequent WM_ENTERIDLE.

On the other hand, if it’s not time to render a new frame, we call timeSetEvent with TIME_ONESHOT to schedule a frame update in the future.

As we saw previously, timeSetEvent isn’t as reliable as a custom loop using MsgWaitForMultipleObjectsEx, but if a modal dialog or popup menu is visible, the user probably isn’t paying very close attention anyway. All that matters is that the UI remains responsive and animation continues while modal loops are open. Code follows:

LRESULT CALLBACK ModalFrameSchedulerWndProc(HWND hwnd, UINT message, WPARAM wparam, LPARAM lparam) {
    if (message == idleMessage) {
    return DefWindowProc(hwnd, message, wparam, lparam);

struct AlmostMSG {
    HWND hwnd;
    UINT message;
    WPARAM wparam;
    LPARAM lparam;

    AlmostMSG* msg = reinterpret_cast<AlmostMSG*>(user_data);
    PostMessage(msg->hwnd, msg->message, msg->wparam, msg->lparam);
    delete msg;

void PostMessageIn(DWORD timeout, HWND hwnd, UINT message, WPARAM wparam, LPARAM lparam) {
    if (timeout) {
        AlmostMSG* msg = new AlmostMSG;
        msg->hwnd = hwnd;
        msg->message = message;
        msg->wparam = wparam;
        msg->lparam = lparam;
        timeSetEvent(timeout, 1, timeForPost, reinterpret_cast<DWORD_PTR>(msg), TIME_ONESHOT | TIME_CALLBACK_FUNCTION);
    } else {
        PostMessage(hwnd, message, wparam, lparam);

class ModalFrameScheduler : public IFrameListener {
    ModalFrameScheduler() { stepping = false; }

    // Call when WM_ENTERIDLE is received.
    void onIdle() {
        if (!frameListenerWindow) {
            idleMessage = RegisterWindowMessageW(L"IMVU_ScheduleFrame");

            WNDCLASS wc;
            ZeroMemory(&wc, sizeof(wc));
            wc.hInstance = GetModuleHandle(0);
            wc.lpfnWndProc = ModalFrameSchedulerWndProc;
            wc.lpszClassName = L"IMVUModalFrameScheduler";

            frameListenerWindow = CreateWindowW(
                0, 0, 0, 0, 0, 0, 0,
                GetModuleHandle(0), 0);

        if (!stepping) {
            const unsigned timeout = ftc.getTimeout();
            stepping = true;
            PostMessageIn(timeout, frameListenerWindow, idleMessage, 0, 0);
    void step() { stepping = false; }

    bool stepping;
    FrameTimeoutCalculator ftc;

How has it worked out?

A custom message loop and WM_ENTERIDLE neatly solves all of the goals we laid out:

  • No unnecessary polling, and thus increased battery life and performace.
  • When possible, the 3D windows animate at 60 Hz.
  • Even degradation. If painting a frame takes 40 ms, the frame rate will drop from 60 Hz to 25 Hz, not from 60 Hz to 15 Hz, as some of the implementations did.
  • Animation continue to play, even while modal dialogs and popup menus are visible.
  • This code runs well on XP, Vista, and Windows 7.

At Least Interview (or: How I Ended Up at IMVU)

Recent conversations have pointed out my career philosophy isn’t as obvious as I thought. Thus, I’d like to share the story of how I joined IMVU and what it means to me and to those I interview.

Why Won’t You Interview?

You wouldn’t believe the number of times I’ve tried and failed to get somebody to take a weekend and fly out to IMVU for an interview. I don’t understand: we’ll happily pay you and your significant other to spend a vacation in San Francisco for the small price of a day’s interview.

There are three possible outcomes:

  1. We make you an offer and you accept.
  2. We make you an offer and you decline.
  3. We don’t make you an offer.

What’s the worst that could happen? Maybe you’ll be forced to actually decide whether IMVU is the right home for you. Maybe IMVU won’t be a fit and you’ll feel a little worse for it.

Either way you’ll have a better sense of yourself and maybe you’ll have stumbled upon a more fulfilling life. Plus you’ll have a free vacation!

I Could Have Joined IMVU 9 Months Earlier

I was halfway through my graduate degree at Iowa State, implementing a functional GPU language. I figured I was headed towards a job working on concurrent languages at Microsoft Research or something. Indeed, that would have been fine! I’m still glad concurrent programming languages aren’t a solved a problem – I can still fantasize about someday contributing to the field.

On July 2nd, 2004 (my birthday!), a guy named Eric Ries e-mails me out of the blue “Are you the same Chad Austin from the boost and cal3d mailing lists? Interested in some contract work?” He was working on some wack AOL Instant Messenger add-on that used BitTorrent as its installer and had a hideous website, so I wasn’t terribly interested. He persisted, and by GDC 2005, he convinced me to come interview.

Once I met the founding team, I came to a few conclusions:

  1. IMVU’s founders were smart. I’d be silly not to work with them.
  2. Coming from graduate school, I didn’t expect much of a salary, so I could take a bunch of stock in exchange.
  3. If IMVU succeeded, win!
  4. If IMVU failed, at least I’d learn a lot.

I wasn’t super excited about the product at first, but IMVU’s founders convinced me to give them a shot, and it was definitely the right decision.

How I “Sell” to Candidates

When I interview candidates, I truly believe that IMVU is a great opportunity. If the candidate is hesitant about committing to such a huge life change, I understand. Moving across the country and taking a new job is a gigantic personal decision, and I can’t make that decision for them.

I never aggressively push IMVU, but I do my best to provide the data necessary to make the right decision. “I’ve been here a while. What information do you need to know whether IMVU is right for you?” I like to believe honesty is as effective as aggressive salesmanship. :)

What This Means

I heartily endorse the philosophy espoused by NetFlix: periodically reconsider your place in the world. I’d be a hypocrite if I said otherwise.

That said, I think our culture overvalues salary. Money is but one (uncorrelated?) component of our motivation. Since humans are notoriously bad at predicting what makes us happy, it’s critical that we weigh facets such as personal freedom, your colleagues, social context, future opportunities, and how your work fits into your personal narrative.

We once tried to hire a frighteningly smart man away from Google. He interviewed but declined our generous offer, saying that his entire social life was tied into Google. In hindsight, the sacrifice we asked of him is clear, and I respect his decision.

In short, stay open-minded, but consciously consider what makes you happy.

Extracting Color and Transparency from Flash

The original source of this post is at the IMVU engineering blog. Subscribe now!

For clarity, I slightly oversimplified my previous discussion on efficiently rendering Flash in a 3D scene. The sticky bit is extracting transparency information from the Flash framebuffer so we can composite the overlay into the scene.

Flash does not give you direct access to its framebuffer. It does, with IViewObject::Draw, allow you to composite the Flash framebuffer onto a DIB section of your choice.

Remembering your Porter-Duff, composition of source over dest is:

Color = SourceColor * SourceAlpha + DestColor * (1 - SourceAlpha)

If the source color is premultiplied, you get:

Color = SourceColor + DestColor * (1 - SourceAlpha)

Assuming we want premultiplied color and alpha from Flash for efficient rendering in the 3D scene, applying the above requires solving for FlashAlpha and FlashColor:

RenderedColor = FlashColor * FlashAlpha + DestColor * (1 - FlashAlpha)

RenderedColor = FlashColor * FlashAlpha + DestColor - DestColor * FlashAlpha

RenderedColor - DestColor = FlashColor * FlashAlpha - DestColor * FlashAlpha

RenderedColor - DestColor = FlashAlpha * (FlashColor - DestColor)

FlashAlpha = (RenderedColor - DestColor) / (FlashColor - DestColor)

If FlashColor and DestColor are equal, then FlashAlpha is undefined. Intuitively, this makes sense. If you render a translucent black SWF on a black background, you can’t know the transparency data because all of the pixels are still black. This doesn’t matter, as I’ll show in a moment.

FlashColor is trivial:

RenderedColor = FlashColor * FlashAlpha + DestColor * (1 - FlashAlpha)

RenderedColor - DestColor * (1 - FlashAlpha) = FlashColor * FlashAlpha

FlashColor = (RenderedColor - DestColor * (1 - FlashAlpha)) / FlashAlpha

FlashColor is undefined if FlashAlpha is 0. Transparency has no color.

What do these equations give us? We know RenderedColor, since it’s the result of calling IViewObject::Draw. We have control over DestColor, since we configure the DIB Flash is drawn atop. What happens if we set DestColor to black (0)?

FlashColor = (RenderedColorOnBlack) / FlashAlpha

What happens if we set it to white (1)?

FlashColor = (RenderedColorOnWhite - (1 - FlashAlpha)) / FlashAlpha

Now we’re getting somewhere! Since FlashColor and FlashAlpha are constant, we can define a relationship between FlashAlpha and RenderedColorOnBlack and RenderedColorOnWhite:

(RenderedColorOnBlack) / FlashAlpha = (RenderedColorOnWhite - (1 - FlashAlpha)) / FlashAlpha

RenderedColorOnBlack = RenderedColorOnWhite - 1 + FlashAlpha

FlashAlpha = RenderedColorOnBlack - RenderedColorOnWhite + 1

FlashAlpha = RenderedColorOnWhite - RenderedColorOnBlack

So all we have to do is render the SWF on a white background and a black background and subtract the two to extract the alpha channel.

Now what about color? Just plug the calculated FlashAlpha into the following when rendering on black.

FlashColor = (RenderedColor - DestColor * (1 - FlashAlpha)) / FlashAlpha

FlashColor = RenderedColorOnBlack / FlashAlpha

Since we want premultiplied alpha:

FlashColor = RenderedColorOnBlack

Now that we know FlashColor and FlashAlpha for the overlay, we can copy it into a texture and render the scene!