The Story of Haskell at IMVU

An interesting blog post and its corresponding comment thread on Hacker News made me realize that, while I’d told the story of Haskell at IMVU to many people, I’d never written it down. :)

IMVU’s backend had been written in PHP from the beginning. Startup code is always gross, but after years of learning PHP’s ins and outs, we bent it to our will. Part of me looks back and says “It wasn’t that bad”, and indeed, it has a lot of great properties. But then I remember the horror.

The terrible straight-line performance, the lack of concurrent IO, the insane memory usage… Once the codebase reached about a million lines, the latency on our REST service response times was over half a second and in the worst case several seconds. For some types of services, that’s okay, but IMVU strives to be a real-time, delightful experience, and service latency on interactive things like dressing up, browsing the catalog, sending messages to your friends, and shopping is really important.

Some of the performance-minded of us tried to get PHP to be fast, but it was basically intractable, and we couldn’t afford to fork PHP with a high-performance JIT like Facebook ended up doing years later.

The combination of PHP’s awful performance, terrifying pitfalls, and the fact that millions of lines of code in a dynamic language (even with copious test coverage) makes it really hard to refactor, led us to evaluate alternatives. Someone suggested Java (which would have been totally fine in hindsight) but the infrastructure team decided to adopt this newfangled Node.js thing. Node sucked. This predated ES6, promises, and the rich ecosystem of npm, so we spent far too long rebuilding basic, basic, infrastructure (like propagating stack traces across async operations) in Node that we already had in PHP, and it still wasn’t as good. In the end we decided to drop Node and stick with PHP.

Fast forward a year or two…

There were some people at IMVU who had dabbled with Haskell, but it always seemed too idealistic and not very practical. In fact, one weekend, Andy went home with the explicit goal of proving that Haskell was stupid and not good for writing production software.

He came back on Monday glowing. “Haskell is perfect for web services! We should use it! It’s fast and safe and concurrent!” We smiled and nodded (sometimes Andy gets enthusiastic about new things) and went on with our daily work. But in the evenings and weekends he plugged away, and built a prototype web service that was 1) dramatically faster than the corresponding PHP services 2) actually quite short in implementation and 3) surprisingly clear. Huh, maybe this Haskell thing did have legs.

OK, so let’s consider the possibility that Haskell could be something IMVU could adopt. How we would vet it?

We started by, during the next hack week, forming a small team of everyday engineers to learn Haskell and work on a feature. We learned two things: teaching Haskell was a nontrivial problem. BUT, Haskell newcomers hacking on the codebase only caused one regression, and that regression actually exposed a bug in some of our infrastructure. Well that’s pretty cool. For years, IMVU had a tradition of throwing new engineers into the deep end, after which they’d invariably take down the site, and then we’d demonstrate that we’re egoless and all failures are an opportunity to improve our process with a postmortem. What if Haskell could help us avoid some of these problems?

Over time we slowly ramped up Haskell within the company. At first we had growing pains. The infrastructure in our PHP stack was extremely mature and well-baked. The corresponding infrastructure in our Haskell (error reporting, DB access, unit testing) was not. It took us a while to sort out exactly what our Haskell best practices would be, and by that point a few people started to get a negative vibe about Haskell. But we kept pushing forward.

(By the way, I keep saying “we”, but at this point I hadn’t written any Haskell yet, and was not particularly for or against it, even though I did think PHP was terrible.)

By this point we’d shipped at least two sets of mobile services on Haskell, and many people considered it a success. However… Tensions about Haskell had really started to heat up, with some people saying things like “I never want to even look at Haskell again” or “at this stage in my career I don’t want another programming language” or “I think we’re making a horrible mistake by building another stack, especially in Haskell.” People were complaining to their managers, and the managers were coming to me. Some people loved Haskell, some people hated it, but very few were in the middle.

As a neutral third party, and as someone with a lot of respect in the organization, the engineering managers sent me out to gather ground truth. I took my little black notebook and a pen and interviewed something like half the engineering team on their opinions on Haskell, whether they were fans or not. I spent an hour with each person, asking what they liked, what they didn’t like, what they thought we should do, etc.

I then collated all the feedback and presented back to the engineering managers and leadership team (of which I was a part too, but since I had no direct reports and had no experience with Haskell at IMVU, I was fairly unbiased). I don’t have my report handy, but it went approximately like this:

75% of the people I interviewed were positive about or fine with Haskell.

25% of the people had major concerns. The concerns, in roughly decreasing priority, were:

  • Haskell strings suck
    • Specifically there’s a lot of cognitive overhead in switching between the five common string representations as needed
  • Syntactic concerns
    • Haskell’s syntax is iconoclastic: parens, $, /= instead of !=
    • some people in the Haskell community love point-free style
    • cognitive overhead in switching between pure and monadic style
    • let vs. where
  • concerns about a second backend stack
    • duplicate infrastructure across PHP and Haskell
  • concerns about it not being a mainstream language
    • teaching the whole organization
    • interaction between Haskell experts and novices sometimes leaves the novices feeling dumb or inadequate
    • bus factor to organization on key infrastructure
    • Haskell is less debuggable by ops engineers

The wins were:

  • GHC’s support for safe concurrent programming is world-class
  • Sound type safety led to refactoring being a breeze
  • Fun, stretches the mind (for some people)

Ultimately, we settled on a middle ground. We would not deprecate PHP, but we would continue to support Haskell as a technology stack. We had specific services in mind that needed high throughput and low latency (<100 ms) and we knew that PHP could not achieve those. That said, we respected the fact that several engineers, even those in leadership roles, put their foot down and said they didn’t want to touch Haskell. (From my perspective it was kind of a weird experience. Highly-skilled people that I respect just got really angry and irrational and basically refused to even try to learn Haskell, which resulted in a culture divide in the company.)

In the end, I never did manage to convince some of my peers to try it. That said, there were many engineers who loved Haskell and we built a great deal of critical infrastructure in it. The Haskell infrastructure basically hums like a well-oiled machine. Perfectly reliable unit tests. Concurrent batched IO. Abstractions for things like timeouts and running tasks on other threads while correctly bubbling failures. Fantastic support for safely encoding and decoding JSON. One of the best benchmarking tools I’ve ever used.

Since one of the biggest concerns about Haskell was education, I taught a sequence of four or five classes for the engineering team with my take on how it should be taught. Traditional Haskell materials tend to talk about laziness and currying and all this FP junk that is certainly interesting and different, but mostly irrelevant to people trying to come in hot and get the job done. Instead, I focused on specifics first, and then generalizations.

“You know how you can add two integers and get a new integer back? Well, you can also add two floating point numbers. Int and Float are instances of Num.”

“You know how you can concatenate strings? And concatenating with the empty string gives you the same value? It turns out many things satisfy these properties. They’re called Monoids.”

“You know how you can map across a list? We do it all the time in PHP or Python. Well you can also map across Maybe. Think of Maybe as a list that can have either zero or one element. Things that can be mapped are called Functors.”

etc. etc.

We started with specific examples of how to do stuff with Haskell, then talked about type classes, and the final optional class was about how Haskell is compiled down to the machine. You’d have to ask the attendees how much they remember, but the classes were well-reviewed at the time. :)

In summary, I definitely wouldn’t say that Haskell was a failure. GHC is an amazing piece of technology, and IMVU built infrastructure in it that wouldn’t have been possible in PHP. I also can’t say Haskell was a resounding success. I was truly disturbed by the culture drama that it brought. And, frankly, saddened and confused by how some people closed their minds. Personally, I would have happily traded a lot of Haskell’s safety for avoiding drama. I sometimes wonder what would have happened had we chosen Java instead of Node.js back in 2010. We could have hit our performance goals in Java too.

Oh well. Many of us still love Haskell. It’s not going away. I left IMVU last year (needed some more experience elsewhere) but I personally miss Haskell a great deal. If I were starting a new project with people that I trusted, I’d definitely reach for Haskell first. The expressiveness, brevity, and safety let you move extremely quickly, and Haskell is definitely mature enough for production use.

Dropbox Hack Week: GraphQL Server in Haskell

Last week was hack week at Dropbox. I took the opportunity to explore the implementation of a GraphQL server that does optimal IO batching and concurrency.

Serial IO

A common performance problem in the implementation of web services is serial IO. Let’s say you have a service that returns the names of all of your friends. It’s easy and natural to implement it like this:

me = fetch_user_info(my_user_id)
friend_names = []
for friend_id in me.friends:
  friend_names.append(fetch_user_info(friend_id).name)
return friend_names

The problem is that each user info lookup for my friends occurs sequentially. Even with TCP connection pooling, that’s still a packet round-trip in the datacenter. “But that should be faster than 10 ms right?” Even if it is, it doesn’t take many friends to blow your performance budget and send your response time across the threshold of perceptible delay.

Moreover, this problem compounds on itself. If your inner functions have serial IO and you call them in a loop, you’ve just added potentially thousands of round-trips to your backend data sources. I would hazard a guess and say serial IO is the largest contributor to service response latencies.

Manually Batched IO

One solution is to always batch IO. Every function takes a list and returns a list. This can be made to work (indeed, I’ve achieved excellent performance by carefully, manually, batching) but doesn’t compose as your services scale. Sometimes it’s just too hard to know all of the data you will need to fetch, and the dependencies between that data.

OK, so serial IO is bad. More on that in a bit. Let’s talk about REST now.

REST

At IMVU, we went all-in on REST, building a rather substantial framework to make it easy to create REST services. We anticipated the fact that REST services tend to require many round-trips from clients, so we built a response denormalization system, where the service can anticipate the other services a client will need to hit and include their responses too.

This sounded great at first, and in fact was mostly an improvement from the prior state of affairs. But, at scale, REST has some challenging performance problems. For one, REST services have a defined schema, and they always return the data they advertise. As I mentioned, if a service doesn’t return all the data a client needs, the client needs to hit more services, increasing the number of client-to-server round-trips. In addition, because the service always returns the same set of data, it must query the backend database to fetch more data than the client even cares about.

This phenomenon tends to happen most frequently with core domain objects like users. Because users are so important, they accumulate relationships with so many pieces of data (e.g. list of subscribed experiments, contact lists, shopping carts, etc.), almost all of which is irrelevant to most clients.

Why GraphQL?

This is where GraphQL comes in. In GraphQL, the client specifies the data that it wants. There is only one request, even if the data is deeply nested, and the server only has to fetch exactly what’s needed.

Consider the query:

query HeroNameQuery {
    newhope_hero: hero(episode: NEWHOPE) {
        name
    }
    empire_hero: hero(episode: EMPIRE) {
        name
    }
    jedi_hero: hero(episode: JEDI) {
        name
    }
}

It looks up the hero of each of the first three Star Wars movies, fetches any information it needs from the backend, and returns only what is requested:

"data": {
    "HeroNameQuery": {
        "jedi_hero": {
            "name": "R2-D2"
        },
        "empire_hero": {
            "name": "Luke Skywalker"
        },
        "newhope_hero": {
            "name": "R2-D2"
        }
    }
}

There are GraphQL implementations for many languages but many of them don’t solve the serial IO problem I described to start this post. In fact, a naive GraphQL implementation might issue IO per field of each object requested.

For hack week, I wanted to explore the design of a GraphQL server that issued all of its backend IO in optimally concurrent batches.

Why Haskell?

Dropbox doesn’t use Haskell, but I find it to be a great language for exploring design spaces, particularly around execution models. Also, Facebook open sourced their excellent Haxl library which converts code written with serial IO into efficient batched requests. Haxl provides an effect type that, when it can understand that two data fetches are independent, runs them both in parallel. When all Haxl operations are blocked on backend data fetches, only then does it issue the backends. My prototype GraphQL resolvers are surprisingly naive, specified with sequential code. Haxl automatically batches up the requests and hands them to the DataSource for execution.

In addition, there is nothing clever about the GraphQL request handler or graph traversal and filtering — all cleverness is handled by Haxl.

At this point, you might be thinking “So was anything challenging about this project?” On the data fetch side, no, not really. :) However, I did run into one unexpected snag when using Haxl for data updates: because Haxl tries really hard — and in a general, composable way — to run your IO in parallel, you must be careful about how writes and reads are sequenced together. If you leave out the () <- on line 105, Haskell sequences the operations with >> instead of >>=, and Haxl’s >> uses the Applicative bind operation instead of the Monad bind operation, and thus it assumes it can run them in parallel. And, as you might expect, issuing a write and read to the same data concurrently doesn’t end well. :)

Conclusions

I am very thankful for jdnavarro’s excellent GraphQL query parser. With it, in four days, I was able to get a prototype GraphQL server up and running. Using Haxl and Hedis, I have it hooked up to a Redis data source, and it correctly batches all independent IO reads and runs them concurrently.

The Star Wars hero names query above results in two batched backend requests:

fetch star wars batch of size 3:
["FetchEpisode(Jedi)","FetchEpisode(Empire)","FetchEpisode(NewHope)"]
fetch star wars batch of size 2:
["FetchCharacter(1000)","FetchCharacter(2001)"]

You can even see that it noticed that R2-D2 is the hero of both movies, and only requested its info once.

The performance is pretty good: on some AWS instances, I measured about 3500 queries per second per machine and a query latency averaging 3 ms. Of course, that could worsen as permissions checks and so on are implemented. On the other hand, the code is completely unoptimized, full of lazy data structures and an unoptimized parser without a parser cache.

The prototype code is open sourced on the Dropbox GitHub.

It’s probably possible to build something like Haxl in Python with generators, but you’d have to give up standard loops and list comprehensions, instead using some kind of parallel map operation. You also would not benefit from anything that teases concurrency out of imperative functions like GHC’s upcoming ApplicativeDo extension. There are some things that Haskell’s restricted effects are uniquely good at. :)

I’d guess it would probably be even trickier to do a good implementation in Go given that the programmer has less control over goroutines than Python’s generators and Monads in Haskell. That said, perhaps someone will discover a clean, optimal implementation. We should pay attention to efforts like this.

I think GraphQL will be a huge deal for high-performance web services. It’s harder to implement than REST or traditional RPC services, but the reduction in response latencies could be substantial. Naive implementations aren’t going to have good performance, but if you are interested in aiming straight at the finish line, Haskell and Haxl will give you a head start.

Announcing BufferBuilder: Encode JSON in Haskell 2-5x faster than Aeson

Conclusion

This tale is a little long and detailed, so for those of you who want immediate payoff…

Andy and I wrote a Haskell library called Data.BufferBuilder (including Data.BufferBuilder.Utf8 and Data.BufferBuilder.Json) which makes it easy to safely and efficiently encode Haskell data structures into JSON. In our benchmarks, using Data.BufferBuilder.Json to encode JSON is 4-5x as fast as using Aeson.

Even if you’re already using Aeson, you can benefit somewhat from BufferBuilder’s improved encoding performance. The buffer-builder-aeson package adds a ToJson instance for Aeson’s Value type, which our benchmarks show is 50% to 100% faster than Aeson’s built-in encoder. All you need to do is call Data.BufferBuilder.Json.encodeJson instead of Data.Aeson.encode!

Why did we build BufferBuilder?

Some of IMVU’s backend services are written in Haskell. While Haskell is incredible for many use cases, we ran into an unexpected bottleneck: JSON encoding. Our service response structure produces quite a lot of JSON, and much of that JSON contains URLs encoded into JSON strings.

Amazingly, URL and JSON encoding was showing up as a significant cost center when generating JSON responses. Some services spent over a second encoding JSON!

When faced with a performance problem, my first instinct is to either pound the problem into the ground or unmake the problem so the code isn’t necessary in the first place.

So let’s look at the problem holistically:

  • A JSON response is represented in memory as a collection of varied, in-memory data structures. The response happens to contain many URLs — sometimes more than a hundred.
  • URLs are represented by a data structure consisting of the protocol, hostname, path segments, query string, and so on.
  • Each URL becomes a JSON string.
  • Each in-memory data structure is converted to a JSON object whose properties depend on the type on the corresponding data structure.

Using Aeson to encode all of this results in the following steps:

  • ToJSON instances convert Haskell data types to an AST of Aeson Values.
  • The keys of an Aeson object are Text values. In memory, Text is encoded in UTF-16. Thus, URLs must be translated from their in-memory representation (ours is ASCII) into UTF-16 before they fit into the Aeson AST.
  • Then, the entity bodies are converted into JSON objects, where the keys are known at compile-time, but must be encoded as UTF-16.
  • The entire AST is encoded into a Lazy Text Builder.
  • Then the Lazy Text is encoded into a Lazy ByteString containing UTF-8.
  • Then, to generate the response body, the Lazy ByteString is concatenated into a single strict ByteString.

That’s a lot of steps! To summarize:

(URLs and Custom Data Types) -> Aeson AST -> Lazy Text -> Lazy UTF-8 ByteString -> Strict ByteString

The Design of BufferBuilder

This is where Andy and I sat down to create an API to let us cleanly express JSON encoding without sacrificing type safety OR performance.

We know that a fast, if not the fastest, way to build up a buffer of bytes is to allocate a chunk of memory, stream writes to it, and either chunk or realloc() as needed. Obviously, this kind of code can be trivially expressed in C:

void buffer_append(buffer* b, const char* s, size_t length) {
    if (!b->has_room(length)) {
        b->grow(length);
    }
    memcpy(b->data + b->size, s, length);
}

Because I’d been told that bouncing between Haskell and C with the foreign function interface can be slow, my first approach was to attempt to build a Haskell monad that grabbed the RealWorld token out of IO (IO a is basically a newtype around RealWorld -> (RealWorld, a)), augmented it with some extra “local variables” like the output ptr, capacity, and current size, and manually implemented allocation and memory writes with GHC.Prim APIs. GHC did not like this at all. The generated code ran 20 times slower than naive usage of Data.ByteString.Builder. Nonetheless, it was an interesting technique, so maybe I’ll write about it another time.

Surely it was possible to do better. So I tried the foreign function interface after all.

I wrote a tiny C API that allocated the underlying growable buffer. It provided APIs for appending bytes, buffers, UTF-16-to-UTF-8 transcoding, and so on. These FFI calls can only happen within IO actions, but building buffers is fundamentally a pure operation, and the provided Haskell API should be effectively pure. The solution is to offer a restricted state monad like ST which limits the operations within the monad to safe buffer-building operations.

This approach was by the fastest of any that I tried. In fact, in a sequence of appendBS operations, if the arguments are known-strict ByteStrings, GHC will compile the appendBS sequence directly into a series of cheap C function calls. For example, the following code:

data BSTriple = BSTriple !ByteString !ByteString !ByteString

writeBSTriple :: BSTriple -> BufferBuilder ()
writeBSTriple !(BSTriple a b c) = do
    appendBS a
    appendBS b
    appendBS c

compiles into something like:


movq %rbx,%rdi
movq 120(%rsp),%rax
movq %rax,%rsi
movq 96(%rsp),%rax
movq %rax,%rdx
movq 112(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp
movq %rbx,%rdi
movq 152(%rsp),%rax
movq %rax,%rsi
movq 128(%rsp),%rax
movq %rax,%rdx
movq 144(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp
movq %rbx,%rdi
movq 216(%rsp),%rax
movq %rax,%rsi
movq 160(%rsp),%rax
movq %rax,%rdx
movq 208(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp

Obviously GHC’s code generator has some room to improve, but the main thrust is exactly right. An equivalent C++ API would generate much the same kind of code, modulo the poor instruction and register selection, which overall doesn’t matter too much here.

In almost all situations, Haskell isn’t going to be as fast as straightforward C, but with some work and appropriate use of FFI, it’s possible to come close.

Data.BufferBuilder.Utf8

Once we had an API to safely and efficiently build up buffers of bytes, we wanted to build safe APIs on top for constructing valid UTF-8 buffers and valid JSON.

Utf8Builder is a newtype around BufferBuilder with a restricted API. If you only call safe functions in Data.BufferBuilder.Utf8, the result is guaranteed to be valid UTF-8. Unsafe functions are provided for when you know precisely what you’re doing.

Data.BufferBuilder.Json

Data.BufferBuilder.Json is built on top of Data.BufferBuilder.Utf8. Data.BufferBuilder.Json’s Value type is a newtype around Utf8Builder, meaning there’s no Aeson-style AST. Each Value simply knows how to write itself into an output buffer. Just like how the safe Utf8Builder functions guarantee the output is legal UTF-8, the safe JsonBuilder functions guarantee (almost) that the output is a legal JSON document. (There are a couple caveats, see the documentation for details.)

I suspect Data.BufferBuilder.Json is approaching the limit of how fast a JSON encoder can be. And thanks to the beauty of Haskell, it’s convenient and safe!

If you’re using Aeson and encoding performance matters to you, give BufferBuilder a shot!