The Story of Haskell at IMVU

An interesting blog post and its corresponding comment thread on Hacker News made me realize that, while I’d told the story of Haskell at IMVU to many people, I’d never written it down. :)

IMVU’s backend had been written in PHP from the beginning. Startup code is always gross, but after years of learning PHP’s ins and outs, we bent it to our will. Part of me looks back and says “It wasn’t that bad”, and indeed, it has a lot of great properties. But then I remember the horror.

The terrible straight-line performance, the lack of concurrent IO, the insane memory usage… Once the codebase reached about a million lines, the latency on our REST service response times was over half a second and in the worst case several seconds. For some types of services, that’s okay, but IMVU strives to be a real-time, delightful experience, and service latency on interactive things like dressing up, browsing the catalog, sending messages to your friends, and shopping is really important.

Some of the performance-minded of us tried to get PHP to be fast, but it was basically intractable, and we couldn’t afford to fork PHP with a high-performance JIT like Facebook ended up doing years later.

The combination of PHP’s awful performance, terrifying pitfalls, and the fact that millions of lines of code in a dynamic language (even with copious test coverage) makes it really hard to refactor, led us to evaluate alternatives. Someone suggested Java (which would have been totally fine in hindsight) but the infrastructure team decided to adopt this newfangled Node.js thing. Node sucked. This predated ES6, promises, and the rich ecosystem of npm, so we spent far too long rebuilding basic, basic, infrastructure (like propagating stack traces across async operations) in Node that we already had in PHP, and it still wasn’t as good. In the end we decided to drop Node and stick with PHP.

Fast forward a year or two…

There were some people at IMVU who had dabbled with Haskell, but it always seemed too idealistic and not very practical. In fact, one weekend, Andy went home with the explicit goal of proving that Haskell was stupid and not good for writing production software.

He came back on Monday glowing. “Haskell is perfect for web services! We should use it! It’s fast and safe and concurrent!” We smiled and nodded (sometimes Andy gets enthusiastic about new things) and went on with our daily work. But in the evenings and weekends he plugged away, and built a prototype web service that was 1) dramatically faster than the corresponding PHP services 2) actually quite short in implementation and 3) surprisingly clear. Huh, maybe this Haskell thing did have legs.

OK, so let’s consider the possibility that Haskell could be something IMVU could adopt. How we would vet it?

We started by, during the next hack week, forming a small team of everyday engineers to learn Haskell and work on a feature. We learned two things: teaching Haskell was a nontrivial problem. BUT, Haskell newcomers hacking on the codebase only caused one regression, and that regression actually exposed a bug in some of our infrastructure. Well that’s pretty cool. For years, IMVU had a tradition of throwing new engineers into the deep end, after which they’d invariably take down the site, and then we’d demonstrate that we’re egoless and all failures are an opportunity to improve our process with a postmortem. What if Haskell could help us avoid some of these problems?

Over time we slowly ramped up Haskell within the company. At first we had growing pains. The infrastructure in our PHP stack was extremely mature and well-baked. The corresponding infrastructure in our Haskell (error reporting, DB access, unit testing) was not. It took us a while to sort out exactly what our Haskell best practices would be, and by that point a few people started to get a negative vibe about Haskell. But we kept pushing forward.

(By the way, I keep saying “we”, but at this point I hadn’t written any Haskell yet, and was not particularly for or against it, even though I did think PHP was terrible.)

By this point we’d shipped at least two sets of mobile services on Haskell, and many people considered it a success. However… Tensions about Haskell had really started to heat up, with some people saying things like “I never want to even look at Haskell again” or “at this stage in my career I don’t want another programming language” or “I think we’re making a horrible mistake by building another stack, especially in Haskell.” People were complaining to their managers, and the managers were coming to me. Some people loved Haskell, some people hated it, but very few were in the middle.

As a neutral third party, and as someone with a lot of respect in the organization, the engineering managers sent me out to gather ground truth. I took my little black notebook and a pen and interviewed something like half the engineering team on their opinions on Haskell, whether they were fans or not. I spent an hour with each person, asking what they liked, what they didn’t like, what they thought we should do, etc.

I then collated all the feedback and presented back to the engineering managers and leadership team (of which I was a part too, but since I had no direct reports and had no experience with Haskell at IMVU, I was fairly unbiased). I don’t have my report handy, but it went approximately like this:

75% of the people I interviewed were positive about or fine with Haskell.

25% of the people had major concerns. The concerns, in roughly decreasing priority, were:

  • Haskell strings suck
    • Specifically there’s a lot of cognitive overhead in switching between the five common string representations as needed
  • Syntactic concerns
    • Haskell’s syntax is iconoclastic: parens, $, /= instead of !=
    • some people in the Haskell community love point-free style
    • cognitive overhead in switching between pure and monadic style
    • let vs. where
  • concerns about a second backend stack
    • duplicate infrastructure across PHP and Haskell
  • concerns about it not being a mainstream language
    • teaching the whole organization
    • interaction between Haskell experts and novices sometimes leaves the novices feeling dumb or inadequate
    • bus factor to organization on key infrastructure
    • Haskell is less debuggable by ops engineers

The wins were:

  • GHC’s support for safe concurrent programming is world-class
  • Sound type safety led to refactoring being a breeze
  • Fun, stretches the mind (for some people)

Ultimately, we settled on a middle ground. We would not deprecate PHP, but we would continue to support Haskell as a technology stack. We had specific services in mind that needed high throughput and low latency (<100 ms) and we knew that PHP could not achieve those. That said, we respected the fact that several engineers, even those in leadership roles, put their foot down and said they didn’t want to touch Haskell. (From my perspective it was kind of a weird experience. Highly-skilled people that I respect just got really angry and irrational and basically refused to even try to learn Haskell, which resulted in a culture divide in the company.)

In the end, I never did manage to convince some of my peers to try it. That said, there were many engineers who loved Haskell and we built a great deal of critical infrastructure in it. The Haskell infrastructure basically hums like a well-oiled machine. Perfectly reliable unit tests. Concurrent batched IO. Abstractions for things like timeouts and running tasks on other threads while correctly bubbling failures. Fantastic support for safely encoding and decoding JSON. One of the best benchmarking tools I’ve ever used.

Since one of the biggest concerns about Haskell was education, I taught a sequence of four or five classes for the engineering team with my take on how it should be taught. Traditional Haskell materials tend to talk about laziness and currying and all this FP junk that is certainly interesting and different, but mostly irrelevant to people trying to come in hot and get the job done. Instead, I focused on specifics first, and then generalizations.

“You know how you can add two integers and get a new integer back? Well, you can also add two floating point numbers. Int and Float are instances of Num.”

“You know how you can concatenate strings? And concatenating with the empty string gives you the same value? It turns out many things satisfy these properties. They’re called Monoids.”

“You know how you can map across a list? We do it all the time in PHP or Python. Well you can also map across Maybe. Think of Maybe as a list that can have either zero or one element. Things that can be mapped are called Functors.”

etc. etc.

We started with specific examples of how to do stuff with Haskell, then talked about type classes, and the final optional class was about how Haskell is compiled down to the machine. You’d have to ask the attendees how much they remember, but the classes were well-reviewed at the time. :)

In summary, I definitely wouldn’t say that Haskell was a failure. GHC is an amazing piece of technology, and IMVU built infrastructure in it that wouldn’t have been possible in PHP. I also can’t say Haskell was a resounding success. I was truly disturbed by the culture drama that it brought. And, frankly, saddened and confused by how some people closed their minds. Personally, I would have happily traded a lot of Haskell’s safety for avoiding drama. I sometimes wonder what would have happened had we chosen Java instead of Node.js back in 2010. We could have hit our performance goals in Java too.

Oh well. Many of us still love Haskell. It’s not going away. I left IMVU last year (needed some more experience elsewhere) but I personally miss Haskell a great deal. If I were starting a new project with people that I trusted, I’d definitely reach for Haskell first. The expressiveness, brevity, and safety let you move extremely quickly, and Haskell is definitely mature enough for production use.

Designing Crux – Import Statements

Manipulating a module’s import list is a high-frequency activity in modern programming languages. Sadly, few popular languages perfect the usability of the import list.

To do a good job in Crux, let’s start from the use cases, roughly sorted by importance:

  • import a module and refer to it by a qualified import name
  • import a list of names unqualified into this module’s scope
  • import everything unqualified into this module’s scope
  • import a module solely for its side effects, without importing any symbols
  • visually scan the import list
  • sort the import list (with Emacs’s sort-lines or the like)
  • and, of course, tools must easily parse the import list

Designing for usability means optimizing for the most common use cases, while making sure the less common use cases are still possible.

The most common use case is to import a module and refer to it qualified. Consider this hypothetical syntax:

import fs.path
path.combine("/usr", "local")

The qualified import reference (here, path) defaults to the last segment of the module name. Now, for this to work well, the basename has to be short, distinct, and meaningful.

Haskell gets this wrong.

import qualified Data.ByteString as BS

From left to right: import qualified is long to start with. Then you have most packages in some kind of arbitary namespace like Data. Then, since ByteString is too long to be convenient (especially with the mixed case), people typically name the import BS. Typically being the key word. Sometimes people name it Bytes. Sometimes ByteString. Sometimes B. Sometimes Data.ByteString.Char8 is imported as BS. The fact that the programmer has to make a choice here naturally leads to inconsistency.

Go does a great job here. In Go, you import like this:

import "fs/path"

Afterwards, path functions can be accessed as path.Function. Short and meaningful.

Go goes even further and provides guidelines for naming functions in the context of a module. Conventions tend to be copied, so by setting a good standard early on, Go positively affected every project built then on.

ES6 modules are also pretty interesting. They are statically analyzable and syntactically lightweight, but they have a fairly significant problem at scale: sorting imports.

import Dispatcher from 'flux/dispatcher';
import {foo, bazzle} from 'app/utils';

The problem with having the list of imported symbols first is that they can’t easily be machine-sorted. Every ES6 project I’ve worked on has encouraged its engineers to sort the imports by hand. Also, the import list coming first gets annoying when import lists are long.

import {
  foo, bar, baz,
  qux, harf, barf,
  hurp, floop
} from 'my_module';

With Crux, we learned from all of the above import systems and came up with something that meets every use case. Here is the proposed syntax:

import {
  bytes,
  flux.dispatcher,
  my_encoding_system as mes,
  imported_only_for_side_effects as _,
  utils(foo, bar, baz),
  more_utils(
    foo2,
    bar2,
    baz2,
  ),
  all_the_utils(...), // import everything into scope
}

The easiest import syntax is most common one, but other types of imports, like renaming the import or importing some symbols unqualified, are straightforward too.

This directly satisfies all of our use cases!

We may change our minds and make the commas after each import optional in the future.

Side Note: Implicit vs. Explicit Imports

OCaml and Java allow qualifying names at their use sites, making import statements unnecessary. Import statements are syntax sugar at that point. I don’t have a strong justification here, but Crux went with explicit import statements (like Go, Python, Haskell, ES6) because it’s easier for humans and machines to see a module’s dependencies at a glance.

Two Kinds of Servers

Occasionally I overhear (or get sucked into) an argument that goes like this:

“Go is the best language for writing servers because X.”

“No, you neanderthal, Haskell is, because Y.”

“Well I’ve never needed Y! And Haskell is too out of touch with real problems.”

“But how can you tolerate such a limited and poorly-designed tool?”

“Well, I like to use it and I get stuff done, so what do you care?”

Rarely do these types of discussions change anyone’s mind. :)

People will argue all day long in a vacuum about what languages is best, but languages are tools and proper evaluation of a tool must be tied to a concrete context. Thus, the subject at hand: there are two types of servers. Specialized single-purpose servers differ from general business application servers.

Single-Purpose Infrastructure

Single-purpose infrastructure, like databases, request proxies, and messaging systems emphasize latency, throughput, reliability, and efficient use of hardware. Some examples are Redis, MySQL query proxies, and nginx. The choice of programming language matters to the extent that it facilitates the performance and reliability goals of the infrastructure. For these types of servers, Go is actually a great language. With Go, you get performance comparable to the JVM, a bunch of high-quality libraries, and the ability to plop your static binary onto any machine or container.

I’ll share an example from IMVU. IMVU’s primary content delivery system for all user-generated content, including 3D assets and user photos, was served by a pool of Perlbal proxies. This pool of machines served some of the highest traffic for the whole business, but the performance was not good and the code inside Perlbal was hard to maintain. Normally we wouldn’t have cared about the code except that we had discovered some race conditions in Perlbal. One of my coworkers finally got fed up and, within a couple weeks, replaced this entire part of our system with a replacement written in Go. Two machines (hooked up to 10g switches of course) running the Go implementation could serve all of the traffic previously allocated to the entire Perlbal pool. (We ended up running more instances for redundancy, however.) Not only was the new code faster, it was dramatically more maintainable. This was a big success story for Go.

Special-purpose servers usually have only a few skilled authors, clear goals, and eventually reach a point where they’re “done” and further maintenance is increasingly rare. (Though they might eventually be replaced with new infrastructure.)

Application Servers

On the other hand, every company has that server codebase with all the logic that makes the business go. You know the one. Where the code dates back to the founding of the company, and the programming language is whatever the founders liked at the time. :) It’s had hundreds of contributors from many backgrounds. Perhaps the original authors are no longer with the company. The needs of the code have evolved over time with the business, and after about four or five years it’s a person’s or team’s full-time job to manage large-scale refactoring.

I call these application servers, but I’ve heard the term “frontend server” used before too. Performance is a concern, but straight-line code execution is often less of an issue than efficient IO to the backend services.

Application servers are where the features live, and therefore they evolve with the needs of the business. The ability to define safe abstractions and refactor the code is more important than with special-purpose servers. Programming safety is also critical. You wouldn’t want somebody’s hack week feature to turn into a buffer overflow, runtime exception, corrupt data, or security vulnerability. Thus, type systems (or at least runtime invariant enforcement in languages like Python or PHP) are critical in application servers.

Go does not make a good language for application servers. The lack of immutable data makes it really awkward to enforce invariants. The lack of generics makes it awkward to express IO unless you generate code for each entity type in your database. Complicated arrangements of goroutines and channels are necessary to parallelize backend IO. Data structure traversals must be manually spelled out every time. Haskell, on the other hand is brilliant for application servers (assuming your team is comfortable with the learning curve). Libraries like Haxl allow specification of business rules in natural imperative style, while automatically extracting IO parallelism from the structure of the code. In Go, timeouts must be implemented manually every time, whereas in Haskell there is a single, always-correct timeout function. Go also doesn’t have anything like Haskell’s mapConcurrently, and while it can be done manually, it’s tricky to remain correct in the presence of unhandled errors. (In a high-reliability system, you probably want to transfer a runtime error like division by zero or null pointer dereference to your request handler’s error handler so it can be logged appropriately, rather than shutting down your whole server.) With Haskell, you can enforce 100% reliable unit tests or that transactions are never nested.

Even pre-generics Java was better than Go at defining invariants – with final, you could feel confident that a value is never changed after it’s initialized. This entire post by Dave Cheney could have been replaced with one sentence: “Too bad Go doesn’t support constant values in general.”

I’m not saying Go can’t be made to work for application servers, nor am I saying Haskell is necessarily the answer. Haskell has relatively poor usability characteristics, and I may write more about that later. But when you are trying to pick a language for a server, make sure you understand how it’s likely to evolve over time. A small group of people solving a focused problem is a completely different beast from hundreds of people hacking on business logic across years, so choose your language appropriately. :)

Dropbox Hack Week: GraphQL Server in Haskell

Last week was hack week at Dropbox. I took the opportunity to explore the implementation of a GraphQL server that does optimal IO batching and concurrency.

Serial IO

A common performance problem in the implementation of web services is serial IO. Let’s say you have a service that returns the names of all of your friends. It’s easy and natural to implement it like this:

me = fetch_user_info(my_user_id)
friend_names = []
for friend_id in me.friends:
  friend_names.append(fetch_user_info(friend_id).name)
return friend_names

The problem is that each user info lookup for my friends occurs sequentially. Even with TCP connection pooling, that’s still a packet round-trip in the datacenter. “But that should be faster than 10 ms right?” Even if it is, it doesn’t take many friends to blow your performance budget and send your response time across the threshold of perceptible delay.

Moreover, this problem compounds on itself. If your inner functions have serial IO and you call them in a loop, you’ve just added potentially thousands of round-trips to your backend data sources. I would hazard a guess and say serial IO is the largest contributor to service response latencies.

Manually Batched IO

One solution is to always batch IO. Every function takes a list and returns a list. This can be made to work (indeed, I’ve achieved excellent performance by carefully, manually, batching) but doesn’t compose as your services scale. Sometimes it’s just too hard to know all of the data you will need to fetch, and the dependencies between that data.

OK, so serial IO is bad. More on that in a bit. Let’s talk about REST now.

REST

At IMVU, we went all-in on REST, building a rather substantial framework to make it easy to create REST services. We anticipated the fact that REST services tend to require many round-trips from clients, so we built a response denormalization system, where the service can anticipate the other services a client will need to hit and include their responses too.

This sounded great at first, and in fact was mostly an improvement from the prior state of affairs. But, at scale, REST has some challenging performance problems. For one, REST services have a defined schema, and they always return the data they advertise. As I mentioned, if a service doesn’t return all the data a client needs, the client needs to hit more services, increasing the number of client-to-server round-trips. In addition, because the service always returns the same set of data, it must query the backend database to fetch more data than the client even cares about.

This phenomenon tends to happen most frequently with core domain objects like users. Because users are so important, they accumulate relationships with so many pieces of data (e.g. list of subscribed experiments, contact lists, shopping carts, etc.), almost all of which is irrelevant to most clients.

Why GraphQL?

This is where GraphQL comes in. In GraphQL, the client specifies the data that it wants. There is only one request, even if the data is deeply nested, and the server only has to fetch exactly what’s needed.

Consider the query:

query HeroNameQuery {
    newhope_hero: hero(episode: NEWHOPE) {
        name
    }
    empire_hero: hero(episode: EMPIRE) {
        name
    }
    jedi_hero: hero(episode: JEDI) {
        name
    }
}

It looks up the hero of each of the first three Star Wars movies, fetches any information it needs from the backend, and returns only what is requested:

"data": {
    "HeroNameQuery": {
        "jedi_hero": {
            "name": "R2-D2"
        },
        "empire_hero": {
            "name": "Luke Skywalker"
        },
        "newhope_hero": {
            "name": "R2-D2"
        }
    }
}

There are GraphQL implementations for many languages but many of them don’t solve the serial IO problem I described to start this post. In fact, a naive GraphQL implementation might issue IO per field of each object requested.

For hack week, I wanted to explore the design of a GraphQL server that issued all of its backend IO in optimally concurrent batches.

Why Haskell?

Dropbox doesn’t use Haskell, but I find it to be a great language for exploring design spaces, particularly around execution models. Also, Facebook open sourced their excellent Haxl library which converts code written with serial IO into efficient batched requests. Haxl provides an effect type that, when it can understand that two data fetches are independent, runs them both in parallel. When all Haxl operations are blocked on backend data fetches, only then does it issue the backends. My prototype GraphQL resolvers are surprisingly naive, specified with sequential code. Haxl automatically batches up the requests and hands them to the DataSource for execution.

In addition, there is nothing clever about the GraphQL request handler or graph traversal and filtering — all cleverness is handled by Haxl.

At this point, you might be thinking “So was anything challenging about this project?” On the data fetch side, no, not really. :) However, I did run into one unexpected snag when using Haxl for data updates: because Haxl tries really hard — and in a general, composable way — to run your IO in parallel, you must be careful about how writes and reads are sequenced together. If you leave out the () <- on line 105, Haskell sequences the operations with >> instead of >>=, and Haxl’s >> uses the Applicative bind operation instead of the Monad bind operation, and thus it assumes it can run them in parallel. And, as you might expect, issuing a write and read to the same data concurrently doesn’t end well. :)

Conclusions

I am very thankful for jdnavarro’s excellent GraphQL query parser. With it, in four days, I was able to get a prototype GraphQL server up and running. Using Haxl and Hedis, I have it hooked up to a Redis data source, and it correctly batches all independent IO reads and runs them concurrently.

The Star Wars hero names query above results in two batched backend requests:

fetch star wars batch of size 3:
["FetchEpisode(Jedi)","FetchEpisode(Empire)","FetchEpisode(NewHope)"]
fetch star wars batch of size 2:
["FetchCharacter(1000)","FetchCharacter(2001)"]

You can even see that it noticed that R2-D2 is the hero of both movies, and only requested its info once.

The performance is pretty good: on some AWS instances, I measured about 3500 queries per second per machine and a query latency averaging 3 ms. Of course, that could worsen as permissions checks and so on are implemented. On the other hand, the code is completely unoptimized, full of lazy data structures and an unoptimized parser without a parser cache.

The prototype code is open sourced on the Dropbox GitHub.

It’s probably possible to build something like Haxl in Python with generators, but you’d have to give up standard loops and list comprehensions, instead using some kind of parallel map operation. You also would not benefit from anything that teases concurrency out of imperative functions like GHC’s upcoming ApplicativeDo extension. There are some things that Haskell’s restricted effects are uniquely good at. :)

I’d guess it would probably be even trickier to do a good implementation in Go given that the programmer has less control over goroutines than Python’s generators and Monads in Haskell. That said, perhaps someone will discover a clean, optimal implementation. We should pay attention to efforts like this.

I think GraphQL will be a huge deal for high-performance web services. It’s harder to implement than REST or traditional RPC services, but the reduction in response latencies could be substantial. Naive implementations aren’t going to have good performance, but if you are interested in aiming straight at the finish line, Haskell and Haxl will give you a head start.

Designing DFPL – The Broad Strokes

Why not Python?

I love Python. It’s easy to learn, predictable, and works well for a wide variety of programs. Python’s the first language where I regularly found myself writing programs and having them work on the first try. The problem with Python is that it breaks down at scale.

Performance

In my experience, whenever a Python codebase reaches a couple hundred thousand lines of code, performance becomes an issue. It’s well-known that Python’s straight-line perf is not even close to C or even Java. In addition, Python programs don’t make efficient use of memory: objects are large and sparse — programs consist of a lot of pointer-chasing. The garbage collector’s not great, and startup and module loading times become an issue at scale, especially for short-lived processes like unit tests, build scripts, or desktop applications.

There are Python JIT compilers, but JITs don’t really help with startup time or memory usage.

Parallelism

Python has a GIL, meaning it’s hard to write programs that make efficient use of the hardware. It’s possible to move CPU-bound work into C, but any kind of coordination between Python and C has to ping-pong across the GIL, which becomes a central bottleneck.

Launching multiple processes can help, but each process pays Python’s base process memory usage overhead, and they can’t easily share in-memory caches.

As we’ve seen from Haskell and Go, the ability to have convenient, lightweight, efficient parallelism is a big deal.

Maintainability

I talked about this in The Long-Term Problem With Dynamically Typed Languages. As codebases get huge, you really want to be able to lean on the computer in order to safely make changes. Type systems are mini proof assistants for expressing guarantees like "I never mix up binary data with textual data" or "I always check the case that optional values are empty". Unit tests help, but are "proof by example". Type systems add a meaningful layer of confidence, allowing software to scale to larger teams and faster changes.

Why not Haskell/Go/Rust/Scala/F#/Whatever?

I agree 100% with Patrick Walton here:

There is room for some statically-typed language to occupy the network-services niche where performance needs to be good—maybe 2x slower than optimal (note that this is not acceptable for all network services)—and speed of development trumps most other considerations. Some sort of hybrid of Swift, Rust, and Go, perhaps, as none of them are 100% ideal for this space.

The dynamic language space is relatively saturated at this point. Rust is doing a great job in the native machine performance space. F# and Swift and, to some degree, Scala are excellent choices for the .NET, Apple, and JVM ecosystems, respectively.

But as of yet, there’s nothing great in either the web frontend space or the application server space.

Haskell’s laziness, curried functions, and nonobvious operational semantics are not suitable for frontend code. While it’s an otherwise perfect language for writing application servers, its syntax is tragically divisive, and I have no hope that Haskell as it stands today will become mainstream.

Go’s type system somehow manages to be both verbose and not expressive enough. Go is sufficient for single-purpose servers, but, from first-hand experience, the lack of generics really hurts in application servers. (I probably owe an essay on the difference between the two types of servers.) The Go community tends to rely on idioms, but it would be better if the language was general and expressive enough to allow direction expression of things like timeouts and cancellation. Also, Go is not actually memory-safe. Go is especially frustrating because, if it had, say, generics and tuples instead of the more limited multiple-return-values, it would be a truly excellent programming language.

Rust’s concurrency is not lightweight enough, and the syntax is extremely verbose.

Scala has a significant platform dependency on the JVM. Check out the size of some Scala.js programs. Also Scala’s type system is insanely complex.

F# brings along a large .NET dependency. It also suffers from the ML syntax. (Actually, it has two syntaxes.)

Maybe Swift is great, but it was only just open sourced, and I’d be worried about the Obj-C platform dependency. I’ve yet to write any real Swift programs, but what I’ve seen looks great.

All of these languages are lovely and have their uses, but none are particularly satisfactory replacements for Go or TypeScript/JavaScript. That is, lightweight code that can run at high performance in a browser and on a concurrent, scalable server.

But, hold on, what about js_of_ocaml? This is where things get interesting. Andy used js_of_ocaml for a while and found that, on paper, Ocaml has all the right properties: it’s safe, high-performance, compiles to very concise, efficient JavaScript, expressive, concurrent, established… In fact, it’s nearly perfect… if you can stomach the syntax, that is. As I mentioned previously, at this point I’m convinced that ML-family languages haven’t caught on largely for reasons of taste (or lack thereof — just look at the js_of_ocaml website’s font selection).

The Goals

So what are we trying to do with this thing?

The language doesn’t need anything new or innovative. ML gives us how bidirectional type inference, sum types with pattern matching, and soundness. Haskell gives us overloading (i.e. type classes, i.e. type-indexed values). Go shows us how important a solid experience is, both for the first time and in daily use. Python shows us how it’s possible to connect predictable syntax and solid, composable semantics.

Thus, this is largely a human factors project. Our goal is to see if we can marry all of those things into one language.

Our goals are to have

  • Memory safety.
  • Static typing without heavyweight type annotations everywhere.
  • Language features that are composable.
  • A minimal runtime, with no large platform dependency.
  • Obvious execution semantics. Like imperative languages, you should be able to look at a program and understand its runtime behavior.
  • Tight, clean compilation to JavaScript. Should be a suitable alternative to TypeScript or CoffeeScript.
  • A reasonable replacement for Go or Node on the backend.
  • Cost-free abstractions.

Non-goals

  • C- or Rust-like performance. Rust is already doing a great job there.
  • Support for restricted effects. It’s not clear the usability cost is worth it.

Next time, I’ll either dive into one of the specific design decisions we’ve made, or talk about the language’s overall gestalt or texture.

Designing a Delightful Functional Programming Language

ML and Haskell have amazingly powerful type systems. Their type systems, and to some degree syntax, have widely influenced the modern crop of languages. Both can achieve machine performance. Haskell has fantastic concurrency. Both have concise, boilerplate-free syntax — some would say too concise. Both have the ability to build abstractions without impacting performance.

So why aren’t there more Haskell or ML programmers? Like, it’s fantastic that Rust and Swift borrowed some of their ideas, but why did it take two decades? Why do things like CoffeeScript and Go spread like wildfire instead? Go is especially bizarre because it’s not really very different from Java in many ways, and a lot worse in others.

I’m sure there are many reasons. Tooling is important. Marketing is important. Haskell is (or perhaps just was) largely a research language. The first-time user experience is important. I don’t claim to know every reason why some languages spread and others don’t, but I know at least one thing: syntax matters a whole lot more than we’d like to admit.

Consider all of the bracing and spacing and indentation arguments people have about C.

Remember that brilliant people have whined about the fact that Python uses whitespace to indicate nesting, as if that materially affects how much work they can get done.

Hell, CoffeeScript gained significant popularity by adding nothing but syntax to JavaScript. There were many opportunities to add actual features to CoffeeScript, like lightweight coroutine syntax, but CoffeeScript’s motto was “It’s just JavaScript.” Its popularity is based on the strength of the syntax alone.

There are two ways to interpret this phenomenon. One is “stupid humans and their quirks and biases, if only the uneducated masses would see the light and focus on what really matters”. Another is the realization that programming is a deeply human activity. A programming language lets humans take their human thoughts and map them into something a computer can execute. The aesthetics of that activity are important.

If a language is going to get adopted, at least without significant corporate backing, it needs to appeal to peoples’ tastes.

Back to Haskell and ML. They have their proponents, but it’s only until now that they’re starting to dance with the mainstream. It’s a shame, because software would be in a much better place if all the material benefits of ML and Haskell (bidirectional type inference, cheap and safe concurrency, sum types) had spread faster.

I’ll share some relevant experiences.

My personal OCaml experience occurred in university. One day I decided to play with this OCaml thing. My rationale is perhaps embarrassing, but I quit for two reasons: the addition operator for integers is + but for floats it’s +.. The other thing was the pervasive use of ;;, at least in the repl. Honestly, it felt tasteless, so I dropped it and moved on, never getting to see the real benefits (polymorphism, type inference, modules, sum types).

Now, Haskell. I was part of the effort to get Haskell adopted at IMVU. The benefits of using Haskell were simply too great to ignore. But when we tried to get everyone to learn it, a nontrivial fraction of engineers developed a visceral aversion. “I understand all the benefits but… I just hate it.” “Why do you hate it?” “The whitespace, $, the two let forms, the weird function names… I dunno, it’s just ugly and I hate it.”

It made me deeply sad that these seemingly superficial complaints prevented what should have been a slam dunk, at least when it came to safety and performance and maintainability and development speed.

When I joined Dropbox, I had some conversations with teammates about the benefits of something like Haskell (most people either don’t know about it or think it’s some academic obscurity), and I got a serious negative reaction to the idea of static types. “They’re heavyweight.” “They’re hard to learn.” “They hinder refactoring.” “They hurt development speed.”

None of these are necessarily factual. After I poked and prodded, I figured out that the resistance to the idea of static typing comes from prior exposure to things like C++ or Java (or Go), which have especially verbose syntaxes and type systems. This is similar to what happens with functional programming, where a common reaction to the idea is “Ugh, I hated my lisp class in college.”

These realizations led Andy and I to take on a project: design a programming language with all the expressive power and semantic correctness of an ML, yet with a syntax and aesthetic that would be instantly recognizable to your typical Go or JavaScript programmer. Surely it’s achievable. This is largely a human factors project – while the PL theory is important, we do not intend to significantly innovate there. Instead, the main innovation is applying familiarity and usability to a domain with the opposite reputation.

In subsequent posts, I will describe some of DFPL’s design decisions.

PL Usability: Name Fewer Things

If you follow me on Twitter or know me in person, you know I’ve been somewhat obsessed with programming languages lately. Why? In my career I’ve written high-performance desktop software, embedded controllers, 3D engines, scalable and low-latency web stacks, frontend JavaScript stacks… and at this stage in my life I’m tired of dealing with stupid human mistakes like null pointer exceptions, uninitialized data, unhandled cases, accidentally passing a customer ID where you meant an experiment ID, accidentally running some nontransactional code inside of a transaction, specifying timeouts with milliseconds when the function expects seconds…

All of these problems are solvable with type systems. Having learned and written a pile of Haskell at IMVU, I’m now perpetually disappointed with other languages. Haskell has a beautiful mix of a powerful type system, bidirectional type inference, and concise syntax, but it also has a reputation for being notoriously hard to teach.

In fact, getting Haskell adopted in the broader IMVU engineering organization was a significant challenge, with respect to both evangelization (why is this important) and education (how do we use it).

Why is that? If Haskell is so great, what’s the problem? Well, there are a few reasons, but I personally interviewed nearly a dozen IMVU engineers to get their thoughts about Haskell and a common theme arose again and again:

There are too many NAMES in Haskell. String, ByteString, Text. Functor, Applicative, Monad. map, fmap. foldl, foldl’, foldr. Pure let … in syntax vs. monadic let syntax. Words words words.

Some people are totally okay with this. They have no problem reading reference documentation, memorizing and assigning precise meaning to every name. This is just a hypothesis, but I wonder whether mathematically-oriented people tend to fall into this category. Either way, it’s definitely true that some people tolerate a proliferation of names less than others.

As someone who can’t remember anything, I naturally stumbled into a guiding principle for API design: express concepts with a minimal set of names. This necessitates that the concepts be composable.

And now I’m going to pick on a few specific examples.

Maybe

My first example is Haskell’s maybe function.

First of all, what kind of name is that. I bet that, if you didn’t know what it did, you wouldn’t be able to guess. (A classic usability metric is how accurately things can be guessed.)

Second of all, why? Haskell already has pattern matches. You don’t need a tiny special function to pattern match Maybes. Compare, for some m :: Maybe Int:

maybe (return ()) (putStrLn . show) m

with:

case m of
  Just i -> putStrLn (show i)
  Nothing -> return ()

Sure, the latter is a bit more verbose, but I’d posit the explicit pattern match is almost always more readable in context. Why? Because maybe is yet another word to memorize. You need to know the order of its arguments and its meaning. The standard library would be no weaker without this function. Multiple experienced Haskellers have told me to prefer it over explicit pattern matches, but it doesn’t optimize for readability over the long term.

Pattern matches are great! They always work the same way (predictability!), and they’re quite explicit about what’s going on.

pathwalk

For my second example, the last thing I want to do is pick on this contributor, because the use case was valid and the pull request meaningful. (In the end, we found another way to provide the same functionality, so the contributor did add value to the project.)

But what I want to show is why I rejected this pull request. pathwalk was intended to be a small, light, learnable API that anyone could drop into a project. It exposes one function, pathWalk, with three variants, one for early exit, one for accumulation of a result, and one that uses lazy IO for convenience. All of these are intended to be, if not memorizable, quite predictable. The pull request, on the other hand, added a bunch of variations with nonobvious names, as well as six different callback types. For an idea as simple as directory traversal, introducing a pile of exported symbols would have a negative impact on usability. Thus, to keep the library’s interface small and memorizable, I rejected the PR.

TypeScript

Now I’ll discuss the situation that led me to writing this post. My team at Dropbox recently adopted TypeScript to prevent a category of common mistakes. For our own sanity, we were already annotating parameters and return types — now we’re simply letting the computer verify correctness for us.

However, as we started to fill in type annotations for the program, a coworker of mine expressed some concern. "We shouldn’t write so many type definitions in this frontend code. JavaScript is a scripting language, not heavyweight like Java." That argument didn’t make sense to me. Surely it doesn’t matter whether we’re writing high-performance backend code or frontend code — type systems satisfy the same need in either situation. And all I was doing was annotating the parsed return type of a service request to prevent mistakes. The specific code looked something like this:

// at the top of the file
interface ParsedResponse {
  users: UserModel[];
  payloads: PayloadModel[];
}
 
// ...
 
// way down in the file
function makeServiceRequest(request: Request): Promise<ParsedResponse> {
  return new Promise((resolve, reject) => {
    // ...
    resolve({
      users: parsedUserModels,
      payloads: parsedPayloadModels,
    });
  });
}

At first I could not understand my coworker’s reaction. But I realized my coworker’s concerns weren’t about the types — after all, he would have documented the fields of the response in comments anyway, and, all else equal, who doesn’t want compiler verification that their code accesses valid properties?

What made him react so negatively was that we needed to come up with a name for this parsed response thing. Moreover, the definition of the response type was located far away from the function that used it. This adds some mental overhead. It’s another thing to look at and remember.

Fortunately, TypeScript uses structural typing (as opposed to nominal typing), and the advantage of structural typing is types are compatible if the fields line up, so you can avoid naming intermediate types. Thus, if we change the code to:

function makeServiceRequest(request: Request): Promise<{
  users: UserModel[],
  payloads: PayloadModel[],
}> {
  return new Promise((resolve, reject) => {
    // ...
    resolve({
      users: parsedUserModels,
      payloads: parsedPayloadModels,
    });
  });
}

Now we can have type safety as well as the concision of a dynamic language. (And if we had bidirectional type inference, it would be even better. Alas, bidirectional type inference and methods are at odds with each other.)

Other Examples

git has the concept of an "index" where changes can be placed in preparation for an upcoming commit. However, the terminology around the index is inconsistent. Sometimes it’s referred to as the "cache", sometimes the "index", sometimes the "stage". I get it, "stage" is supposed to be the verb and "index" is the noun, but how does that help me remember which option I need to pass to git diff? It doesn’t help that both stage and index are both nouns and verbs.

In the early 2000s I wrote a sound library which was praised for how easy it was to get up and running relative to other libraries at the time. I suspect some of this was due to needing to know only a couple concepts to play sounds or music. FMOD, for on the other hand, required that you learn about devices and mixing channels and buffers. Now, all of these are useful concepts, and every serious sound programmer will have to deeply understand them in the end. But from the perspective of someone coming in fresh, they just want to play a sound. They don’t want to spend time studying a bunch of foundational concepts. Audiere had the deeper concepts too, but it was useful to provide a simpler API for the simple 80% use case.

Python largely gets this right. If you want to know the size of some x, for any x, you simply write len(x). You don’t need a different function for each data type.

To summarize, whenever you provide an API or language of some kind, think hard about the names of the ideas, and whether you can eliminate any. It will improve the usability of your system.

Haskell is Not a Purely Functional Language

Haskell has a strange reputation. Some of the best programmers I've known, coming to Haskell, have made ridiculous assumptions like "It must be pretty hard to write real programs in Haskell, given you can't have side effects." Of course that statement can't be true — there are many real programs written in Haskell, and a program without a side effect wouldn't be worth running, right?

Everyone describes Haskell as purely-functional. Even the haskell.org front page uses the phrase "purely-functional" within the first ten words. I argue that purely-functional, while in some ways accurate, gives the wrong impression about Haskell.

I propose instead that, for most programmers, it's better to think of Haskell as a language with restricted effects.

Out of the box, Haskell has two effect contexts, one at each extreme: pure functions and IO. Pure functions cannot[1] have side effects at all. Their outputs depend solely on their inputs; thus, they can be evaluated in any order[2] (or multiple times or never at all). IO actions, however, can have arbitrary side effects. Thus, they must be sequenced and run in order.

IO is unrestricted, but usefully restricted subsets of IO can be defined atop it. Consider software transactional memory. One reason STM works so well in Haskell compared to other languages is because it does not need to track reads and writes to all memory: only those inside of the STM execution context. STM is a restricted execution context – if arbitrary side effects were allowed in a transaction, then they might be run zero or multiple times, which would be unsafe. But there's another benefit of this restricted context: it means that transactions only need track reads and writes to STM variables like TVars. Independent memory reads and writes, such as those that occur as part of the evaluation of pure functions, can be completely ignored — pure functions can have no side effects, so it's fine to reevaluate them as needed. This is why Haskell has some of the best STM support, as explained in Haskell is useless.

Restricted effect contexts are powerful. IMVU uses them to guarantee unit tests are deterministic. They also let you guarantee that, for example, a Redis transaction cannot issue a MySQL query or write to a file, because the Redis execution engine may arbitrarily retry or abort the transaction.

OK, so Haskell has restricted effects. Restricted effects are a useful, though quite uncommon, language feature.

Should it be called a "restricted effects language"? No, I don't think so. It is totally possible to write all of your code in IO with imperative variable updates and mutable data structures, just like you would in Python or C. It's also fine to write all of your code with pure functions invoked from a minimal main function. That is, Haskell supports many styles of programming.

One word or phrase is simply not sufficient to describe anything meaningfully complex, such as a programming language. Languages consist of many features.

In one of my favorite papers, Teaching Programming Languages in a Post-Linnean Age, Shriram Krishnamurthi argues that programming language paradigms are outdated — languages are designed with interacting sets of features and should be evaluated as such.

Programming language “paradigms” are a moribund and tedious legacy of a bygone age. Modern language designers pay them no respect, so why do our courses slavishly adhere to them? This paper argues that we should abandon this method of teaching languages, offers an alternative, reconciles an important split in programming language education, and describes a textbook that explores these matters.

The problem with using paradigms to describe a language is that they come across as exclusionary. "This language is functional, not imperative." "This language is object-oriented, not functional." However, almost all languages support a wide range of programming styles.

So let's compare a few languages broken down into a mix of individual language features:

Java Haskell Go Rust C Python
Memory safety Yes Yes Kinda[3] Yes No Yes
Garbage collection Yes Yes Yes No No Yes
Precise control over memory layout No No No Yes Yes No
Concurrency model 1:1 M:N M:N 1:1 1:1 GIL
Obvious translation to machine code No No Somewhat Yes Yes No
Generic programming[4] Yes Yes Kinda, via interface{} Yes No Yes
Mutable data structures Yes Yes Yes Yes Yes Yes
Higher-order functions Kinda[5] Yes Yes Yes No Yes
Sum types No Yes No Yes No No
Runtime eval Kinda[6] No No No No Yes
Type inference Limited Yes Limited Limited No No
Tail calls No Yes No No No No
Exceptions Yes Yes Kinda[7] No No Yes
"Goroutines" e.g. user-threads No[8] Yes Yes No No Yes
Function overloading Yes Yes No No No No
Costless abstractions Depends on JIT Yes No Yes Kinda [Macros] No

The chart above is pretty handwavy, and we could quibble on the details, but my point is that languages should be compared using feature sets rather than paradigms. With feature sets it's possible to see which languages occupy similar points in the overall language design space. Haskell and Rust have related type systems, and Haskell and Java and Go have similar runtime semantics. It should be clear, then, that there is no taxonomy as simple as "functional", "imperative", or "object-oriented" that accurately classifies the differences between these languages.

(An aside: Whether a language supports a feature is often fuzzy, especially because some features can be broken apart into several aspects. Also, you can sometimes approximate one feature with others. For example, since Go doesn't support sum types, you can use type switches instead, at the cost of exhaustiveness checks and efficiency[9].)

"Purely-functional" is a description so generic that it doesn't fully convey what is interesting about Haskell. While Haskell has great support for functional programming, it has uniquely-powerful support for imperative programming too.

I recommend avoiding using the terms "functional language", "procedural language", "imperative language", "object-oriented language", and whatever other paradigms that have been popularized over the years.

Thanks to Imran Hameed for his feedback and suggestions. Of course, however, any mistakes are mine.

[1] Except via unsafe, user-beware mechanisms like unsafePerformIO and unsafeDupablePerformIO.

[2] I'm hand-waving a bit. If evaluating a value throws an exception, the exception must not be thrown unless the value is needed.

[3] Go is memory-safe except when one thread writes to an interface variable or slice and another reads from it.

[4] By generic programming I mean the ability to conveniently express, in the language, a computation that can operate on many different types of input. Any dynamic language will have this property because all values inhabit the same type. Typed languages with generic type systems also have this property. Go doesn't so you must use interface{} with explicit casts or code generation to implement generic algorithms.

[5] JVM doesn't really support higher-order functions

[6] In the JVM, code can be loaded dynamically with a ClassLoader. But runtime implementations don't generally ship with a Java language compiler.

[7] Go has panic and recover which are roughly equivalent to try…catch but with manual exception filtering and rebubbling.

[8] Java has a user threading mechanism implemented with bytecode translation in a library called Quasar.

[9] A direct encoding for sum types can be unboxed, smaller than a word, and only require one indirection to access the payload. The equivalent Go using type switches and interfaces would use two words for the interface pointer and an additional indirection.

Sum Types Are Coming: What You Should Know

Just as all mainstream languages now have lambda functions, I predict sum types are the next construct to spread outward from the typed functional programming community to the mainstream. Sum types are very useful, and after living with them in Haskell for a while, I miss them deeply when using languages without them.

Fortunately, sum types seem to be catching on: both Rust and Swift have them, and it sounds like TypeScript’s developers are at least open to the idea.

I am writing this article because, while sum types are conceptually simple, most programmers I know don’t have hands-on experience with them don’t have a good sense of their usefulness.

In this article, I’ll explain what sum types are, how they’re typically represented, and why they’re useful. I will also dispel some common misconceptions that cause people to argue sum types aren’t necessary.

What is a Sum Type?

Sum types can be explained a couple ways. First, I’ll compare them to product types, which are extremely familiar to all programmers. Then I’ll show how sum types look (unsafely) implemented in C.

Every language has product types – tuples and structs or records. They are called product types because they’re analogous to the cartesian products of sets. That is, int * float is the set of pairs of values (int, float). Each pair contains an int AND a float.

If product types correspond to AND, sum types correspond to OR. A sum type indicates a value that is either X or Y or Z or …

Let’s take a look at enumerations and unions and show how sum types are a safe generalization of the two. Sum types are a more general form of enumerations. Consider C enums:

enum Quality {
  LOW,
  MEDIUM,
  HIGH
};

A value of type Quality must be one of LOW, MEDIUM, or HIGH (excepting uninitialized data or unsafe casts — we are talking about C of course). C also has a union language feature:

union Event {
  struct ClickEvent ce;
  struct PaintEvent pe;
};

ClickEvent and PaintEvent share the same storage location in the union, so if you write to one, you ought not read from the other. Depending on the version of the C or C++ specification, the memory will either alias or your program will have undefined behavior. Either way, at any point in time, it’s only legal to read from one of the components of a union.

A sum type, sometimes called a discriminated union or tagged variant, is a combination of a tag (like an enum) and a payload per possibility (like a union).

In C, to implement a kind of sum type, you could write something like:

enum EventType {
  CLICK,
  PAINT
};

struct ClickEvent {
  int x, y;
};
struct PaintEvent {
  Color color;
};

struct Event {
  enum EventType type;
  union {
    struct ClickEvent click;
    struct PaintEvent paint;
  };
};

The type field indicates which event struct is legal to access. Usage looks as follows:

// given some Event event
switch (event.type) {
  case CLICK:
    handleClickEvent(&event.click);
    break;
  case PAINT:
    handlePaintEvent(&event.paint);
    break;
  // most compilers will let us know if we didn't handle every event type
}

However, there is some risk here. Nothing prevents code from accessing .paint in the case that type is CLICK. At all times, every possible field in event is visible to the programmer.

A sum type is a safe formalization of this idea.

Sum Types are Safe

Languages like ML, Haskell, F#, Scala, Rust, Swift, and Ada provide direct support for sum types. I’m going to give examples in Haskell because the Haskell syntax is simple and clear. In Haskell, our Event type would look like this:

data Event = ClickEvent Int Int
           | PaintEvent Color

That syntax can be read as follows: there is a data type Event that contains two cases: it is either a ClickEvent containing two Ints or a PaintEvent containing a Color.

A value of type Event contains two parts: a small tag describing whether it’s a ClickEvent or PaintEvent followed by a payload that depends on the specific case. If it’s a ClickEvent, the payload is two integers. If it’s a PaintEvent, the payload is one color. The physical representation in memory would look something like [CLICK_EVENT_TAG][Int][Int] or [PAINT_EVENT_TAG][Color], much like our C code above. Some languages can even store the tag in the bottom bits of the pointer, which is even more efficient.

Now, to see what type of Event a value contains, and to read the event’s contents, you must pattern match on it.

-- given some event :: Event
case event of
  ClickEvent x y -> handleClickEvent x y
  PaintEvent color -> handlePaintEvent color

Sum types, paired with pattern matching, provide nice safety guarantees. You cannot read x out of an event without first verifying that it’s a ClickEvent. You cannot read color without verifying it’s a PaintEvent. Moreover, the color value is only in scope when the event is known to be a PaintEvent.

Sum Types are General

We’ve already discussed how sum types are more general than simple C-style enumerations. In fact, in a simple enumeration, since none of the options have payloads, the sum type can be represented as a single integer in memory. The following DayOfWeek type, for example, can be represented as efficiently as the corresponding C enum would.

data DayOfWeek = Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday

Sum types can also be used to create nullable data types like C pointers or Java references. Consider F#’s option, or Rust’s Option, or Haskell’s Maybe:

data Maybe a = Nothing | Just a

(a is a generic type variable – that is, you can have a Maybe Int or Maybe Customer or Maybe (Maybe String)).

Appropriate use of Maybe comes naturally to programmers coming from Java or Python or Objective C — it’s just like using NULL or None or nil instead of an object reference except that the type signature of a data type or function indicates whether a value is optional or not.

When nullable references are replaced by explicit Maybe or Option, you no longer have to worry about NullPointerExceptions, NullReferenceExceptions, and the like. The type system enforces that required values exist and that optional values are safely pattern-matched before they can be dereferenced.

Imagine writing some code that closes whatever window has focus. The C++ would look something like this:

Window* window = get_focused_window();
window->close_window();

Oops! What if there is no focused window? Boom. The fix:

Window* window = get_focused_window();
if (window) {
  window->close_window();
}

But then someone could accidentally call a different method on window outside of the if statement… reintroducing the problem. To help mitigate this possibility, C++ does allow introducing names inside of a conditional:

if (Window* window = get_focused_window()) {
  window->close_window();

Pattern matching on Maybe avoids this problem entirely. There’s no way to even call close_window unless an actual window is returned, and the variable w is never bound unless there is an actual focused window:

window <- get_focused_window
case window of p
  Nothing -> return ()
  Just w -> close_window w

This is a big win for correctness and clarity.

Thinking in Sum Types

Once you live with sum types for a while, they change the way you think. People coming from languages like Python or Java (myself included) to Haskell immediately gravitate towards tuples and Maybe since they’re familiar. But once you become accustomed to sum types, they subtly shift how you think about the shape of your data.

I’ll share a specific memorable example. At IMVU we built a Haskell URL library and we wanted to represent the "head" of a URL, which includes the optional scheme, host, username, password, and port. Everything before the path, really. This data structure has at least one important invariant: it is illegal to have a scheme with no host. But it is possible to have a host with no scheme in the case of protocol-relative URLs.

At first I structured UrlHead roughly like this:

type SchemeAndHost = (Maybe Scheme, Host)
type UrlHead = Maybe (Maybe SchemeAndHost)
-- to provide some context, the compete URL type follows
type Url = (UrlHead, [PathSegment], QueryString)

In hindsight, the structure is pretty ridiculous and hard to follow. But the idea is that, if the URL head is Nothing, then the URL is relative. If it’s Just Nothing, then the path is treated as absolute. If it’s Just (Just (Nothing, host)), then it’s a protocol-relative URL. Otherwise it’s a fully-qualified URL, and the head contains both a scheme and a host.

However, after I started to grok sum types, a new structure emerged:

data UrlHead
  = FullyQualified ByteString UrlHost
  | ProtocolRelative UrlHost
  | Absolute
  | Relative

Now the cases are much clearer. And they have explicit names and appropriate payloads!

Sum Types You Already Know

There are several sum types that every programmer has already deeply internalized. They’ve internalized them so deeply that they no longer think about the general concept. For example, "This variable either contains a valid reference to class T or it is null." We’ve already discussed optional types in depth.

Another common example is when functions can return failure conditions. A fallible function either returns a value or it returns some kind of error. In Haskell, this is usually represented with Either, where the error is on the Left. Similarly, Rust uses the Result type. It’s relatively rare, but I’ve seen Python functions that either return a value or an object that derives from Exception. In C++, functions that need to return more error information will usually return the error status by value and, in the case of success, copy the result into an out parameter.

Obviously languages with exceptions can throw an exception, but exceptions aren’t a general error-handling solution for the following reasons:

  1. What if you temporarily want to store the result or error? Perhaps in a cache or promise or async job queue. Using sum types allows sidestepping the complicated issue of exception transferability.
  2. Some languages either don’t have exceptions or limit where they can be thrown or caught.
  3. If used frequently, exceptions generally have worse performance than simple return values.

I’m not saying exceptions are good or bad – just that they shouldn’t be used as an argument for why sum types aren’t important. :)

Another sum type many people are familiar with is values in dynamic languages like JavaScript. A JavaScript value is one of many things: either undefined or null or a boolean or a number or an object or… In Haskell, the JavaScript value type would be defined approximately as such:

data JSValue = Undefined
             | Null
             | JSBool Bool
             | JSNumber Double
             | JSString Text
             | JSObject (HashMap Text JSValue)

I say approximately because, for the sake of clarity, I left out all the other junk that goes on JSObject too. ;) Like whether it’s an array, its prototype, and so on.

JSON is a little simpler to define:

data JSONValue = Null
               | True
               | False
               | String Text
               | Number Double
               | Array [JSONValue]
               | Object (HashMap Text JSONValue)

Notice this type is recursive — Arrays and Objects can refer to other JSONValues.

Protocols, especially network protocols, are another situation where sum types frequently come up. Network packets will often contain a bitfield of some sort describing the type of packet, followed by a payload, just like discriminated unions. This same structure is also used to communicate over channels or queues between concurrent processes. Some example protocol definitions modeled as sum types:

data JobQueueCommand = Quit | LogStatus | RunJob Job
data AuthMethod = Cookie Text | Secret ByteString | Header ByteString

Sum types also come up when whenever sentry values are needed in API design.

Approximating Sum Types

If you’ve ever tried to implement a JSON AST in a language like C++ or Java or Go you will see that the lack of sum types makes safely and efficiently expressing the possibilities challenging. There are a couple ways this is typically handled. The first is with a record containing many optional values.

struct JSONValue {
  JSONBoolean* b;
  JSONString* s;
  JSONNumber* n;
  JSONArray* a;
  JSONObject* o;
}

The implied invariant is that only one value is defined at a time. (And perhaps, in this example, JSON null is represented by all pointers in JSONValue being null.) This limitation here is that nothing stops someone from making a JSONValue where, say, both a and o are set. Is it an array? Or an object? The invariant is broken, so it’s ambiguous. This costs us some type safety. This approximation, by the way, is equivalent to Go’s errors-as-multiple-return-values idiom. Go functions return a result and an error, and it’s assumed (but not enforced) that only one is valid at a time.

Another approach to approximating sum types is using an interface and classes like the following Java:

interface JSONValue {}
class JSONBoolean implements JSONValue { bool value; }
class JSONString implements JSONValue { String value; }
class JSONArray implements JSONValue { ArrayList<JSONValue> elements; }
// and so on

To check the specific type of a JSONValue, you need a runtime type lookup, something like C++’s dynamic_cast or Go’s type switch.

This is how Go, and many C++ and Java JSON libraries, represent the AST. The reason this approach isn’t ideal is because there’s nothing stopping anyone from deriving new JSONValue classes and inserting them into JSON arrays or objects. This weakens some of the static guarantees: given a JSONValue, the compiler can’t be 100% sure that it’s only a boolean, number, null, string, array, or object, so it’s possible for the JSON AST to be invalid. Again, we lose type safety without sum types.

There is a third approach for implementing sum types in languages without direct support, but it involves a great deal of boilerplate. In the next section I’ll discuss how this can work.

The Expression Problem

Sometimes, when it comes up that a particular language doesn’t support sum types (as most mainstream languages don’t), people make the argument "You don’t need sum types, you can just use an interface for the type and a class for each constructor."

That argument sounds good at first, but I’ll explain generally that interfaces and sum types have different (and somewhat opposite) use cases.

As I mentioned in the previous section, it’s common in languages without sum types, such as Java and Go, to represent a JSON AST as follows:

interface JSONValue {}
class JSONBoolean implements JSONValue { bool value; }
class JSONString implements JSONValue { String value; }
class JSONNumber implements JSONValue { double value; }
class JSONArray implements JSONValue { ArrayList<JSONValue> elements; }
class JSONObject impleemnts JSONValue { HashMap<String, JSONValue> properties; }

As I also mentioned, this structure does not rigorously enforce that the ONLY thing in, say, a JSON array is a null, a boolean, a number, a string, an object, or another array. Some other random class could derive from JSONValue, even if it’s not a sensible JSON value. The JSON encoder wouldn’t know what to do with it. That is, interfaces and derived classes here are not as type safe as sum types, as they don’t enforce valid JSON.

With sum types, given a value of type JSONValue, the compiler and programmer know precisely which cases are possible. Thus, any code in the program can safely and completely enumerate the possibilities. Thus, we can use JSONValue anywhere in the program without modifying the cases at all. But if we add a new case to JSONValue, then we potentially have to update all uses. That is, it is much easier to use the sum type in new situations than to modify the list of cases. (Imagine how much code you’d have to update if someone said "Oh, by the way, all pointers in this Java program can have a new state: they’re either null, valid, or lazy, in which case you have to force them. (Remember that nullable references are a limited form of sum types.) That would require a blood bath of code updates across all Java programs ever written.)

The opposite situation occurs with interfaces and derived classes. Given an interface, you don’t know what class implements it — code consuming an interface is limited to the API provided by the interface. This gives a different degree of freedom: it’s easy to add new cases (e.g. classes deriving from the interface) without updating existing code, but your uses are limited to the functionality exposed by the interface. To add new methods to the interface, all existing implementations must be updated.

To summarize:

  • With sum types, it’s easy to add uses but hard to add cases.
  • With interfaces, it’s easy to add cases but hard to add uses.

Each has its place and neither is a great substitute for the other. This is known as The Expression Problem.

To show why the argument that sum types can be replaced with interfaces is weak, let us reduce the scenario to the simplest non-enumeration sum type: Maybe.

interface Maybe {
}

class Nothing implements Maybe {
}

class Just implements Maybe {
  Object value;
}

What methods should Maybe have? Well, really, all you can do is run different code depending on whether the Maybe is a Nothing or Just.

interface MaybeVisitor {
  void visitNothing();
  void visitJust(Object value);
}

interface Maybe {
  void visit(MaybeVisitor visitor);
}

This is the visitor pattern, which is another way to approximate sum types in languages without them. Visitor has the right maintainability characteristics (easy to add uses, hard to add cases), but it involves a great deal of boilerplate. It also requires two indirect function calls per pattern match, so it’s dramatically less efficient than a simple discriminated union would be. On the other hand, direct pattern matches of sum types can be as cheap as a tag check or two.

Another reason visitor is not a good replacement for sum types in general is that the boilerplate is onerous enough that you won’t start "thinking in sum types". In languages with lightweight sum types, like Haskell and ML and Rust and Swift, it’s quite reasonable to use a sum type to reflect a lightweight bit of user interface state. For example, if you’re building a chat client, you may represent the current scroll state as:

type DistanceFromBottom = Int
data ScrollState = ScrolledUp DistanceFromBottom | PeggedToBottom

This data type only has a distance from bottom when scrolled up, not pegged to the bottom. Building a visitor just for this use case is so much code that most people would sacrifice a bit of type safety and instead simply add two fields.

bool scrolledUp;
int distanceFromBottom; // only valid when scrolledUp

Another huge benefit of pattern matching sum types over the visitor pattern is that pattern matches can be nested or have wildcards. Consider a function that can either return a value or some error type. Haskell convention is that errors are on the Left and values are on the Right branch of an Either.

data FetchError = ConnectionError | PermissionError String | JsonDecodeError
fetchValueFromService :: IO (Either FetchError Int)

-- Later on...

result <- fetchValueFromService
case result of
  Right value -> processResult value
  Left ConnectionError -> error "failed to connect"
  Left (PermissionError username) -> error ("try logging in, " ++ username)
  Left JsonDecodeError -> error "failed to decode JSON"
  _ -> error "unknown error"

Expressing this with the visitor pattern would be extremely painful.

Paul Koerbitz comes to a similar conclusion.

Named Variants? or Variants as Types?

Now I’d like to talk a little about sum types are specified from a language design perspective.

Programming languages that implement sum types have to decide how the ‘tag’ of the sum type is represented in code. There are two main approaches languages take. Either the cases are given explicit names or each case is specified with a type.

Haskell, ML, Swift, and Rust all take the first approach. Each case in the type is given a name. This name is not a type – it’s more like a constant that describes which ‘case’ the sum type value currently holds. Haskell calls the names "type constructors" because they produce values of the sum type. From the Rust documentation:

enum Message {
    Quit,
    ChangeColor(i32, i32, i32),
    Move { x: i32, y: i32 },
    Write(String),
}

Quit and ChangeColor are not types. They are values. Quit is a Message by itself, but ChangeColor is a function taking three ints and returning Message. Either way, the names Quit, ChangeColor, Move, and Write indicate which case a Message contains. These names are also be used in pattern matches. Again, from the Rust documentation:

fn quit() { /* ... */ }
fn change_color(r: i32, g: i32, b: i32) { /* ... */ }
fn move_cursor(x: i32, y: i32) { /* ... */ }
match msg {
    Message::Quit => quit()
    Message::ChangeColor(r, g, b) => change_color(r, g, b),
    Message::Move { x: x, y: y } => move_cursor(x, y),
    Message::Write(s) => println!("{}", s),
};

The other way to specify the cases of a sum type is to use types themselves. This is how C++’s boost.variant and D’s std.variant libraries work. An example will help clarify the difference. The above Rust code translated to C++ would be:

struct Quit {};
struct ChangeColor { int r, g, b; };
struct Move { int x; int y; };
struct Write { std::string message; };
typedef variant<Quit, ChangeColor, Move, Write> Message;

msg.match(
  [](const Quit&) { quit(); },
  [](const ChangeColor& cc) { change_color(cc.r, cc.g, cc.b); },
  [](const Move& m) { move_cursor(m.x, m.y); },
  [](const Write& w) { std::cout << w.message << std::endl; }
);

Types themselves are used to index into the variant. There are several problems with using types to specify the cases of sum types. First, it’s incompatible with nested pattern matches. In Haskell I could write something like:

type MouseButton = LeftButton | RightButton | MiddleButton | ExtraButton Int
type MouseEvent = MouseDown MouseButton Int Int | MouseUp MouseButton Int Int | MouseMove Int Int

-- ...

case mouseEvent of
  MouseDown LeftButton x y -> beginDrag x y
  MouseUp LeftButton x y -> endDrag x y

In C++, using the variant<> template described above, I’d have to do something like:

struct LeftButton {};
struct RightButton {};
struct MiddleButton {};
struct ExtraButton { int b; };
typedef variant<LeftButton, RightButton, MiddleButton, ExtraButten> MouseButton;

struct MouseDown { MouseButton button; int x; int y; };
struct MouseUp { MouseButton button; int x; int y; };
struct MouseMove { int x; int y; };
typedef variant<MouseDown, MouseUp, MouseMove> MouseEvent;

// given: MouseEvent mouseEvent;

mouseEvent.match(
  [](const MouseDown& event) {
    event.match([](LeftButton) {
      beginDrag(event.x, event.y);
    });
  },
  [](const MouseUp& event) {
    event.match([](LeftButton) {
      endDrag(event.x, event.y);
    });
  }
);

You can see that, in C++, you can’t pattern match against MouseDown and LeftButton in the same match expression.

(Note: It might look like I could compare with == to simplify the code, but in this case I can’t because the pattern match extracts coordinates from the event. That is, the coordinates are a "wildcard match" and their value is irrelevant to whether that particular branch runs.)

Also, it’s so verbose! Most C++ programmers I know would give up some type safety in order to fit cleanly into C++ syntax, and end up with something like this:

struct Button {
  enum ButtonType { LEFT, MIDDLE, RIGHT, EXTRA } type;
  int b; // only valid if type == EXTRA
};

struct MouseEvent {
  enum EventType { MOUSE_DOWN, MOUSE_UP, MOUSE_MOVE } type;
  Button button; // only valid if type == MOUSE_DOWN or type == MOUSE_UP
  int x;
  int y;
};

Using types to index into variants is attractive – it doesn’t require adding any notion of type constructors to the language. Instead it uses existing language functionality to describe the variants. However, it doesn’t play well with type inference or pattern matching, especially when generics are involved. If you pattern match using type names, you must explicitly spell out each fully-qualified generic type, rather than letting type inference figure out what is what:

auto result = some_fallible_function();
// result is an Either<Error, std::map<std::string, std::vector<int>>>
result.match(
  [](Error& e) {
    handleError(e);
  },
  [](std::map<std::string, std::vector<int>>& result) {
    handleSuccess(result);
  }
);

Compare to the following Haskell, where the error and success types are inferred and thus implicit:

result <- some_fallible_action
case result of
  Left e ->
    handleError e
  Right result ->
    handleSuccess result

There’s an even deeper problem with indexing variants by type: it becomes illegal to write variant<int, int>. How would you know if you’re referring to the first or second int? You might say "Well, don’t do that", but in generic programming that can be difficult or annoying to work around. Special-case limitations should be avoided in language design if possible – we’ve already learned how annoying void can be in generic programming.

These are all solid reasons, from a language design perspective, to give each case in a sum type an explicit name. This could address many of the concerns raised with respect to adding sum types to the Go language. (See also this thread). The Go FAQ specifically calls out that sum types are not supported in Go because they interact confusingly with interfaces, but that problem is entirely sidestepped by named type constructors. (There are other reasons retrofitting sum types into Go at this point is challenging, but their interaction with interfaces is a red herring.)

It’s likely not a coincidence that languages with sum types and type constructors are the same ones with pervasive type inference.

Summary

I hope I’ve convinced you that sum types, especially when paired with pattern matching, are very useful. They’re easily one of my favorite features of Haskell, and I’m thrilled to see that new languages like Rust and Swift have them too. Given their utility and generality, I expect more and more languages to grow sum types in one form or another. I hope the language authors do some reading, explore the tradeoffs, and similarly come to the conclusion that Haskell, ML, Rust, and Swift got it right and should be copied, especially with respect to named cases rather than "union types". :)

To summarize, sum types:

  • provide a level of type safety not available otherwise.
  • have an efficient representation, more efficient than vtables or the visitor pattern.
  • give programmers an opportunity to clearly describe possibilities.
  • with pattern matching, provide excellent safety guarantees.
  • are an old idea, and are finally coming back into mainstream programming!

I don’t know why, but there’s something about the concept of sum types that makes them easy to dismiss, especially if you’ve spent your entire programming career without them. It takes experience living in a sum types world to truly internalize their value. I tried to use compelling, realistic examples to show their utility and I hope I succeeded. :)

Endnotes

Terminology

In this article, I’ve used the name sum type, but tagged variant, tagged union, or discriminated union are fine names too. The phrase sum type originates in type theory and is a denotational description. The other names are operational in that they describe the implementation strategy.

Terminology is important though. When Rust introduced sum types, they had to name them something. They happened to settle on enum, which is a bit confusing for people coming from languages where enums cannot carry payloads. There’s a corresponding argument that they should have been called union, but that’s confusing too, because sum types aren’t about sharing storage either. Sum types are a combination of the two, so neither keyword fits exactly. Personally, I’m partial to Haskell’s data keyword because it is used for both sum and product types, sidestepping the confusion entirely. :)

More Reading

If you’re convinced, or perhaps not yet, and you’d like to read more, some great articles have been written about the subject:

Wikipedia has two excellent articles, distinguishing between Tagged Unions and Algebraic Data Types.

Sum types and pattern matching go hand in hand. Wikipedia, again, describes pattern matching in general.

TypeScript’s union types are similar to sum types, though they don’t use named type constructors.

FP Complete has another nice introduction to sum types.

For what it’s worth, Ada has had a pretty close approximation of sum types for decades, but it did not spread to other mainstream languages. Ada’s implementation isn’t quite type safe, as accessing the wrong case results in a runtime error, but it’s probably close enough to safe in practice.

Much thanks goes to Mark Laws for providing valuable feedback and corrections. Of course, any errors are my own.

Announcing BufferBuilder: Encode JSON in Haskell 2-5x faster than Aeson

Conclusion

This tale is a little long and detailed, so for those of you who want immediate payoff…

Andy and I wrote a Haskell library called Data.BufferBuilder (including Data.BufferBuilder.Utf8 and Data.BufferBuilder.Json) which makes it easy to safely and efficiently encode Haskell data structures into JSON. In our benchmarks, using Data.BufferBuilder.Json to encode JSON is 4-5x as fast as using Aeson.

Even if you’re already using Aeson, you can benefit somewhat from BufferBuilder’s improved encoding performance. The buffer-builder-aeson package adds a ToJson instance for Aeson’s Value type, which our benchmarks show is 50% to 100% faster than Aeson’s built-in encoder. All you need to do is call Data.BufferBuilder.Json.encodeJson instead of Data.Aeson.encode!

Why did we build BufferBuilder?

Some of IMVU’s backend services are written in Haskell. While Haskell is incredible for many use cases, we ran into an unexpected bottleneck: JSON encoding. Our service response structure produces quite a lot of JSON, and much of that JSON contains URLs encoded into JSON strings.

Amazingly, URL and JSON encoding was showing up as a significant cost center when generating JSON responses. Some services spent over a second encoding JSON!

When faced with a performance problem, my first instinct is to either pound the problem into the ground or unmake the problem so the code isn’t necessary in the first place.

So let’s look at the problem holistically:

  • A JSON response is represented in memory as a collection of varied, in-memory data structures. The response happens to contain many URLs — sometimes more than a hundred.
  • URLs are represented by a data structure consisting of the protocol, hostname, path segments, query string, and so on.
  • Each URL becomes a JSON string.
  • Each in-memory data structure is converted to a JSON object whose properties depend on the type on the corresponding data structure.

Using Aeson to encode all of this results in the following steps:

  • ToJSON instances convert Haskell data types to an AST of Aeson Values.
  • The keys of an Aeson object are Text values. In memory, Text is encoded in UTF-16. Thus, URLs must be translated from their in-memory representation (ours is ASCII) into UTF-16 before they fit into the Aeson AST.
  • Then, the entity bodies are converted into JSON objects, where the keys are known at compile-time, but must be encoded as UTF-16.
  • The entire AST is encoded into a Lazy Text Builder.
  • Then the Lazy Text is encoded into a Lazy ByteString containing UTF-8.
  • Then, to generate the response body, the Lazy ByteString is concatenated into a single strict ByteString.

That’s a lot of steps! To summarize:

(URLs and Custom Data Types) -> Aeson AST -> Lazy Text -> Lazy UTF-8 ByteString -> Strict ByteString

The Design of BufferBuilder

This is where Andy and I sat down to create an API to let us cleanly express JSON encoding without sacrificing type safety OR performance.

We know that a fast, if not the fastest, way to build up a buffer of bytes is to allocate a chunk of memory, stream writes to it, and either chunk or realloc() as needed. Obviously, this kind of code can be trivially expressed in C:

void buffer_append(buffer* b, const char* s, size_t length) {
    if (!b->has_room(length)) {
        b->grow(length);
    }
    memcpy(b->data + b->size, s, length);
}

Because I’d been told that bouncing between Haskell and C with the foreign function interface can be slow, my first approach was to attempt to build a Haskell monad that grabbed the RealWorld token out of IO (IO a is basically a newtype around RealWorld -> (RealWorld, a)), augmented it with some extra “local variables” like the output ptr, capacity, and current size, and manually implemented allocation and memory writes with GHC.Prim APIs. GHC did not like this at all. The generated code ran 20 times slower than naive usage of Data.ByteString.Builder. Nonetheless, it was an interesting technique, so maybe I’ll write about it another time.

Surely it was possible to do better. So I tried the foreign function interface after all.

I wrote a tiny C API that allocated the underlying growable buffer. It provided APIs for appending bytes, buffers, UTF-16-to-UTF-8 transcoding, and so on. These FFI calls can only happen within IO actions, but building buffers is fundamentally a pure operation, and the provided Haskell API should be effectively pure. The solution is to offer a restricted state monad like ST which limits the operations within the monad to safe buffer-building operations.

This approach was by the fastest of any that I tried. In fact, in a sequence of appendBS operations, if the arguments are known-strict ByteStrings, GHC will compile the appendBS sequence directly into a series of cheap C function calls. For example, the following code:

data BSTriple = BSTriple !ByteString !ByteString !ByteString

writeBSTriple :: BSTriple -> BufferBuilder ()
writeBSTriple !(BSTriple a b c) = do
    appendBS a
    appendBS b
    appendBS c

compiles into something like:


movq %rbx,%rdi
movq 120(%rsp),%rax
movq %rax,%rsi
movq 96(%rsp),%rax
movq %rax,%rdx
movq 112(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp
movq %rbx,%rdi
movq 152(%rsp),%rax
movq %rax,%rsi
movq 128(%rsp),%rax
movq %rax,%rdx
movq 144(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp
movq %rbx,%rdi
movq 216(%rsp),%rax
movq %rax,%rsi
movq 160(%rsp),%rax
movq %rax,%rdx
movq 208(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp

Obviously GHC’s code generator has some room to improve, but the main thrust is exactly right. An equivalent C++ API would generate much the same kind of code, modulo the poor instruction and register selection, which overall doesn’t matter too much here.

In almost all situations, Haskell isn’t going to be as fast as straightforward C, but with some work and appropriate use of FFI, it’s possible to come close.

Data.BufferBuilder.Utf8

Once we had an API to safely and efficiently build up buffers of bytes, we wanted to build safe APIs on top for constructing valid UTF-8 buffers and valid JSON.

Utf8Builder is a newtype around BufferBuilder with a restricted API. If you only call safe functions in Data.BufferBuilder.Utf8, the result is guaranteed to be valid UTF-8. Unsafe functions are provided for when you know precisely what you’re doing.

Data.BufferBuilder.Json

Data.BufferBuilder.Json is built on top of Data.BufferBuilder.Utf8. Data.BufferBuilder.Json’s Value type is a newtype around Utf8Builder, meaning there’s no Aeson-style AST. Each Value simply knows how to write itself into an output buffer. Just like how the safe Utf8Builder functions guarantee the output is legal UTF-8, the safe JsonBuilder functions guarantee (almost) that the output is a legal JSON document. (There are a couple caveats, see the documentation for details.)

I suspect Data.BufferBuilder.Json is approaching the limit of how fast a JSON encoder can be. And thanks to the beauty of Haskell, it’s convenient and safe!

If you’re using Aeson and encoding performance matters to you, give BufferBuilder a shot!