Code Density – Efficient but Painful?

by Chad Austin
Aug 11 15

When most people try to read math equations, a paper, some Haskell code, or even abstraction-heavy Python, they tend to gnash their teeth a bit and then whine about it being too hard to read. And some of that is fair – there is a great deal of overly complicated stuff out there.

But, in general, I’ve seen negative reactions to code that is simply dense, even if it’s factored well.

I’ve long been fascinated by how perceived time differs from actual time. In particular, humans are susceptible to believing that certain things are faster than others just because they feel faster. Consider the classic example of the MacOS mouse-based UI versus keyboard shortcuts.

We’ve done a cool $50 million of R & D on the Apple Human Interface. We discovered, among other things, two pertinent facts:

  • Test subjects consistently report that keyboarding is faster than mousing.
  • The stopwatch consistently proves mousing is faster than keyboarding.

This contradiction between user-experience and reality apparently forms the basis for many user/developers’ belief that the keyboard is faster.

Or consider automation versus just doing the grunt work.

Taking this analogy back to programming with abstractions, consider two ways to add one to every element in a list. One uses the map abstraction and the other manually iterates with a for loop.

new_list = map(lambda v: v + 1, my_list)


new_list = []
for v in my_list:
    new_list.append(v + 1)

map, once you understand it, immediately conveys more information than the for loop. When you see map being used, you know a few things:

  1. the output list has the same length as the input list
  2. the input list is not modified
  3. each output element only depends on the corresponding input element

To build this same level of understanding given a for loop, you have to read the loop body. In trivial examples like this one, it’s easy, but most loop bodies aren’t so simple.

map is common enough in programming languages that most people have no problem with it. But there are abstractions everywhere: monads, semigroups, actors, currying, coroutines, threads, sockets, closures… Each abstraction conveys some useful meaning to the programmer, often by applying some rules or restrictions.

Consider asynchronous programming with callback soup versus coroutines, tasks, or lightweight threads. When you are programming with callbacks it’s very easy to forget to call a callback or call it twice. This can result in some extremely hard-to-diagnose bugs. Coroutines / tasks provide a useful guarantee: asynchronous operations will return exactly once. This abstraction comes with a cost, however: the code is more terse and indirect, and depends on your knowledge of the abstraction, just like map above.

So right. Abstractions exist. They must be learned, but they provide some nice guarantees.

Applied in the extreme, abstractions result in extremely dense, terse code. In languages with weak support for cheap-abstraction-building, like Go, this code would have to be spelled out manually. One might look at that Haskell and exclaim "Agh, that’s unreadable." But, ignoring language familiarity, consider the amount of knowledge gained per unit of mental effort and time. The algorithm has a certain amount of inherent complexity. You can either read a whole lot of "simple" lines of code or a few "hard" lines of code, but you’ll build the same amount of understanding in the end.

My conjecture: People don’t like reading dense code, so it feels less productive than reading a lot of fluffy code, even if it’s actually faster. This is the same psychological effect as the MacOS mouse vs. keyboard shortcut feedback. I’m not aware of any comprehension speed studies across, say, Haskell and Java, but I wouldn’t be surprised if people feel slower when reading Haskell but are actually faster.

Perhaps, to maximize productivity, you want to optimize for unpleasantness – the day will be "longer" so you can get more done.

Why might density be unpleasant? I’m not at all sure why papers full of equations are so intimidating compared to prose, but I have a guess. When most people read, their eyes and attention subconsciously jump around a bit. When reading less dense material, this is fine — perhaps even beneficial. But with very dense material, each symbol or word conveys a lot of meaning, forcing the reader’s eyes and attention to move much more slowly and deliberately so the brain can stay caught up. This forced slowness hurts and feels wrong, even if it actually results in quicker comprehension. Again, totally a guess. I have no idea.

Maybe it would be worth bumping up the font size when reading math or dense Haskell code?

If you have any resources or citations on this topic, send them my way!

An Aside

My blog posts are frequently misinterpreted as if I’m making broader statements than I am, so here are some things I’m NOT saying:

I’m definitely not saying code that’s terse for the sake of it is always a win. For example, consider the maybe function: some people love it but is it really any clearer than just pattern matching as necessary?

Also, I am not saying Haskell is always faster to read than anything else. I’m talking about code density in general, including abstraction-heavy or functional-style Python, or a super-tight C algorithm versus fluffy OOP object soup.

Sum Types Are Coming: What You Should Know

by Chad Austin
Jul 9 15

Just as all mainstream languages now have lambda functions, I predict sum types are the next construct to spread outward from the typed functional programming community to the mainstream. Sum types are very useful, and after living with them in Haskell for a while, I miss them deeply when using languages without them.

Fortunately, sum types seem to be catching on: both Rust and Swift have them, and it sounds like TypeScript’s developers are at least open to the idea.

I am writing this article because, while sum types are conceptually simple, most programmers I know don’t have hands-on experience with them don’t have a good sense of their usefulness.

In this article, I’ll explain what sum types are, how they’re typically represented, and why they’re useful. I will also dispel some common misconceptions that cause people to argue sum types aren’t necessary.

What is a Sum Type?

Sum types can be explained a couple ways. First, I’ll compare them to product types, which are extremely familiar to all programmers. Then I’ll show how sum types look (unsafely) implemented in C.

Every language has product types – tuples and structs or records. They are called product types because they’re analogous to the cartesian products of sets. That is, int * float is the set of pairs of values (int, float). Each pair contains an int AND a float.

If product types correspond to AND, sum types correspond to OR. A sum type indicates a value that is either X or Y or Z or …

Let’s take a look at enumerations and unions and show how sum types are a safe generalization of the two. Sum types are a more general form of enumerations. Consider C enums:

enum Quality {

A value of type Quality must be one of LOW, MEDIUM, or HIGH (excepting uninitialized data or unsafe casts — we are talking about C of course). C also has a union language feature:

union Event {
  struct ClickEvent ce;
  struct PaintEvent pe;

ClickEvent and PaintEvent share the same storage location in the union, so if you write to one, you ought not read from the other. Depending on the version of the C or C++ specification, the memory will either alias or your program will have undefined behavior. Either way, at any point in time, it’s only legal to read from one of the components of a union.

A sum type, sometimes called a discriminated union or tagged variant, is a combination of a tag (like an enum) and a payload per possibility (like a union).

In C, to implement a kind of sum type, you could write something like:

enum EventType {

struct ClickEvent {
  int x, y;
struct PaintEvent {
  Color color;

struct Event {
  enum EventType type;
  union {
    struct ClickEvent click;
    struct PaintEvent paint;

The type field indicates which event struct is legal to access. Usage looks as follows:

// given some Event event
switch (event.type) {
  case CLICK:
  case PAINT:
  // most compilers will let us know if we didn't handle every event type

However, there is some risk here. Nothing prevents code from accessing .paint in the case that type is CLICK. At all times, every possible field in event is visible to the programmer.

A sum type is a safe formalization of this idea.

Sum Types are Safe

Languages like ML, Haskell, F#, Scala, Rust, Swift, and Ada provide direct support for sum types. I’m going to give examples in Haskell because the Haskell syntax is simple and clear. In Haskell, our Event type would look like this:

data Event = ClickEvent Int Int
           | PaintEvent Color

That syntax can be read as follows: there is a data type Event that contains two cases: it is either a ClickEvent containing two Ints or a PaintEvent containing a Color.

A value of type Event contains two parts: a small tag describing whether it’s a ClickEvent or PaintEvent followed by a payload that depends on the specific case. If it’s a ClickEvent, the payload is two integers. If it’s a PaintEvent, the payload is one color. The physical representation in memory would look something like [CLICK_EVENT_TAG][Int][Int] or [PAINT_EVENT_TAG][Color], much like our C code above. Some languages can even store the tag in the bottom bits of the pointer, which is even more efficient.

Now, to see what type of Event a value contains, and to read the event’s contents, you must pattern match on it.

-- given some event :: Event
case event of
  ClickEvent x y -> handleClickEvent x y
  PaintEvent color -> handlePaintEvent color

Sum types, paired with pattern matching, provide nice safety guarantees. You cannot read x out of an event without first verifying that it’s a ClickEvent. You cannot read color without verifying it’s a PaintEvent. Moreover, the color value is only in scope when the event is known to be a PaintEvent.

Sum Types are General

We’ve already discussed how sum types are more general than simple C-style enumerations. In fact, in a simple enumeration, since none of the options have payloads, the sum type can be represented as a single integer in memory. The following DayOfWeek type, for example, can be represented as efficiently as the corresponding C enum would.

data DayOfWeek = Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday

Sum types can also be used to create nullable data types like C pointers or Java references. Consider F#’s option, or Rust’s Option, or Haskell’s Maybe:

data Maybe a = Nothing | Just a

(a is a generic type variable – that is, you can have a Maybe Int or Maybe Customer or Maybe (Maybe String)).

Appropriate use of Maybe comes naturally to programmers coming from Java or Python or Objective C — it’s just like using NULL or None or nil instead of an object reference except that the type signature of a data type or function indicates whether a value is optional or not.

When nullable references are replaced by explicit Maybe or Option, you no longer have to worry about NullPointerExceptions, NullReferenceExceptions, and the like. The type system enforces that required values exist and that optional values are safely pattern-matched before they can be dereferenced.

Imagine writing some code that closes whatever window has focus. The C++ would look something like this:

Window* window = get_focused_window();

Oops! What if there is no focused window? Boom. The fix:

Window* window = get_focused_window();
if (window) {

But then someone could accidentally call a different method on window outside of the if statement… reintroducing the problem. To help mitigate this possibility, C++ does allow introducing names inside of a conditional:

if (Window* window = get_focused_window()) {

Pattern matching on Maybe avoids this problem entirely. There’s no way to even call close_window unless an actual window is returned, and the variable w is never bound unless there is an actual focused window:

window <- get_focused_window
case window of p
  Nothing -> return ()
  Just w -> close_window w

This is a big win for correctness and clarity.

Thinking in Sum Types

Once you live with sum types for a while, they change the way you think. People coming from languages like Python or Java (myself included) to Haskell immediately gravitate towards tuples and Maybe since they’re familiar. But once you become accustomed to sum types, they subtly shift how you think about the shape of your data.

I’ll share a specific memorable example. At IMVU we built a Haskell URL library and we wanted to represent the "head" of a URL, which includes the optional scheme, host, username, password, and port. Everything before the path, really. This data structure has at least one important invariant: it is illegal to have a scheme with no host. But it is possible to have a host with no scheme in the case of protocol-relative URLs.

At first I structured UrlHead roughly like this:

type SchemeAndHost = (Maybe Scheme, Host)
type UrlHead = Maybe (Maybe SchemeAndHost)
-- to provide some context, the compete URL type follows
type Url = (UrlHead, [PathSegment], QueryString)

In hindsight, the structure is pretty ridiculous and hard to follow. But the idea is that, if the URL head is Nothing, then the URL is relative. If it’s Just Nothing, then the path is treated as absolute. If it’s Just (Just (Nothing, host)), then it’s a protocol-relative URL. Otherwise it’s a fully-qualified URL, and the head contains both a scheme and a host.

However, after I started to grok sum types, a new structure emerged:

data UrlHead
  = FullyQualified ByteString UrlHost
  | ProtocolRelative UrlHost
  | Absolute
  | Relative

Now the cases are much clearer. And they have explicit names and appropriate payloads!

Sum Types You Already Know

There are several sum types that every programmer has already deeply internalized. They’ve internalized them so deeply that they no longer think about the general concept. For example, "This variable either contains a valid reference to class T or it is null." We’ve already discussed optional types in depth.

Another common example is when functions can return failure conditions. A fallible function either returns a value or it returns some kind of error. In Haskell, this is usually represented with Either, where the error is on the Left. Similarly, Rust uses the Result type. It’s relatively rare, but I’ve seen Python functions that either return a value or an object that derives from Exception. In C++, functions that need to return more error information will usually return the error status by value and, in the case of success, copy the result into an out parameter.

Obviously languages with exceptions can throw an exception, but exceptions aren’t a general error-handling solution for the following reasons:

  1. What if you temporarily want to store the result or error? Perhaps in a cache or promise or async job queue. Using sum types allows sidestepping the complicated issue of exception transferability.
  2. Some languages either don’t have exceptions or limit where they can be thrown or caught.
  3. If used frequently, exceptions generally have worse performance than simple return values.

I’m not saying exceptions are good or bad – just that they shouldn’t be used as an argument for why sum types aren’t important. :)

Another sum type many people are familiar with is values in dynamic languages like JavaScript. A JavaScript value is one of many things: either undefined or null or a boolean or a number or an object or… In Haskell, the JavaScript value type would be defined approximately as such:

data JSValue = Undefined
             | Null
             | JSBool Bool
             | JSNumber Double
             | JSString Text
             | JSObject (HashMap Text JSValue)

I say approximately because, for the sake of clarity, I left out all the other junk that goes on JSObject too. ;) Like whether it’s an array, its prototype, and so on.

JSON is a little simpler to define:

data JSONValue = Null
               | True
               | False
               | String Text
               | Number Double
               | Array [JSONValue]
               | Object (HashMap Text JSONValue)

Notice this type is recursive — Arrays and Objects can refer to other JSONValues.

Protocols, especially network protocols, are another situation where sum types frequently come up. Network packets will often contain a bitfield of some sort describing the type of packet, followed by a payload, just like discriminated unions. This same structure is also used to communicate over channels or queues between concurrent processes. Some example protocol definitions modeled as sum types:

data JobQueueCommand = Quit | LogStatus | RunJob Job
data AuthMethod = Cookie Text | Secret ByteString | Header ByteString

Sum types also come up when whenever sentry values are needed in API design.

Approximating Sum Types

If you’ve ever tried to implement a JSON AST in a language like C++ or Java or Go you will see that the lack of sum types makes safely and efficiently expressing the possibilities challenging. There are a couple ways this is typically handled. The first is with a record containing many optional values.

struct JSONValue {
  JSONBoolean* b;
  JSONString* s;
  JSONNumber* n;
  JSONArray* a;
  JSONObject* o;

The implied invariant is that only one value is defined at a time. (And perhaps, in this example, JSON null is represented by all pointers in JSONValue being null.) This limitation here is that nothing stops someone from making a JSONValue where, say, both a and o are set. Is it an array? Or an object? The invariant is broken, so it’s ambiguous. This costs us some type safety. This approximation, by the way, is equivalent to Go’s errors-as-multiple-return-values idiom. Go functions return a result and an error, and it’s assumed (but not enforced) that only one is valid at a time.

Another approach to approximating sum types is using an interface and classes like the following Java:

interface JSONValue {}
class JSONBoolean implements JSONValue { bool value; }
class JSONString implements JSONValue { String value; }
class JSONArray implements JSONValue { ArrayList<JSONValue> elements; }
// and so on

To check the specific type of a JSONValue, you need a runtime type lookup, something like C++’s dynamic_cast or Go’s type switch.

This is how Go, and many C++ and Java JSON libraries, represent the AST. The reason this approach isn’t ideal is because there’s nothing stopping anyone from deriving new JSONValue classes and inserting them into JSON arrays or objects. This weakens some of the static guarantees: given a JSONValue, the compiler can’t be 100% sure that it’s only a boolean, number, null, string, array, or object, so it’s possible for the JSON AST to be invalid. Again, we lose type safety without sum types.

There is a third approach for implementing sum types in languages without direct support, but it involves a great deal of boilerplate. In the next section I’ll discuss how this can work.

The Expression Problem

Sometimes, when it comes up that a particular language doesn’t support sum types (as most mainstream languages don’t), people make the argument "You don’t need sum types, you can just use an interface for the type and a class for each constructor."

That argument sounds good at first, but I’ll explain generally that interfaces and sum types have different (and somewhat opposite) use cases.

As I mentioned in the previous section, it’s common in languages without sum types, such as Java and Go, to represent a JSON AST as follows:

interface JSONValue {}
class JSONBoolean implements JSONValue { bool value; }
class JSONString implements JSONValue { String value; }
class JSONNumber implements JSONValue { double value; }
class JSONArray implements JSONValue { ArrayList<JSONValue> elements; }
class JSONObject impleemnts JSONValue { HashMap<String, JSONValue> properties; }

As I also mentioned, this structure does not rigorously enforce that the ONLY thing in, say, a JSON array is a null, a boolean, a number, a string, an object, or another array. Some other random class could derive from JSONValue, even if it’s not a sensible JSON value. The JSON encoder wouldn’t know what to do with it. That is, interfaces and derived classes here are not as type safe as sum types, as they don’t enforce valid JSON.

With sum types, given a value of type JSONValue, the compiler and programmer know precisely which cases are possible. Thus, any code in the program can safely and completely enumerate the possibilities. Thus, we can use JSONValue anywhere in the program without modifying the cases at all. But if we add a new case to JSONValue, then we potentially have to update all uses. That is, it is much easier to use the sum type in new situations than to modify the list of cases. (Imagine how much code you’d have to update if someone said "Oh, by the way, all pointers in this Java program can have a new state: they’re either null, valid, or lazy, in which case you have to force them. (Remember that nullable references are a limited form of sum types.) That would require a blood bath of code updates across all Java programs ever written.)

The opposite situation occurs with interfaces and derived classes. Given an interface, you don’t know what class implements it — code consuming an interface is limited to the API provided by the interface. This gives a different degree of freedom: it’s easy to add new cases (e.g. classes deriving from the interface) without updating existing code, but your uses are limited to the functionality exposed by the interface. To add new methods to the interface, all existing implementations must be updated.

To summarize:

  • With sum types, it’s easy to add uses but hard to add cases.
  • With interfaces, it’s easy to add cases but hard to add uses.

Each has its place and neither is a great substitute for the other. This is known as The Expression Problem.

To show why the argument that sum types can be replaced with interfaces is weak, let us reduce the scenario to the simplest non-enumeration sum type: Maybe.

interface Maybe {

class Nothing implements Maybe {

class Just implements Maybe {
  Object value;

What methods should Maybe have? Well, really, all you can do is run different code depending on whether the Maybe is a Nothing or Just.

interface MaybeVisitor {
  void visitNothing();
  void visitJust(Object value);

interface Maybe {
  void visit(MaybeVisitor visitor);

This is the visitor pattern, which is another way to approximate sum types in languages without them. Visitor has the right maintainability characteristics (easy to add uses, hard to add cases), but it involves a great deal of boilerplate. It also requires two indirect function calls per pattern match, so it’s dramatically less efficient than a simple discriminated union would be. On the other hand, direct pattern matches of sum types can be as cheap as a tag check or two.

Another reason visitor is not a good replacement for sum types in general is that the boilerplate is onerous enough that you won’t start "thinking in sum types". In languages with lightweight sum types, like Haskell and ML and Rust and Swift, it’s quite reasonable to use a sum type to reflect a lightweight bit of user interface state. For example, if you’re building a chat client, you may represent the current scroll state as:

type DistanceFromBottom = Int
data ScrollState = ScrolledUp DistanceFromBottom | PeggedToBottom

This data type only has a distance from bottom when scrolled up, not pegged to the bottom. Building a visitor just for this use case is so much code that most people would sacrifice a bit of type safety and instead simply add two fields.

bool scrolledUp;
int distanceFromBottom; // only valid when scrolledUp

Another huge benefit of pattern matching sum types over the visitor pattern is that pattern matches can be nested or have wildcards. Consider a function that can either return a value or some error type. Haskell convention is that errors are on the Left and values are on the Right branch of an Either.

data FetchError = ConnectionError | PermissionError String | JsonDecodeError
fetchValueFromService :: IO (Either FetchError Int)

-- Later on...

result <- fetchValueFromService
case result of
  Right value -> processResult value
  Left ConnectionError -> error "failed to connect"
  Left (PermissionError username) -> error ("try logging in, " ++ username)
  Left JsonDecodeError -> error "failed to decode JSON"
  _ -> error "unknown error"

Expressing this with the visitor pattern would be extremely painful.

Paul Koerbitz comes to a similar conclusion.

Named Variants? or Variants as Types?

Now I’d like to talk a little about sum types are specified from a language design perspective.

Programming languages that implement sum types have to decide how the ‘tag’ of the sum type is represented in code. There are two main approaches languages take. Either the cases are given explicit names or each case is specified with a type.

Haskell, ML, Swift, and Rust all take the first approach. Each case in the type is given a name. This name is not a type – it’s more like a constant that describes which ‘case’ the sum type value currently holds. Haskell calls the names "type constructors" because they produce values of the sum type. From the Rust documentation:

enum Message {
    ChangeColor(i32, i32, i32),
    Move { x: i32, y: i32 },

Quit and ChangeColor are not types. They are values. Quit is a Message by itself, but ChangeColor is a function taking three ints and returning Message. Either way, the names Quit, ChangeColor, Move, and Write indicate which case a Message contains. These names are also be used in pattern matches. Again, from the Rust documentation:

fn quit() { /* ... */ }
fn change_color(r: i32, g: i32, b: i32) { /* ... */ }
fn move_cursor(x: i32, y: i32) { /* ... */ }
match msg {
    Message::Quit => quit()
    Message::ChangeColor(r, g, b) => change_color(r, g, b),
    Message::Move { x: x, y: y } => move_cursor(x, y),
    Message::Write(s) => println!("{}", s),

The other way to specify the cases of a sum type is to use types themselves. This is how C++’s boost.variant and D’s std.variant libraries work. An example will help clarify the difference. The above Rust code translated to C++ would be:

struct Quit {};
struct ChangeColor { int r, g, b; };
struct Move { int x; int y; };
struct Write { std::string message; };
typedef variant<Quit, ChangeColor, Move, Write> Message;

  [](const Quit&) { quit(); },
  [](const ChangeColor& cc) { change_color(cc.r, cc.g, cc.b); },
  [](const Move& m) { move_cursor(m.x, m.y); },
  [](const Write& w) { std::cout << w.message << std::endl; }

Types themselves are used to index into the variant. There are several problems with using types to specify the cases of sum types. First, it’s incompatible with nested pattern matches. In Haskell I could write something like:

type MouseButton = LeftButton | RightButton | MiddleButton | ExtraButton Int
type MouseEvent = MouseDown MouseButton Int Int | MouseUp MouseButton Int Int | MouseMove Int Int

-- ...

case mouseEvent of
  MouseDown LeftButton x y -> beginDrag x y
  MouseUp LeftButton x y -> endDrag x y

In C++, using the variant<> template described above, I’d have to do something like:

struct LeftButton {};
struct RightButton {};
struct MiddleButton {};
struct ExtraButton { int b; };
typedef variant<LeftButton, RightButton, MiddleButton, ExtraButten> MouseButton;

struct MouseDown { MouseButton button; int x; int y; };
struct MouseUp { MouseButton button; int x; int y; };
struct MouseMove { int x; int y; };
typedef variant<MouseDown, MouseUp, MouseMove> MouseEvent;

// given: MouseEvent mouseEvent;

  [](const MouseDown& event) {
    event.match([](LeftButton) {
      beginDrag(event.x, event.y);
  [](const MouseUp& event) {
    event.match([](LeftButton) {
      endDrag(event.x, event.y);

You can see that, in C++, you can’t pattern match against MouseDown and LeftButton in the same match expression.

(Note: It might look like I could compare with == to simplify the code, but in this case I can’t because the pattern match extracts coordinates from the event. That is, the coordinates are a "wildcard match" and their value is irrelevant to whether that particular branch runs.)

Also, it’s so verbose! Most C++ programmers I know would give up some type safety in order to fit cleanly into C++ syntax, and end up with something like this:

struct Button {
  enum ButtonType { LEFT, MIDDLE, RIGHT, EXTRA } type;
  int b; // only valid if type == EXTRA

struct MouseEvent {
  enum EventType { MOUSE_DOWN, MOUSE_UP, MOUSE_MOVE } type;
  Button button; // only valid if type == MOUSE_DOWN or type == MOUSE_UP
  int x;
  int y;

Using types to index into variants is attractive – it doesn’t require adding any notion of type constructors to the language. Instead it uses existing language functionality to describe the variants. However, it doesn’t play well with type inference or pattern matching, especially when generics are involved. If you pattern match using type names, you must explicitly spell out each fully-qualified generic type, rather than letting type inference figure out what is what:

auto result = some_fallible_function();
// result is an Either<Error, std::map<std::string, std::vector<int>>>
  [](Error& e) {
  [](std::map<std::string, std::vector<int>>& result) {

Compare to the following Haskell, where the error and success types are inferred and thus implicit:

result <- some_fallible_action
case result of
  Left e ->
    handleError e
  Right result ->
    handleSuccess result

There’s an even deeper problem with indexing variants by type: it becomes illegal to write variant<int, int>. How would you know if you’re referring to the first or second int? You might say "Well, don’t do that", but in generic programming that can be difficult or annoying to work around. Special-case limitations should be avoided in language design if possible – we’ve already learned how annoying void can be in generic programming.

These are all solid reasons, from a language design perspective, to give each case in a sum type an explicit name. This could address many of the concerns raised with respect to adding sum types to the Go language. (See also this thread). The Go FAQ specifically calls out that sum types are not supported in Go because they interact confusingly with interfaces, but that problem is entirely sidestepped by named type constructors. (There are other reasons retrofitting sum types into Go at this point is challenging, but their interaction with interfaces is a red herring.)

It’s likely not a coincidence that languages with sum types and type constructors are the same ones with pervasive type inference.


I hope I’ve convinced you that sum types, especially when paired with pattern matching, are very useful. They’re easily one of my favorite features of Haskell, and I’m thrilled to see that new languages like Rust and Swift have them too. Given their utility and generality, I expect more and more languages to grow sum types in one form or another. I hope the language authors do some reading, explore the tradeoffs, and similarly come to the conclusion that Haskell, ML, Rust, and Swift got it right and should be copied, especially with respect to named cases rather than "union types". :)

To summarize, sum types:

  • provide a level of type safety not available otherwise.
  • have an efficient representation, more efficient than vtables or the visitor pattern.
  • give programmers an opportunity to clearly describe possibilities.
  • with pattern matching, provide excellent safety guarantees.
  • are an old idea, and are finally coming back into mainstream programming!

I don’t know why, but there’s something about the concept of sum types that makes them easy to dismiss, especially if you’ve spent your entire programming career without them. It takes experience living in a sum types world to truly internalize their value. I tried to use compelling, realistic examples to show their utility and I hope I succeeded. :)



In this article, I’ve used the name sum type, but tagged variant, tagged union, or discriminated union are fine names too. The phrase sum type originates in type theory and is a denotational description. The other names are operational in that they describe the implementation strategy.

Terminology is important though. When Rust introduced sum types, they had to name them something. They happened to settle on enum, which is a bit confusing for people coming from languages where enums cannot carry payloads. There’s a corresponding argument that they should have been called union, but that’s confusing too, because sum types aren’t about sharing storage either. Sum types are a combination of the two, so neither keyword fits exactly. Personally, I’m partial to Haskell’s data keyword because it is used for both sum and product types, sidestepping the confusion entirely. :)

More Reading

If you’re convinced, or perhaps not yet, and you’d like to read more, some great articles have been written about the subject:

Wikipedia has two excellent articles, distinguishing between Tagged Unions and Algebraic Data Types.

Sum types and pattern matching go hand in hand. Wikipedia, again, describes pattern matching in general.

TypeScript’s union types are similar to sum types, though they don’t use named type constructors.

FP Complete has another nice introduction to sum types.

For what it’s worth, Ada has had a pretty close approximation of sum types for decades, but it did not spread to other mainstream languages. Ada’s implementation isn’t quite type safe, as accessing the wrong case results in a runtime error, but it’s probably close enough to safe in practice.

Much thanks goes to Mark Laws for providing valuable feedback and corrections. Of course, any errors are my own.


by Chad Austin
May 31 15

Several years ago, when we started training IMVU’s client team on modern web technologies, an engineer suggested we give CoffeeScript a chance. The purported benefit was that CoffeeScript helps you stay within "the good parts" of JavaScript by providing concise syntax for lambdas, object literals, destructuring binds, object iteration, and other idioms.

However, after a month or two of the team using it, we all agreed to switch back to JavaScript.

You see, the problem with CoffeeScript is that its benefits are shallow. CoffeeScript’s motto is "It’s just JavaScript", even though it has the opportunity to be so much more.

Programming languages have several aspects. The syntax of a language affects a person’s ability to visually parse its structure, the amount of typing required, and a language’s overall visual style. Syntax has a large effect on a language’s overall pleasantness, so people tend to give it a great deal of attention. Also, syntax is the first impression people have of a language. I’ve heard people say "JavaScript is in the C family" even though almost the only thing the two languages have in common is curly braces.

The semantics of a language, on the other hand, provide upper bounds on the size of programs that can be written with it. Language semantics apply constraints, and thus guarantees, making it easier and safer to work on larger programs. Examples follow:

  • this value is always an integer
  • this program does not write to memory it doesn’t own
  • this function has the same result no matter how many times it is called (idempotence)
  • this function has no observable side effects (purity)
  • memory will be automatically reclaimed when it is no longer reachable
  • the completion callback for this asynchronous operation is called exactly once
  • MySQL queries cannot be issued in the middle of a Redis transaction
  • this object always contains properties x, y, and z
  • if you have an object of type X, it will never be null, and it will always be fully-constructed
  • if an object is constructed on the stack, its destructor will run before the current function returns

Different languages provide different sets of guarantees.

Both syntax and semantics are important. Syntax solves for pleasantness and human factors, but strong semantics are what help people safely write larger and more complicated programs.

To be especially clear, I am not saying syntax is unimportant. As an example, consider C++11 lambda functions. They are "merely" syntax sugar for code structures possible in C++98. However, because they’re so convenient, they fundamentally change the economics of writing in certain styles. Thus syntax provides a vital role in guiding programmers down good paths.

The problem with CoffeeScript, and ultimately the reason why IMVU dropped it, is that, while it looks nice at first glance, it doesn’t substantially improve the semantics of JavaScript. Missing properties still return undefined. It’s still possible to trip over this in lambda functions. There are a few wins: it’s no longer easy to accidentally write to a global variable (like you might in JavaScript by forgetting var), and CoffeeScript’s for own makes it easy to iterate over an object’s properties without walking the prototype chain. But those wins can be easily obtained in JavaScript through tools such as jshint.

At IMVU, we decided CoffeeScript’s syntactic brevity was not worth the cost of having a translation step for our code. That translation step adds a bit of latency between editing a file and reloading it in your browser, and it adds a degree of mental overhead for programmers, especially those not deeply familiar with the web. That is, not only must a programmer know JavaScript semantics, but they must also know how CoffeeScript translates into JavaScript.

And, sometimes, that translation is not at all obvious. Consider the following examples. I encourage you to try to guess what they do and then paste the code into

The following snippets all call fn with two arguments:

fn a, b
fn a,
fn x:y, b

However, if the first argument is not an object literal, then it’s a syntax error:


How many parameters do you think are passed to fn? What are their values?

fn (

There are too many ways to express branches. I hypothesize that having so many different control flow idioms makes rapid scanning of flow through a function harder, with no meaningful benefit.

if x then return
unless x then return
return if x
return unless x
for x in ls then foo(x)
foo(x) for x in ls
foo(x) while i++ < 10
foo(x) until i++ > 10

How do you think the following snippets parse?



t for t in ls if t > 1
t for t in ls when t > 1

o = (t for t in ls when t > 1)
o = (for t in ls then)

foo ? 1 : 2 # legal
foo ? [] : 2 # not legal

fn a, debugger, b
a = debugger
fn a debugger # not legal?? is debugger an expression or not?

e = try f

a = if b then

a = [f for f in x]
a = (f for f in x)

I’ve literally been told the best way to cope is to install an editor plugin that makes it easy to compile bits of code so I can verify that the compiled JavaScript matches the behavior I intended.

The following code is legal! And produces a rather strange result.

a = (for b in x then)

Yet not everything is an expression.

a = (return) # illegal

CoffeeScript has semi-open and closed intervals. Guess which is which?


Real CoffeeScript Hazards

Now that I’m done making fun of the syntax, in all seriousness, there are actually a couple real dangers in CoffeeScript too.

Because CoffeeScript has no syntax for explicitly introducing a variable binding, the assignment to a name introduces a variable binding in that scope. If an existing variable is defined in an outer scope, the inner assignment will reuse the outer variable binding. This means that changes to outer scopes can change the meaning of code inner scopes.

Consider the following code (and in case it’s not clear, do simply calls its argument function with no arguments):

a = 'hello world'
do ->
  for i in [0...10]
    print i
print a

It prints the values 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, "hello world".

Now rename the outer variable a to i.

i = 'hello world'
do ->
  for i in [0...10]
    print i
print i

We didn’t change the inner scope at all. But now the inner loop does not create a new variable binding — instead it reuses the outer variable i. This code now prints 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9.

Another risk that comes up in CoffeeScript comes from the fact that the last expression in a function is the return value. Consider the following function:

processStrings = ->
  for longString in dataSource
    process longString

Let’s assume processStrings is intended to have no return value. Let’s also assume that the process function has a large return value that we intend to discard.

processStrings will actually return a list containing the return value of process for each iteration through the loop. This is not a correctness hazard so much as a performance hazard, but it’s come up several times in the codebase I’ve been working on. To fix processStrings, it should be written as follows:

processStrings = ->
  for longString in dataSource
    process longString

This situation is why Haskell distinguishes between forM and forM_.

CoffeeScript Has Benefits Too

I’ve just listed a lot of unfortunate things about CoffeeScript, but it does provide some rather significant benefits. In fact, I think that a team that deeply understands JavaScript will enjoy CoffeeScript simply because it’s so concise.

Arrow functions and CoffeeScript’s braceless object literals do make it easy to visually scan through large chunks of code. Traditional JavaScript is littered with curly braces and parentheses – sometimes JavaScript feels as noisy as a LISP.

Classes nicely encode the pattern that everyone expresses manually with prototypes in JavaScript. Someone recently made the comment "I’ve been writing CoffeeScript for years and I’ve never needed to understand JavaScript prototypes". I can believe that, especially if you’re not interacting with JavaScript libraries that make direct use of prototypes.

Destructuring binds are a huge deal. They dramatically simplify the keyword arguments pattern. Consider the following imvujs JavaScript module:

  foo: 'foo.js',
  bar: 'bar.js',
}, function(imports) {
  var foo =;
  var bar =;

The corresponding CoffeeScript is much shorter and easier to read:

  foo: 'foo.js'
  bar: 'bar.js'
, ({foo, bar}) ->

Fortunately, many of the niceties of CoffeeScript are making it into ES6 and TypeScript, so CoffeeScript’s value proposition in the future is not very high.

So what are you saying, Chad?

My opinions tend to be nuanced. I am not the kind of person to come out guns blazing and say "X sucks!" or "Y is the best thing ever!"

Do I think CoffeeScript is a big win over JavaScript? Nope.

Would I start a new project in CoffeeScript today? Nope.

Would I be happy working in an existing CoffeeScript codebase? Sure.

Do I think CoffeeScript has some advantages over JavaScript? Sure.

Do I think CoffeeScript has much of a future given that ES6 and TypeScript have arrow functions, classes, and destructuring binds? Nope.

Do I think it’s possible to have a language with the syntax niceties of CoffeeScript and real semantic benefits like coroutines and static types? Absolutely, and I’m very much looking forward to the new few years of AltJS innovation.

I think Jeremy Ashkenas deserves a lot of credit for exploring this programming language space. It was a worthwhile experiment, and it validated the usefulness of several features that made it into ES6.


My opinions on this topic are not new. Many others have come to the same conclusion. For example:

Thinking About Performance

by Chad Austin
Apr 28 15

Update: Bill Gropp covers this topic in depth in a presentation at University of Illinois: Engineering for Performance in High-Performance Computing.

Two things happened recently that made me think my approach to performance may not be widely shared.

First, when I announced BufferBuilder on reddit, one early commenter asked "Did you run your Haskell through a profiler to help identify the slow parts of your code?" A fairly reasonable question, but no, I didn’t. And the reason is that, if stuff shows up in a Haskell profiler, it’s already going to be too slow. I knew, up front, that an efficient way to build up buffers is a bounds check followed by a store (in the case of a single byte) or copy (in the case of a bytestring). Any instructions beyond that are heat, not light. Thus, I started BufferBuilder by reading the generated assembly and building a mental model of how Haskell translates to instructions, not by running benchmarks and profilers. (Benchmarks came later.) Any "typical" CPU-bound Haskell program is going to spend most of its time in stg_upd_frame_info and stg_ap_*_info anyway. ;)

Next, on my local programming IRC channel, I shared a little trick I’d seen on Twitter for replacing a div with a shift: (i+1)%3 == (1<<i)&3 for i in [0, 2]. One person strenuously objected to the idea of using this trick. Paraphrasing, their argument went something like "the meaning of the code is not clear, so tricks like that should be left to the compiler, but it doesn’t work for all values of i, so a compiler would never actually substitute the left for the right. Just don’t bother." We went back and forth, after which it became clear that what the person REALLY meant was "don’t use this trick unless you’re sure it matters". And then we got to "you have to run a profiler to know it matters." I contend it’s possible to use your eyes and brain to see that a div is on the critical path in an inner loop without fancy tools. You just have to have a rough sense of operation cost and how out-of-order CPUs work.

Over the years I have built a mental framework for thinking about performance beyond the oft-recommended "get the program working, then profile and optimize the hot spots". My mental framework comes from three sources:

  1. Rico Mariani has written a lot about designing for performance so you aren’t caught, late in the project, unable to hit your perf targets. That is, understand the machine and problem constraints, and sketch out on paper the solution so you can make sure it fits. Use prototypes as necessary.
  2. I’ve always been interested in how our programs run on physical silicon. The Pentium Chronicles: The People, Passion, and Politics Behind Intel’s Landmark Chips is an excellent history of the design of Intel’s out-of-order P6 line.
  3. I’ve been involved with too many projects run with the philosophy of "get it working, profile and optimize later". It’s very easy for this strategy to fail, resulting in a program that’s uniformly sluggish and uneconomical to fix.

Performance is Engineering

To hit your performance goals, you first need to define your goals. Consider what you’re trying to accomplish. Some example performance goals:

  • Interactive VR on a mobile phone
  • Time between mouse click in menus and action taken is under 100 ms
  • Load a full 3D chat room in five seconds
  • First page of search results from a mountain in Hawaii in half a second
  • Latest experimental analyses in half a day

Also, understand why you want to hit those goals. Todd Hoff’s excellent Latency is Everywhere and it Costs You Sales has several links to research showing how performance can affect your business metrics and customer satisfaction.

For BufferBuilder, our goal was to match the performance, in Haskell, of a naive C++ buffer
building library.

Now that you have a goal in mind, and presumably you understand the problem you’re trying to solve, there’s one final piece to understand. You could call them the "fundamental constants of human-computer interaction", which can be split into the human and computer halves.

Very roughly:

  • ~16 ms – frame budget for interactive animation
  • 100 ms – user action feels instantaneous
  • 200 ms – typical human reaction time
  • 500+ ms – perceptible delay
  • 3+ seconds – perceived sluggishness
  • 10+ seconds – attention span is broken

And on the computer:

  • 1 cycle – latency of simple bitwise operations
  • a few – maximum number of independent instructions that can be retired per cycle
  • 3-4 cycles – latency of L1 cache hit
  • ~dozen cycles – latency of L2 cache hit
  • ~couple dozen cycles – integer division on modern x86
  • couple hundred cycles – round-trip to main memory
  • 50-250 us – round-trip to SSD
  • 250-1000 us – in-network ethernet/tcp round-trip
  • 10 ms – spinny disk seek
  • 150 ms – IP latency between Australia and USA

Throughput numbers:

  • 8.5 GB/s – memory bandwidth on iPhone 5
  • ~100 MB/s – saturated gigabit ethernet connection
  • 50-100 MB/s – spinny disk transfer speed
  • 4.5 Mb/s – global average connection speed
  • 3.2 Mb/s – average mobile bandwidth in USA

These numbers can be composed into higher-level numbers. For example:

  • 1 ms – parsing 100 KB of JSON on a modern, native parser such as RapidJSON or sajson.

It’s unlikely JSON parsing will become appreciably faster in the future – parsing is a frustrating, latency-bound problem for computers. "Read one byte. Branch. Read next byte. Branch."

While throughput numbers increase over time, latency has only inched downwards. Thus, on most typical programs, you’re likely to find yourself latency-bound before being throughput-bound.

Above numbers are from:

Designing for Performance

Once you’ve defined your performance goal, you need to make the numbers fit. If your goal is interactive animation, you can barely afford a single blocking round-trip to a spinny disk per frame. (Don’t do that.) If your goal is an interactive AJAX application that feels instant from all around the world, you must pay VERY close attention to the number of IP round-trips required. Bandwidth, modulo TCP windowing, is usually available in spades – but accumulated, serial round-trips will rapidly blow your experience budget. If you want a WebGL scene to load in five seconds on a typical global internet connection, the data had better fit in (5s * 4.5 Mb/s) = 2.8 MB, minus initial round-trip time.

For BufferBuilder, since we knew our goal was to match (or at least come reasonably close to) C++ performance in Haskell, we didn’t need a profiler. We knew that appending a single byte to a buffer could be minimally expressed with a load of the current output pointer, a (predicted) bounds check, and a write of the new output pointer. Appending a buffer to another is a (predicted) bounds check, a memcpy, and updating the output pointer.

A profiler is not needed to achieve the desired performance characteristics. An understanding of the problem, an understanding of the constraints, and careful attention to the generated code is all you need.

We can apply this approach at almost any scale. Want a rich website that uses less than 100 MB of RAM yet has high-resolution images, gradients, transitions, and effects? Build prototypes to measure the costs of each component, build up a quantitative understanding, and make the numbers fit. (Of course, don’t forget to actually measure real memory consumption on the resulting page, lest you be surprised by something you missed.)

Want to design a mobile application that goes from tap to displayed network data in 200 ms? Well, good luck, because Android devices have touchscreen latency of 100+ ms. (Edit: mdwrigh2 on Hacker News points out that this data is out of date for modern Android handsets.) And it can take a second or more to spin up the radio!

To summarize, given a rough understanding of the latencies of various layers of a system, and knowledge of the critical path through the code, simply sum the numbers together and you should have a close approximation to the total latency. Periodically double-check your understanding against real measurements, of course. :)


Are you saying profilers are useless?!

Absolutely not. Profilers are awesome. I have a spent a great deal of time with the Python profiler. I particularly love AMD’s old CodeAnalyst tool. (Not the new CodeXL thing.) Sampling profilers in general are great. I’ve also built a bunch of intrusive profilers for various purposes.

But always keep in mind that profilers are for exploring and learning something new. By the time you’ve built your application, you may not actually be able to "profile your way to success".

Should I unilaterally apply this advice, irrespective of context?

Of course not. A large number of programs, on a modern CPU, trivially fit in all the performance budgets you care about. In these situations, by all means, write O(n) loops, linked lists, call malloc() inside every function, use Python, whatever. At that point your bottleneck is human development speed, so don’t worry about it.

But continue to develop an understanding of the costs of various things. I remember a particular instance when someone replaced hundreds of <img> tags on a page with <canvas> for some kind of visual effect. Bam, the page became horrifically slow and consumed an enormous amount of memory. Browsers are very smart about minimizing working set in the presence of lots of images (flushing decoded JPEGs from memory until they’re in view is one possible technique), but <canvas> is freeform and consumes at least width*height*4 bytes of pagefile-backed memory.

What about algorithmic complexity?

Algorithmic complexity can be a significant improvement. Especially when you’re Accidentally Quadratic. Reach for those wins first. But once you get into O(n) vs. O(lg n), you’re almost certainly limited by constant factors.

In any context, you should aim for the biggest wins first. Let’s say you’re writing a web service that talks to a database, fetching info for 100 customers. By far the biggest optimization there is to run one query to fetch 100 customers (as a single round-trip can be 1-10 ms) instead of 100 queries of one customer each. Whenever someone says "My new web service is taking 1.5 seconds!" I can almost guarantee this is why. Both approaches are technically O(n), but the query issue overhead is a HUGE constant factor.

In interviews I sometimes ask candidates about the performance difference between two algorithms that have the same algorithmic complexity. There is no precisely correct answer to my question, but "the execution time is the same" is wrong. Constant factors can be very large, and optimizing them can result in multiple-orders-of-magnitude improvements.

But Knuth said…!

Read what he actually said. Also, since then, we’ve developed a much deeper understanding of when mature optimizations are appropriate. Passing const std::string& instead of std::string is not a premature optimization – it’s just basic hygiene you should get right.

The Long-Term Problem With Dynamically Typed Languages

by Chad Austin
Apr 23 15

This may be the only time I weigh in on the static vs. dynamic typing discussion. Each side has its extreme proponents, and people differ in their ability and desire to work in systems with implicit invariants.

Many years ago, back when Java and C++ were the Mainstream Languages and Python was the shiny new up-and-comer, I read Bruce Eckel’s arguments in support of dynamically typed languages, and some of the nonobvious (at the time) ways you can get more done at higher quality in a more flexible language.

If I recall correctly, the arguments were that type systems can be approximated with unit tests (neither subsumes the other), and the ease of getting code up and running in a dynamically-typed language outweighs the benefits static types provide, especially when (explicit) static types require more thought and commitment before the problem space has been developed and understood. That is, dynamic languages are more fluid, and you can test bits of the program even before they’re made to fit with the rest of the code. In statically typed languages, on the other hand, the code must compile before it can be run at all.

Note: I feel a temptation to qualify every statement I’m making, because the programming language space has since been explored in more depth, and significantly more nuance is understood now than was in the late 90s and early 2000s. We are in the midst of a programming language renaissance.

I have some empathy for Bruce Eckel’s argument. Early on in a system’s life it is important to have some flexibility and "play". However, as the correct shape or factoring of the software is discovered, it becomes useful to have the computer enforce more and more invariants.

Note that some people treat static typing vs. dynamic typing as a binary switch. In reality it’s a bit more of a continuum. Python will tend to throw exceptions rather than implicitly convert values. JavaScript will implicitly convert values sometimes, but not always. PHP tries to make whatever code you wrote run, even if it’s nonsensical. Thus, among dynamically-typed languages, some languages tend to catch type errors sooner, simply because they’re less forgiving at runtime.

IMVU’s technology stack is weighted heavily towards dynamic languages. This is largely a product of the year of its birth. Everyone was writing PHP and Python in the mid 2000s. The website backend consists of a prodigious amount of PHP. The website frontend is JavaScript, because that’s really the only option. The client is built out of C++ (for the engine), Python (for all the business logic), and JavaScript (for the UI).

After ten years, however, I can look back and speak with confidence about the long-term effects of building a business on top of dynamic languages.

Most of the core PHP APIs from IMVU’s early years are still around. And they’re terrible. They’re terrible but they’re "impossible" to change, because the iteration time on running all the tests, verifying in production, and so on is never worth the smallish, indirect benefits of improving the APIs. Obviously, nothing is impossible, but relying on a giant test suite and test infrastructure to prove the correctness of renaming a function or adding a parameter, in practice, is a significant coefficient of friction on the software’s ability to evolve over time.

Not improving core APIs results in a kind of broken windows effect. When APIs are confusing or vague, people tend not to notice other confusing or vague APIs, and it slows everyone down in the long term.

Thus, instead of easily refactoring the legacy APIs, people think "I’ll just make a new one and migrate the code over!" And now you have two hard-to-change APIs. And then three. And the cycle continues. Additionally, this cycle is fed by architect types who know or think they know a better way to do things, but can’t be bothered to update the old systems.

With a type system (such as in C++, Go, Haskell, or Java), the iteration time on renaming a function or changing its signature is a single build. In Haskell, it’s even better: simply type “:r” into ghci and see all the places where the code needs to be updated. That is, a type system flattens the cost of change curve. Small API or performance improvements that otherwise wouldn’t be worth it suddenly are, because the compiler can quickly tell you all the places that need to be updated. You don’t need to run all the tests across the history of the company in order to see whether you’ve missed a spot.

There’s another huge cost of dynamic languages. It has to do with engineers building an understanding of existing systems. For all the mental energy spent on whitespace and curly braces, the most important component of understanding software is grasping data flow. Programs exist to transform data, and understanding how that’s done is paramount. Types accelerate the process of building a mental understanding of the program, especially when lightweight types such as CustomerId (vs. Int) are used.

I can’t say I regret IMVU building its technology on top of PHP or Python or JavaScript but I now believe that the long-term reduction in agility is not worth the short-term benefits, especially given the plethora of other options: Java, Go, Haskell, perhaps F# or some other ML, and so on.

I wonder how Facebook feels on this topic. :)

In Remembrance of Mike Ey

by Chad Austin
Mar 7 15

We hired Mike Ey out of RIT back when IMVU had a Palo Alto office.  My first impression meeting him was that he looked like John Carmack.

My second impression came after completion of a project I can no longer remember.  In the post-project code review, a task came up – he had modified a flexible system to add a feature-specific conditional.  I asked him to fix it, and noticed a month later that nothing had been done.  So I asked him again – still nothing changed.  Finally I had to go to his manager.  Eventually the task was completed.

In hindsight, I see that, like many people fresh out of school, he struggled with the tension between the tasks on the board and the important stuff that has to get done anyway.

Later on, he and I worked a lot more closely.  His background in 3D graphics led us to working on IMVU’s content pipeline together.  He did much of the research into whether COLLADA would be suitable for IMVU’s needs.

I considered Mike a talented but junior engineer when my manager came to me and said “I want to take a chance and make Mike Ey the tech lead of our new engine team.”  I resisted a bit, but eventually we agreed that I would support him in this new role.

And he did great.  Mike brought creativity, positivity, willingness to help, and a constructive attitude, all of which are critical to getting a team off the ground.  His concrete experience building games in Unity3D on the weekends kept us focused on solving for actual customer needs.

Around this time, he and I became friends.  There’s something irreplaceable about working with someone you implicitly trust.  He also made me laugh all the time – we were both fans of Dr. Weird and most days you could hear Dr. Weird quotes in the office.

Mike taught me to solder, inspired me in many of my random woodworking projects, and was always available to chat about what recent games were great.  He even built his own quadcopters.

I was super bummed when he announced he was going to Microsoft to work on a secret project with his childhood friend.  As it turned out, he went to work on HoloLens, and none of us can blame him.  :)  You could tell how proud he was when HoloLens was finally announced.  After a year of being unable to talk about it, he spent days gushing about the team, the coolness of the project, and the pranks he’d played on other team members during development.

The day he died, I couldn’t believe it.  I had just sent him a note asking for a recommendation on something, and while building some shelves in the garage I wondered how his new drill press was working out.  Only then did I realize how much he meant to me, and how much he affected my life.

Mike, we miss you!

Announcing BufferBuilder: Encode JSON in Haskell 2-5x faster than Aeson

by Chad Austin
Feb 23 15


This tale is a little long and detailed, so for those of you who want immediate payoff…

Andy and I wrote a Haskell library called Data.BufferBuilder (including Data.BufferBuilder.Utf8 and Data.BufferBuilder.Json) which makes it easy to safely and efficiently encode Haskell data structures into JSON. In our benchmarks, using Data.BufferBuilder.Json to encode JSON is 4-5x as fast as using Aeson.

Even if you’re already using Aeson, you can benefit somewhat from BufferBuilder’s improved encoding performance. The buffer-builder-aeson package adds a ToJson instance for Aeson’s Value type, which our benchmarks show is 50% to 100% faster than Aeson’s built-in encoder. All you need to do is call Data.BufferBuilder.Json.encodeJson instead of Data.Aeson.encode!

Why did we build BufferBuilder?

Some of IMVU’s backend services are written in Haskell. While Haskell is incredible for many use cases, we ran into an unexpected bottleneck: JSON encoding. Our service response structure produces quite a lot of JSON, and much of that JSON contains URLs encoded into JSON strings.

Amazingly, URL and JSON encoding was showing up as a significant cost center when generating JSON responses. Some services spent over a second encoding JSON!

When faced with a performance problem, my first instinct is to either pound the problem into the ground or unmake the problem so the code isn’t necessary in the first place.

So let’s look at the problem holistically:

  • A JSON response is represented in memory as a collection of varied, in-memory data structures. The response happens to contain many URLs — sometimes more than a hundred.
  • URLs are represented by a data structure consisting of the protocol, hostname, path segments, query string, and so on.
  • Each URL becomes a JSON string.
  • Each in-memory data structure is converted to a JSON object whose properties depend on the type on the corresponding data structure.

Using Aeson to encode all of this results in the following steps:

  • ToJSON instances convert Haskell data types to an AST of Aeson Values.
  • The keys of an Aeson object are Text values. In memory, Text is encoded in UTF-16. Thus, URLs must be translated from their in-memory representation (ours is ASCII) into UTF-16 before they fit into the Aeson AST.
  • Then, the entity bodies are converted into JSON objects, where the keys are known at compile-time, but must be encoded as UTF-16.
  • The entire AST is encoded into a Lazy Text Builder.
  • Then the Lazy Text is encoded into a Lazy ByteString containing UTF-8.
  • Then, to generate the response body, the Lazy ByteString is concatenated into a single strict ByteString.

That’s a lot of steps! To summarize:

(URLs and Custom Data Types) -> Aeson AST -> Lazy Text -> Lazy UTF-8 ByteString -> Strict ByteString

The Design of BufferBuilder

This is where Andy and I sat down to create an API to let us cleanly express JSON encoding without sacrificing type safety OR performance.

We know that a fast, if not the fastest, way to build up a buffer of bytes is to allocate a chunk of memory, stream writes to it, and either chunk or realloc() as needed. Obviously, this kind of code can be trivially expressed in C:

void buffer_append(buffer* b, const char* s, size_t length) {
    if (!b->has_room(length)) {
    memcpy(b->data + b->size, s, length);

Because I’d been told that bouncing between Haskell and C with the foreign function interface can be slow, my first approach was to attempt to build a Haskell monad that grabbed the RealWorld token out of IO (IO a is basically a newtype around RealWorld -> (RealWorld, a)), augmented it with some extra “local variables” like the output ptr, capacity, and current size, and manually implemented allocation and memory writes with GHC.Prim APIs. GHC did not like this at all. The generated code ran 20 times slower than naive usage of Data.ByteString.Builder. Nonetheless, it was an interesting technique, so maybe I’ll write about it another time.

Surely it was possible to do better. So I tried the foreign function interface after all.

I wrote a tiny C API that allocated the underlying growable buffer. It provided APIs for appending bytes, buffers, UTF-16-to-UTF-8 transcoding, and so on. These FFI calls can only happen within IO actions, but building buffers is fundamentally a pure operation, and the provided Haskell API should be effectively pure. The solution is to offer a restricted state monad like ST which limits the operations within the monad to safe buffer-building operations.

This approach was by the fastest of any that I tried. In fact, in a sequence of appendBS operations, if the arguments are known-strict ByteStrings, GHC will compile the appendBS sequence directly into a series of cheap C function calls. For example, the following code:

data BSTriple = BSTriple !ByteString !ByteString !ByteString

writeBSTriple :: BSTriple -> BufferBuilder ()
writeBSTriple !(BSTriple a b c) = do
    appendBS a
    appendBS b
    appendBS c

compiles into something like:

movq %rbx,%rdi
movq 120(%rsp),%rax
movq %rax,%rsi
movq 96(%rsp),%rax
movq %rax,%rdx
movq 112(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp
movq %rbx,%rdi
movq 152(%rsp),%rax
movq %rax,%rsi
movq 128(%rsp),%rax
movq %rax,%rdx
movq 144(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp
movq %rbx,%rdi
movq 216(%rsp),%rax
movq %rax,%rsi
movq 160(%rsp),%rax
movq %rax,%rdx
movq 208(%rsp),%rax
addq %rax,%rdx
subq $8,%rsp
movl $0,%eax
call bw_append_bs
addq $8,%rsp

Obviously GHC’s code generator has some room to improve, but the main thrust is exactly right. An equivalent C++ API would generate much the same kind of code, modulo the poor instruction and register selection, which overall doesn’t matter too much here.

In almost all situations, Haskell isn’t going to be as fast as straightforward C, but with some work and appropriate use of FFI, it’s possible to come close.


Once we had an API to safely and efficiently build up buffers of bytes, we wanted to build safe APIs on top for constructing valid UTF-8 buffers and valid JSON.

Utf8Builder is a newtype around BufferBuilder with a restricted API. If you only call safe functions in Data.BufferBuilder.Utf8, the result is guaranteed to be valid UTF-8. Unsafe functions are provided for when you know precisely what you’re doing.


Data.BufferBuilder.Json is built on top of Data.BufferBuilder.Utf8. Data.BufferBuilder.Json’s Value type is a newtype around Utf8Builder, meaning there’s no Aeson-style AST. Each Value simply knows how to write itself into an output buffer. Just like how the safe Utf8Builder functions guarantee the output is legal UTF-8, the safe JsonBuilder functions guarantee (almost) that the output is a legal JSON document. (There are a couple caveats, see the documentation for details.)

I suspect Data.BufferBuilder.Json is approaching the limit of how fast a JSON encoder can be. And thanks to the beauty of Haskell, it’s convenient and safe!

If you’re using Aeson and encoding performance matters to you, give BufferBuilder a shot!

Code Reviews: Follow the Data

by Chad Austin
Jan 24 15

This is a mirror of the corresponding post at the IMVU Engineering Blog.

After years of reviewing other people’s code, I’d like to share a couple practices that improve the effectiveness of code reviews.

Why Review Code?

First, why review code at all? There are a few reasons:

  • Catching bugs
  • Enforce stylistic consistency with the rest of the code
  • Finding opportunities to share code across systems
  • Helping new engineers spin up with the team and project
  • Helping API authors see actual problems that API consumers face
  • Maintain the health of the system overall

It seems most people reach for one of two standard techniques when reviewing code; they either review diffs as they arrive or they review every change in one large chunk. We’ve used both techniques, but I find there’s a more effective way to spend code review time.

Reviewing Diffs

It seems the default code review technique for most people is to sign up for commit emails or proposed patches and read every diff as it goes by. This has some benefits – it’s nice to see when someone is working in an area or that a library had a deficiency needing correction. However, on a large application, diffs become blinding. When all you see is streams of diffs, you can lose sight of how the application’s architecture is evolving.

I’ve ended up in situations where every diff looked perfectly reasonable but, when examining the application at a higher level, key system invariants had been broken.

In addition, diffs tends to emphasize small, less important details over more important integration and design risks. I’ve noticed that, when people only review diffs, they tend to point out things like whitespace style, how iteration over arrays is expressed, and the names of local variables. In some codebases these are important, but higher-level structure is often much more important over the life of the code.

Reviewing diffs can also result in wasted work. Perhaps someone is iterating towards a solution. The code reviewer may waste time reviewing code that its author is intending to rework anyway.

Reviewing Everything

Less often, I’ve seen another code review approach similar to reviewing diffs, but on entire bodies of work at a time. This approach can work, but it’s often mindnumbing. See, there are two types of code: core, foundational code and leaf code. Foundational code sits beneath and between everything else in the application. It’s important that it be correct, extensible, and maintainable. Leaf code is the specific functionality a feature needs. It is likely to be used in a single place and may never be touched again. Leaf code is not that interesting, so most of your code review energy should go towards the core pieces. Reviewing all the code in a project or user story mixes the leaf code in with the foundational code and makes it harder see exactly what’s going on.

I think there’s a better way to run code reviews. It’s not as boring, tends to catch important changes to core systems, and is fairly efficient in terms of time spent.

Follow the Data

My preferred technique for reviewing code is to trace data as it flows through the system. This should be done after a meaningful, but not TOO large, body of work. You want about as much code as you can review in an hour: perhaps more than a user story, but less than an entire feature. Start with a single piece of data, say, some text entered on a website form. Then, trace that data all the way through the system to the output. This includes any network protocols, transformation functions, text encoding, decoding, storage in databases, caching, and escaping.

Following data through the code makes its high-level structure apparent. After all, code only exists to transform data. You may even notice scenarios where two transformations can be folded into one. Or perhaps eliminated entirely — sometimes abstraction adds no value at all.

This style of code review frequently covers code that wasn’t written by the person or team that initiated the code review. But that’s okay! It helps people get a bigger picture, and if the goal is to maintain overall system health, new code and existing shouldn’t be treated differently.

It’s also perfectly fine for the code review not to cover every new function. You’ll likely hit most of them while tracing the data (otherwise the functions would be dead code) but it’s better to emphasize the main flows. Once the code’s high-level structure is apparent, it’s usually clear which functions are more important than others.

After experimenting with various code review techniques, this approach has been the most effective and reliable over time. Make sure code reviews are somewhat frequent, however. After completion of every “project” or “story” or “module” or whatever, sit down for an hour with the code’s authors and appropriate tech leads and review the code. If the code review takes longer than an hour, people become too fatigued to add value.

Handling Code Review Follow-Up Tasks

At IMVU in particular, as we’re reviewing the code, someone will rapidly take notes into a shared document. The purpose of the review meeting is to review the code, not discuss the appropriate follow-up actions. It’s important not to interrupt the flow of the review meeting with a drawn-out discussion about what to do about one particular issue.

After the meeting, the team leads should categorize follow-up tasks into one of three categories:

  1. Do it right now. Usually tiny tweaks, for example: rename a function, call a different API, delete some commented-out code, move a function to a different file.
  2. Do it at the top of the next iteration. This is for small or medium-sized tasks that are worth doing. Examples: fix a bug, rework an API a bit, change an important but not-yet-ubiquitous file format.
  3. Would be nice someday. Delete these tasks. Don’t put them in a backlog. Maybe mention them to a research or infrastructure team. Example: “It would be great if our job scheduling system could specify dependencies declaratively.”

Nothing should float around on an amorphous backlog. If they are important, they’ll come up again. Plus, it’s very tempting to say “We’ll get to it” but you never will, and even if you have time, nobody will have context. So either get it done right away or be honest with yourself and consciously drop it.

Now go and review some code! :)

My Political Views

by Chad Austin
Dec 30 14

I lean towards empiricism. We use empirical evidence to judge the effectiveness of drugs and the safety of our food. Yet we enact legislation based on wishful thinking, hope, and what feels right. Most of my standpoints stem from what has generally been shown to work — though I also support social experiments like The Kansas Experiment.

Education is critical for the country’s long-term health. Higher education is important but preschool, per dollar, is at least equally valuable. I support public funding of preschool, especially in lower-income settings. College education is also valuable, but I actually think it’s overpriced and trade schools and apprenticeships ought to come back into vogue. In addition, much of the cost of college education appears to stem from the increase in “staff costs” such as health care, as teaching is a lower-leverage activity than factory production or software development.

Science education in particular is critical. There appears to be a growing antiscience movement seeded from the far right that jeopardizes the USA’s future as a world leader in innovation.

Public statements made by politicians should be publicly fact-checked and advertised as such. False advertising is illegal for product advertisements – why is it not illegal for those running for office? It’s disturbing how easily humans are swayed by information that is not true, even after it’s been shown to be untrue. This is my biggest fear about our modern hyper-connected world.

Solving energy is critical for a sustainable world. We should continue investing in fusion just in case it works. I support the development and maintenance of a nuclear power infrastructure as it’s cleaner than coal. It’s a good idea to continue investment into renewable energy, as it’s quite viable and hydrocarbon externalities are not priced into the market like they are with renewables.

I do think that free markets are effective at optimizing efficiency. However, they must be monitored every so often because corporations love to form monopolies and/or capture their regulators. Effective regulation can prevent monopoly formation and enable private markets to succeed. Consider the Swiss healthcare system and UK forcing British Telecom to open its wires.

If legislation has a very small effect, it should not be enacted.

Government systems should maximize transparency and checks and balances to mitigate the inevitable long-term drift towards corruption.

I believe that a solid and practical understanding of systems theory is one of western society’s biggest blind spots. I might go as far as saying systems thinking is a new literacy. As our world becomes increasingly connected and interdependent, systems theory’s relevance will increase.

Empathy is key. In general, people either have good intentions or believe they have good intentions (is there a difference?). Trying to understand their viewpoints is a good idea. (Though there’s a point where you have to give up. Once, during a debate, my counterpart literally said “I’m sorry, I won’t read that information. It’s against my beliefs.”)

The US Department of Defense is dramatically over-funded. It’s totally not clear that meddling in the affairs of other countries has been beneficial to us, especially relative to the enormous cost. I’d love to see a portion of the DOD’s budget redirected towards the NSF and other basic research organizations. Basic research pays long-term dividends.

I think it’s worth preserving the environment. The environment is a broad concern over time and space, thus should be managed at a higher level than individuals, local communities, or corporations. The cost of environmental regulation is peanuts compared to the long-term benefits. [citation needed ;)]

All people should have equal basic rights. This includes people of any race, gender, socioeconomic status, sexual orientation, age, or ability.

Immigration policy should be relaxed, especially to skilled immigrants.

We should strive to demonstrate, through our actions, character, wisdom, and culture, that we are a great nation. I’m not the most patriotic person, but there is value in striving to be great in competition with others.

More Thoughts on Haskell

by Chad Austin
Dec 29 14

I’d like to follow up on my previous post with some points that are more nuanced and probably only make sense to people who’ve written Haskell before.


Some syntax elements in Haskell were annoying at first, but I adjusted to them quite easily, and I now believe my annoyance was simply unfamiliarity. Separating arguments by spaces and using the $ operator (that is, a $ b c as shorthand for a (b c) comes naturally now. The only thing I still get wrong is operator precedence.


Regarding category theory terminology: when I even mention words like functor or monad, I’ve seen engineers’ eyes glaze over. They instantly think “Oh, that’s too complicated to understand.” I swear, if Functor was named Mappable and Monad was named Chainable or something like that, it would make Haskell seem much less intimidating to beginners. Explaining monads doesn’t require any fancy tutorials or stretch analogies. Similarly, explaining functors doesn’t require using the words “lifting” or “context”. It’s as simple as “Well, you know how you can map across a list in every other language? Well, consider Maybe as a list of zero or one elements. You can map across that too. What about a promise or future? You can map across that too. Functions themselves can be mapped, transforming their return values. Anything that can be mapped is a functor.”

In general, I do think it’s a good idea for programmers to use the same terminology as mathematicians, as they’re generally more rigorous and precise, but… I’ll just quote An Introduction to Category Theory:

In her charming Italian style she asked me (instructed me) to cut out all the jokes. This was quite difficult since some of the official categorical terminology is a joke, but I have done my best.


This is a relatively minor nit: Haskell module import syntax is cluttered. I always feel like import qualified Data.Vector as Vector would be better written in Python style: from Data import Vector. This would have the side benefit of mitigating a common, and in my opinion unfortunate, pattern you see in Haskell code: importing modules by abbreviations. import qualified Data.Vector as V. For some common modules, like Data.ByteString as BS and Data.ByteString.Char8 as BSC, the abbreviations are common enough that everyone knows in context what module has been imported. However, for your own modules, you should import with explicit, non-abbreviated names, so it’s clear in context which module is being referenced.


I’m unsure about laziness by default. Laziness can be really great, and it’s somewhat cheap in Haskell, but there are many situations where, if you’re going to compute anything for a data structure, you might as well compute it all right then, while the caches are still hot. The theoretical benefits of only computing the values necessary have a nontrivial cost: lazy thunked data structures have branching and dereferencing costs that strict languages can avoid.

Streaming and Chunking

I feel like Haskell streaming and chunking libraries, like Conduit, have the same problem as laziness for most reasonably-sized data structures. If your machine has 32 GiB of RAM and 50 GB/s of memory bandwidth, who cares if you allocate 100 MB of data and dump it all on a socket in one go. Chunking only matters much if your peak memory usage times concurrency doesn’t fit. I’ve seen similar performance issues with with Python 2’s xrange() function. range() is frequently faster in context because it avoids the extra iteration overhead.

Partial Functions

Some people have vehemently argued that pattern matching is so superior to conditionals and dereferencing than Java and C++ are fundamentally flawed. That is:

case p of
  Nothing -> bar
  Just v -> foo v

is safer than:

if (ptr) {
} else {

I have some sympathy for this argument. Pattern matching syntax does mean that the dereferenced value is only visible in the scope where it is valid. However, the real problem with languages like C++ and Java is not the lack of pervasive pattern matching. It’s that Java and C++ don’t have a way to distinguish in the type system between a boxed value (aka T*) and a potentially-null boxed value (aka T*).

Pattern matching by itself is nice, but the real benefit comes from its combination with sum types.

What is the Haskell equivalent of NullPointerException? Partial functions!

Let’s imagine I wrote the above code as follows, except assuming p is not Nothing:

foo $ fromJust p

If p is Nothing, this code throws an error. fromJust is a partial function in that, when the input has an unexpected value, it throws an error. Haskell has many partial functions.

In your code, you should strive to write total functions. However, I am of the opinion that Haskell should distinguish, in the type system, between the two types of bottom. There are two ways that a function in Haskell can have no value. It can either throw an exception or it can enter an infinite loop. Partial functions throw exceptions.

In practice, infinite loops are rare (except through accidental recursive references). However, it would be useful to disallow certain classes of pure code from throwing an exceptions. This would make it illegal for pure code to call partial functions. However, certain types of errors could still occur, especially out of memory errors and stack overflows. Thus, the language would have to draw a distinction between synchronous errors such as partial matches and asynchronous errors such as resource exhaustion.

I can understand why the language adopted a simpler model, even though it would be nice to guarantee that some pure functions cannot throw non-resource-exhaustion errors.

Purity is a vague concept. Some useful attributes can be applied to a call graph: “has no observable side effects”. “does not allocate heap memory”. “fits within a defined resource budget”. “does not throw exceptions.” I’m not aware of any mainstream languages that distinguish between these notions of purity.