Several years ago, when we started training IMVU’s client team on modern web technologies, an engineer suggested we give CoffeeScript a chance. The purported benefit was that CoffeeScript helps you stay within "the good parts" of JavaScript by providing concise syntax for lambdas, object literals, destructuring binds, object iteration, and other idioms.

However, after a month or two of the team using it, we all agreed to switch back to JavaScript.

You see, the problem with CoffeeScript is that its benefits are shallow. CoffeeScript’s motto is "It’s just JavaScript", even though it has the opportunity to be so much more.

Programming languages have several aspects. The syntax of a language affects a person’s ability to visually parse its structure, the amount of typing required, and a language’s overall visual style. Syntax has a large effect on a language’s overall pleasantness, so people tend to give it a great deal of attention. Also, syntax is the first impression people have of a language. I’ve heard people say "JavaScript is in the C family" even though almost the only thing the two languages have in common is curly braces.

The semantics of a language, on the other hand, provide upper bounds on the size of programs that can be written with it. Language semantics apply constraints, and thus guarantees, making it easier and safer to work on larger programs. Examples follow:

  • this value is always an integer
  • this program does not write to memory it doesn’t own
  • this function has the same result no matter how many times it is called (idempotence)
  • this function has no observable side effects (purity)
  • memory will be automatically reclaimed when it is no longer reachable
  • the completion callback for this asynchronous operation is called exactly once
  • MySQL queries cannot be issued in the middle of a Redis transaction
  • this object always contains properties x, y, and z
  • if you have an object of type X, it will never be null, and it will always be fully-constructed
  • if an object is constructed on the stack, its destructor will run before the current function returns

Different languages provide different sets of guarantees.

Both syntax and semantics are important. Syntax solves for pleasantness and human factors, but strong semantics are what help people safely write larger and more complicated programs.

To be especially clear, I am not saying syntax is unimportant. As an example, consider C++11 lambda functions. They are "merely" syntax sugar for code structures possible in C++98. However, because they’re so convenient, they fundamentally change the economics of writing in certain styles. Thus syntax provides a vital role in guiding programmers down good paths.

The problem with CoffeeScript, and ultimately the reason why IMVU dropped it, is that, while it looks nice at first glance, it doesn’t substantially improve the semantics of JavaScript. Missing properties still return undefined. It’s still possible to trip over this in lambda functions. There are a few wins: it’s no longer easy to accidentally write to a global variable (like you might in JavaScript by forgetting var), and CoffeeScript’s for own makes it easy to iterate over an object’s properties without walking the prototype chain. But those wins can be easily obtained in JavaScript through tools such as jshint.

At IMVU, we decided CoffeeScript’s syntactic brevity was not worth the cost of having a translation step for our code. That translation step adds a bit of latency between editing a file and reloading it in your browser, and it adds a degree of mental overhead for programmers, especially those not deeply familiar with the web. That is, not only must a programmer know JavaScript semantics, but they must also know how CoffeeScript translates into JavaScript.

And, sometimes, that translation is not at all obvious. Consider the following examples. I encourage you to try to guess what they do and then paste the code into

The following snippets all call fn with two arguments:

fn a, b
fn a,
fn x:y, b

However, if the first argument is not an object literal, then it’s a syntax error:


How many parameters do you think are passed to fn? What are their values?

fn (

There are too many ways to express branches. I hypothesize that having so many different control flow idioms makes rapid scanning of flow through a function harder, with no meaningful benefit.

if x then return
unless x then return
return if x
return unless x
for x in ls then foo(x)
foo(x) for x in ls
foo(x) while i++ < 10
foo(x) until i++ > 10

How do you think the following snippets parse?



t for t in ls if t > 1
t for t in ls when t > 1

o = (t for t in ls when t > 1)
o = (for t in ls then)

foo ? 1 : 2 # legal
foo ? [] : 2 # not legal

fn a, debugger, b
a = debugger
fn a debugger # not legal?? is debugger an expression or not?

e = try f

a = if b then

a = [f for f in x]
a = (f for f in x)

I’ve literally been told the best way to cope is to install an editor plugin that makes it easy to compile bits of code so I can verify that the compiled JavaScript matches the behavior I intended.

The following code is legal! And produces a rather strange result.

a = (for b in x then)

Yet not everything is an expression.

a = (return) # illegal

CoffeeScript has semi-open and closed intervals. Guess which is which?


Real CoffeeScript Hazards

Now that I’m done making fun of the syntax, in all seriousness, there are actually a couple real dangers in CoffeeScript too.

Because CoffeeScript has no syntax for explicitly introducing a variable binding, the assignment to a name introduces a variable binding in that scope. If an existing variable is defined in an outer scope, the inner assignment will reuse the outer variable binding. This means that changes to outer scopes can change the meaning of code inner scopes.

Consider the following code (and in case it’s not clear, do simply calls its argument function with no arguments):

a = 'hello world'
do ->
  for i in [0...10]
    print i
print a

It prints the values 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, "hello world".

Now rename the outer variable a to i.

i = 'hello world'
do ->
  for i in [0...10]
    print i
print i

We didn’t change the inner scope at all. But now the inner loop does not create a new variable binding — instead it reuses the outer variable i. This code now prints 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9.

Another risk that comes up in CoffeeScript comes from the fact that the last expression in a function is the return value. Consider the following function:

processStrings = ->
  for longString in dataSource
    process longString

Let’s assume processStrings is intended to have no return value. Let’s also assume that the process function has a large return value that we intend to discard.

processStrings will actually return a list containing the return value of process for each iteration through the loop. This is not a correctness hazard so much as a performance hazard, but it’s come up several times in the codebase I’ve been working on. To fix processStrings, it should be written as follows:

processStrings = ->
  for longString in dataSource
    process longString

This situation is why Haskell distinguishes between forM and forM_.

CoffeeScript Has Benefits Too

I’ve just listed a lot of unfortunate things about CoffeeScript, but it does provide some rather significant benefits. In fact, I think that a team that deeply understands JavaScript will enjoy CoffeeScript simply because it’s so concise.

Arrow functions and CoffeeScript’s braceless object literals do make it easy to visually scan through large chunks of code. Traditional JavaScript is littered with curly braces and parentheses – sometimes JavaScript feels as noisy as a LISP.

Classes nicely encode the pattern that everyone expresses manually with prototypes in JavaScript. Someone recently made the comment "I’ve been writing CoffeeScript for years and I’ve never needed to understand JavaScript prototypes". I can believe that, especially if you’re not interacting with JavaScript libraries that make direct use of prototypes.

Destructuring binds are a huge deal. They dramatically simplify the keyword arguments pattern. Consider the following imvujs JavaScript module:

  foo: 'foo.js',
  bar: 'bar.js',
}, function(imports) {
  var foo =;
  var bar =;

The corresponding CoffeeScript is much shorter and easier to read:

  foo: 'foo.js'
  bar: 'bar.js'
, ({foo, bar}) ->

Fortunately, many of the niceties of CoffeeScript are making it into ES6 and TypeScript, so CoffeeScript’s value proposition in the future is not very high.

So what are you saying, Chad?

My opinions tend to be nuanced. I am not the kind of person to come out guns blazing and say "X sucks!" or "Y is the best thing ever!"

Do I think CoffeeScript is a big win over JavaScript? Nope.

Would I start a new project in CoffeeScript today? Nope.

Would I be happy working in an existing CoffeeScript codebase? Sure.

Do I think CoffeeScript has some advantages over JavaScript? Sure.

Do I think CoffeeScript has much of a future given that ES6 and TypeScript have arrow functions, classes, and destructuring binds? Nope.

Do I think it’s possible to have a language with the syntax niceties of CoffeeScript and real semantic benefits like coroutines and static types? Absolutely, and I’m very much looking forward to the new few years of AltJS innovation.

I think Jeremy Ashkenas deserves a lot of credit for exploring this programming language space. It was a worthwhile experiment, and it validated the usefulness of several features that made it into ES6.


My opinions on this topic are not new. Many others have come to the same conclusion. For example:

The Gamepad API

The Gamepad API is a general API supporting a large number of possible input devices. However, it’s named after the most common use case: gamepad controllers. It could definitely support IR remote controls, switches, audio mixers, and so on… Maybe it should be named the “Input Device API”. :) It’s not as general as DirectInput or USB HID though. No access to positional information (like mouse deltas, finger coordinates on a trackpad, or 3D tracking), no force feedback, and no access to device accelerometers.

Overall, the Gamepad API is narrow in scope. With a little more specification effort and implementation complexity (copy the good bits of DirectInput or Raw Input?), the API could support a huge number of use cases.

That said, within its narrow scope, the Gamepad API is well-designed. Devices can have an arbitrary number of either buttons or axes.

I’d make a few changes, however.


Remove Fingerprinting Mitigation

Gamepads MUST only appear in the list if they are currently connected to the user agent, and at least one device has been interacted with by the user. If no devices have been interacted with, devices MUST NOT appear in the list to avoid a malicious page from fingerprinting the user.

That’s such a lost cause it’s not even funny. There are dozens of ways to retrieve identifying information from a user (and companies that license implementations of such). As long as there are machine learning algoritms and any differences at all between different computers and browsers, users will be fingerprinted. My favorite example is when researchers demonstrated a fingerprinting technique by rendering text into a canvas and analyzing the anti-aliasing and font rendering. Attempting to mitigate fingerprinting by penalizing the user experience seems like the wrong tradeoff here, except perhaps on an opt-in basis.

Allow Standard-Mapped Devices to have Extra Buttons or Axes

Two sentences in the spec imply that an input device that is recognized to implement a standard mapping will not expose more functionality than is defined by the mapping object.

When the user agent recognizes the attached device, it is RECOMMENDED that it be remapped to a canonical ordering when possible.


The standard gamepad has 4 axes, and up to 17 buttons.

Simply changing the wording to “has at least 4 axes, and at least 17 buttons” would allow games that default to standard mapping inputs to work with devices that support additional capabilities, like racing wheels with standard-mapped controls in the middle.

Add support for multiple standard mappings per device.

Input controller idioms come and go. Some become entrenched in a generation’s gamepads, and some fade away.

Assigning a single canonical mapping per device limits the discovery of useful structure. For example, an SNES controller, Wiimote, or Logitech Driving Force wheel wouldn’t satisfy the “Standard Gamepad” mapping, but all of them have a directional pad. If there was a “Directional Pad” mapping, and devices implemented multiple mappings, then any game that relied on a Directional Pad would work out of the box.

Add an event-based input mechanism.

The Gamepad API’s sole input discovery mechanism relies on JavaScript polling the current gamepad state at high frequency to detect input events from the gamepad. For interactive applications like games, this is usually fine. However, polling at 60 Hz in JavaScript is excessive if you just need to know when a button was pressed. We don’t poll for the mouse or keyboard events – why are gamepads different? If the argument is convenience, libraries can always offer that.

Moreover, underneath it all, some high-priority operating system thread is polling the device, and translating the current device state into an event stream. This is why, even though XInput is a polling-based API, if your game drops a few frames, button presses aren’t lost.

For certain classes of problems, like mental chronometry, you need to know when the button press occurred so you can measure elapsed time within a millisecond or so. If JavaScript is polling button state, it doesn’t know when the input event originated – perhaps 10-50 ms of latency has elapsed by the time it sees the button change – but the underlying high-priority polling thread knows. (Or at least has a better sense.)

Let’s say I’m playing a game running at 10 Hz and I press a button to open a bit of UI. Display of the UI hitches (maybe WebGL textures are being uploaded), pausing the game’s polling loop for one second or so, blocking requestAnimationFrame from polling the device. If the Gamepad API isn’t polling the device at a higher frequency under the hood, any buttons pressed and released within that one second period would not register. So we have to assume that, to avoid missed button presses, they have to be queued. But, since only ONE change can be measured each frame, each press has to be queued for two frames (button DOWN, button UP).

Thus, even though the presses occurred at T, T+100ms, and T+200ms, JavaScript won’t see them until T+1000ms, T+1200ms, and T+1400ms. In games that rely on high-precision gestures, this is the difference between it recognizing gestures even in the presence of frame drops, and it missing gestures entirely.

If you think dropped events aren’t a problem in “real games”, try playing SimCity 2013 on an average Mac sometime… The low frame rate would be completely tolerable except for the dropped events.

In contrast, an API where JavaScript would ask “Give me all the input events since I last asked” would associate a timestamp with each event, allowing for accurate combination recognition. If the Gamepad API is backed by Windows’s Raw Input API, this data can be retrieved with GetMessageTime().

The Gamepad API is definitely a step in the right direction, but it feels like it ought to be a little lower-level and more general to avoid being yet another Almost Good web API.

Web Platform Limitations, Part 1 – XMLHttpRequest Priority

Note: this post is a mirror of the post at the IMVU Engineering Blog.

The web is inching towards being a general-purpose applications platform that rivals native apps for performance and functionality. However, to this day, API gaps make certain use cases hard or impossible.

Today I want to talk about streaming network resources for a 3D world.

Streaming Data

In IMVU, when joining a 3D room, hundreds of resources need to be downloaded: object descriptions, 3D mesh geometry, textures, animations, and skeletons. The size of this data is nontrivial: a room might contain 40 MB of data, even after gzip. The largest assets are textures.

To improve load times, we first request low-resolution thumbnail versions of each texture. Then, once the scene is fully loaded and interactive, high-resolution textures are downloaded in the background. High-resolution textures are larger and not critical for interactivity. That is, each resource is requested with a priority:

High Priority Low Priority
Skeletons High-resolution textures
Low-resolution textures

Browser Connection Limits

Let’s imagine what would happen if we opened an XMLHttpRequest for each resource right away. What happens depends on whether the browser is using plain HTTP or SPDY.


Browsers limit the number of simultaneous TCP connections to each domain. That is, if the browser’s limit is 8 connections per domain, and we open 50 XMLHttpRequests, the 9th would not even submit its request until the 8th finished. (In theory, HTTP pipelining helps, but browsers don’t enable it by default.) Since there is no way to indicate priority in the XMLHttpRequest API, you would have to be careful to issue XMLHttpRequests in order of decreasing priority, and hope no higher-priority requests would arrive later. (Otherwise, they would be blocked by low-priority requests.) This assumes the browser issues requests in sequence, of course. If not, all bets are off.

There is a way to approximate a prioritizing network atop HTTP XMLHttpRequest. At the application layer, limit the number of open XMLHttpRequests to 8 or so and have them pull the next request from a priority queue as requests finish.

Soon I’ll show why this doesn’t work that well in practice.


If the browser and server both support SPDY, then there is no theoretical limit on the number of simultaneous requests to a server. The browser could happily blast out hundreds of HTTP requests, and the responses will make full use of your bandwidth. However, a low-priority response might burn bandwidth that could otherwise be spent on a high-priority response.

SPDY has a mechanism for prioritizing requests. However, that mechanism is not exposed to JavaScript, so, like HTTP, you either issue requests from high priority to low priority or you build the prioritizing network approximation described above.

However, the prioritizing network reduces effective bandwidth utilization by limiting the number of concurrent requests at any given time.

Prioritizing Network Approximation

Let’s consider the prioritizing network implementation described above. Besides the fact that it doesn’t make good use of the browser’s available bandwidth, it has another serious problem: imagine we’re loading a 3D scene with some meshes, 100 low-resolution textures, and 100 high-resolution textures. Imagine the high-resolution textures are already in the browser’s disk cache, but the low-resolution textures aren’t.

Even though the high-resolution textures could be displayed immediately (and would trump the low-resolution textures), because they have low priority, the prioritizing network won’t even check the disk cache until all low-resolution textures have been downloaded.

That is, even though the customer’s computer has all of the high-resolution textures on disk, they won’t show up for several seconds! This is an unnecessarily poor experience.

Browser Knows Best

In short, the browser has all of the information needed to effectively prioritize HTTP requests. It knows whether it’s using HTTP or SPDY. It knows what’s in cache and not.

It would be super fantastic if browsers let you tell them. I’ve seen some discussions about adding priority hints, but they seem to have languished.

tl;dr Not being able to tell the browser about request priority prevents us from making effective use of available bandwidth.


Why not download all resources in one large zip file or package?

Each resource lives at its own URL, which maximizes utilization of HTTP caches and data sharing. If we downloaded resources in a zip file, we wouldn’t be able to leverage CDNs and the rest of the HTTP ecosystem. In addition, HTTP allows trivially sharing resources across objects. Plus, with protocols like SPDY, per-request overhead is greatly reduced.

JavaScript’s new Operator (or: How to Read ECMA-262)

Before we dig in, I must briefly describe the state of JavaScript, aka ECMAScript, aka ECMA-262.

Two versions of JavaScript are in wide use today. ECMA-262, 3rd edition (commonly known as ES3) was standardized in 1999 and is supported by pretty much anything that claims to support JavaScript. ECMA-262, 5th edition (commonly known as ES5) is a slightly more modern language that, while common among the latest web browsers, is not ubiquitous. Note that ECMA-262, 5.1 edition, is a bugfix release of the ES5 standard.

The ECMA-262 standards are quite approachable and freely available online.

(This is a tangent but it’s cool. You should know the test262 project provides an ES5.1 conformance suite, and it appears that IE10 is the most compliant implementation at this time.)

On to the topic at hand, which is the behavior of JavaScript’s new operator and why you might want a JavaScript implementation of it.

Open the ES5 spec and examine section 11.2.2. You can disregard the fact that there are two versions of the new operator: the specification distinguishes between new X and new X.Y(Z).

The key point is that new evaluates its argument expression, which must result in an Object, and calls said object’s internal [[Construct]] method. Only Function objects have a [[Construct]] method, and its behavior is precisely defined in section 13.2.2.

I will demonstrate a JavaScript function that implements new. The first argument is the constructor.

function new_(T) {
    if (!(T instanceof Function)) {
        throw new TypeError(T + ' is not a constructor');
    var proto = T.prototype;
    if (!(proto instanceof Object)) {
        proto = Object.prototype;
    var obj = Object.create(proto);
    var result = T.apply(obj, [], 1));
    return (result instanceof Object) ? result : obj;

Why is such a function useful?

First, and primarily, new_ can take a variable list of arguments, via new_.apply. That can’t be done with the traditional new operator.

Secondly, new_ can be bound to produce factory functions. Let’s say you have a constructor SomeClass. You can create a SomeClass factory as such:

var SomeClassFactory = new_.bind(undefined, SomeClass);
var instance = SomeClassFactory();
(instance instanceof SomeClass); // true

Thirdly, new_ illustrates some non-intuitive behavors of the new operator. For example, the created object’s prototype is set to Object.prototype if the constructor’s prototype is not an object. Consider:

function foo() {}
foo.prototype = null;
var instance = new foo;
Object.getPrototypeOf(instance) === Object.prototype; // true

A lesser-known behavior is that constructors can themselves return values. If the value returned is an Object (remember that Functions are Objects too), new returns it instead of the object passed to the constructor as this. For example:

function Wooo() { return Wooo; }
Wooo === new new new new new new new new Wooo; // true

I don’t recommend it, but this technique could be used to write constructors that behave identically whether they are called directly or as a constructor:

function Flexible(arg) {
    var this_ = Object.create(Flexible.prototype);
    this_.arg = arg;
    return this_;
Flexible.prototype.method = function() {
    return this.arg;

var a = Flexible('success');
var b = new Flexible('success');
'success' === a.method(); // true
'success' === b.method(); // true

I hope that you’ve learned something about JavaScript’s new operator. Before sending you on your way, I’d like to point out that Object.create is an ES5 method and thus unsupported in ES3 implementations. So could we write an implementation of new_ that runs in old browsers by avoiding Object.create?

Object.create is effectively a primitive operation on which the new operator is built. And indeed, we can bypass the need for Object.create by using some of new‘s behavior to create an object with a specified prototype. Here is the ES3 version of new_:

function new_(T) {
    if (!(T instanceof Function)) {
        throw new TypeError(T + ' is not a constructor');
    var proto = T.prototype;
    if (!(proto instanceof Object)) {
        proto = Object.prototype;
    function dummy() {}
    dummy.prototype = proto;
    var obj = new dummy; // Object.getPrototypeOf(obj) === proto
    var result = T.apply(obj, [], 1));
    return (result instanceof Object) ? result : obj;

Emscripten Results: Firefox 19 shows dramatic improvement

Last time, we looked at Emscripten’s performance with current JS JITs on an in-order Atom core and found a penalty relative to out-of-order cores.

However, I told @js_dev I’d give updated numbers on a more typical out-of-order x86 core like my 2010 MacBook Pro’s i5.

There are a couple interesting things here: Firefox 19 shows substantial Emscripten performance improvements over Firefox 17, which is even still on par with hand-written JavaScript. While JavaScript JITs are still an order of magnitude away from native code performance, Emscripten’s performance meets or exceeds the performance of hand-written JavaScript. Progress!

The machine is a 2010 Macbook Pro, Core i5 2.53 GHz, OS X 10.6.

For each compiler, I compiled with -O0, -O1, -O2, -O3, and picked the best result.

Language Compiler Variant Vertex Rate Slowdown
C++ clang -O2 SSE 100142197 1
C++ gcc -O3 SSE 93109180 1.08
C++ gcc -O3 scalar 60398333 1.66
C++ clang -O2 scalar 58324308 1.72
JavaScript Chrome 23 untyped 9510489 10.5
Emscripten -O3 Aurora 19.0a2 scalar 7666000 13.1
Emscripten -O3 Firefox 17 scalar 6044000 16.6
JavaScript Chrome 23 typed arrays 5890000 17
Emscripten -O3 Chrome 25.0 canary scalar 5733706 17.5
JavaScript Firefox 17 untyped 5264735 19
JavaScript Firefox 17 typed arrays 5240000 19.1
Emscripten -O2 Chrome 23 scalar 4586165 21.8
Emscripten -O1 nodejs 0.8.10 scalar 4453109 22.5
Emscripten -O2 nodejs 0.8.10 scalar 1483406 67.5
Emscripten -O3 nodejs 0.8.10 scalar 668796 150

Here are the results for various Emscripten optimization levels:

Browser Compilation Level Vertex Rate
Firefox 17 emscripten -O0 2451509
Firefox 17 emscripten -O1 4080000
Firefox 17 emscripten -O2 5146000
Firefox 17 emscripten -O3 6044000
Chrome 23 emscripten -O0 1229754
Chrome 23 emscripten -O1 4152339
Chrome 23 emscripten -O2 4586165
Chrome 23 emscripten -O3 465162
Aurora 19.0a2 emscripten -O0 2062762
Aurora 19.0a2 emscripten -O1 4900000
Aurora 19.0a2 emscripten -O2 6214757
Aurora 19.0a2 emscripten -O3 7666000
Chrome 25.0 canary emscripten -O0 3001399
Chrome 25.0 canary emscripten -O1 4410235
Chrome 25.0 canary emscripten -O2 5482000
Chrome 25.0 canary emscripten -O3 5733706

I updated the benchmark to automate compiling and running the native and node.js builds.

JavaScript, Emscripten, and the Atom D2700

Lately I’ve been doing some work with Emscripten. As predicted, the quality of Emscripten’s generated code is improving and JITs are learning to understand its generated code. I have high hopes for asm.js, a formalization of high-performance, low-level JavaScript. I now believe it’s conceivable that Emscripten could approach the same level of performance as PNaCl, though whether that happens remains to be seen.

However, having a rough understanding of how today’s JavaScript JITs work, I’ve always wondered whether Emscripten-generated code would be especially penalized relative to native code on an in-order core like Intel Atom. Having recently built an Intel Atom home server, I figured I’d update my recent Emscripten skinning benchmark results and find out.

First I’ll describe the hardware. The CPU is an Atom D2700 on the Intel D2700DC board. 1066 MHz DDR3 memory. Two cores hyperthreaded. Running Ubuntu 12.04 Server. Firefox and Chromium packages are stock. Node.js and clang 3.1 are x64 Linux binaries downloaded from their respective websites. Emscripten is commit 26250471b46a68204711f037f33790bfb4ba37c7 in the master branch.

Now the results. Remember there are three JavaScript implementations: hand-written JS with untyped arrays and objects “untyped”, hand-written JS with typed arrays “typed arrays”, and Emscripten-compiled C++ “scalar”. Emscripten’s compiler was invoked with -O1. I saw significant performance drop-offs with -O2 and -O3.

Language Compiler Variant Vertex Rate Slowdown
C++ gcc 4.6.3 -O3 SSE 24040000 1
C++ clang 3.1 -O3 SSE 22530000 1.07
C++ gcc 4.6.3 -O3 scalar 18730000 1.28
C++ clang 3.1 -O3 scalar 13040000 1.84
JavaScript Chromium 20.0 untyped 3150000 7.63
JavaScript Firefox 17 typed arrays 2437562 9.86
JavaScript Firefox 17 untyped 1084577 22.2
Emscripten Firefox 17 scalar 944333 25.5
JavaScript Chromium 20.0 typed arrays 807577 29.8
Emscripten node 0.8.14 scalar 679802 35.4
Emscripten Chromium 20.0 scalar 677966 35.5

Based on the previous benchmark results and my recent experience with Emscripten, it appears that JavaScript JITted code indeed has a penalty relative native code on in-order cores, or at least the Atom D2700.

Next time I hope to update these benchmarks on a high-end desktop CPU.

As always, if you’d like to reproduce these results or question them, the code is available on my github.

Digging into JavaScript Performance, Part 2

UPDATE. After I posted these numbers, Alon Zakai, Emscripten’s author, pointed out options for generating optimized JavaScript. I reran my benchmarks; check out the updated table below and the script used to generate the new results.

At the beginning of the year, I tried to justify my claim that JavaScript has a long way to go before it can compete with the performance of native code.

Well, 10 months have passed. WebGL is catching on, Native Client has been launched, Unreal Engine 3 targets Flash 11, and Crytek has announced they might target Flash 11 too. Exciting times!

On the GPU front, we’re in a good place. With WebGL, iOS, and Flash 11 all roughly exposing shader model 2.0, it’s not a ton of work to target all of the above. Even on the desktop you can’t assume higher than shader model 2.0: the Intel GMA 950 is still at the top.

However, shader model 2.0 isn’t general enough to offload all of your compute-intensive workloads to the GPU. With 16 vertex attributes and no vertex texture fetch, you simply can’t get enough data into your vertex shaders do to everything you need, e.g. blending morph targets.

Thus, for the foreseeable future, we’ll need to write fast CPU code that can run on the web, mobile devices, and the desktop. Today, that means at least JavaScript and a native language like C++. And, because Microsoft has not implemented WebGL, the Firefox and Chrome WebGL blacklists are so strict, and no major browsers fall back on software, you probably care about targeting Flash 11 too. (It does have a software fallback!) If you care about Flash 11, then your code had better target ActionScript 3 / AVM2 too.

How can we target native platforms, the web, and Flash at the same time?

Native platforms are easy: C++ is well-supported on Windows, Mac, iOS, and Android. SSE2 is ubiquitous on x86, ARM NEON is widely available, and both have high-quality intrinsics-based implementations.

As for Flash… I’m just counting on Adobe Alchemy to ship.

On the web, you have two choices. Write your code in C++ and cross-compile it to JavaScript with Emscripten or write it in JavaScript and run via your native JavaScript engine. Ideally, cross-compiling C++ to JS via Emscripten would be as fast as writing your code in JavaScript. If it is, then targeting all platforms is easy: just use C++ and the browsers will do as well as they would with native JavaScript.

Over the last two evenings, while weathering a dust storm, I set about updating my skeletal animation benchmark results: for math-heavy code, how does JavaScript compare to C++ today? And how does Emscripten compare to hand-written JavaScript?

If you’d like, take a look at the raw results.

Language Compiler Variant Vertex Rate Slowdown
C++ clang 2.9 SSE 101580000 1
C++ gcc 4.2 SSE 96420454 1.05
C++ gcc 4.2 scalar 63355501 1.6
C++ clang 2.9 scalar 62928175 1.61
JavaScript Chrome 15 untyped 10210000 9.95
JavaScript Firefox 7 typed arrays 8401598 12.1
JavaScript Chrome 15 typed arrays 5790000 17.5
Emscripten Chrome 15 scalar 5184815 19.6
JavaScript Firefox 7 untyped 5104895 19.9
JavaScript Firefox 9a2 untyped 2005988 50.6
JavaScript Firefox 9a2 typed arrays 1932271 52.6
Emscripten Firefox 9a2 scalar 734126 138
Emscripten Firefox 7 scalar 729270 139


  • JavaScript is still a factor of 10-20 away from well-written native code. Adding SIMD support to JavaScript will help, but obviously that’s not the whole story…
  • It’s bizarre that Chrome and Firefox disagree on whether typed arrays or not are faster.
  • Firefox 9 clearly has performance issues that need to be worked out. I wanted to benchmark its type inference capabilities.
  • Emscripten… ouch :( I wish it were even comparable to hand-written JavaScript, but it’s another factor of 10-20 slower…
  • Emscripten on Chrome 15 is within a factor of two of hand-written JavaScript. I think that means you can target all platforms with C++, because hand-written JavaScript won’t be that much faster than cross-compiled C++.
  • Emscripten on Firefox 7 and 9 still has issues, but Alon Zakai informs me that the trunk version of SpiderMonkey is much faster.

In the future, I’d love to run the same test on Flash 11 / Alchemy and Native Client but the former hasn’t shipped and the latter remains a small market.

One final note: it’s very possible my test methodology is screwed up, my benchmarks are wrong, or I suck at copy/pasting numbers. Science should be reproducible: please try to reproduce these results yourself!

In Defense of Language Democracy (Or: Why the Browser Needs a Virtual Machine)

Years ago, Mark Hammond did a bunch of work to get Python running inside Mozilla’s script tags. Parts of Mozilla are ostensibly designed to be language-independent, even. Unfortunately, even if Mozilla had succeeded at shipping multiple language implementations, it’s unlikely other browser vendors would have followed suit. It’s just not logistically feasible to have all browsers gate and care for the set of interesting languages on the client.

I can hear you asking “Why do I care about Python in the browser? Or C++? Or OCaml? JavaScript is a great language.” I agree! JavaScript is a great language. Given the extremely short timeframe and immense political pressure, I’m thrilled we ended up with something as capable as JavaScript.

Nonetheless, fair competition benefits everyone. Take a look at what’s happened in the web server space in the last few years: Ruby on Rails. Django. Node.js. nginx. Tornado. Twisted. AppEngine. MochiWeb. HipHop-PHP. ASP.NET MVC. A proliferation of interesting datastores: memcache, redis, riak, etc. That’s an incredible amount of innovation in a short period of time.

Now let’s go through the same exercise, but on the client. jQuery, YUI, fast JavaScript JITs, CSS3, CoffeeScript, proliferation of standards-compliant browsers, some amount of HTML5… Maybe ubiquitous usage of Flash video? These advancements are significant, but it’s clear the front-end stack is changing much more slowly than the back-end.

Why is the back-end evolving faster than the front-end?

When building an application backend, even atop a virtualized hosting provider such as EC2, you are given approximately raw access to a machine: x86 instruction set, sockets, virtual memory, operating system APIs, and all. Any software that runs on that machine competes at the same level. You can use Python or Ruby or C++ or some combination thereof. If Redis wants to innovate with new memory management schemes, nothing is stopping it. This ecosystem democratized – nay, meritocratized – innovation.

On the front-end, the problem boils down to the fact that JavaScript is built atop but does not expose the capabilities of the underlying hardware, meaning browsers and JavaScript implementations are inherently more capable than anything built atop them.

Of course, any client-side technology is going to rev slower simply because it’s hard to get people to update their bits. Also, users decide which client bits they like best, whether they be Internet Explorer, Chrome, or Firefox. Now the technology-takes-time-to-gain-ubiquity problem has a new dimension: each browser vendor must also decide to implement this technology in a compatible way. It took years for even JavaScript to standardize across browsers.

However, if we could instead standardize the web on a performant and safe VM such as CLR, JVM, or LLVM, including explicit memory layout and allocation and access to extra cores and the GPU, JavaScript becomes a choice rather than a mandate.

This point of view depends on my prediction that JavaScript will not become competitive with native code, but not everyone agrees. If JavaScript does eventually match native code, than I’d expect the browser itself to be written in it. It’s impossible for me to claim that JavaScript will never match native code, but the sustained success of C++ in systems programming, games, and high-performance computing is a testament to the value of systems languages.

Native Client, however, gives web developers the opportunity to write code within 5-10% of native code performance, in whatever language they want, without losing the safety and convenience of the web. You can write web applications that leverage multiple cores, and with WebGL, you can harness dedicated graphics hardware as well. Native Client does restrict access to operating system APIs, but I expect APIs to evolve reasonably quickly.

Let’s take a particular example: the HTML5 video tag. Native Client could have sidestepped the entire which-video-codec-should-we-standardize spat between Mozilla, Google, Apple, and Microsoft by allowing each site to choose the codec it prefers. YouTube could safely deploy whatever codecs it wanted, and even evolve them over time.

With Native Client, we could share Python code between the front-end and the back-end. We could use languages that support weak references. We could implement IMVU’s asynchronous task system. We could embed new JavaScript interpreters in old browsers.

Native Client is not the only option here. The JVM and CLR are other portable and performant VMs that have seen considerable language innovation while approximating native code performance.

A standardized, performant, and safe VM in the browser would increase the strength of the open web versus native app stores and their arbitrary technology limitations.

Finally, I’d like to thank Alon Zakai (author of Emscripten), Mike Shaver, and Chris Saari for engaging in open, honest discussion. I hope this public discourse leads to a better web. Either way, I hope this is my last post on this topic. :)

Native Client is Widely Misunderstood (And What Google Should Do About It)

Wow. My recent post about why Mozilla should adopt Native Client stirred up quite a storm. Some folks don’t believe the web needs high-performance applications. Some are happy with whatever APIs browsers expose. I disagree with these points, but I can respect them.

Most surprisingly, several respondents had simply untrue objections to Native Client, so I’d like to clear up their misconceptions. Then I will make recommendations to the Native Client team on how to fix their perception problems.

If you want to spend some minutes and learn about Native Client and LLVM from the horse’s mouth, watch this video.

Misconceptions about Native Client

Native Client implies x86

False. Originally, Native Client was positioned as an x86 sandbox technology, but now it has a clear LLVM story, with x86-32, x86-64, and partially-implemented ARM backends. Portability is a key benefit of the web, and Google understands this.

Native Client is complicated

True, it’s certainly not a trivial amount of code. But compare the amount of code in NativeClient vs. Mozilla’s JavaScript engine:

$ wc -l native_client/src/**/*.{c,h,cc}
155082 total
$ find mozilla-central/js/src -path '*tests*' -prune -o \( -iname '*.c' -o -iname '*.cc' -o -iname '*.h' -o -iname '*.cpp' \) -print0 | wc -l --files0-from=-
363471 total

NativeClient is at least on the same order of complexity as a modern JavaScript engine, and since it already provides performance within 5% of native code, I’d guess it’s less susceptible to change.

Native Client / LLVM is not an open standard

I empathize with this concern, but Flash isn’t an open standard and it sees wide adoption. The difference between Flash and Native Client is that Native Client / LLVM is open source and could easily become an open standard.

Native Client is insecure

Native Client was designed to be a secure x86 sandbox. Under the assumption that its basic security model is sound, the question then becomes “how large is the attack surface and how likely is it to be broken?” Given the amount of code in a modern web browser and JavaScript JIT, I don’t see how Native Client is any worse.

With a little more work, JavaScript will perform at the same level as native code

I’m not informed or involved enough to claim JavaScript can never be as fast as native code. However, I have my doubts. A friend was working on a Monte Carlo Go AI, and he initially wrote his algorithm in JavaScript. Monte Carlo requires simulating a large number of game states, and a naïve port of his JavaScript to C++ gave a 100x performance improvement.

Check out my skeletal animation benchmark, where the JavaScript JITs need another 10x to compete with native code.

Even if JITs can match native code in some benchmarks (and I hope they do), performance across browsers will depend on the particulars of the JIT implementation. Native Client, at least for pure computation, would perform the same in every browser.

We can simply compile languages like Haskell, Python, and C to optimized JavaScript and let the JIT sort it out.

There are some attempts to use JavaScript as a backend for other language implementations, but they rarely perform well. For example, a CPython compiled to JavaScript via LLVM/Emscripten runs about 30x slower than a native build in Chrome, and 200x slower in Firefox 4 beta 8.

I’ve heard the argument for an RPython-like statically-analyzable subset of JavaScript that browsers could run very efficiently. This subset could operate as a defacto bytecode, and Emscripten could compile LLVM to it with minimal performance loss. It’s possible this could work, but directly exposing LLVM seems more fruitful.

Red Herring Arguments

JavaScript is easier to develop with than native languages

Sure, but that doesn’t mean native languages don’t have a purpose. My hypothesis is that there are problems for which JavaScript is not and will not be suited, and that exposing the native power of the machine is better for application developers, and thus the web.

Binaries are obscure

Minified JS isn’t human-readable either, but machines can reconstruct both. Drdaemon nails it in his comment


“If you want native performance, just download software or install a plug-in!”

While this sentiment reflects today’s reality, it doesn’t reflect trends on the web. Web applications continue to supplant desktop applications. Google Docs, Creately, Pivotal Tracker, Gmail, Mockingbird, and all of the games on Facebook are examples where I would have used a desktop application in the past. It seems that, whenever browsers provide new capabilities, applications consume them. Why would that trend stop now?

Recommendations to the Native Client team

  1. Get a move on! Enable it by default! More flashy demos!
  2. Reposition Native Client as a portable technology and make sure it’s clear that LLVM is key to its strategy.

Finally, NativeClient is still new. I expect it will be some time before it’s solid enough to rely on for production use. That said, it has the potential to disrupt the desktop operating system and I’m excited for a future where all software is web-based.

Digging Into JavaScript Performance

While JavaScript implementations have been improving by leaps and bounds, I predict that they still won’t meet the performance of native code within the next couple years, even when plenty of memory is available and the algorithms are restricted to long, homogenous loops. (Death-by-1000-cuts situations where your profile is complete flat and function call overhead dominates may be permanently relegated to statically compiled languages.)

Thus, I really want to see Native Client succeed, as it neatly jumps to a world where it’s possible to have code within 5-10% of the performance of native code, securely deployed on the web. I wrote a slightly inflammatory post about why the web should compete at the same level as native desktop applications, and why Native Client is important for getting us there.

Mike Shaver called me out. “Write a benchmark that’s important to you, submit it as a bug, and we’ll make it fast.” So I took the Cal3D skinning loop and wrote four versions: C++ with SSE intrinsics, C++ with scalar math, JavaScript, and JavaScript with typed arrays. I tested on a MacBook Pro, Core i5, 2.5 GHz, with gcc and Firefox 4.0 beta 8.

First, the code is on github.

The numbers:

Millions of vertices skinned per second (bigger is better)
  • C++ w/ SSE intrinsics: 98.3
  • C++ w/ scalars: 61.2
  • JavaScript: 5.1
  • JavaScript w/ typed arrays: 8.4

It’s clear we’ve got a ways to go until JavaScript can match native code, but the Mozilla team is confident they can improve this benchmark. Even late on a Sunday night, Vlad took a look and found some suspiciously-inefficient code generation. If JavaScript grows SIMD intrinsics, that will help a lot.

From a coding style perspective, writing high-performance JavaScript is a challenge. In C++, it’s easy to express that a BoneTransform contains three four-float vectors, and they’re all stored contiguously in memory. In JavaScript, that involves using typed arrays and being very careful with your offsets. I would love to be able to specify memory layout without changing all property access to indices and offsets.

Finally, if you want to track Mozilla’s investigation into this benchmark, here is the bug. I’m excited to see what they can do.