Digging into JavaScript Performance, Part 2

UPDATE. After I posted these numbers, Alon Zakai, Emscripten’s author, pointed out options for generating optimized JavaScript. I reran my benchmarks; check out the updated table below and the script used to generate the new results.

At the beginning of the year, I tried to justify my claim that JavaScript has a long way to go before it can compete with the performance of native code.

Well, 10 months have passed. WebGL is catching on, Native Client has been launched, Unreal Engine 3 targets Flash 11, and Crytek has announced they might target Flash 11 too. Exciting times!

On the GPU front, we’re in a good place. With WebGL, iOS, and Flash 11 all roughly exposing shader model 2.0, it’s not a ton of work to target all of the above. Even on the desktop you can’t assume higher than shader model 2.0: the Intel GMA 950 is still at the top.

However, shader model 2.0 isn’t general enough to offload all of your compute-intensive workloads to the GPU. With 16 vertex attributes and no vertex texture fetch, you simply can’t get enough data into your vertex shaders do to everything you need, e.g. blending morph targets.

Thus, for the foreseeable future, we’ll need to write fast CPU code that can run on the web, mobile devices, and the desktop. Today, that means at least JavaScript and a native language like C++. And, because Microsoft has not implemented WebGL, the Firefox and Chrome WebGL blacklists are so strict, and no major browsers fall back on software, you probably care about targeting Flash 11 too. (It does have a software fallback!) If you care about Flash 11, then your code had better target ActionScript 3 / AVM2 too.

How can we target native platforms, the web, and Flash at the same time?

Native platforms are easy: C++ is well-supported on Windows, Mac, iOS, and Android. SSE2 is ubiquitous on x86, ARM NEON is widely available, and both have high-quality intrinsics-based implementations.

As for Flash… I’m just counting on Adobe Alchemy to ship.

On the web, you have two choices. Write your code in C++ and cross-compile it to JavaScript with Emscripten or write it in JavaScript and run via your native JavaScript engine. Ideally, cross-compiling C++ to JS via Emscripten would be as fast as writing your code in JavaScript. If it is, then targeting all platforms is easy: just use C++ and the browsers will do as well as they would with native JavaScript.

Over the last two evenings, while weathering a dust storm, I set about updating my skeletal animation benchmark results: for math-heavy code, how does JavaScript compare to C++ today? And how does Emscripten compare to hand-written JavaScript?

If you’d like, take a look at the raw results.

Language Compiler Variant Vertex Rate Slowdown
C++ clang 2.9 SSE 101580000 1
C++ gcc 4.2 SSE 96420454 1.05
C++ gcc 4.2 scalar 63355501 1.6
C++ clang 2.9 scalar 62928175 1.61
JavaScript Chrome 15 untyped 10210000 9.95
JavaScript Firefox 7 typed arrays 8401598 12.1
JavaScript Chrome 15 typed arrays 5790000 17.5
Emscripten Chrome 15 scalar 5184815 19.6
JavaScript Firefox 7 untyped 5104895 19.9
JavaScript Firefox 9a2 untyped 2005988 50.6
JavaScript Firefox 9a2 typed arrays 1932271 52.6
Emscripten Firefox 9a2 scalar 734126 138
Emscripten Firefox 7 scalar 729270 139


  • JavaScript is still a factor of 10-20 away from well-written native code. Adding SIMD support to JavaScript will help, but obviously that’s not the whole story…
  • It’s bizarre that Chrome and Firefox disagree on whether typed arrays or not are faster.
  • Firefox 9 clearly has performance issues that need to be worked out. I wanted to benchmark its type inference capabilities.
  • Emscripten… ouch :( I wish it were even comparable to hand-written JavaScript, but it’s another factor of 10-20 slower…
  • Emscripten on Chrome 15 is within a factor of two of hand-written JavaScript. I think that means you can target all platforms with C++, because hand-written JavaScript won’t be that much faster than cross-compiled C++.
  • Emscripten on Firefox 7 and 9 still has issues, but Alon Zakai informs me that the trunk version of SpiderMonkey is much faster.

In the future, I’d love to run the same test on Flash 11 / Alchemy and Native Client but the former hasn’t shipped and the latter remains a small market.

One final note: it’s very possible my test methodology is screwed up, my benchmarks are wrong, or I suck at copy/pasting numbers. Science should be reproducible: please try to reproduce these results yourself!

In Defense of Language Democracy (Or: Why the Browser Needs a Virtual Machine)

Years ago, Mark Hammond did a bunch of work to get Python running inside Mozilla’s script tags. Parts of Mozilla are ostensibly designed to be language-independent, even. Unfortunately, even if Mozilla had succeeded at shipping multiple language implementations, it’s unlikely other browser vendors would have followed suit. It’s just not logistically feasible to have all browsers gate and care for the set of interesting languages on the client.

I can hear you asking “Why do I care about Python in the browser? Or C++? Or OCaml? JavaScript is a great language.” I agree! JavaScript is a great language. Given the extremely short timeframe and immense political pressure, I’m thrilled we ended up with something as capable as JavaScript.

Nonetheless, fair competition benefits everyone. Take a look at what’s happened in the web server space in the last few years: Ruby on Rails. Django. Node.js. nginx. Tornado. Twisted. AppEngine. MochiWeb. HipHop-PHP. ASP.NET MVC. A proliferation of interesting datastores: memcache, redis, riak, etc. That’s an incredible amount of innovation in a short period of time.

Now let’s go through the same exercise, but on the client. jQuery, YUI, fast JavaScript JITs, CSS3, CoffeeScript, proliferation of standards-compliant browsers, some amount of HTML5… Maybe ubiquitous usage of Flash video? These advancements are significant, but it’s clear the front-end stack is changing much more slowly than the back-end.

Why is the back-end evolving faster than the front-end?

When building an application backend, even atop a virtualized hosting provider such as EC2, you are given approximately raw access to a machine: x86 instruction set, sockets, virtual memory, operating system APIs, and all. Any software that runs on that machine competes at the same level. You can use Python or Ruby or C++ or some combination thereof. If Redis wants to innovate with new memory management schemes, nothing is stopping it. This ecosystem democratized – nay, meritocratized – innovation.

On the front-end, the problem boils down to the fact that JavaScript is built atop but does not expose the capabilities of the underlying hardware, meaning browsers and JavaScript implementations are inherently more capable than anything built atop them.

Of course, any client-side technology is going to rev slower simply because it’s hard to get people to update their bits. Also, users decide which client bits they like best, whether they be Internet Explorer, Chrome, or Firefox. Now the technology-takes-time-to-gain-ubiquity problem has a new dimension: each browser vendor must also decide to implement this technology in a compatible way. It took years for even JavaScript to standardize across browsers.

However, if we could instead standardize the web on a performant and safe VM such as CLR, JVM, or LLVM, including explicit memory layout and allocation and access to extra cores and the GPU, JavaScript becomes a choice rather than a mandate.

This point of view depends on my prediction that JavaScript will not become competitive with native code, but not everyone agrees. If JavaScript does eventually match native code, than I’d expect the browser itself to be written in it. It’s impossible for me to claim that JavaScript will never match native code, but the sustained success of C++ in systems programming, games, and high-performance computing is a testament to the value of systems languages.

Native Client, however, gives web developers the opportunity to write code within 5-10% of native code performance, in whatever language they want, without losing the safety and convenience of the web. You can write web applications that leverage multiple cores, and with WebGL, you can harness dedicated graphics hardware as well. Native Client does restrict access to operating system APIs, but I expect APIs to evolve reasonably quickly.

Let’s take a particular example: the HTML5 video tag. Native Client could have sidestepped the entire which-video-codec-should-we-standardize spat between Mozilla, Google, Apple, and Microsoft by allowing each site to choose the codec it prefers. YouTube could safely deploy whatever codecs it wanted, and even evolve them over time.

With Native Client, we could share Python code between the front-end and the back-end. We could use languages that support weak references. We could implement IMVU’s asynchronous task system. We could embed new JavaScript interpreters in old browsers.

Native Client is not the only option here. The JVM and CLR are other portable and performant VMs that have seen considerable language innovation while approximating native code performance.

A standardized, performant, and safe VM in the browser would increase the strength of the open web versus native app stores and their arbitrary technology limitations.

Finally, I’d like to thank Alon Zakai (author of Emscripten), Mike Shaver, and Chris Saari for engaging in open, honest discussion. I hope this public discourse leads to a better web. Either way, I hope this is my last post on this topic. :)

Native Client is Widely Misunderstood (And What Google Should Do About It)

Wow. My recent post about why Mozilla should adopt Native Client stirred up quite a storm. Some folks don’t believe the web needs high-performance applications. Some are happy with whatever APIs browsers expose. I disagree with these points, but I can respect them.

Most surprisingly, several respondents had simply untrue objections to Native Client, so I’d like to clear up their misconceptions. Then I will make recommendations to the Native Client team on how to fix their perception problems.

If you want to spend some minutes and learn about Native Client and LLVM from the horse’s mouth, watch this video.

Misconceptions about Native Client

Native Client implies x86

False. Originally, Native Client was positioned as an x86 sandbox technology, but now it has a clear LLVM story, with x86-32, x86-64, and partially-implemented ARM backends. Portability is a key benefit of the web, and Google understands this.

Native Client is complicated

True, it’s certainly not a trivial amount of code. But compare the amount of code in NativeClient vs. Mozilla’s JavaScript engine:

$ wc -l native_client/src/**/*.{c,h,cc}
155082 total
$ find mozilla-central/js/src -path '*tests*' -prune -o \( -iname '*.c' -o -iname '*.cc' -o -iname '*.h' -o -iname '*.cpp' \) -print0 | wc -l --files0-from=-
363471 total

NativeClient is at least on the same order of complexity as a modern JavaScript engine, and since it already provides performance within 5% of native code, I’d guess it’s less susceptible to change.

Native Client / LLVM is not an open standard

I empathize with this concern, but Flash isn’t an open standard and it sees wide adoption. The difference between Flash and Native Client is that Native Client / LLVM is open source and could easily become an open standard.

Native Client is insecure

Native Client was designed to be a secure x86 sandbox. Under the assumption that its basic security model is sound, the question then becomes “how large is the attack surface and how likely is it to be broken?” Given the amount of code in a modern web browser and JavaScript JIT, I don’t see how Native Client is any worse.

With a little more work, JavaScript will perform at the same level as native code

I’m not informed or involved enough to claim JavaScript can never be as fast as native code. However, I have my doubts. A friend was working on a Monte Carlo Go AI, and he initially wrote his algorithm in JavaScript. Monte Carlo requires simulating a large number of game states, and a naïve port of his JavaScript to C++ gave a 100x performance improvement.

Check out my skeletal animation benchmark, where the JavaScript JITs need another 10x to compete with native code.

Even if JITs can match native code in some benchmarks (and I hope they do), performance across browsers will depend on the particulars of the JIT implementation. Native Client, at least for pure computation, would perform the same in every browser.

We can simply compile languages like Haskell, Python, and C to optimized JavaScript and let the JIT sort it out.

There are some attempts to use JavaScript as a backend for other language implementations, but they rarely perform well. For example, a CPython compiled to JavaScript via LLVM/Emscripten runs about 30x slower than a native build in Chrome, and 200x slower in Firefox 4 beta 8.

I’ve heard the argument for an RPython-like statically-analyzable subset of JavaScript that browsers could run very efficiently. This subset could operate as a defacto bytecode, and Emscripten could compile LLVM to it with minimal performance loss. It’s possible this could work, but directly exposing LLVM seems more fruitful.

Red Herring Arguments

JavaScript is easier to develop with than native languages

Sure, but that doesn’t mean native languages don’t have a purpose. My hypothesis is that there are problems for which JavaScript is not and will not be suited, and that exposing the native power of the machine is better for application developers, and thus the web.

Binaries are obscure

Minified JS isn’t human-readable either, but machines can reconstruct both. Drdaemon nails it in his comment


“If you want native performance, just download software or install a plug-in!”

While this sentiment reflects today’s reality, it doesn’t reflect trends on the web. Web applications continue to supplant desktop applications. Google Docs, Creately, Pivotal Tracker, Gmail, Mockingbird, and all of the games on Facebook are examples where I would have used a desktop application in the past. It seems that, whenever browsers provide new capabilities, applications consume them. Why would that trend stop now?

Recommendations to the Native Client team

  1. Get a move on! Enable it by default! More flashy demos!
  2. Reposition Native Client as a portable technology and make sure it’s clear that LLVM is key to its strategy.

Finally, NativeClient is still new. I expect it will be some time before it’s solid enough to rely on for production use. That said, it has the potential to disrupt the desktop operating system and I’m excited for a future where all software is web-based.

Digging Into JavaScript Performance

While JavaScript implementations have been improving by leaps and bounds, I predict that they still won’t meet the performance of native code within the next couple years, even when plenty of memory is available and the algorithms are restricted to long, homogenous loops. (Death-by-1000-cuts situations where your profile is complete flat and function call overhead dominates may be permanently relegated to statically compiled languages.)

Thus, I really want to see Native Client succeed, as it neatly jumps to a world where it’s possible to have code within 5-10% of the performance of native code, securely deployed on the web. I wrote a slightly inflammatory post about why the web should compete at the same level as native desktop applications, and why Native Client is important for getting us there.

Mike Shaver called me out. “Write a benchmark that’s important to you, submit it as a bug, and we’ll make it fast.” So I took the Cal3D skinning loop and wrote four versions: C++ with SSE intrinsics, C++ with scalar math, JavaScript, and JavaScript with typed arrays. I tested on a MacBook Pro, Core i5, 2.5 GHz, with gcc and Firefox 4.0 beta 8.

First, the code is on github.

The numbers:

Millions of vertices skinned per second (bigger is better)
  • C++ w/ SSE intrinsics: 98.3
  • C++ w/ scalars: 61.2
  • JavaScript: 5.1
  • JavaScript w/ typed arrays: 8.4

It’s clear we’ve got a ways to go until JavaScript can match native code, but the Mozilla team is confident they can improve this benchmark. Even late on a Sunday night, Vlad took a look and found some suspiciously-inefficient code generation. If JavaScript grows SIMD intrinsics, that will help a lot.

From a coding style perspective, writing high-performance JavaScript is a challenge. In C++, it’s easy to express that a BoneTransform contains three four-float vectors, and they’re all stored contiguously in memory. In JavaScript, that involves using typed arrays and being very careful with your offsets. I would love to be able to specify memory layout without changing all property access to indices and offsets.

Finally, if you want to track Mozilla’s investigation into this benchmark, here is the bug. I’m excited to see what they can do.

Mozilla’s Rejection of NativeClient Hurts the Open Web

Update: To avoid potential confusion, I will plainly state my overall thesis. The primary benefit of the internet is its openness, connectedness, standardness. By not adopting a technology capable of competing with native apps on iOS, Android, Windows, and Mac, web vendors are preventing important classes of applications such as high-end games and simulations from moving to the open web.

Tom Forsyth writes that clock speeds have grown disproportionately relative to memory access, implying that dynamic languages such as Python or JavaScript, which perform more dependent memory reads, don’t reap the full benefits of Moore’s law. Tom then digs into Data-Oriented Design, whose proponents think primarily about how data is laid out in memory (physical structure) and secondarily about code’s syntax (logical structure). I would have loved to have seen Tom dig into empirical data about the performance of Python and JavaScript across a variety of architectures, especially now that memory subsystems are better and tracing JITs have caught on, but his point stands: memory analysis is critical for low-latency code on today’s architectures. Dynamic languages and virtual tables are at odds with predictable memory access patterns.

How does this apply to the web? Google has developed an x86 sandboxing technology called NativeClient which allows web pages to securely embed x86 code. NativeClient enables Data-Oriented Design on the web, bringing web applications to the same playing field as native applications, especially in domains such as 2D and 3D graphics, video encoding/decoding, audio processing, and simulation.

Mozilla publicly rejects NativeClient and its portable LLVM equivalent, PNaCl. Instead, Mozilla is choosing to invest in JavaScript improvements, predicting that JavaScript performance will come “close enough” to native code performance.

I argue that native code’s primary benefit lies in memory layout and access patterns, not instruction set benefits such as SIMD. With typed arrays, WebGL has brought some degree of explicit memory layout to JavaScript, but it’s still restrictive: typed arrays don’t provide pointers, structures, structure-of-arrays vs. array-of-structures, or variable-width records. These aren’t always easy to specify in C either, but at least NativeClient gives us the possibility to innovate on systems-level design, while preserving the convenience, security, and portability of web-based code.

Predictability is a further advantage of native code. In today’s browser climate, the JavaScript engines have sometimes wildly different performance characteristics. Even if each browser vendor implements its own x86 or LLVM sandbox, it’s unlikely that an application would run differently across browsers.

Beyond performance, NativeClient gives us the ability to target existing code written in C, C++, or even languages like Haskell, to the web. Emscripten and similar “translation taxes” are no longer necessary.

Finally, notice that web-based installation of native code is becoming more prevalent: iOS App Store, upcoming Mac App Store, Games for Windows Live, and Steam have shown it’s possible to make a seamless and compelling native code installation experience. However, these are all restrictive walled gardens! For the open web to compete, it needs a realistic answer to native code.

I believe that Mozilla’s insistence on pushing JavaScript over NativeClient hurts the open web by giving native applications an indefinite leg up. I want the web to support applications as rich as Supreme Commander, a game with thousands of units where each weapon trajectory is physically simulated. NativeClient would give us that capability.

Preemptive response: But NativeClient is x86! Basing the open web on a peculiar, old instruction set is a terrible idea! That’s why I point to LLVM and Portable NativeClient (PNaCl). It’s not a burden to target PNaCl by default, and cross-compile to x86 and ARM if they matter to you.