Emscripten Results: Firefox 19 shows dramatic improvement

Last time, we looked at Emscripten’s performance with current JS JITs on an in-order Atom core and found a penalty relative to out-of-order cores.

However, I told @js_dev I’d give updated numbers on a more typical out-of-order x86 core like my 2010 MacBook Pro’s i5.

There are a couple interesting things here: Firefox 19 shows substantial Emscripten performance improvements over Firefox 17, which is even still on par with hand-written JavaScript. While JavaScript JITs are still an order of magnitude away from native code performance, Emscripten’s performance meets or exceeds the performance of hand-written JavaScript. Progress!

The machine is a 2010 Macbook Pro, Core i5 2.53 GHz, OS X 10.6.

For each compiler, I compiled with -O0, -O1, -O2, -O3, and picked the best result.

Language Compiler Variant Vertex Rate Slowdown
C++ clang -O2 SSE 100142197 1
C++ gcc -O3 SSE 93109180 1.08
C++ gcc -O3 scalar 60398333 1.66
C++ clang -O2 scalar 58324308 1.72
JavaScript Chrome 23 untyped 9510489 10.5
Emscripten -O3 Aurora 19.0a2 scalar 7666000 13.1
Emscripten -O3 Firefox 17 scalar 6044000 16.6
JavaScript Chrome 23 typed arrays 5890000 17
Emscripten -O3 Chrome 25.0 canary scalar 5733706 17.5
JavaScript Firefox 17 untyped 5264735 19
JavaScript Firefox 17 typed arrays 5240000 19.1
Emscripten -O2 Chrome 23 scalar 4586165 21.8
Emscripten -O1 nodejs 0.8.10 scalar 4453109 22.5
Emscripten -O2 nodejs 0.8.10 scalar 1483406 67.5
Emscripten -O3 nodejs 0.8.10 scalar 668796 150

Here are the results for various Emscripten optimization levels:

Browser Compilation Level Vertex Rate
Firefox 17 emscripten -O0 2451509
Firefox 17 emscripten -O1 4080000
Firefox 17 emscripten -O2 5146000
Firefox 17 emscripten -O3 6044000
Chrome 23 emscripten -O0 1229754
Chrome 23 emscripten -O1 4152339
Chrome 23 emscripten -O2 4586165
Chrome 23 emscripten -O3 465162
Aurora 19.0a2 emscripten -O0 2062762
Aurora 19.0a2 emscripten -O1 4900000
Aurora 19.0a2 emscripten -O2 6214757
Aurora 19.0a2 emscripten -O3 7666000
Chrome 25.0 canary emscripten -O0 3001399
Chrome 25.0 canary emscripten -O1 4410235
Chrome 25.0 canary emscripten -O2 5482000
Chrome 25.0 canary emscripten -O3 5733706

I updated the benchmark to automate compiling and running the native and node.js builds.

JavaScript, Emscripten, and the Atom D2700

Lately I’ve been doing some work with Emscripten. As predicted, the quality of Emscripten’s generated code is improving and JITs are learning to understand its generated code. I have high hopes for asm.js, a formalization of high-performance, low-level JavaScript. I now believe it’s conceivable that Emscripten could approach the same level of performance as PNaCl, though whether that happens remains to be seen.

However, having a rough understanding of how today’s JavaScript JITs work, I’ve always wondered whether Emscripten-generated code would be especially penalized relative to native code on an in-order core like Intel Atom. Having recently built an Intel Atom home server, I figured I’d update my recent Emscripten skinning benchmark results and find out.

First I’ll describe the hardware. The CPU is an Atom D2700 on the Intel D2700DC board. 1066 MHz DDR3 memory. Two cores hyperthreaded. Running Ubuntu 12.04 Server. Firefox and Chromium packages are stock. Node.js and clang 3.1 are x64 Linux binaries downloaded from their respective websites. Emscripten is commit 26250471b46a68204711f037f33790bfb4ba37c7 in the master branch.

Now the results. Remember there are three JavaScript implementations: hand-written JS with untyped arrays and objects “untyped”, hand-written JS with typed arrays “typed arrays”, and Emscripten-compiled C++ “scalar”. Emscripten’s compiler was invoked with -O1. I saw significant performance drop-offs with -O2 and -O3.

Language Compiler Variant Vertex Rate Slowdown
C++ gcc 4.6.3 -O3 SSE 24040000 1
C++ clang 3.1 -O3 SSE 22530000 1.07
C++ gcc 4.6.3 -O3 scalar 18730000 1.28
C++ clang 3.1 -O3 scalar 13040000 1.84
JavaScript Chromium 20.0 untyped 3150000 7.63
JavaScript Firefox 17 typed arrays 2437562 9.86
JavaScript Firefox 17 untyped 1084577 22.2
Emscripten Firefox 17 scalar 944333 25.5
JavaScript Chromium 20.0 typed arrays 807577 29.8
Emscripten node 0.8.14 scalar 679802 35.4
Emscripten Chromium 20.0 scalar 677966 35.5

Based on the previous benchmark results and my recent experience with Emscripten, it appears that JavaScript JITted code indeed has a penalty relative native code on in-order cores, or at least the Atom D2700.

Next time I hope to update these benchmarks on a high-end desktop CPU.

As always, if you’d like to reproduce these results or question them, the code is available on my github.