Digging into JavaScript Performance, Part 2

Nov 6, 2011 - c++, emscripten, flash, games, javascript, nativeclient, performance, x86

UPDATE. After I posted these numbers, Alon Zakai, Emscripten's author, pointed out options for generating optimized JavaScript. I reran my benchmarks; check out the updated table below and the script used to generate the new results.

At the beginning of the year, I tried to justify my claim that JavaScript has a long way to go before it can compete with the performance of native code.

Well, 10 months have passed. WebGL is catching on, Native Client has been launched, Unreal Engine 3 targets Flash 11, and Crytek has announced they might target Flash 11 too. Exciting times!

On the GPU front, we're in a good place. With WebGL, iOS, and Flash 11 all roughly exposing shader model 2.0, it's not a ton of work to target all of the above. Even on the desktop you can't assume higher than shader model 2.0: the Intel GMA 950 is still at the top.

However, shader model 2.0 isn't general enough to offload all of your compute-intensive workloads to the GPU. With 16 vertex attributes and no vertex texture fetch, you simply can't get enough data into your vertex shaders do to everything you need, e.g. blending morph targets.

Thus, for the foreseeable future, we'll need to write fast CPU code that can run on the web, mobile devices, and the desktop. Today, that means at least JavaScript and a native language like C++. And, because Microsoft has not implemented WebGL, the Firefox and Chrome WebGL blacklists are so strict, and no major browsers fall back on software, you probably care about targeting Flash 11 too. (It does have a software fallback!) If you care about Flash 11, then your code had better target ActionScript 3 / AVM2 too.

How can we target native platforms, the web, and Flash at the same time?

Native platforms are easy: C++ is well-supported on Windows, Mac, iOS, and Android. SSE2 is ubiquitous on x86, ARM NEON is widely available, and both have high-quality intrinsics-based implementations.

As for Flash... I'm just counting on Adobe Alchemy to ship.

On the web, you have two choices. Write your code in C++ and cross-compile it to JavaScript with Emscripten or write it in JavaScript and run via your native JavaScript engine. Ideally, cross-compiling C++ to JS via Emscripten would be as fast as writing your code in JavaScript. If it is, then targeting all platforms is easy: just use C++ and the browsers will do as well as they would with native JavaScript.

Over the last two evenings, while weathering a dust storm, I set about updating my skeletal animation benchmark results: for math-heavy code, how does JavaScript compare to C++ today? And how does Emscripten compare to hand-written JavaScript?

If you'd like, take a look at the raw results.

Language	Compiler	Variant	Vertex Rate	Slowdown
C++	clang 2.9	SSE	101580000	1
C++	gcc 4.2	SSE	96420454	1.05
C++	gcc 4.2	scalar	63355501	1.6
C++	clang 2.9	scalar	62928175	1.61
JavaScript	Chrome 15	untyped	10210000	9.95
JavaScript	Firefox 7	typed arrays	8401598	12.1
JavaScript	Chrome 15	typed arrays	5790000	17.5
Emscripten	Chrome 15	scalar	5184815	19.6
JavaScript	Firefox 7	untyped	5104895	19.9
JavaScript	Firefox 9a2	untyped	2005988	50.6
JavaScript	Firefox 9a2	typed arrays	1932271	52.6
Emscripten	Firefox 9a2	scalar	734126	138
Emscripten	Firefox 7	scalar	729270	139

Conclusions?

JavaScript is still a factor of 10-20 away from well-written native code. Adding SIMD support to JavaScript will help, but obviously that's not the whole story...
It's bizarre that Chrome and Firefox disagree on whether typed arrays or not are faster.
Firefox 9 clearly has performance issues that need to be worked out. I wanted to benchmark its type inference capabilities.
~~Emscripten... ouch :( I wish it were even comparable to hand-written JavaScript, but it's another factor of 10-20 slower...~~
Emscripten on Chrome 15 is within a factor of two of hand-written JavaScript. I think that means you can target all platforms with C++, because hand-written JavaScript won't be that much faster than cross-compiled C++.
Emscripten on Firefox 7 and 9 still has issues, but Alon Zakai informs me that the trunk version of SpiderMonkey is much faster.

In the future, I'd love to run the same test on Flash 11 / Alchemy and Native Client but the former hasn't shipped and the latter remains a small market.

One final note: it's very possible my test methodology is screwed up, my benchmarks are wrong, or I suck at copy/pasting numbers. Science should be reproducible: please try to reproduce these results yourself!

JavaScript, Emscripten, and the Atom D2700 - NextPrevious - More High Order Bits

Have a comment? Send me an email or tweet.

Imported Comments [?]

Kevin Gadd on Nov 6, 2011

Nice post. I'm shocked that Emscripten performs so badly on your test case - when I last looked at the code it was generating, it did pretty well in some cases. I'm not really surprised by the general performance numbers, though - albeit disappointed that things still aren't any better and that we still don't have access to anything better than typed arrays.

Kevin Gadd on Nov 6, 2011

Since the huge performance regression from FF7 to FF9a2 was worrying me, I did some local testing and then filed a bug on bugzilla:

https://bugzilla.mozilla.org/show_bug.cgi?id=700101

Paco Lopez on Nov 6, 2011

The Emscripten code have been optimized with Closure? There is a big difference between optimized and plain emscripten output. I think NativeClient is the way to go. And I hope Firefox integrate it some day. :)

Greetings!

Chad Austin on Nov 6, 2011

How do I know whether the emscripten code has been optimized with closure or not?

Paco Lopez on Nov 6, 2011

I was talking about post-emscripten optimization you could see at the bottom of https://github.com/kripken/emscripten/wiki/Optimizing-Code This additional otimization is done with the Closure Compiler(http://code.google.com/closure/compiler/).

The output of emscripten is too verbose. Closure compiler restructure and reduce that code. AFAIK the results would be much better with Closure.

azakai on Nov 6, 2011

The Emscripten code tested here has no optimizations on it - not LLVM's, not Emscripten's, and not Closure's - so it is not surprising it is very very slow. Here are some numbers with optimizations:

https://gist.github.com/1343182

The optimized Emscripten code ends up 8.34 times slower than optimized native code. The unoptimized Emscripten code is 169.78 times slower than optimized native code, or over 20 times slower than optimized Emscripten code (that's the typical ratio).

8.34 is slower than most benchmarks, in fact it is about the same as the raytrace benchmark in the Emscripten benchmark suite which is the slowest there. Both code samples suffer from the same issue, which is that they are heavy on memory reads and writes, which are not that fast even with typed arrays. At least until we write a better optimizer for that - it's possible to hand-optimize the inner loops for now, there are lots of obvious optimizations we don't have yet.

(Yes, optimizing Emscripten code is not a trivial process at this point, as mentioned in the gist. The wiki does explain the important parts though.)

azakai on Nov 7, 2011

Thanks for updating the post with optimized Emscripten code.

Emscripten code being 2X slower than handwritten is slower than I'd expect, but this benchmark does look like it would benefit greatly from hand-optimizing the inner loops (which projects using Emscripten are doing, like Broadway).

To elaborate on the trunk versions of browsers, both Chrome and Firefox's trunk versions are much faster than the tested versions here (and that is typically the case - progress is very fast). Firefox's update cycle is tomorrow btw, so your Firefox 9a2 will auto-update to 10a, which should be much faster.

In general though I would recommend simply testing with the trunk JS engines themselves, and not browsers at all, if you are interested in either the raw speed of JS engines, or in how fast browsers will be in a month or two. (But if you care about how fast browsers are right now, then of course test current release versions.)

paPus on Nov 8, 2011

How would Java and GWT compiled javascript behave for the same tests?

Paco Lopez on Nov 8, 2011

Really interesting results. Thank you for the update! I would like to see Native Client results. BTW I don't think that the Chrome market is small.

Chad Austin on Nov 8, 2011

I'd love to see more results too! Anyone care to run some Java/NaCl benchmarks? :)