Digging into JavaScript Performance, Part 2
UPDATE. After I posted these numbers, Alon Zakai, Emscripten's author, pointed out options for generating optimized JavaScript. I reran my benchmarks; check out the updated table below and the script used to generate the new results.
At the beginning of the year, I tried to justify my claim that JavaScript has a long way to go before it can compete with the performance of native code.
Well, 10 months have passed. WebGL is catching on, Native Client has been launched, Unreal Engine 3 targets Flash 11, and Crytek has announced they might target Flash 11 too. Exciting times!
On the GPU front, we're in a good place. With WebGL, iOS, and Flash 11 all roughly exposing shader model 2.0, it's not a ton of work to target all of the above. Even on the desktop you can't assume higher than shader model 2.0: the Intel GMA 950 is still at the top.
However, shader model 2.0 isn't general enough to offload all of your compute-intensive workloads to the GPU. With 16 vertex attributes and no vertex texture fetch, you simply can't get enough data into your vertex shaders do to everything you need, e.g. blending morph targets.
Thus, for the foreseeable future, we'll need to write fast CPU code that can run on the web, mobile devices, and the desktop. Today, that means at least JavaScript and a native language like C++. And, because Microsoft has not implemented WebGL, the Firefox and Chrome WebGL blacklists are so strict, and no major browsers fall back on software, you probably care about targeting Flash 11 too. (It does have a software fallback!) If you care about Flash 11, then your code had better target ActionScript 3 / AVM2 too.
How can we target native platforms, the web, and Flash at the same time?
Native platforms are easy: C++ is well-supported on Windows, Mac, iOS, and Android. SSE2 is ubiquitous on x86, ARM NEON is widely available, and both have high-quality intrinsics-based implementations.
As for Flash... I'm just counting on Adobe Alchemy to ship.
On the web, you have two choices. Write your code in C++ and cross-compile it to JavaScript with Emscripten or write it in JavaScript and run via your native JavaScript engine. Ideally, cross-compiling C++ to JS via Emscripten would be as fast as writing your code in JavaScript. If it is, then targeting all platforms is easy: just use C++ and the browsers will do as well as they would with native JavaScript.
Over the last two evenings, while weathering a dust storm, I set about updating my skeletal animation benchmark results: for math-heavy code, how does JavaScript compare to C++ today? And how does Emscripten compare to hand-written JavaScript?
If you'd like, take a look at the raw results.
Language | Compiler | Variant | Vertex Rate | Slowdown |
---|---|---|---|---|
C++ | clang 2.9 | SSE | 101580000 | 1 |
C++ | gcc 4.2 | SSE | 96420454 | 1.05 |
C++ | gcc 4.2 | scalar | 63355501 | 1.6 |
C++ | clang 2.9 | scalar | 62928175 | 1.61 |
JavaScript | Chrome 15 | untyped | 10210000 | 9.95 |
JavaScript | Firefox 7 | typed arrays | 8401598 | 12.1 |
JavaScript | Chrome 15 | typed arrays | 5790000 | 17.5 |
Emscripten | Chrome 15 | scalar | 5184815 | 19.6 |
JavaScript | Firefox 7 | untyped | 5104895 | 19.9 |
JavaScript | Firefox 9a2 | untyped | 2005988 | 50.6 |
JavaScript | Firefox 9a2 | typed arrays | 1932271 | 52.6 |
Emscripten | Firefox 9a2 | scalar | 734126 | 138 |
Emscripten | Firefox 7 | scalar | 729270 | 139 |
Conclusions?
- JavaScript is still a factor of 10-20 away from well-written native code. Adding SIMD support to JavaScript will help, but obviously that's not the whole story...
- It's bizarre that Chrome and Firefox disagree on whether typed arrays or not are faster.
- Firefox 9 clearly has performance issues that need to be worked out. I wanted to benchmark its type inference capabilities.
Emscripten... ouch :( I wish it were even comparable to hand-written JavaScript, but it's another factor of 10-20 slower...- Emscripten on Chrome 15 is within a factor of two of hand-written JavaScript. I think that means you can target all platforms with C++, because hand-written JavaScript won't be that much faster than cross-compiled C++.
- Emscripten on Firefox 7 and 9 still has issues, but Alon Zakai informs me that the trunk version of SpiderMonkey is much faster.
In the future, I'd love to run the same test on Flash 11 / Alchemy and Native Client but the former hasn't shipped and the latter remains a small market.
One final note: it's very possible my test methodology is screwed up, my benchmarks are wrong, or I suck at copy/pasting numbers. Science should be reproducible: please try to reproduce these results yourself!
Nice post. I'm shocked that Emscripten performs so badly on your test case - when I last looked at the code it was generating, it did pretty well in some cases. I'm not really surprised by the general performance numbers, though - albeit disappointed that things still aren't any better and that we still don't have access to anything better than typed arrays.
Since the huge performance regression from FF7 to FF9a2 was worrying me, I did some local testing and then filed a bug on bugzilla:
https://bugzilla.mozilla.org/show_bug.cgi?id=700101
The Emscripten code have been optimized with Closure? There is a big difference between optimized and plain emscripten output. I think NativeClient is the way to go. And I hope Firefox integrate it some day. :)
Greetings!
How do I know whether the emscripten code has been optimized with closure or not?
I was talking about post-emscripten optimization you could see at the bottom of https://github.com/kripken/emscripten/wiki/Optimizing-Code This additional otimization is done with the Closure Compiler(http://code.google.com/closure/compiler/).
The output of emscripten is too verbose. Closure compiler restructure and reduce that code. AFAIK the results would be much better with Closure.
The Emscripten code tested here has no optimizations on it - not LLVM's, not Emscripten's, and not Closure's - so it is not surprising it is very very slow. Here are some numbers with optimizations:
https://gist.github.com/1343182
The optimized Emscripten code ends up 8.34 times slower than optimized native code. The unoptimized Emscripten code is 169.78 times slower than optimized native code, or over 20 times slower than optimized Emscripten code (that's the typical ratio).
8.34 is slower than most benchmarks, in fact it is about the same as the raytrace benchmark in the Emscripten benchmark suite which is the slowest there. Both code samples suffer from the same issue, which is that they are heavy on memory reads and writes, which are not that fast even with typed arrays. At least until we write a better optimizer for that - it's possible to hand-optimize the inner loops for now, there are lots of obvious optimizations we don't have yet.
(Yes, optimizing Emscripten code is not a trivial process at this point, as mentioned in the gist. The wiki does explain the important parts though.)
Thanks for updating the post with optimized Emscripten code.
Emscripten code being 2X slower than handwritten is slower than I'd expect, but this benchmark does look like it would benefit greatly from hand-optimizing the inner loops (which projects using Emscripten are doing, like Broadway).
To elaborate on the trunk versions of browsers, both Chrome and Firefox's trunk versions are much faster than the tested versions here (and that is typically the case - progress is very fast). Firefox's update cycle is tomorrow btw, so your Firefox 9a2 will auto-update to 10a, which should be much faster.
In general though I would recommend simply testing with the trunk JS engines themselves, and not browsers at all, if you are interested in either the raw speed of JS engines, or in how fast browsers will be in a month or two. (But if you care about how fast browsers are right now, then of course test current release versions.)
How would Java and GWT compiled javascript behave for the same tests?
Really interesting results. Thank you for the update! I would like to see Native Client results. BTW I don't think that the Chrome market is small.
I'd love to see more results too! Anyone care to run some Java/NaCl benchmarks? :)