Web Platform Limitations, Part 2 - Web Audio API is a Mess
The Web Audio API is a beautiful example of how bizarre web APIs can get.
Originally, Mozilla proposed their Audio Data API which allowed programmatic creation of stream of audio sample data. You simply opened an output stream, specified the number of channels and sample frequency, and wrote sample data to it. It was simple, beautiful, low-latency, and exposed a minimum baseline set of functionality, meaning that each browser would have likely had a high-quality implementation. The Audio Data API had another nice feature: the application developer specified the desired sample rate. In practice, every OS already has a high-quality resampler and mixer, so there's no penalty when using a non-native sample rate.
I don't know exactly why the Audio Data API lost, but I suspect it had something to do with the idea that JavaScript is too slow or high-latency to generate sample data in small buffers on demand.
Of course, with typed arrays, asm.js, and the upcoming SIMD.js spec (more on that later), that concern is not very well-founded.
Either way, the Audio Data API lost and the Web Audio API won.
The Web Audio API is an enormous mess. It has a huge API surface with a signal processing graph containing HRTFs, sound cones, doppler, convolutions, oscillators, filters, and so on. Quoth the spec: "It is a goal of this specification to include the capabilities found in modern game audio engines as well as some of the mixing, processing, and filtering tasks that are found in modern desktop audio production applications."
We may end up with at least four separate implementations: Blink, WebKit, Gecko, and IE. Some friends of mine and I have this suspicion that, since each browser has to implement such a wide API, in practice, browsers won't always do a great job with each node, so serious audio people will simply generate samples in JavaScript (or Emscripten-compiled C++) anyway.
I created a little demo that shows how poorly browsers handle the basic task of playing a handful of buffers, with fixed sample rates, at exact, scheduled times. I did not find a single browser on either Windows or Mac that could play five 200 Hz sine wave buffers without glitching. You can try the demo yourself.
And I would be remiss without linking the jsmess plea for help.
These are almost certainly solvable problems (the Web Audio API allows you to specify when buffers begin playback in seconds at 64-bit floating point precision), but given that the Web Audio API has been in development for years, you'd think the fundamentals of playing gapless buffers would be solved by now. All the media graph stuff could be optional and layered on.
Final Thoughts
For posterity, audio APIs should work much like DirectSound: at some fairly high, reliable frequency, your application should query the current 'play' and 'write' heads in a circular audio buffer, fill the buffer from the write head up to the play head, and ta da. Streaming audio. Let the OS's audio stack handle the final resampling and mixing, and let the particular application perform whatever transforms it needs to. Of course, the actual polling and buffer-filling could happen in a browser thread, and JavaScript could stream arbitrary numbers of samples in advance if latency doesn't matter that much.
(Side note: OpenAL is terrible too. Audio is a stream, not a collection of buffers that you queue.)
"But, but, JavaScript performance!"
If your concern is truly about JavaScript performance, then we're all in trouble anyway. Why should only audio benefit from SIMD and low latency callbacks? If it will never be the case that JavaScript numerical performance will approach native numerical performance, the Web Audio graph could be replaced entirely by a handful of signal processing functions that are 1) mathematically defined and 2) work on arbitrary typed arrays. This would be simpler for browsers to implement, easier to verify, AND have more flexibility.
So, VSTi in the browser is just a wild dream and even if some simple stuff possible due to poor quality.. just a toy. Correct?
I'm trying to stream mono audio sampled at 4k hz and getting glitches between chunks of samples as you mention above. Iv'e read that the Web Audio API's automatic upsampling (in this case from 4k to 44.1k) does not result in exact timing, which i believe is what I'm hearing; 'clicks' on the ends of sample chunks.
Any suggestions on how to get around this? Nothing I've read online has worked.