The Web Audio API is a beautiful example of how bizarre web APIs can get.
Originally, Mozilla proposed their Audio Data API which allowed programmatic creation of stream of audio sample data. You simply opened an output stream, specified the number of channels and sample frequency, and wrote sample data to it. It was simple, beautiful, low-latency, and exposed a minimum baseline set of functionality, meaning that each browser would have likely had a high-quality implementation. The Audio Data API had another nice feature: the application developer specified the desired sample rate. In practice, every OS already has a high-quality resampler and mixer, so there’s no penalty when using a non-native sample rate.
Of course, with typed arrays, asm.js, and the upcoming SIMD.js spec (more on that later), that concern is not very well-founded.
Either way, the Audio Data API lost and the Web Audio API won.
The Web Audio API is an enormous mess. It has a huge API surface with a signal processing graph containing HRTFs, sound cones, doppler, convolutions, oscillators, filters, and so on. Quoth the spec: “It is a goal of this specification to include the capabilities found in modern game audio engines as well as some of the mixing, processing, and filtering tasks that are found in modern desktop audio production applications.”
I created a little demo that shows how poorly browsers handle the basic task of playing a handful of buffers, with fixed sample rates, at exact, scheduled times. I did not find a single browser on either Windows or Mac that could play five 200 Hz sine wave buffers without glitching. You can try the demo yourself.
And I would be remiss without linking the jsmess plea for help.
These are almost certainly solvable problems (the Web Audio API allows you to specify when buffers begin playback in seconds at 64-bit floating point precision), but given that the Web Audio API has been in development for years, you’d think the fundamentals of playing gapless buffers would be solved by now. All the media graph stuff could be optional and layered on.
(Side note: OpenAL is terrible too. Audio is a stream, not a collection of buffers that you queue.)