Web Audio API – things I learned the hard way

April 2nd, 2014 · 9 Comments · Technology

Firefox recently dropped support for the Mozilla Audio Data API, so I started porting my two audio-enabled Web apps (Plasticity and Online Tone Generator) to the Web Audio API, which is the W3C-blessed standard way to work with audio in the browser.

In the process, I ran into a few problems, which I thought I’d share here for the benefit of other developers making their first steps with the Web Audio API.

AudioBufferSourceNodes and OscillatorNodes are single-use entities

Suppose we want to generate two 440 Hz tones, each with a duration of 1 second, separated with a 1-second pause. This is not the way to do it:

oscillator = context.createOscillator();
oscillator.frequency.value = 440;
oscillator.connect(context.destination);
currentTime = context.currentTime;
oscillator.start(currentTime);
oscillator.stop(currentTime + 1); //stop after 1 second
oscillator.start(currentTime + 2); //resume after 2 seconds
oscillator.stop(currentTime + 3); //stop again after 3 seconds

What’s wrong? We cannot call .start() on an OscillatorNode or AudioBufferSourceNode more than once. The second call in the above code will result in an error. Both OscillatorNodes (which are used to generate simple tones) and AudioBufferSourceNodes (which are used to play back short samples like sound effects) are meant to be thrown away after each use.

Instead, we should create a separate node for every time we want to play a sound. Every time, we must also connect it to the audio graph:

oscillator = context.createOscillator();
oscillator.frequency.value = 440;
oscillator.connect(context.destination);
currentTime = context.currentTime;
oscillator.start(currentTime);
oscillator.stop(currentTime + 1);

oscillator2 = context.createOscillator(); //create 2nd oscillator
oscillator2.frequency.value = 440;
oscillator2.connect(context.destination);
oscillator2.start(currentTime + 2);
oscillator2.stop(currentTime + 3);

ChannelMergerNode inputs don’t map to channels

Warning: This will be fixed in the next Working Draft of the W3C spec; the below text will eventually become obsolete when browsers implement the new spec.

What do you do when you have two mono sources – like OscillatorNodes, which are always mono, or AudioBufferSourceNodes connected to a mono buffer – and you want to mix them into a stereo signal, for example, play one sample in the left channel, and the other in the right? You use a ChannelMergerNode.

A ChannelMergerNode has a number of inputs, but only one output. It takes the input audio signals and mixes them into a single multichannel signal. Sounds pretty simple, but it’s easy to fall into the trap of assuming that inputs correspond to channels in the output signal. For example, take a look at the following code, which tries to play a tone on the right channel only:

oscillatorR = context.createOscillator();
oscillatorR.frequency.value = 440;
mergerNode = context.createChannelMerger(2);
//create mergerNode with 2 inputs
mergerNode.connect(context.destination);

oscillatorR.connect(mergerNode, 0, 1);
//connect output #0 of the oscillator to input #1 of the mergerNode
//we're leaving input #0 of the mergerNode empty
currentTime = context.currentTime;
oscillatorR.start(currentTime);
oscillatorR.stop(currentTime + 2);

The result of running this code is a tone playing in both channels at the same time. Why? Because inputs of a ChannelMergerNode do not map to channels in the output signal. If an input is not connected, ChannelMergerNode will ignore it. In this case, the first input (numbered 0) is not connected. The only connected input is #1, and it has a mono signal. ChannelMerger merges all the channels on all the connected inputs into a single output. Here, it receives only a single mono signal, so it will output a mono signal, which you will hear coming from both speakers, as you always do with mono sounds.

The right way to have a sound playing only in one channel is to create a “dummy” source node and connect it to the ChannelMergerNode:

context = makeAudioContext();
oscillatorR = context.createOscillator();
oscillatorR.frequency.value = 440;
mergerNode = context.createChannelMerger(2); //create mergerNode with 2 inputs
mergerNode.connect(context.destination);

silence = context.createBufferSource();
silence.connect(mergerNode, 0, 0);
//connect dummy source to input #0 of the mergerNode
oscillatorR.connect(mergerNode, 0, 1);
//connect output #0 of the oscillator to input #1 of the mergerNode
currentTime = context.currentTime;
oscillatorR.start(currentTime);
oscillatorR.stop(currentTime + 2);

You create a silence node by creating an AudioBufferSourceNode, just like you would for any sample, and then not initializing the buffer property. The W3C spec guarantees that this produces a single channel of silence. (As of April 2014, this works in Chrome, but does not work in Firefox 28. In Firefox, the input is ignored and the result is the tone playing on both channels.)

Update: I’m happy to say that my feedback to the W3C Audio Working Group has resulted in changes to the spec. Per the latest Editor’s Draft, ChannelMergerNodes accept up to 6 mono inputs (other types of inputs will be downmixed to mono), with empty inputs treated as silence rather than omitted. Now the changes have to be published in the next Working Draft, and then the browsers have to implement them.

Unused nodes get removed automatically

You might think that if you want to have two sounds playing in different channels – one sound in left, another in right – you don’t need to create dummy nodes. After all, the ChannelMergerNode will have two input channels.

In the code below, we want to play a 440 Hz tone in the left channel for 2 seconds, and a 2400 Hz tone in the right channel for 4 seconds. Both tones start at the same time.

oscillatorL = context.createOscillator();
oscillatorL.frequency.value = 440;
oscillatorR = context.createOscillator();
oscillatorR.frequency.value = 2400;
mergerNode = context.createChannelMerger(2); //create mergerNode with 2 inputs
mergerNode.connect(context.destination);

oscillatorL.connect(mergerNode, 0, 0);
//connect output #0 of the oscillator to input #0 of the mergerNode
oscillatorR.connect(mergerNode, 0, 1);
//connect output #0 of the oscillator to input #1 of the mergerNode
currentTime = context.currentTime;
oscillatorL.start(currentTime);
oscillatorL.stop(currentTime + 2); //stop "left" tone after 2 s
oscillatorR.start(currentTime);
oscillatorR.stop(currentTime + 4); //stop "right" tone after 4 s

This code works as expected for the first 2 seconds – each tone is audible only on one channel. But then the left tone stops playing, and the right tone starts playing on both channels. What’s going on?

When oscillatorL stops playing, it gets disconnected from mergerNode and deleted. The browser is allowed to do this because – as you recall – an OscillatorNode or AudioBufferSourceNode can only be used once, so after we call oscillatorL.stop(), oscillatorL becomes unusable.
The ChannelMergerNode notices that it is left with only one channel of input, and starts outputting a mono signal.

As you can see, the most stable solution, if you want to access individual audio channels, is to always have a dummy node (or several, if you’re dealing with 5.1 or 7.1 audio) connected to your ChannelMergerNode. What’s more, it’s probably best to make sure the dummy nodes remain referenceable for as long as you need them. If you assign them to local variables in a function, and that function returns, the browser may remove those nodes from the audio graph:

function playRight()
{
    var oscillatorR = context.createOscillator();
    oscillatorR.frequency.value = 440;
    var mergerNode = context.createChannelMerger(2);
    mergerNode.connect(context.destination);
    var silence = context.createBufferSource();
    silence.connect(mergerNode, 0, 0);
    oscillatorR.connect(mergerNode, 0, 1);
    currentTime = context.currentTime;
    oscillatorR.start(currentTime);
    oscillatorR.stop(currentTime + 2);    
}

playRight();

Consider what happens at the time playRight() finishes. oscillatorR won’t get removed because it’s playing (scheduled to stop in 2 seconds). But the silence node is not doing anything and when the function exits, it won’t be referenceable, so the browser might decide to get rid of it. This would of course switch the output of the ChannelMergerNode into mono mode.

It’s worth noting that the above code currently works in Chrome, but it might not in the future. The W3C spec gives browsers a lot of leeway when it comes to removing AudioNodes:

An implementation may choose any method to avoid unnecessary resource usage and unbounded memory growth of unused/finished nodes. (source)

In Firefox, You cannot modify an AudioBuffer after you’ve assigned it to AudioBufferSourceNode.buffer

The following code attempts to generate and play a 440 Hz tone over a single channel:

SAMPLE_RATE = 44100;
buffer = context.createBuffer(1, 44100*2, SAMPLE_RATE);
//create a mono 44.1 kHz buffer, 2 seconds length
bufferSource = context.createBufferSource();
bufferSource.buffer = buffer;

soundData = buffer.getChannelData(0);
for (var i = 0; i < soundData.length; i++)
  soundData[i] = Math.sin(2*Math.PI*i*440/SAMPLE_RATE);
bufferSource.connect(context.destination);
bufferSource.start(0);

It works in Chrome, but fails in Firefox (28) without any error message. Why? The moment you assign buffer to bufferSource.buffer, Firefox makes the buffer immutable. Further attempts to write to the buffer are ignored.

This behavior is not covered by the W3C spec (at least I couldn’t find any relevant passage), but here’s a Mozilla dev explaining it on StackOverflow.

Hi! If you find this blog helpful, please support it with a little bit of money. My goal is to keep maintaining it and adding more in-depth content. Unfortunately, this takes a non-trivial amount of time (for example, writing a single article can take hundreds of hours of research), which is a problem because I have to make a living. Donations from awesome users like you buy me time to work on this blog. Thanks! – Tomasz

9 Comments so far

Yuri Plaksyuk (@YuriPlaksyuk) May 9, 2014 at 10:29 am

Comment on “AUDIOBUFFERSOURCENODES AND OSCILLATORNODES ARE SINGLE-USE ENTITIES” section. I have found more elegant solution – use GainNode: OscillatorNode -> GainNode -> DestinationNode.

Start OscillatorNode once, then change gain value from 0.0 to 1.0 and vise versa.

Reply
- Tomasz May 9, 2014 at 12:29 pm
  
  I tried this solution and decided I don’t like having an OscillatorNode that’s constantly playing, whether I need it or not.
  
  Reply
Craig Nov 13, 2014 at 6:02 pm

Only being able to call start() once was driving me nuts. The stop() method here is misleading – perhaps it should be re-named remove() or destroy(). Thanks for this post – very helpful. Here is a stop/stop solution that I came up with: http://codepen.io/deadlygeek/pen/mydevQ/

Reply
Gibran Malheiros Sep 22, 2015 at 6:05 pm

Really nice article Tomasz!
The first case was exactly what I was looking for.

Reply
Pete Nov 4, 2015 at 4:30 pm

I’ve read this article at least a dozen times, learning something vital each time. My merger node works now! All hail the dummy input!

I had some trouble with your use of the term “mono.” To me, mono means a one-channel signal; stereo is a two-channel signal, no matter what the input. By this definition, an oscillator node doesn’t produce a mono signal. It usually produces a stereo signal with both channels being identical. I think the API documentation is confusing on this subject.

Great article — most useful I’ve found so far for the web audio API. Thanks!

Reply
- Tomasz Nov 4, 2015 at 6:11 pm
  
  Hi Pete, thanks for writing! It says here that an OscillatorNode has a single mono output:
  http://webaudio.github.io/web-audio-api/#the-oscillatornode-interface
  However, if you just have a single OscillatorNode, it will be upmixed to stereo because your sound card expects at least 2 channels.
  
  Note that the ChannelMergerNode behavior will change — I’m proud to say I got the W3C to change the spec (YES WE CAN!) and the current Editor’s Draft has a much more sensible definition of ChannelMergerNode:
  http://webaudio.github.io/web-audio-api/#the-channelmergernode-interface
  
  Reply
  - Pete Nov 4, 2015 at 6:57 pm
    
    The up-mixing seems like a bad design to me. I usually work with mono signals, and up-mixing must have a cost in time and memory. If the sound card requires a stereo (2-channel) signal, that should only happen at the destination, not at the first node in the graph (the oscillator, in my case) and propagated through (possibly many) more nodes, unnecessarily. It would be clearer overall if up- and down-mixing occurred only programmatically. If an oscillator is supposed to be mono, it should always output just one channel, and methods should be provided for intentional channel increase/reduction.
    
    I’ll look forward to the new changes — thanks again.
    
    Reply
    - Tomasz Nov 5, 2015 at 11:34 am
      
      Yes, ideally the upmixing should happen at the destination. The oscillator should output a mono signal per the spec. If it doesn’t in your browser, then I’d say it’s a bug.
      
      Reply
Erik Thysell Feb 21, 2024 at 1:09 pm

Super helpful!

Reply