If you are an audiophile who uses a PC as a source in your audio system, you’re probably aware of the fact that Windows Vista introduced a brand-new audio engine to replace the much hated KMixer of Windows XP. In my opinion, there are a few reasons why audiophiles should be happy with this change:
- The new audio stack automatically upconverts all streams to a 32-bit floating-point sample depth (the same that is used in professional studios) and mixes them with the same precision. Because of the amount of headroom that comes with using 32-bit floats, there is no more clipping when playing two samples at the same time. There is also no loss of resolution when you lower the volume of a stream (see below).
- The Vista/Win7 audio engine automatically feeds your sound card with the highest-quality output stream that it can handle, which is usually 24 bits per sample. Perhaps you’re wondering why you should care, given that most music uses only 16 bits per sample. Suppose you’re playing a 16-bit song with a digital volume control set to 10%. This corresponds to dividing each sample by 10. Now let’s assume the song contains the following two adjacent samples: 41 and 48. In an ideal world, after the volume control we would get 4.1 and 4.8. However, if the output stream has a 16-bit depth just like the input stream, then both output samples will have to be truncated to 4. There is now no difference between the two samples, which means we have lost some resolution. But if we can have an output stream with 24 bits per sample, for each 16-bit level we get 28 = 256 additional (“fractional”) levels, so we can still preserve the difference between the two attenuated samples. In fact, we can have ≈4.1016 and ≈4.8008, which is within 0.04% of the “ideal” samples of 4.1 and 4.8.
- Don’t you hate it when you change the volume in your movie player or instant messaging software and instead of changing its own volume, it changes your system volume? Or have you ever used an application with its own poorly implemented volume control (iTunes, I’m pointing at you!)? Well, these abominations should now be behind us. In Vista and Win7, each application gets its own audio stream (or streams) and a separate high-quality volume control, so there should no longer be any reason for application vendors to mess with the system volume or roll their own and botch the job.
So Windows Vista and Windows 7 upconvert all your samples to 32-bit floats and mix them with 32-bit precision into an output stream that, by default, has the highest bit depth that your hardware can handle. The output bit depth is customizable; you can change it in the properties of your audio device. If you change it e.g. to 16 bits, the audio engine will still use 32-bit floats for internal processing — it will just downconvert the resulting stream to 16 bits before sending it to your device.
Now, what about the sample rate? You can set the output sample rate in the audio device properties window, but is there also some internal sample rate that the Windows audio engine uses regardless of your setting? For example, does it upsample your 44.1 kHz songs to 96 or 128 kHz? Unlike the upconverting from 16-bit integers to 32-bit floats (which should be completely lossless), this could potentially introduce some distortion as going from 44.1 kHz to 96 or 128 kHz requires at least some interpolation.
I couldn’t find the answer to this question anywhere, so I wrote to Larry Osterman, who developed the Vista and Win7 audio stacks at Microsoft. His answer was that the sample rate that the engine uses is the one that the user specifies in the Properties window. The default sample rate is chosen by the audio driver (44.1 kHz on most devices). So if your music has a sample rate of 44.1 kHz, you can choose that setting and no sample rate conversion will take place. (Of course, any 48 kHz and higher samples will then be downsampled to 44.1 kHz.)
There is some interesting technical information on the Windows Vista audio stack in this Channel9 video.