Things I’ve learned, published for the public benefit
Hope This Helps header image

An audiophile’s look at the audio stack in Windows Vista and 7

If you are an audiophile who uses a PC as a source in your audio system, you’re probably aware of the fact that Windows Vista introduced a brand-new audio engine to replace the much hated KMixer of Windows XP. In my opinion, there are a few reasons why audiophiles should be happy with this change:

  • The new audio stack automatically upconverts all streams to a 32-bit floating-point sample depth (the same that is used in professional studios) and mixes them with the same precision. Because of the amount of headroom that comes with using 32-bit floats, there is no more clipping when playing two samples at the same time. There is also no loss of resolution when you lower the volume of a stream (see below).
  • The Vista/Win7 audio engine automatically feeds your sound card with the highest-quality output stream that it can handle, which is usually 24 bits per sample. Perhaps you’re wondering why you should care, given that most music uses only 16 bits per sample. Suppose you’re playing a 16-bit song with a digital volume control set to 10%. This corresponds to dividing each sample by 10. Now let’s assume the song contains the following two adjacent samples: 41 and 48. In an ideal world, after the volume control we would get 4.1 and 4.8. However, if the output stream has a 16-bit depth just like the input stream, then both output samples will have to be truncated to 4. There is now no difference between the two samples, which means we have lost some resolution. But if we can have an output stream with 24 bits per sample, for each 16-bit level we get 28 = 256 additional (“fractional”) levels, so we can still preserve the difference between the two attenuated samples. In fact, we can have β‰ˆ4.1016 and β‰ˆ4.8008, which is within 0.04% of the “ideal” samples of 4.1 and 4.8.
  • Don’t you hate it when you change the volume in your movie player or instant messaging software and instead of changing its own volume, it changes your system volume? Or have you ever used an application with its own poorly implemented volume control (iTunes, I’m pointing at you!)? Well, these abominations should now be behind us. In Vista and Win7, each application gets its own audio stream (or streams) and a separate high-quality volume control, so there should no longer be any reason for application vendors to mess with the system volume or roll their own and botch the job.

So Windows Vista and Windows 7 upconvert all your samples to 32-bit floats and mix them with 32-bit precision into an output stream that, by default, has the highest bit depth that your hardware can handle. The output bit depth is customizable; you can change it in the properties of your audio device. If you change it e.g. to 16 bits, the audio engine will still use 32-bit floats for internal processing — it will just downconvert the resulting stream to 16 bits before sending it to your device.

Now, what about the sample rate? You can set the output sample rate in the audio device properties window, but is there also some internal sample rate that the Windows audio engine uses regardless of your setting? For example, does it upsample your 44.1 kHz songs to 96 or 128 kHz? Unlike the upconverting from 16-bit integers to 32-bit floats (which should be completely lossless), this could potentially introduce some distortion as going from 44.1 kHz to 96 or 128 kHz requires at least some interpolation.

I couldn’t find the answer to this question anywhere, so I wrote to Larry Osterman, who developed the Vista and Win7 audio stacks at Microsoft. His answer was that the sample rate that the engine uses is the one that the user specifies in the Properties window. The default sample rate is chosen by the audio driver (44.1 kHz on most devices). So if your music has a sample rate of 44.1 kHz, you can choose that setting and no sample rate conversion will take place. (Of course, any 48 kHz and higher samples will then be downsampled to 44.1 kHz.)

There is some interesting technical information on the Windows Vista audio stack in this Channel9 video.

28 Comments so far

  • Mandeep Baidwan

    Thanks for the roundup of information. This answered both of my questions regarding sample rate conversion and bit depth in Win7 :).

  • Bing Crosby

    Thanks for this article!

    A quick question. Is the volume control not done in 32 bit floating point prior to forming the output stream? So if I have 24 bit song with a 24 bit output stream to my 24 bit sound card at say 10% volume, Windows will have already truncated the 32 bit upsample so I won’t have lost any resolution? So in the 16 bit song case you actually had 16 spare bits and not 8?

    Many Thanks!

    • Tomasz

      Not sure I understand your question. If your input stream is 24-bit and your output stream is 24-bit, then you are going to lose resolution with a digital volume control set to 10% (or anything below 100%). The fact that the volume control is 32-bit float doesn’t matter because the floating-point stream will get converted to 24-bit integer right before it is sent to your sound card.

  • Bing Crosby

    Thanks for your prompt reply.

    Ok, so say I have the 32 bit Windows floating point audio stack, a 24 bit 96 kHz file to play, and my soundcard excepts a maximum output stream (via WASAPI) of 24 bit. So I will loose resolution using the digital volume control? Any way to calculate to what bit depth I will be reduced for around 10 % volume? But I’m not likely to loose anything for 16 bit?

    I’m just interested because I have a sound card with a high quality headphone amp in it (I also have an outboard one but I don’t want to use it). I was wondering about this issue because I couldn’t really distinguish between a 24 bit and 16 bit file with around 8 % volume. However, if the digital volume control doesn’t loose too much resolution for the 24 bit flac (5-15 % volume is around where I listen), I’ll still use the digital one as the outboard analogue head amp probably only has an SNR ~ 100 dB.

    Thanks for your help! Hope these clarifies things a bit!

    Thanks!

    • Tomasz

      At 6.25% volume, you are throwing away 4 bits (the number of possible levels is divided by 16). At 12.5% volume, you are throwing away 3 bits (the number of possible levels is divided by 8).

      Another way of looking at it is that a 24-bit sound card accepts 16777216 possible values for each sample. If you listen at 10% volume, you are only using the lower 10% of possible values (1677721 possible values). Sounds like a big loss, but a 16-bit sample has “only” 65536 possible values and sounds pretty damn good.

      If you cannot distinguish between 24-bit and 16-bit at 8% volume, that means you cannot distinguish between 16-bit sound and (about) 20-bit sound. That makes it extremely unlikely that you could tell the difference between 16-bit and 24-bit (each additional bit makes less of an audible difference). In fact, I’m not sure you or I could tell the difference between 16 and 15 bits. As you can probably tell, I am not an enthusiast of 24-bit recordings.

  • Grr

    The sample rate conversion they do use, though, is really terrible. Try playing a sine wave at 48 kHz when your sound card’s properties are set to 44.1 kHz. (Sound –> Playback –> Speakers Properties –> Advanced –> Default Format) The distortion is clear as day, and I’m not even a crazy audiophile.

  • Greg Sullivan

    I agree that the sample rate conversion is not very good. In Windows XP, there was a setting to control the quality of the conversion. I can’t find an equivalent setting in Windows 7. (anyone?) In XP, and on my particular system, it’s available at:
    Control Panel | Sounds and Audio Devices Properties | Audio | Sound Playback | Advanced | Performance | Sample Rate Conversion Quality

    Does it depend on the specific audio interface & driver?

    I noticed the poor conversion quality immediately when I played a YouTube clip in Windows 7. When I changed the sample rate of the audio interface from 48kHz to 44.1kHz, the problem went away.

    Greg.

  • Greg Sullivan

    I found this YouTube clip of a 40Hz to 100Hz test tone:
    http://www.youtube.com/watch?v=KGlCdZ0j2FQ

    With my interface set to 48kHz, the aliasing is very noticable.

    I tried a test with a program that can generate test tones, and has multiple audio driver options. It supports ASIO, DirectX, and “MMSys WAV”.
    When using MMS WAV I hear aliasing, but when I use ASIO or DirectX, I do NOT hear aliasing. The program is “Multistream ASIO Player”: http://www.tropicalcoder.com/MStreamPlayer.htm I will contact the author to see whether he can shed any light on this problem.

    It’s conceivable that some apps, if they determine that they do not have exclusive access to the audio interface, and thus cannot change it’s sample rate to match the content, will perform their own sample rate conversion.

    Note that I am doing this testing on a netbook running Windows 7 Starter Edition. I wonder whether Windows automatically improves the sample rate conversion if it detects a faster processor and/or a higher grade of Windows?
    Greg.

    • Tomasz

      I generated a 40 Hz tone in Cool Edit Pro, both in 44.1 kHz and 48 kHz. Whenever the sample rate doesn’t match the Windows sample rate, I hear the same high-frequency distortion artifact.

      Here are the results of playing a 44.1 kHz WAV file (40 Hz tone) in various apps with the Windows sample rate set to 48 kHz:

      Cool Edit Pro 2.0 – distortion
      Windows Media Player – no distortion
      Foobar2000 – no distortion
      iTunes 7 – no distortion
      Winamp (with Directsound output) – no distortion
      Winamp (WaveOut output) – distortion

      In short, Directsound output upsamples correctly. WaveOut produces distortion.

      • Greg Sullivan

        Thanks!

        On Windows XP, it seems that even WaveOut works ok.

        Now, is there any way to configure YouTube to use Directsound on Windows 7?

        Greg.

        • Greg Sullivan

          Tomasz,
          I did some testing with Winamp.I verified that it behaves the same as yours on Windows 7. (no distortion with Directsound, distortion with WaveOut).

          I then tested it on Windows XP. I first opened another Directsound application, to lock the hardware at 48kHz. I then opened Winamp, and played a 44.1kHz test tone, using WaveOut. There was no distortion. I used the USB interface so I could monitor the actual sample rate of the hardware – it stayed on 48kHz the entire time.

          So, as I suspected, WaveOut appears to perform good quality sample rate conversion on Windows XP.

          Greg.

  • Greg Sullivan

    I’ve reproduced the problem with a high quality USB audio interface. (the M-Audio Fast Track Ultra). The nice thing about this interface is that it has it’s own control panel, so I can be sure of the actual sample rate being used at any given time.

    Windows 7 honours whatever sample rate I set in the M-Audio control panel – all formats available in Windows are restricted to the M-Audio sample rate. So, again, when I play back 44.1kHz material when the M-Audio interface is set to 48kHz, I hear aliasing, either in YouTube, or when using the MMSys WAV driver in the Multistream ASIO player.

    The standard Windows Media Player does NOT produce aliasing, so that’s encouraging. I also tried the Media Player Classic player, and it too worked fine, no matter what output renderer I chose.

    I’ve also done some testing on Windows XP. Before I did the tests, I set the sample rate conversion quality to the LOWEST setting. In a nutshell, I am unable to get XP to produce aliasing. If only one application is open, Windows will set the audio interface to the same sample rate as the content.
    If more than one application is open simultaneously, XP seems to leave the hardware configured to the rate that the first application set it to, and then, presumably, perform sample rate conversion. (I made sure that the second application was not simply outputting at the rate of the first application, by comparing pitches of test tones) Even when I open the Multistream player (which seems to use a sample rate of 44.1kHz for the test tone generator) when another app is holding the hardware rate at 48kHz, and then use MMSys WAV, the result is STILL clean! (it is clean when using DirectX as well) So, Windows XP seems to be behaving a lot better than Windows 7!

    As an interesting aside, I am unable to get the VLC player to play a test tone cleanly, regardless of all the settings, but ONLY on Windows 7. On XP, it works fine. (?!)

    Greg.

  • Greg Sullivan

    I have cross posted about the sample rate conversion quality issue in a few places, one of them being the Microsoft Pro Audio Developer’s Forum:

    http://social.msdn.microsoft.com/Forums/en/windowspro-audiodevelopment/thread/725546ce-57bf-40d0-b7aa-47e51de9c3ae

    Greg.

  • G0ukI

    My post is a little late in the day, but I stumbled across this article and simply had to throw in my thoughts as an audiophile sound engineer with a lot of time on his hands!
    I currently have my onboard soundcard set to 96khz 24bit, using Media Player Classic FFDshow Audio Decoder to resample audio to 96khz libavcodec highest quality, with the ouput format for uncompressed or decoded streams set to 32bit floating point and nothing else. The upshot of this is that decent quality mp3’s now sound more ‘airy’ and the bottom end more natural and deep – they no longer sound like crunchy mp3’s! That, and it’s also possible to select which applications then use these settings in Directshow control, so it’s possible to apply these settings to Youtube videos, etc.
    Technically I know I’m out of my depth regarding this article, however, I have spent extensive time testing different configurations to acheive the highest possible audio quality in Windows 7 with minimum perceived artifacts using this player and am more than satisfied with the results.
    It took a quite a while and a lot of testing to work out how to configure Media Player Classic to achieve this, but now it’s working the way it should, it sounds like I’ve have a far superior soundcard! πŸ™‚

  • Robert Halvarsson

    Apparantly Microsoft is aware of the problem, yet I had hopes that this would be fixed by now. I am personally affected, and even though I can remedy the problem by decreasing the khz manually, I know a lot of people who don’t…

  • xpclient

    The fix is produced for Windows 7 SP1 but not for Vista. πŸ™

  • DJ Zath

    I have a way to “lock” the internal bit/sample rates in the WDM engine (for XP) by fixing specific binary strings in the registry and then locking the keys in place that greatly-improve its audio pipeline.. much to that which many here do manually by loading a file set to a given bit/sample rate and having windows play it right off the bat (like playing the start.wav @ 48KHz for example); I use this method to play my radio station feeds @ 24bit with outstanding dynamic range.. Unfortunately, this trick does NOT work in Windows 7! Now, with Windows 7, I still find that the audio is distorted and aliasing in comparison to the XP machine running my registry modifications.

  • wireless speakers systems

    I loved as much as you’ll receive carried out right here. The sketch is tasteful, your authored material stylish. nonetheless, you command get got an nervousness over that you wish be delivering the following. unwell unquestionably come further formerly again as exactly the same nearly a lot often inside case you shield this increase.

    My favorite web log concerning technological innovation: wireless speakers systems

  • Paolo Rodriguez

    Have they improved it further in Windows 8?

  • real estate

    Good way of describing, and pleasant article to obtain information on the topic of my presentation subject matter,
    which i am going to convey in institution of higher education.

  • hoppesbrain

    Thanks for this! As you may have noticed, there are some improvements to the audio stack in Windows 10!

    I use an M-Audio 192, and since installing Windows 10, I can set the sound mixer to output to the sound card in 96K 32-bit. It would only do 24-bit before. I know, it’s unlikely I’ll hear a difference, but I like that it’s doing everything in 32-bit floating point, and leaving it to the sound card for conversion down to the 24-bit DAC. (The M-Audio actually processes internally at 36-bit before downsampling to 24.)

    Too bad I can’t set it to 192K, but I’ll take it.

    Some discussion and interesting links here…
    https://www.reddit.com/r/audioengineering/comments/33sr2c/microsoft_seminar_on_windows_10_audio_stack/

  • Lucas

    I have windows 7 32 bits and I noticed that the default audio driver is away better thant the carppy IDT high definiton even with that carpy equalizer panel, I make a good decision in uninstaling idt audio bullshit, now I’m happy with the quality of the sound, even I can mess with the sample rate I hear the difference!

  • xp user

    Sorry – But I have 2 operation systems XP and Vista on same PC with same date drivers and same software same properties and Vista sounds much worse even with same ASIO output on USB Hi-Fi out card. Sorry for this I listen to music on XP and always .

  • Matt Bentley

    Almost everything about this article and the comments is incorrect.
    32-bit doesn’t mean no clipping.
    AFAIK the system does not automatically feed the soundcard with 24-bit audio, it has to be specifically supported by the app.
    The understanding of audio, binary math and PCM is pretty bad here – even I don’t entirely understand PCM, but I know it’s not just ‘steps based on a range of values between 0 and 65535 for 16-bit audio).

    • Teeluck

      1. The system, after processing the audio, feeds the final audio samples to a limiter, which will limit all clipping samples in a proper way to prevent excessive audio distortion.
      2. The system feeds the audio device with what was selected by the the user in the device advanced settings. The final data sent is a dithered (and limited as described before) version as selected.My Realtek codec is as default set for 24bit 48khz.

    • Tomasz P. Szynalski

      The floats (whether 32-bit or other) allow no clipping when mixing two sounds, because they have enormous range. You can add two loud sounds and get something that’s over the maximum possible sample value. If the system ran on 16-bit integers, you’d be unable to store the sum of these sounds – you’d exceed the integer range. Of course later you have to limit it.
      About PCM, of course it’s steps (linear steps in computer PCM). It’s not that hard to find out — all you need to do is search for “PCM” in Wikipedia and read the first two sentences.

Leave a Comment