Category Archives: Technology

The Hidden Shadow

The flower delivery van had been parked across the street for far too long. Cahey peered outside through the window blinds for the third time. By now he was certain they had him under surveillance. He had been careful not to discuss the subject matter of his current project with anyone, but there were a few souls at the Tribune who knew he was working on a major investigative piece. Apparently that was enough to spike the government’s interest.

Cahey lit a cigarette and reflected on the van’s relatively conspicuous location. Sloppy surveillance work or a deliberate attempt to scare him into silence? There was no way to know. He was, however, sure of one thing: if they came here, they would find nothing. Knowing that digital content was much easier to protect from prying eyes than papers, photographs and recordings, he had disposed of every physical record of his investigation, leaving only a digitized copy on the hard drive of his laptop computer. Two days ago, he had encrypted all this data using an open-source application called TrueCrypt, making sure to overwrite the original files several times before deletion. Now his data was unrecoverable without the password, and there was nothing anybody could do about it, not even the NSA with their army of PhD’s and their supercomputers. The spooks would be in for a surprise.

“Drrrrrt” — the sound of the doorbell pierced the smoke-infused air. Cahey glanced through the window. The van was gone. As he walked towards the door, he contemplated logging out of his Windows account, but decided against it. Bypassing that layer of security would be a trivial exercise, and it wouldn’t do the government much good anyway, given the fact that everything of interest was now encrypted. He opened the door. On his porch stood five serious-looking men in suits. “Stephen Cahey? We have a warrant to search the premises.”

———-

Agent Jack Trallis looked at the machine he had been ordered to process. It was a pretty standard Dell laptop with a dual-core CPU and a 15-inch screen that was covered with fingerprints. “God, do I hate those glossy displays”, he muttered to himself. He was alone in the room; the other agents were in the living room questioning the suspect. Trallis noticed the prominent TrueCrypt icon on the machine’s desktop. “Uh oh. Strong encryption.” He fixed his eyes on the taskbar at the bottom of the screen. There was a row of oversized, unlabeled icons that reminded him of the Hackintosh he had once built for his girlfriend. The guy’s laptop was running Windows 7. There was still a chance.

He located the Documents folder, opened its Properties window, and clicked on the “Previous Versions” tab. Just as he thought, there were five previous versions of the folder – “shadow copies” created regularly by the operating system as part of the System Restore mechanism. As these snapshots were prepared silently in the background and stored on a hidden disk volume, few users were aware of them. Agent Trallis was smiling. The good guys from Redmond were going to make his job easy again.

He selected one of the snapshots and clicked Open. An Explorer window popped up, showing the contents of the Documents folder exactly as it had appeared three days ago. “This is too funny”, he thought. There was a subfolder labeled Project Foxhunt full of scanned documents and audio files. Trallis grabbed his radio. “Sir”, he called out to his commanding officer, “I’ve got something you might want to have a look at.”

For technical information on Volume Shadow Copy, read What you should know about Volume Shadow Copy/System Restore in Windows 7 & Vista

An audiophile’s look at the audio stack in Windows Vista and 7

If you are an audiophile who uses a PC as a source in your audio system, you’re probably aware of the fact that Windows Vista introduced a brand-new audio engine to replace the much hated KMixer of Windows XP. In my opinion, there are a few reasons why audiophiles should be happy with this change:

  • The new audio stack automatically upconverts all streams to a 32-bit floating-point sample depth (the same that is used in professional studios) and mixes them with the same precision. Because of the amount of headroom that comes with using 32-bit floats, there is no more clipping when playing two samples at the same time. There is also no loss of resolution when you lower the volume of a stream (see below).
  • The Vista/Win7 audio engine automatically feeds your sound card with the highest-quality output stream that it can handle, which is usually 24 bits per sample. Perhaps you’re wondering why you should care, given that most music uses only 16 bits per sample. Suppose you’re playing a 16-bit song with a digital volume control set to 10%. This corresponds to dividing each sample by 10. Now let’s assume the song contains the following two adjacent samples: 41 and 48. In an ideal world, after the volume control we would get 4.1 and 4.8. However, if the output stream has a 16-bit depth just like the input stream, then both output samples will have to be truncated to 4. There is now no difference between the two samples, which means we have lost some resolution. But if we can have an output stream with 24 bits per sample, for each 16-bit level we get 28 = 256 additional (“fractional”) levels, so we can still preserve the difference between the two attenuated samples. In fact, we can have ≈4.1016 and ≈4.8008, which is within 0.04% of the “ideal” samples of 4.1 and 4.8.
  • Don’t you hate it when you change the volume in your movie player or instant messaging software and instead of changing its own volume, it changes your system volume? Or have you ever used an application with its own poorly implemented volume control (iTunes, I’m pointing at you!)? Well, these abominations should now be behind us. In Vista and Win7, each application gets its own audio stream (or streams) and a separate high-quality volume control, so there should no longer be any reason for application vendors to mess with the system volume or roll their own and botch the job.

So Windows Vista and Windows 7 upconvert all your samples to 32-bit floats and mix them with 32-bit precision into an output stream that, by default, has the highest bit depth that your hardware can handle. The output bit depth is customizable; you can change it in the properties of your audio device. If you change it e.g. to 16 bits, the audio engine will still use 32-bit floats for internal processing — it will just downconvert the resulting stream to 16 bits before sending it to your device.

Now, what about the sample rate? You can set the output sample rate in the audio device properties window, but is there also some internal sample rate that the Windows audio engine uses regardless of your setting? For example, does it upsample your 44.1 kHz songs to 96 or 128 kHz? Unlike the upconverting from 16-bit integers to 32-bit floats (which should be completely lossless), this could potentially introduce some distortion as going from 44.1 kHz to 96 or 128 kHz requires at least some interpolation.

I couldn’t find the answer to this question anywhere, so I wrote to Larry Osterman, who developed the Vista and Win7 audio stacks at Microsoft. His answer was that the sample rate that the engine uses is the one that the user specifies in the Properties window. The default sample rate is chosen by the audio driver (44.1 kHz on most devices). So if your music has a sample rate of 44.1 kHz, you can choose that setting and no sample rate conversion will take place. (Of course, any 48 kHz and higher samples will then be downsampled to 44.1 kHz.)

There is some interesting technical information on the Windows Vista audio stack in this Channel9 video.

Should we care about ABX test results?

Policy no. 8 in the Terms of Service of the respected audiophile community Hydrogenaudio states:

8. All members that put forth a statement concerning subjective sound quality, must — to the best of their ability — provide objective support for their claims. Acceptable means of support are double blind listening tests (ABX or ABC/HR) demonstrating that the member can discern a difference perceptually, together with a test sample to allow others to reproduce their findings.

What a breath of fresh air. Other audio forums are full of snake-oil-peddling and Kool-Aid-drinking evangelists who go on and on about how replacing $200 speaker wires with $400 speaker wires “really opened up the soundstage and made the upper-midrange come alive”. The people at Hydrogenaudio know that such claims demand proper scientific evidence. How nice to see that they dismiss subjective nonsense and rely instead on the ultimate authority of ABX tests, which really tell us what makes a difference and what doesn’t.

Except that ABX tests don’t measure what really matters to us. ABX tests tell us whether we can hear a difference between A and B. What we really want to know, however, is whether A is as good as B.

1.

“Wait a second!”, I hear you exclaim. “Surely if I cannot tell A from B, then for all intents and purposes, A is as good as B and vice versa. If you can’t see the difference, why pay more?

Actually, there could be tons of reasons. To take a somewhat contrived example, suppose I magically replaced the body of your car with one that were less resistant to corrosion, leaving all the other features of your vehicle intact. Looking at the car and driving it, you would not notice any difference. Even if I gave you a chance to choose between your original car and the doctored one, they would seem identical to you and you could choose either of them. However, if you were to choose the one I tampered with, five years later your vehicle’s body would be covered in spots of rust.

The obvious lesson here is that “not seeing a difference” does not guarantee that A is as good as B. Choosing one thing over another can have consequences that are hard to detect in a test because they are delayed, subtle, or so odd-ball that no one even thinks to record them during the test.

But how is this relevant to listening tests? Assuming that music affects us through our hearing, how could we be affected by differences that we cannot hear?

In his fascinating book Burning House: Unlocking the Mysteries of the Brain, Jay Ingram describes the case of a 49-year-old woman suffering from a condition called hemispatial neglect (the case was researched by neuropsychologists John Marshall and Peter Halligan). Patients with hemispatial neglect are unable to perceive one (usually the left) side of the objects they see. When asked to copy drawings, they draw only one side; when reading out words, they read them only in half (e.g. they read simile as mile).

burning-houseIn Marshall and Halligan’s experiment, the woman was given two simple drawings showing two houses. In one of the drawings, the left side of the house was covered in flames and smoke; the houses looked the same otherwise. Since the flames were located on the left side, the patient was unable to see them and claimed to see no difference between the drawings. When Marshall and Halligan asked her which of the houses she would rather live in, she replied — rather unsurprisingly — that it was a silly question, given that the houses were identical.

However, when the experimenters persuaded her to make a choice anyway, she picked the flameless house 14 out of 17 times, all the time insisting that both houses look the same.

Marshall and Halligan’s experiment shows (as do other well-known psychological experiments, including those pertaining to subliminal messages) that it is possible for information to be in a part of the brain where it is inaccessible to conscious processes. This information can influence one’s state of mind and even take part in decision-making processes without one realizing it.

If people can be affected by information that they don’t even know is there, then who says they cannot be affected by inaudible differences between an MP3 and a CD? Failing an ABX test tells you that you are unable to consciously tell the difference between two music samples. It does not mean that the information isn’t in your brain somewhere — it just means that your conscious processes cannot access it.

So the fact that you cannot tell the difference between an MP3 and a CD in an ABX test does not mean that an MP3 is as good as a CD. Who knows? Maybe listening to MP3s causes more fatigue in the long run. Maybe it makes you get bored with your music more quickly. Or maybe the opposite is true and MP3s are actually better. We can formulate and test all sorts of plausible hypotheses — the point is, an ABX test which shows no audible difference is not the end of the discussion.

2.

I have shown that the lack of audible differences between A and B in an ABX test does not imply that A is as good as B. Before you read this post as an apology for lossless audio formats, here is a statement that will surely upset hard-core audiophiles:

The fact that you can tell the difference between an MP3 and a CD in an ABX test does not mean that the MP3 is worse than a CD.

First of all, the differences between MP3s encoded at mainstream bitrates (128 kbps and 192 kbps) and original recordings are really subtle and can be detected only under special conditions (quiet environment, good equipment, full listener concentration, direct comparisons of short samples). Because the differences are so tiny, we cannot automatically assume that it is the uncompressed version that sounds better. Subtle compression artifacts such as slightly reduced sharpness of attacks on short, loud sounds may in fact be preferred by some listeners in a direct comparison.

Secondly, even if we found that the uncompressed version is preferred by listeners, that wouldn’t necessarily mean that it is better. People prefer sitting in front of the TV to exercising, but the latter might make them feel much better overall. If it were discovered, for example, that compressed music is less tiring to listen to (this is of course pure speculation), then that fact might outweigh any preference for uncompressed sound in blind tests.

Summary

The relevance of ABX tests to the lives of music lovers is questionable. Neither does the absence of audible differences imply equal quality, nor does the presence of audible differences imply that the compressed version is inferior. Rather than being the argument to end all debate, the results of ABX tests are just one data point and the relative strengths of various audio formats may well be put in a new light by further research.

Blind-testing MP3 compression

Among music listeners, the use of lossy audio compression technologies such as MP3 is a controversial topic. On one side, we have the masses who are glad to listen to their favorite tunes on $20 speakers connected to their PC’s onboard audio device and couldn’t care less what bitrate MP3s they get as long as the sound quality is better than FM radio. On another side, we have the quasi-audiophiles (not true audiophiles, of course, as those would never touch anything other than a high-quality CD or LP player properly matched to the amplifier) who stick to lossless formats like FLAC due to MP3′s alleged imperfections.

If I considered myself part of either group, my life would be easy, as I would know exactly what to do. Unfortunately, I fall somewhere in between. I appreciate music played through good equipment and I own what could be described as a budget audiophile system. On the other hand, I am not prepared to follow the lead of the hard-core lossless format advocates, who keep repeating how bad MP3s sound, yet do not offer anything in the way of objective evidence.

So, me being me, I had to come to my own conclusions about MP3 compression. Is it okay for me to listen to MP3s and if so, what bitrate is best? To answer these questions, I spent many hours doing so-called ABX listening tests.

What is an ABX test?

An ABX test works like this: You get four samples of the same musical passage: A, B, X and Y. A is the original (uncompressed) version. B is the compressed version. With X and Y, one is the original version (same as A), the other is the compressed version (same as B), and you don’t know which is which. You can listen to each version (A, B, X or Y) as many times as you like. You can select a short section of the passage and listen to it in each version. Your objective is to decide whether X = A (and Y = B) or X = B (and Y = A). If you can get a sufficient number of right answers (e.g. 7 times out of 7 or 9 times out of 10), you can conclude that there is an audible difference between the compressed sample and the original sample.

What I found

  1. The first thing I found was that telling the difference between a well-encoded 128 kbps MP3 and a WAV file is pretty damn hard. Since 128 kbps is really the lowest of the popular MP3 bitrates and it gets so much bad rap on forums like Head-Fi, I expected that it would fail miserably when confronted with the exquisite work of artists like Pink Floyd or Frank Sinatra. Not so. Amazingly, the Lame encoder set at 128 kbps (ABR, high quality encoding) held its own against pretty much anything I’d throw at it. The warm, deeply human quality of Gianna Nannini’s voice in Meravigliosa Creatura, the measured aggression of Metallica’s Blitzkrieg, the spacious guitar landscapes of Pink Floyd’s Pulse concert — it all sounded exactly the same after compression. There were no changes to the ambiance of the recording, the quality of the vocals, the sound of vowels and consonants, the spatial relationships between the instruments on the soundstage, or the ease with which individual instruments could be picked out.
  2. That said, MP3s at 128 kbps are not truly transparent. With some training, it is possible to distinguish them from original recordings in blind listening tests. My trick was to look for brief, sharp, loud sounds like beats or certain types of guitar sounds — I found that compression takes some of the edge off them. Typically, the difference is so subtle that successful identification is only possible with very short (a few seconds long) samples, a lot of concentration and a lot of going back and forth between the samples. Even then, the choice was rarely obvious for me; more often, making the decision felt like guessing. Which of the identical bass riffs I just heard seemed to carry more energy? A few times I was genuinely surprised that I was able to get such high ABX scores after being so unsure of my answers.
  3. With some effort, it is possible to find passages that make the difference between 128 kbps MP3 and uncompressed audio quite obvious. For me, it was just a matter of finding a sound that was sharp enough and short enough. In David Bowie’s Rock ‘n Roll Suicide, I used a passage where Bowie sings the word “song” in a particular, Dylanesque way (WAV file). Another example is a 1.2-seconds-long sample from Thom Yorke’s Harrowdown Hill (WAV file). The second beat in the sample is accompanied by a static-like click (clipping) that is considerably quieter in the compressed version. More samples that are “difficult” for the MP3 format can be found on the Lame project page (I found the “Castanets” sample especially revealing.).
  4. What about higher bitrates? As I increased the bitrate, the differences that were barely audible at 128 kbps became inaudible and the differences that were obvious became less obvious.
    • At 192 kbps, the Bowie and Yorke samples were still too much of a challenge and I was able to reliably tell the MP3 from the original, though with much less confidence and with more going back and forth between the two versions.
    • At 256 kbps (the highest bitrate I tested), I was not able to identify the MP3 version reliably — my ABX results were 7/10, 6/10 and 6/7, which can be put down to chance.

Caveats

Obviously, the results I got apply to my particular situation. If you have better equipment or better hearing, it is perfectly possible that you will be able to identify 256 kbps MP3s in a blind test. Conversely, if your equipment and/or hearing is worse, 192 kbps or even 128 kbps MP3s may sound transparent to you, even on “difficult” samples.

Test setup

  • Lame MP3 encoder version 3.98.2. I used Joint Stereo, High Quality, and variable bitrate encoding (ABR).
  • Foobar2000 player with ABX plugin. I used ReplayGain to equalize the volume between the MP3 and the original file — otherwise I found it too easy to tell the difference in ABX tests, since MP3 encoding seems to change the volume of the track somewhat.
  • Auzentech X-Meridian 7.1 — a well-respected audiophile-quality sound card with upgraded LM4562 op-amps.
  • RealCable copper jack-RCA interconnect.
  • Denon PMA-350SE amplifier — an entry-level audiophile receiver designed in England.
  • Sennheiser HD 25-1 II, top-of-the-line closed headphones with stock steel cable.

When I write that there was an audible difference in an ABX test, I mean that I got 7/7 or 9/10 correct answers without repeating the test.

Conclusions

If my goal was to use an MP3 bitrate that is indistinguishable from the original in a blind listening test, I would use 256 kbps, since that is the bitrate which I was unable to identify in a reliable way, despite repeated attempts on a variety of samples (including the “difficult” samples posted on the Lame website).

Whether I will actually standardize on 256 kbps, I’m not sure. The fact that a 192 kbps MP3 can be distinguished from the original in a contrived test (good equipment, quiet environment, high listener concentration, specially selected samples) does not mean it is unsuitable for real-world scenarios. Sure, at 192 kbps the music is not always identical to the original, but judging by my experiments, the difference affects less than 1% of my music (in a 100-second sample, more than 99 seconds would probably be transparent). Even if all I did was listen to this tiny proportion of my music, I would be in a position to perceive the difference less than 1% of the time (what percent of the time do I listen to music in a quiet environment? what percent of the time am I really focused on the music as opposed to other things I’m doing?). Besides, there is the rarely-posed question of whether “different” necessarily means “inferior” — it is quite possible that subtle compression artifacts might actually improve the perceived quality of music in some cases.