Among music listeners, the use of lossy audio compression technologies such as MP3 is a controversial topic. On one side, we have the masses who are glad to listen to their favorite tunes on $20 speakers connected to their PC’s onboard audio device and couldn’t care less what bitrate MP3s they get as long as the sound quality is better than FM radio. On another side, we have the quasi-audiophiles (not true audiophiles, of course, as those would never touch anything other than a high-quality CD or LP player properly matched to the amplifier) who stick to lossless formats like FLAC due to MP3’s alleged imperfections.
If I considered myself part of either group, my life would be easy, as I would know exactly what to do. Unfortunately, I fall somewhere in between. I appreciate music played through good equipment and I own what could be described as a budget audiophile system. On the other hand, I am not prepared to follow the lead of the hard-core lossless format advocates, who keep repeating how bad MP3s sound, yet do not offer anything in the way of objective evidence.
So, me being me, I had to come to my own conclusions about MP3 compression. Is it okay for me to listen to MP3s and if so, what bitrate is best? To answer these questions, I spent many hours doing so-called ABX listening tests.
What is an ABX test?
An ABX test works like this: You get four samples of the same musical passage: A, B, X and Y. A is the original (uncompressed) version. B is the compressed version. With X and Y, one is the original version (same as A), the other is the compressed version (same as B), and you don’t know which is which. You can listen to each version (A, B, X or Y) as many times as you like. You can select a short section of the passage and listen to it in each version. Your objective is to decide whether X = A (and Y = B) or X = B (and Y = A). If you can get a sufficient number of right answers (e.g. 7 times out of 7 or 9 times out of 10), you can conclude that there is an audible difference between the compressed sample and the original sample.
What I found
- The first thing I found was that telling the difference between a well-encoded 128 kbps MP3 and a WAV file is pretty damn hard. Since 128 kbps is really the lowest of the popular MP3 bitrates and it gets so much bad rap on forums like Head-Fi, I expected that it would fail miserably when confronted with the exquisite work of artists like Pink Floyd or Frank Sinatra. Not so. Amazingly, the Lame encoder set at 128 kbps (ABR, high quality encoding) held its own against pretty much anything I’d throw at it. The warm, deeply human quality of Gianna Nannini’s voice in Meravigliosa Creatura, the measured aggression of Metallica’s Blitzkrieg, the spacious guitar landscapes of Pink Floyd’s Pulse concert — it all sounded exactly the same after compression. There were no changes to the ambiance of the recording, the quality of the vocals, the sound of vowels and consonants, the spatial relationships between the instruments on the soundstage, or the ease with which individual instruments could be picked out.
- That said, MP3s at 128 kbps are not truly transparent. With some training, it is possible to distinguish them from original recordings in blind listening tests. My trick was to look for brief, sharp, loud sounds like beats or certain types of guitar sounds — I found that compression takes some of the edge off them. Typically, the difference is so subtle that successful identification is only possible with very short (a few seconds long) samples, a lot of concentration and a lot of going back and forth between the samples. Even then, the choice was rarely obvious for me; more often, making the decision felt like guessing. Which of the identical bass riffs I just heard seemed to carry more energy? A few times I was genuinely surprised that I was able to get such high ABX scores after being so unsure of my answers.
- With some effort, it is possible to find passages that make the difference between 128 kbps MP3 and uncompressed audio quite obvious. For me, it was just a matter of finding a sound that was sharp enough and short enough. In David Bowie’s Rock ‘n Roll Suicide, I used a passage where Bowie sings the word “song” in a particular, Dylanesque way (WAV file). Another example is a 1.2-seconds-long sample from Thom Yorke’s Harrowdown Hill (WAV file). The second beat in the sample is accompanied by a static-like click (clipping) that is considerably quieter in the compressed version. More samples that are “difficult” for the MP3 format can be found on the Lame project page (I found the “Castanets” sample especially revealing.).
- What about higher bitrates? As I increased the bitrate, the differences that were barely audible at 128 kbps became inaudible and the differences that were obvious became less obvious.
- At 192 kbps, the Bowie and Yorke samples were still too much of a challenge and I was able to reliably tell the MP3 from the original, though with much less confidence and with more going back and forth between the two versions.
- At 256 kbps (the highest bitrate I tested), I was not able to identify the MP3 version reliably — my ABX results were 7/10, 6/10 and 6/7, which can be put down to chance.
Obviously, the results I got apply to my particular situation. If you have better equipment or better hearing, it is perfectly possible that you will be able to identify 256 kbps MP3s in a blind test. Conversely, if your equipment and/or hearing is worse, 192 kbps or even 128 kbps MP3s may sound transparent to you, even on “difficult” samples.
- Lame MP3 encoder version 3.98.2. I used Joint Stereo, High Quality, and variable bitrate encoding (ABR).
- Foobar2000 player with ABX plugin. I used ReplayGain to equalize the volume between the MP3 and the original file — otherwise I found it too easy to tell the difference in ABX tests, since MP3 encoding seems to change the volume of the track somewhat.
- Auzentech X-Meridian 7.1 — a well-respected audiophile-quality sound card with upgraded LM4562 op-amps.
- RealCable copper jack-RCA interconnect.
- Denon PMA-350SE amplifier — an entry-level audiophile receiver designed in England.
- Sennheiser HD 25-1 II, top-of-the-line closed headphones with stock steel cable.
When I write that there was an audible difference in an ABX test, I mean that I got 7/7 or 9/10 correct answers without repeating the test.
If my goal was to use an MP3 bitrate that is indistinguishable from the original in a blind listening test, I would use 256 kbps, since that is the bitrate which I was unable to identify in a reliable way, despite repeated attempts on a variety of samples (including the “difficult” samples posted on the Lame website).
Whether I will actually standardize on 256 kbps, I’m not sure. The fact that a 192 kbps MP3 can be distinguished from the original in a contrived test (good equipment, quiet environment, high listener concentration, specially selected samples) does not mean it is unsuitable for real-world scenarios. Sure, at 192 kbps the music is not always identical to the original, but judging by my experiments, the difference affects less than 1% of my music (in a 100-second sample, more than 99 seconds would probably be transparent). Even if all I did was listen to this tiny proportion of my music, I would be in a position to perceive the difference less than 1% of the time (what percent of the time do I listen to music in a quiet environment? what percent of the time am I really focused on the music as opposed to other things I’m doing?). Besides, there is the rarely-posed question of whether “different” necessarily means “inferior” — it is quite possible that subtle compression artifacts might actually improve the perceived quality of music in some cases.