Blind-testing MP3 compression

July 5th, 2009 · 25 Comments · Technology

Among music listeners, the use of lossy audio compression technologies such as MP3 is a controversial topic. On one side, we have the masses who are glad to listen to their favorite tunes on $20 speakers connected to their PC’s onboard audio device and couldn’t care less what bitrate MP3s they get as long as the sound quality is better than FM radio. On another side, we have the quasi-audiophiles (not true audiophiles, of course, as those would never touch anything other than a high-quality CD or LP player properly matched to the amplifier) who stick to lossless formats like FLAC due to MP3’s alleged imperfections.

If I considered myself part of either group, my life would be easy, as I would know exactly what to do. Unfortunately, I fall somewhere in between. I appreciate music played through good equipment and I own what could be described as a budget audiophile system. On the other hand, I am not prepared to follow the lead of the hard-core lossless format advocates, who keep repeating how bad MP3s sound, yet do not offer anything in the way of objective evidence.

So, me being me, I had to come to my own conclusions about MP3 compression. Is it okay for me to listen to MP3s and if so, what bitrate is best? To answer these questions, I spent many hours doing so-called ABX listening tests.

What is an ABX test?

An ABX test works like this: You get four samples of the same musical passage: A, B, X and Y. A is the original (uncompressed) version. B is the compressed version. With X and Y, one is the original version (same as A), the other is the compressed version (same as B), and you don’t know which is which. You can listen to each version (A, B, X or Y) as many times as you like. You can select a short section of the passage and listen to it in each version. Your objective is to decide whether X = A (and Y = B) or X = B (and Y = A). If you can get a sufficient number of right answers (e.g. 7 times out of 7 or 9 times out of 10), you can conclude that there is an audible difference between the compressed sample and the original sample.

What I found

The first thing I found was that telling the difference between a well-encoded 128 kbps MP3 and a WAV file is pretty damn hard. Since 128 kbps is really the lowest of the popular MP3 bitrates and it gets so much bad rap on forums like Head-Fi, I expected that it would fail miserably when confronted with the exquisite work of artists like Pink Floyd or Frank Sinatra. Not so. Amazingly, the Lame encoder set at 128 kbps (ABR, high quality encoding) held its own against pretty much anything I’d throw at it. The warm, deeply human quality of Gianna Nannini’s voice in Meravigliosa Creatura, the measured aggression of Metallica’s Blitzkrieg, the spacious guitar landscapes of Pink Floyd’s Pulse concert — it all sounded exactly the same after compression. There were no changes to the ambiance of the recording, the quality of the vocals, the sound of vowels and consonants, the spatial relationships between the instruments on the soundstage, or the ease with which individual instruments could be picked out.
That said, MP3s at 128 kbps are not truly transparent. With some training, it is possible to distinguish them from original recordings in blind listening tests. My trick was to look for brief, sharp, loud sounds like beats or certain types of guitar sounds — I found that compression takes some of the edge off them. Typically, the difference is so subtle that successful identification is only possible with very short (a few seconds long) samples, a lot of concentration and a lot of going back and forth between the samples. Even then, the choice was rarely obvious for me; more often, making the decision felt like guessing. Which of the identical bass riffs I just heard seemed to carry more energy? A few times I was genuinely surprised that I was able to get such high ABX scores after being so unsure of my answers.
With some effort, it is possible to find passages that make the difference between 128 kbps MP3 and uncompressed audio quite obvious. For me, it was just a matter of finding a sound that was sharp enough and short enough. In David Bowie’s Rock ‘n Roll Suicide, I used a passage where Bowie sings the word “song” in a particular, Dylanesque way (WAV file). Another example is a 1.2-seconds-long sample from Thom Yorke’s Harrowdown Hill (WAV file). The second beat in the sample is accompanied by a static-like click (clipping) that is considerably quieter in the compressed version. More samples that are “difficult” for the MP3 format can be found on the Lame project page (I found the “Castanets” sample especially revealing.).
What about higher bitrates? As I increased the bitrate, the differences that were barely audible at 128 kbps became inaudible and the differences that were obvious became less obvious.
- At 192 kbps, the Bowie and Yorke samples were still too much of a challenge and I was able to reliably tell the MP3 from the original, though with much less confidence and with more going back and forth between the two versions.
- At 256 kbps (the highest bitrate I tested), I was not able to identify the MP3 version reliably — my ABX results were 7/10, 6/10 and 6/7, which can be put down to chance.

Caveats

Obviously, the results I got apply to my particular situation. If you have better equipment or better hearing, it is perfectly possible that you will be able to identify 256 kbps MP3s in a blind test. Conversely, if your equipment and/or hearing is worse, 192 kbps or even 128 kbps MP3s may sound transparent to you, even on “difficult” samples.

Test setup

Lame MP3 encoder version 3.98.2. I used Joint Stereo, High Quality, and variable bitrate encoding (ABR).
Foobar2000 player with ABX plugin. I used ReplayGain to equalize the volume between the MP3 and the original file — otherwise I found it too easy to tell the difference in ABX tests, since MP3 encoding seems to change the volume of the track somewhat.
Auzentech X-Meridian 7.1 — a well-respected audiophile-quality sound card with upgraded LM4562 op-amps.
RealCable copper jack-RCA interconnect.
Denon PMA-350SE amplifier — an entry-level audiophile receiver designed in England.
Sennheiser HD 25-1 II, top-of-the-line closed headphones with stock steel cable.

When I write that there was an audible difference in an ABX test, I mean that I got 7/7 or 9/10 correct answers without repeating the test.

Conclusions

If my goal was to use an MP3 bitrate that is indistinguishable from the original in a blind listening test, I would use 256 kbps, since that is the bitrate which I was unable to identify in a reliable way, despite repeated attempts on a variety of samples (including the “difficult” samples posted on the Lame website).

Whether I will actually standardize on 256 kbps, I’m not sure. The fact that a 192 kbps MP3 can be distinguished from the original in a contrived test (good equipment, quiet environment, high listener concentration, specially selected samples) does not mean it is unsuitable for real-world scenarios. Sure, at 192 kbps the music is not always identical to the original, but judging by my experiments, the difference affects less than 1% of my music (in a 100-second sample, more than 99 seconds would probably be transparent). Even if all I did was listen to this tiny proportion of my music, I would be in a position to perceive the difference less than 1% of the time (what percent of the time do I listen to music in a quiet environment? what percent of the time am I really focused on the music as opposed to other things I’m doing?). Besides, there is the rarely-posed question of whether “different” necessarily means “inferior” — it is quite possible that subtle compression artifacts might actually improve the perceived quality of music in some cases.

Hi! If you find this blog helpful, please support it with a little bit of money. My goal is to keep maintaining it and adding more in-depth content. Unfortunately, this takes a non-trivial amount of time (for example, writing a single article can take hundreds of hours of research), which is a problem because I have to make a living. Donations from awesome users like you buy me time to work on this blog. Thanks! – Tomasz

25 Comments so far

ZeeKat Jul 28, 2009 at 6:53 am

I guess this ruins my reputation as a music lover, but I couldn’t tell the difference between LAME encoded 128kbit and CD to save my life (ABX’d twice, to be sure). Or maybe people bashing lower-bitrate mp3s just use crap compressors, LAME seems to be really smart.

Reply
tszynalski Jul 28, 2009 at 4:41 pm

Well, a lot depends on the quality of your source (good sound card) and the speakers (isolating headphones are the most revealing). It also takes a bit of practice to learn what you should listen for.

Check out the “Castanets” sample here . The difference at 128 kbps is quite conspicuous — the compressed sample sounds a bit like “shshsh”, while the original is crisp and aggressive, with clear separation between individual clicks.

Reply
ZeeKat Jul 31, 2009 at 9:34 pm

Might be, it was some bog standard Audigy2 soundcard with not-so-great Technics amplituner and then Grado SR60 headphones. Still, 128kbit files made by older compressor had rather hideous fluttery effect at i.e. cymbals – that was hard to overlook even on crap hardware. I’ll check that castanets example as soon as I figure out what to do with .wv file 🙂

Reply
- tszynalski Aug 1, 2009 at 11:35 am
  
  Foobar2000 plays WV files out of the box; for Winamp, you need a plug-in.
  
  Reply
mwalimu Dec 29, 2009 at 10:23 pm

If I understood correctly, you said that the 128 kbps samples were actually VBR with and average bit rate of 128. Is that correct?

If so, then I have to wonder what kind of results you would have gotten had you used 128 kbps CBR (constant bit rate) instead. When a good encoder (such as Lame) is encoding at VBR, it figures out which parts need to be encoded at a higher bitrate, so that cymbal crash or complex guitar riff might be encoded at 256 kbps or 320 kbps in those fractions of a second where it’s most needed, while other less complex parts are encoded at lower bit rates, to keep the target average near 128 kbps. In practice, when most people talk about a 128 kbps files, they are talking about CBR files. And forcing all of those complex waveforms into 128k frames can take a much greater toll on sound quality than when the encoder has the flexibility to use higher bitrate frames as needed.

Reply
- Tomasz Jan 4, 2010 at 11:15 pm
  
  Yep, I used VBR in all my tests.
  
  Reply
  - noa shiruba Jul 23, 2022 at 7:49 pm
    
    I was also thinking about this as well. a 128 VBR MP3 should sound better tan a 128 CBR in most cases, because it can use higher rates.
    
    I have always ripped all my smusic as FLAC whenever I can, but part of the reason is future proofing. FLAC can always be converted into MP3, AAC, OGG, or whatever comes down the road, including other lossless formats.
    
    The space it takes up is really trivial these days, so you may as well take the safe route.
    
    Reply
AlanAudio Jan 4, 2010 at 9:46 pm

I consider myself an audiophile having modified or paid to have modified most of my gear. I did an A/B test of 3 different CDs played though a highly modified Pioneer SACD player into a NAD C372 integrated into some Monitor RS6s. I was frustrated to find that I could not tell the difference between this and the same music recorded in 320 kbps MP3 through my Squeezebox music server. I had both running at the same time.

This will now cause me to quit ripping all my new CDs in FLAC. I can now use just the high quality MP3 and swallow my audiophile pride.

One Mp3 folder structure will be much easier to manage than having to deal with making two copies of the same music. I use the MP3 files to copy to my 16GB cell phone card.

Reply
- Tomasz Jan 4, 2010 at 11:20 pm
  
  Did you try 256 or 192 kbps? If not, you might be in for another surprise… 🙂
  
  (Make sure you use the latest LAME version and equalize loudness before ABX’ing.)
  
  Reply
  - noa shiruba Jul 23, 2022 at 7:50 pm
    
    Interestingly, I can tell the difference between 320 CBR MP3 and FLAC on my audio player, but not my computer. The main difference is probably the headphones (IEM-M9) vs. the computer speaker (Jawbone Jambox).
    
    Reply
Tim Schmidt Feb 22, 2010 at 10:02 am

These comments are amazing.

So many listeners bash mp3s but never indicate they have ever compared them to less lossy formats.

Thank you for your hard work and scientifically-minded testing.

Reply
Bing Crosby Oct 4, 2010 at 1:56 pm

As an aside, I had a $500 DVD player (before 24 bit/ 192 kHz DAC was standard) from 2001, a $2600 CD player from 2005 and a $9000 CD player from 2002 and there was barely a difference between them. And I’ve played Steinway pianos, and I can tell the difference from a Yamaha!

Maybe I’ll write some more of my opinions about the audiophile market, and their recipes for system building latter.

Reply
Day Eleven: MP3 Ate My HMV « A Life Less Lived Jan 11, 2011 at 9:09 pm

[…] the quality compared to digital isn’t really that inferior. If you have high quality mp3s or CD quality equivalent FLAC files of records, the average music […]

Reply
Mark Fraser: Day Eleven – MP3 Ate My HMV « The One A Day Project Jan 11, 2011 at 9:09 pm

[…] the quality compared to digital isn’t really that inferior. If you have high quality mp3s or CD quality equivalent FLAC files of records, the average music […]

Reply
Arturo Aug 23, 2011 at 3:02 pm

Hey man, your post is very useful, i searching for something that confirmed my own test, ijust want i say: with lame vbr -v6 instead of abr 128 is the same quality and files above 10% lower in size in he most cases (remember avr the file size is unpredictable)

Reply
cj little Sep 5, 2011 at 4:40 am

I must say that I used to think that MP3 192 sounded just fine (circa 2000). I didn’t ABX back then, but I did listen to the same track encoded in various settings, and MP3 192 sounded fine. When AAC started gaining marketshare, I felt the same about AAC 128. I managed to rip my entire collection to AAC 128 (1800+ CDs). I was happy, it sounded fine. It was 2004/5ish. Total investment: $200-$300 (ipod)

Then, a couple years back I started going into hifi audio. I started with a small solid state headphone amp / DA converter and a pair of AKG 240DF’s. I started noticing all this “noise” and “distortion” in my listening. I presumed it was because the headphones were “too good” and got myself a pair of LiveWires. I was happy again (no noise from the amp in my MacBook pro). Total investment: +$300 (amp, headphones)

Then I bought a small tube amp for home listening, a set of a couple year old Vandersteen speakers, and a Logitech Transporter. Now I can ABX MP3 192 v. 320, and the 192 sounds horrible. How could I ever listen to that?! Then I ABX the MP3 192 v. PCM 44kHz/16-bit (FLAC) and the difference is as great as the difference between the 192/320; now, how could I ever listen to the 320? Going further, from the PCM 44kHz/16b vs. PCM 96kHz/24b I can hear a noticeable step on my system, but not nearly as great as the previous two steps. Ok, I’m very, very happy now. Total investment: +$2500 (speakers, amp, transporter)

And when I think I’ve reached the top of the listening chain, I go and listen to a much “better” system (better tubes, seperate stages, better speakers). I still don’t hear as much of a step between the PCM 44 and PCM 96, but I hear a _HUGE_ step between the PCM 44kHz and a DSD (SACD) of the same recording. The SACD sounds even better compared to the PCM 44 than the PCM 44 did compared to the MP3 320. It is amazing! Now, do I have to spend $35k to get here, absolutely not. Probably just adding a DSD playback source to my setup would allow for appreciatively better playback.

If you made it this far, my point is this: for not a lot of money, you will start to hear your encoding choices. Given that at this time, 2TB hard drives are <$80, I would strongly urge you all to rip your CDs into a lossless encoding (I prefer FLAC). Then transcode them down to however your current player will accept. At least this way when (if) you do start to upgrade your components, you don't have to re-rip your collection because it sounds horrible. Learn from my short-sightedness and certainty years ago. :o)

FWIW, all of my testing has been very blind and well-controlled. I do not buy any components, switch encoding types, change room layouts, etc. without first doing a complete ABX set of tests. The world of HiFi is full of snake oil and placebos, and I tread very, very lightly through them all.

Whatever you do, if you love music, keep listening!

– cj

Reply
- Tomasz Sep 6, 2011 at 11:54 am
  
  Thanks for sharing your experiences in such detail.
  
  Naturally, I am skeptical when I hear about audible differences between a 320 kbps MP3 file and PCM. I would suspect an old encoder (old versions of LAME can introduce artifacts in very specific cases) or poor test design (for example, treating 5/7 as a success, repeating an ABX test until you get the result you expect — even one re-run can dramatically change the statistical validity!).
  
  As you see, in order to accept your conclusion, I would have to believe that you did every single thing correctly in your tests. And I have no reason to believe that, or believe the opposite.
  
  I also wonder if you were ABX-ing whole tracks or segments of a few seconds, and how many times you had to play back a segment in order to reach a decision. (I would submit that if you have to listen 10 times until you finally hear the difference, then the difference is of little practical importance. Of course purists, by definition, won’t care about “practical”; they want “identical”.)
  
  The advice to rip everything into FLAC in order to be future-proof is not without merit, but there are also downsides. It may all fit on your hard drive, but will it fit on your iPod? And how long will it take to sync your iPod over USB?
  
  Reply
Sami Jan 3, 2012 at 8:09 pm

If you can’t tell WAVs and MP3s apart, then play them back at 22.05khz and you will ! 🙂 It is important that you _PLAY_ them back at 22.05khz, _NOT_ to resample them to 22.05khz and then play. This works even at 320kbps CBR or highest VBR. They are worlds apart because the psychoacoustic packing that mp3 is based on, does not work anymore when you play half the speed. Same goes for 11.025khz ofcourse, even 32khz works. MP3 is crap. It only fools your ears. Ofcourse I listen MP3s even myself, upsampling them to 96khz/24bit on Foobar2000, but what comes to data integrity MP3 is crap. It is not the same signal by far anymore. The same goes for OGG, AAC=MP4 and most other packers.

I found quite a nice utility called WavPack that could pack stuff to 1:4 with very little difference, even if you played back at 22.05khz. I would subjectively say the signal is 99.8% the same. But it you go lower than 1:4, say 1:8, then WacPack quickly becomes garbage. I needed a packer that would preserve information as much as possible but pack more than FLAC. So I found WavPack. You don’t do anything with it for normal music listening purposes, but it’s nice app if you need to preserve signal AND pack it. This gave me a whole new view of psychoacoustic packing like MP3.

Reply
Fraser Jan 19, 2012 at 8:42 pm

WavPack isn’t 99.8% the same. It is 100% the same. Its not 1/4 of the original size either. Unless you take the lossy route instead, which can then combine with the missing information to give you a lossless copy again. Very very few portible players support WV anyway.

I think most people underestimate the quality of the MP3 format. a V0 encode using one of the newest LAME plugin with “Joint Stereo” will be virtually transparent to 99.9% of people on virtually any equipment. Anybody who claims to notice an “obvious” difference has either been using a seriously outdated encoder, or are suffering from the placebo effect and are only hearing artifacts that were contained in the original recording to begin with.

If you were in a club or at a friends house and they’re playing high quality MP3s, you wouldn’t even question in your mind about whether the music you were hearing was a lossy encode or not.

Reply
Simple comparison of lossy formats: tape vs MP3 « FM DXing Mar 21, 2012 at 11:10 am

[…] from the CD album using CDex software. The MP3 was encoded using LAME version 3.99 using the highly-rated 192 Kbps Variable Bit Rate (VBR) encoding (192 Kbps is the average rate of the musical recording). […]

Reply
christopher May 11, 2012 at 10:23 pm

About 2-3 years ago the very serious c’t computer magazine tested this very theme. They set up an ultra high-end rig to see whether selected (professional)
listeners could tell the difference between 128, 192 and 256. The conditions were thorough (German!) and the result was that 128 could, just about, be recognized, but not 192. The test is in German which is why the blogs never picked it up.

Reply
Ann Jul 24, 2012 at 3:07 pm

> I guess this ruins my reputation as a music lover, but I couldn’t tell the difference between LAME encoded 128kbit and CD to save my life

Don’t worry, the audiophiles can’t tell the difference either. 🙂 They don’t realize that the DSD version was mastered differently from the CD version, which was mastered differently from the LP. They’re not actually hearing a difference in encoding.

Reply
taiganaut Aug 2, 2012 at 6:42 pm

Just a quick note, but ABR 128 is quite a different beast than CBR 128. ABR x will in 99.9% of cases (with LAME) be considerably better quality than CBR x. ABR is VBR with much tighter limits, but it can detect the difficult frames and give them more bits.

Most of the 128Kbps MP3s still floating around out there are CBR 128. The high end (cymbals, sharp percussion, sibilant vocals) have a distinct “watery” sound to them, a kind of underwater warbling with a big loss of definition. I can hear this distortion in noisy environments on crap speakers more often than not without an ABX test – I just say “huh, that sounds like a 128Kbps MP3” and then look at my player and yep. It is.

This has only happened to me one or two times in the 14 years of my MP3-enabled life with 192, and never higher.

Unscientific, but there you go. I use -V 0 for encoding my CDs, and I’m encoding my dad’s entire giant collection of classical music to put on an ipod so he can press one button and shuffle by album on wireless speakers all through the house. I’m using -V 2 for that.

Interestingly, with most classical music, -V 0 in my prior tests almost always uses a much LOWER bitrate than rock/pop music. Symphonies and operas may sound exquisite in -V 0 when it’s allocated 170Kbps, whereas garden variety rock, folk pop, pop music generally takes 190-240Kbps at -V 0.

Reply
taiganaut Aug 2, 2012 at 6:48 pm

Couple other observations:

@Sami: MP3 “fools your ears” because that’s how it works. That’s why they call it psychoacoustic. Playing back at a different sample rate isn’t how the codec is intended to be used, so yeah, I’m not surprised it sounds different. If an MP3 encoded at 44.1 and played at 44.1 is indistinguishable in ABX testing, the only difference in the listener’s experience is based on their awareness of listening to an MP3 instead of whatever the original format was, and their expectations about the same. https://en.wikipedia.org/wiki/Confirmation_bias

Also, a while back I was recording lectures for a class to post online, and found that Ogg had MUCH better performance at very low bitrates than MP3. Squishing a 16KHz 16-bit mono speech stream down to 32Kbps MP3 sounded like one was under a layer of mud. Ogg was distinguishable but much better at 24Kbps. This may not have much to do with higher bitrate performance, but it’s interesting.

Reply
Stromkraft (@Stromkraft) Oct 5, 2013 at 3:14 pm

The fact that DJs manipulate digital files in real time seems absent from this discussion as if music is only heard when played back at original speed and in original key. This is not so.

The real question for DJs are if MP3s can stack up to FLAC, ALAC, AIFF and WAV when being manipulated.

Reply