The Ghost in the MP3
Jul 30, 2015 at 8:27 PM Thread Starter Post #1 of 33
Joined
Jan 4, 2008
Posts
18,377
Likes
16,678
Location
Fukuoka, Japan
I found an interesting article with samples of what is removed by MP3 compression of a song.
 
http://theghostinthemp3.com/theghostinthemp3.html
 
The MPEG-1 or MPEG-2 Layer III standard, more commonly referred to as MP3, has become a nearly ubiquitous digital audio file format. First published in 1993, this codec implements a lossy compression algorithm based on a perceptual model of human hearing. Listening tests, primarily designed by and for western-european men, and using the music they liked, were used to refine the encoder. These tests determined which sounds were perceptually important and which could be erased or altered, ostensibly without being noticed. What are these lost sounds? Are they sounds which human ears can not hear in their original context due to universal perceptual limitations or are they simply encoding detritus? It is commonly accepted that MP3's create audible artifacts such as pre-echo, but what does the music which this codec deletes sound like? In the work presented here, techniques are considered and developed to recover these lost sounds, the ghosts in the MP3, and reformulate these sounds as art.

 
Quote:
 [T]he MP3 codec was refined using listening tests designed by european audio engineers and featuring the music they chose. In a sense, each of these songs acts as a resonant filter for every file encoded in the MP3 format. Tom's Diner by Suzanne Vega, Fast Car by Tracy Chapman, a Haydn Trumpet concerto... these songs carved out the space of sounds that could be successfully encoded as MP3's. To that end, these songs represent a kind of best-case scenario for an MP3 encoding. If anything can be encoded well by this format, it should be these files. And yet these files do leave a residue behind when encoded to MP3. Exploring these sounds helps to define a boundary case for MP3 salvaging.

 
What are your thoughts?
 
Jul 30, 2015 at 9:04 PM Post #3 of 33
  I think that, that horse died a long, long time ago.
When you play those files back at the correct volume with respect to the originals, guess what?
They're  inaudible.

 
So what? No fascination allowed, just the obligatory Sound-Science skepticism?
cool.gif

 
Jul 30, 2015 at 9:24 PM Post #4 of 33
It is interesting, but not so much that I would claim it to be fascinating.  Any lossy format is going to save space by removing specific data.  An effort is made to remove data that would not be heard in the context of the song being played due to physical limitations of human hearing.  These lossy formats are excellent at discarding data that cannot be heard due to masking.  Without the masking, some of the data removed can be heard.  Am I missing something?
 
Jul 31, 2015 at 6:48 AM Post #5 of 33
  It is interesting, but not so much that I would claim it to be fascinating.  Any lossy format is going to save space by removing specific data.  An effort is made to remove data that would not be heard in the context of the song being played due to physical limitations of human hearing.  These lossy formats are excellent at discarding data that cannot be heard due to masking.  Without the masking, some of the data removed can be heard.  Am I missing something?


that!
 
 
 
 
 
and it's not so hard to read all about it on the net, or even better to try and null some tracks ourselves in our good old and free audacity. (those who will try, remember to zoom in like crazy to time align as best as you can, else the test means nothing).
and for the lazy people https://cdvsmp3.wordpress.com/2014/06/21/aac-vs-mp3-a-comparison-through-null-testing/ 
it shows what is showed on all websites doing that kind of test, that the differences start showing up in the -60db range and below for high quality mp3. now as most songs don't go much past 50db of dynamic, leaving a little headroom because not everything is recorded walled up to the 0db signal, is pretty rational.
60db also happens to be what we tend to accept as the instantaneous dynamic range for humans(so many coincidences, it's as if the guys making mp3 knew a few things ^_^).
and just as a very simple test to get a sense of what 60db is, as always, listen to your favorite music at your usual preferred loudness, then use whatever mean you have to reduce the loudness by about 60db and be amazed by how loud the signal is(what signal?^_^). of course if you give it time and are in a very quiet room you will end up hearing a lot more, but the rapid change just goes to show the difference in ability when music is playing louder at the same time. it has nothing to do with the usual claims that we can ear 100db or more of dynamic. that 100db number is obtained by making a dude listen to a super quiet sound, and then later on to a super loud sound. if both sounds were heard at the same time the guy wouldn't notice the quiet one.
 
and as a side note, on almost all the DAPs I've bought, I ended up noticing some background hiss with sensitive IEMs. that's because when I have finished setting my preferred loudness(rather quiet), the hiss is above -60db compared to the music with those IEMs. what I'm saying is that with some of my favorite IEMs and most DAPs, I'm getting a noise floor that is above the limits of encoding of a 320mp3. yet when I talked about the little "tac" noise on the F886, almost everybody using the DAP told me they had no noise, even some with my IEMs. in fact to this day I've had 3 people agreeing with me on the matter. so how come when we ask people if they hear a difference between mp3 and lossless, a great many say yes(most of whom never actually tried an abx but that's another matter)?
of course many just don't use IEMs sensitive enough and indeed end up with the noise 10 or 20db lower. but on headfi a great many people have high end IEMs and CIEMs that are of very low impedance and above 110db sensitivity@1mw. so within the specs of my IEMs.
when I got the DX50 I had sold it back less than one hour after starting listening to it. the background hiss was very obvious, the bass roll off on my low impedance IEM too. still it took some time before other guys agreed about hiss on the dedicated topic.
 
 
that's the irony of audio, getting entangled into stuff we usually don't hear because we know for a fact it's different(jitter, distortions, mp3...), but not noticing the stuff that is actually very audible because nobody told us it was there.
 
Jul 31, 2015 at 7:32 AM Post #6 of 33
It's obvious that you entirely missed the point.
 
That's why I think this topic doesn't belong to the Sound Science forum.
 
Jul 31, 2015 at 7:48 AM Post #7 of 33
  It's obvious that you entirely missed the point.
 
That's why I think this topic doesn't belong to the Sound Science forum.


how so? I'm saying that the choices to remove or leave some sounds happen below about -60db for 320mp3(I don't know if there really is a fixed threshold value?) so even if they weren't psycho acoustically profiled at all, they would still be not very significant sounds. so does it matter at all what song was used for reference to create the algo? it might, if you're confident to notice changes down there(I'm not). the signal at -40db doesn't care about all that, it's the same as in the lossless track.
 
Jul 31, 2015 at 8:59 AM Post #8 of 33
  I found an interesting article with samples of what is removed by MP3 compression of a song.
 
http://theghostinthemp3.com/theghostinthemp3.html
 
 
What are your thoughts?

 
What are your thoughts?
 
Aug 2, 2015 at 3:35 AM Post #9 of 33
 
  I found an interesting article with samples of what is removed by MP3 compression of a song.
 
http://theghostinthemp3.com/theghostinthemp3.html
 
 
What are your thoughts?

 
What are your thoughts?


I think it is fascinating to hear the actual audio that is lost from MP3 compression and it would be interesting to listen to the ghosts of other types of music. I remember someone posted a 128k LAME mp3 vs. lossless file comparison using a good classical recording and on the equipment I had at the time I could just make out which was which. 
 
The "ghosts" would probably be most useful to codec designers as a means of feedback on the effects of their work. 
 
Aug 2, 2015 at 4:47 AM Post #10 of 33
  I found an interesting article with samples of what is removed by MP3 compression of a song.
 
http://theghostinthemp3.com/theghostinthemp3.html
 
 
What are your thoughts?

 
My thoughts are that we have yet another example of a pseudo scientific publicity grab. The true facts are distorted and twisted to play on people's ignorance and fears.
 
This article was written by Ryan Maguire, a Ph.D. student in Composition and Computer Technologies at the University of Virginia Center for Computer Music, who appears to be on a mission of sorts. He's not the first PhD candidate to get things wrong and thus embarrass himself in public, and probably won't be the last.
 
For example  Ryan Maguire writes: "As previously stated, the MP3 codec was refined using listening tests designed by European audio engineers and featuring the music they chose."
 
The real topic is perceptual coding which is a means of data reduction based on removing information that is normally masked by the human ear. This is a very broad topic with many highly varied operational examples in use under different names, patent suites, and technology.
 
(1) MP3 is not the only form of perceptual coding so the article initially fails by trying to make a bad example out something that is actually far, far more general, and of which there are very, very many different examples. For example, in additional to MP3 there is AAC, the various forms of Windows Media, and a long, large list of others. They are all based on the same basic technology with at this point about 30 years of refinements. The refinement process is ongoing, and new MP3-like encoders have developed just lately that are incompatible at the file level such as Opus. The goal has always been higher fidelity in smaller audio files.
 
(2) MP3 was developed as a joint venture between both US and European engineers, so the claim that only European engineers developed it is false. It's probably an attempt at exploiting Xenophobia. The Wikipedia MP3 article linked below has the correct facts.
 
(3) The development of MP3 did not stop and both MP3 encoders and many, many other related and sequel technologies are still being developed and optimized, so that is a lie to say that only certain pieces of music were used to develop this technology. The playlist that the developers back in the 1990s used was partially mentioned, but many, many other songs that have been recorded and released right up to this day have also been used. The initial suite of songs was carefully chosen to make the current technology fail, and included 20 or 30 other songs. Eventually a suite of music called SQAM was developed and released to the public.
 
This appears to be an argument by means of falsely claiming that the technology and procedures that have been used to develop MP3 were far more exclusive and narrow than they actually were. MP3 was not the first perceptual coder, it is not the only perceptual coder, and it won't be the last perceptual coder.  
 
Actually, MP3 is not a coder design at all, it is a decoder design. The basic rule is that any encoder that encodes music so that it is successfully decoded by the standard MP3 decoder is OK. Notably this includes an encoder called LAME which is so successful at avoiding the basic patented technology in the history MP3 encoders that it is open source freeware. 
 
Get the straight story here: 
 
https://en.wikipedia.org/wiki/MP3
 
http://createdigitalmusic.com/2010/05/the-myth-of-falling-fidelity-and-audio-history-unburdened-by-fact/
 
Aug 2, 2015 at 8:43 AM Post #11 of 33
  My thoughts are that we have yet another example of a pseudo scientific publicity grab. The true facts are distorted and twisted to play on people's ignorance and fears.
 

 
 who appears to be on a mission of sortsHe's not the first PhD candidate to get things wrong and thus embarrass himself in public

 
  It's probably an attempt at exploiting Xenophobia.

 
Are your first thoughts character assassination?
 
Seriously, I am interested in some discussion of the science. The above could be done without and I don't think has any merit here.
 
Aside from the factual errors you mention, I think that his demonstration of the effects on pink, white and brown noise aren't representative of what the mp3 codec was intended for, so it is not a surprise that it doesn't perform well on them.
 
Aug 2, 2015 at 9:15 AM Post #12 of 33
 
I think it is fascinating to hear the actual audio that is lost from MP3 compression and it would be interesting to listen to the ghosts of other types of music. I remember someone posted a 128k LAME mp3 vs. lossless file comparison using a good classical recording and on the equipment I had at the time I could just make out which was which. 
 
The "ghosts" would probably be most useful to codec designers as a means of feedback on the effects of their work. 

 
I can't imagine they don't already listen to difference files, as even lowly nerds like myself do this. I'm sure foobar has a plugin for it, but it's easy enough on the command line. A good thing to try is to put the original and the difference file into an ABX, which lets you quick switch between the uncompressed material and the stuff that was taken out, letting you hear relative loudness as well.
 
Aug 2, 2015 at 9:21 AM Post #13 of 33
   
Are your first thoughts character assassination?
 
Seriously, I am interested in some discussion of the science. The above could be done without and I don't think has any merit here.
 
Aside from the factual errors you mention, I think that his demonstration of the effects on pink, white and brown noise aren't representative of what the mp3 codec was intended for, so it is not a surprise that it doesn't perform well on them.

 
Do you not agree that any reasonable discussion of the science must start with the true facts of the situation?
 
I documented the errors in the original cited document and provided references to accurate accounts of the same evidence. What more do I have to do to show good faith?
 
I happen to be more aware of the true facts of the situation than many because I have been long acquainted with some of the actual people who did the actual hands-on work.  I was especially concerned by the claim that MP3 was an European invention ca. 1993, and that the development of MP3 technology was so heavily weighted by the selection of such a limited number of musical works.  
 
In fact the original 1993 work wasn't all that good sounding and MP3 SQ improved steadily until about 2005 and it has only slowed a bit since then. These improvements were based on careful investigation and development by many people using many musical and dramatic works as the basis of their work.   MP3 and the other perceptual coders were literally listened into existence by people, many of  who love good sound.
 
I agree that using pseudorandom noise is a very backwards way to evaluate perceptual coders. The actual general behavior of most coders with  pseudorandom noise is to transform the noise into somewhat different forms of pseudorandom noise. Perceptual coders were designed to do their best work with music and drama which is what they are commonly and most advantageously used with. That's how they should be evaluated.  The goal of testing things is not to break them in irrelevant ways, but to discover useful things they do well and perhaps not-so-well. 
 
Perceptual coding is very common in modern media development and distribution. It's hard to find a broadcast, a/v stream, or a/v file that has not been itself perceptually coded or that at least some of its component parts were previously perceptually coded. For example you may find a .wav file or AVI of a podcast, musical performance, or dramatic work that contains uncompressed audio, but under the covers some or much of its contents were based on perceptually coded files.
 
Aug 2, 2015 at 9:25 AM Post #14 of 33
   
I can't imagine they don't already listen to difference files, as even lowly nerds like myself do this. I'm sure foobar has a plugin for it, but it's easy enough on the command line. A good thing to try is to put the original and the difference file into an ABX, which lets you quick switch between the uncompressed material and the stuff that was taken out, letting you hear relative loudness as well.

 
Difference files are an irrelevant way to judge perceptual coding.
 
Coder developers do most of their work based on listening comparisons of encoded files with the original files they were made from.
 
One of the problems with difference files is that they combine all errors into one receptacle.
 
Diagnosis is typically based on sorting errors by type, and then studying each type of error to find the common causes.
 
Aug 2, 2015 at 9:31 AM Post #15 of 33
I didn't say they judged based on them; more that cats have curiosity.
 

Users who are viewing this thread

Back
Top