ABX testing consensus on the question of audibility
May 31, 2015 at 12:37 AM Thread Starter Post #1 of 57

safulop

100+ Head-Fier
Joined
Dec 10, 2008
Posts
106
Likes
24
I am a speech scientist and member of the Acoustical Society of America. I have never put much stock in ABX tests to tell anything but the most egregiously audible differences in the sound.  Is there a scientific publication which proves that if you can't hear a change in an ABX test, it is not audible?  I don't know of one.  There are other possibilities for testing audibility of changes in the sound, such as sound morphing during playback.  But even these are limited because they fail to account for the evolving nature of every sound.
 
May 31, 2015 at 1:28 AM Post #2 of 57
naturally negative ABX results are "weak" - don't say much except that given the conditions of the test no stisitaical significance was found - usually sumerized as "failed to reject the null hypothesis" - but you knew that didn't you?
 
and still chose a confrontational approach - care to contribute here constructively?
 
Quote:
  I am a speech scientist and member of the Acoustical Society of America. I have never put much stock in ABX tests to tell anything but the most egregiously audible differences in the sound.  Is there a scientific publication which proves that if you can't hear a change in an ABX test, it is not audible?  I don't know of one.  There are other possibilities for testing audibility of changes in the sound, such as sound morphing during playback.  But even these are limited because they fail to account for the evolving nature of every sound
 
some of us do have decades of serious amateur "audio"/music reproduction appreciation experience including reading, observing, attending talks, demos, participating in various tests - and some things thought to be "subtle" - not audiophile's "night and day" differences can be demonstrated with positive discrimination in controlled, level matched, blind ABX tests
 
we naturally think "night and day" characterizations by audiophiles falls into the "egregious audible difference" category - but you want to say absolutely nothing can be inferred from the failure of well executed, "fair" ABX tests of these claims
 
care to deconstruct/detail that position for us?

 
May 31, 2015 at 1:57 AM Post #3 of 57
  I am a speech scientist and member of the Acoustical Society of America. I have never put much stock in ABX tests to tell anything but the most egregiously audible differences in the sound.  Is there a scientific publication which proves that if you can't hear a change in an ABX test, it is not audible?  I don't know of one.  There are other possibilities for testing audibility of changes in the sound, such as sound morphing during playback.  But even these are limited because they fail to account for the evolving nature of every sound.

 
 
I really hate to give you a hard time over this, but you are asking a very unscientific question because it is phrased so negatively. If you read the relevant scientific papers and study Experimental Design, nobody who gets science  tries to prove that something can't be heard. That is a negative hypothesis, and they are very difficult or impossible to prove or even develop evidence for. 
 
A more relevant question might be whether one can confirm the thresholds of hearing for various effects that are given in various time-honored scientific papers using ABX of the kind we are talking about here, and the answer is a resounding yes. Usually ABX develops reliable evidence for significantly lower thresholds for the reason given below.
 
The other thing is that you may not know that there are two ABX tests: The one that one often finds in JASA papers going back to a JASA paper written in 1950  W. A. Munson and Mark B. Gardner, titled Standardizing Auditory Tests. The test described in that paper is not the same as the ABX test we are talking about which was developed independently for a different purpose and differs by allowing the listener a lot more freedom to train himself to hear what we are testing for: 1977's David Clark in his Audio Engineering Society Journal Paper, High-Resolution Subjective Testing Using a Double-Blind Comparator  I think that both tests have their merits  but it is a matter of the right tool for the right job. I'm aware that many JASA papers have been written criticizing the 1950 ABX test. 
 
The ABX test we are talking about certainly accounts for the evolving nature of sound subject to the limitation that you can't test for something that has not evolved yet. :wink: 
 
May 31, 2015 at 2:06 AM Post #4 of 57
I've seen in the audiophile press a kind of "religion" devoted to ABX tests.  But this is not necessarily the scientific consensus on the question of audibility. I mean, there is a definite consensus about evolution, climate change, and a few other things.  But there is not a consensus position which says that "if you can't prove it with an ABX test no one can hear it."
So I'm thinking about exploring other methodologies like sound morphing, and comparing the results with ABX tests.  An ideal study would find a small audio change (e.g. from 256 kbps MP3 to lossless) and get some test results with, say, 20 subjects, in which they could generally hear a point in a sound file where you switch it by morphing, but yet cannot get statistical significant detection in an ABX test.  This would help to establish the sensitivity limits of ABX testing, with effects that are still audible.
 
May 31, 2015 at 2:28 AM Post #5 of 57
  I've seen in the audiophile press a kind of "religion" devoted to ABX tests.  But this is not necessarily the scientific consensus on the question of audibility. I mean, there is a definite consensus about evolution, climate change, and a few other things.  But there is not a consensus position which says that "if you can't prove it with an ABX test no one can hear it."
So I'm thinking about exploring other methodologies like sound morphing, and comparing the results with ABX tests.  An ideal study would find a small audio change (e.g. from 256 kbps MP3 to lossless) and get some test results with, say, 20 subjects, in which they could generally hear a point in a sound file where you switch it by morphing, but yet cannot get statistical significant detection in an ABX test.  This would help to establish the sensitivity limits of ABX testing, with effects that are still audible.

 
People who develop MP3 encoders often use ABX tests to prove the value of their innovations. They also use a blind testing scheme that was developed from ABX called ABC/hr. And they also have another blind testing scheme called MUSRA, 
 
Common wisdom among encoder developers is that ABX has the greatest sensitivity to any kind of difference, small and large. It is the Gold Standard for finding small differences.
 
ABC/hr is more complex but it is good for developing information about the rankings of the accuracy of encoders for the purpose of high accuracy encoding (e.g. high bitrate MP3 and AAC)
 
MUSHRA is designed for developing and ranking coders for which high sonic accuracy is knowingly being sacrificed for small dataset size.
 
Here's a challenge for you. This is a link for a set of files for determining the audibility of small interchannel time delays: http://www.hydrogenaud.io/forums/index.php?showtopic=107570&view=findpost&p=899713
 
If you have not done so, download Foobar2000 and the ABX plug in for it. Use the files as directed by the page at the link. Compare your results to the scientific literature for that kind of effect.  
 
This test is designed to be very sensitive but also very insensitive to the quality of your monitoring system. Headphones are recommended for the best results but I suspect that fairly crappy headphones will be as good as anything.
 
May 31, 2015 at 2:41 AM Post #6 of 57
Thanks for the detailed information; this way if I want to pursue the research I know what I am up against.
I am aware that ABX testing is the "gold standard" for finding small differences; nevertheless it is so only because no one has yet established any other procedure that is more sensitive.  It is not pre-ordained to remain the gold standard forever, although some amateur scientists like to elevate it to this level.  An "objectivist" audiophile once calmly explained to me that science has now uncovered everything there is to be known about what can be heard.  I am always skeptical of such finality in science.
 
May 31, 2015 at 2:50 AM Post #7 of 57
  Thanks for the detailed information; this way if I want to pursue the research I know what I am up against.
I am aware that ABX testing is the "gold standard" for finding small differences; nevertheless it is so only because no one has yet established any other procedure that is more sensitive.  It is not pre-ordained to remain the gold standard forever, although some amateur scientists like to elevate it to this level.  An "objectivist" audiophile once calmly explained to me that science has now uncovered everything there is to be known about what can be heard.  I am always skeptical of such finality in science.

 
I agree with you. The only thing good about ABX is that nothing better for small differences seems to be before us.
 
ABC/hr was developed in the 90s if memory serves but its purpose is different.
 
MUSHRA is different but again so is the purpose.
 
I favor using the right tool for the job, and depending on the job (e.g. managing technical changes that necessarily involve audible differences AKA mixing recordings and live sound) I even favor and use sighted evaluations. :wink:
 
I resent turning ABX into a religion for obvious reasons. I don't know if you know this but ABX of the 1977 variety is my baby - I built the first ABX Comparator and did the first ABX test. I'll still throw it out the window when I find something better!
 
May 31, 2015 at 8:22 AM Post #10 of 57
Arny / @safulop
 
Looking purely at the audibility of differences between audio codecs / containers (eg lossless vs high bit-rate lossy) - assuming same master, properly re-encoded from lossless master to lossy copy, what would both of you consider to be the gold standard for comparison?  This really interests me - as when I first came here, I was one of the many who considered that "of course I could tell the difference".  When I performed my first volume matched ABX tests - confident I could ace it - I suddenly discovered that I was merely human 
wink.gif
.  The discovery was actually surprisingly liberating. My own personal transparency level seems to be about aac200, so I use aac256 for my portable audio.
 
If there is a better method than blind volume matched ABX, I'd be really interested.
 
BTW - I'm Paul.  Feel fee to use first name basis if you're comfortable with that.
 
May 31, 2015 at 1:29 PM Post #11 of 57
Can anyone speak to potential hardware/software issues due to quick-switching in ABX tests? I've generally had to use this when testing the limits of my own hearing, for instance when seeing just how far below 16/44.1 I can push things before I hear a difference in a given track. But part of me feels like it's cheating, and certainly I worry about the more philosophical matter of whether a quick-switch difference is really what we mean by an "audible" difference.
 
May 31, 2015 at 2:37 PM Post #12 of 57
Well I don't know of a better method, but I know that ABX is expected to miss a certain amount of audible difference because the two different sounds are not "side by side" in a literal sense.  You have to present first one and then another, so it becomes like a test of memory.  Imagine if we did the same thing with swatches of color.  I bet there are numerous pairs of color swatches from the paint store that you "couldn't tell apart" if I showed them to you in the manner of an ABX test - first one and then the other.  But you can sure tell the difference when you see them both, and they do have to be butted up against each other as well.  Even a few inches of separation and you can lose the ability to distinguish them even while seeing them both at the same time.
 
I'm thinking about a method for sound comparison that has more of this side-by-side juxtaposition.  I'm thinking about trying the change from one sound to the other in the middle of the sound, or maybe changing  back and forth several times, but in a way that doesn't cause hard cuts or clicking noises obviously.  A colleague of mine has developed a technique for "sound morphing" by which one sound can be transformed into another smoothly, so that might be applicable to this problem.  He once changed a cello note smoothly into a cat's meow, that was interesting.
 
For the moment I am at the brainstorming stage with this.  Thanks for discussing it with me.
 
-Sean
 
May 31, 2015 at 2:42 PM Post #13 of 57
Your post makes me wonder if some kind of test that puts different signals in the L and R channels might have some kind of utility; just shooting in the dark, or course.
 
May 31, 2015 at 3:14 PM Post #14 of 57
  Your post makes me wonder if some kind of test that puts different signals in the L and R channels might have some kind of utility; just shooting in the dark, or course.

 
A stereo file made of 2 diifferent mono files alternating L/R every 5 seconds? or 2 stereo files made of 2 different mono files, one L-R and the other R-L?
could be a nice experiment....
 
May 31, 2015 at 3:17 PM Post #15 of 57
   
A stereo file made of 2 diifferent mono files alternating L/R every 5 seconds? or 2 stereo files made of 2 different mono files, one L-R and the other R-L?
could be a nice experiment....

 
Something like that, if it didn't drive you absolutely bonkers.
 

Users who are viewing this thread

Back
Top