The flaws in blind testing
Nov 1, 2010 at 3:44 PM Thread Starter Post #1 of 44

Prog Rock Man

Headphoneus Supremus
Joined
Jul 2, 2009
Posts
3,812
Likes
196
Starting with this article by Robert Harley, first published in 2008 in The Absolute Sound
 
http://www.avguide.com/forums/blind-listening-tests-are-flawed-editorial?page=1
 
The main part being -
 
"The Blind (Mis-) Leading the Blind
 
Every few years, the results of some blind listening test are announced that purportedly “prove” an absurd conclusion. These tests, ironically, say more about the flaws inherent in blind listening tests than about the phenomena in question.
The latest in this long history is a double-blind test that, the authors conclude, demonstrates that 44.1kHz/16-bit digital audio is indistinguishable from high-resolution digital. Note the word “indistinguishable.” The authors aren’t saying that high-res digital might sound a little different from Red Book CD but is no better. Or that high-res digital is only slightly better and not worth the additional cost. Rather, they reached the rather startling conclusion that CD-quality audio sounds exactly the same as 96kHz/24-bit PCM and DSD, the encoding scheme used in SACD. That is, under double-blind test conditions, 60 expert listeners over 554 trials couldn’t hear any differences between CD, SACD, and 96/24. The study was published in the September, 2007 Journal of the Audio Engineering Society.
 
I contend that such tests are an indictment of blind listening tests in general because of the patently absurd conclusions to which they lead. A notable example is the blind listening test conducted by Stereo Review that concluded that a pair of Mark Levinson monoblocks, an output-transformerless tubed amplifier, and a $220 Pioneer receiver were all sonically identical. (“Do All Amplifiers Sound the Same?” published in the January, 1987 issue.)
Most such tests, including this new CD vs. high-res comparison, are performed not by disinterested experimenters on a quest for the truth but by partisan hacks on a mission to discredit audiophiles. But blind listening tests lead to the wrong conclusions even when the experimenters’ motives are pure. A good example is the listening tests conducted by Swedish Radio (analogous to the BBC) to decide whether one of the low-bit-rate codecs under consideration by the European Broadcast Union was good enough to replace FM broadcasting in Europe.
Swedish Radio developed an elaborate listening methodology called “double-blind, triple-stimulus, hidden-reference.” A “subject” (listener) would hear three “objects” (musical presentations); presentation A was always the unprocessed signal, with the listener required to identify if presentation B or C had been processed through the codec.
 
The test involved 60 “expert” listeners spanning 20,000 evaluations over a period of two years. Swedish Radio announced in 1991 that it had narrowed the field to two codecs, and that “both codecs have now reached a level of performance where they fulfill the EBU requirements for a distribution codec.” In other words, Swedish Radio said the codec was good enough to replace analog FM broadcasts in Europe. This decision was based on data gathered during the 20,000 “double-blind, triple-stimulus, hidden-reference” listening trials. (The listening-test methodology and statistical analysis are documented in detail in “Subjective Assessments on Low Bit-Rate Audio Codecs,” by C. Grewin and T. Rydén, published in the proceedings of the 10th International Audio Engineering Society Conference, “Images of Audio.”)
 
After announcing its decision, Swedish Radio sent a tape of music processed by the selected codec to the late Bart Locanthi, an acknowledged expert in digital audio and chairman of an ad hoc committee formed to independently evaluate low-bit rate codecs. Using the same non-blind observational-listening techniques that audiophiles routinely use to evaluate sound quality, Locanthi instantly identified an artifact of the codec. After Locanthi informed Swedish Radio of the artifact (an idle tone at 1.5kHz), listeners at Swedish Radio also instantly heard the distortion. (Locanthi’s account of the episode is documented in an audio recording played at workshop on low-bit-rate codecs at the 91st AES convention.)
How is it possible that a single listener, using non-blind observational listening techniques, was able to discover—in less than ten minutes—a distortion that escaped the scrutiny of 60 expert listeners, 20,000 trials conducted over a two-year period, and elaborate “double-blind, triple-stimulus, hidden-reference” methodology, and sophisticated statistical analysis?
 
The answer is that blind listening tests fundamentally distort the listening process and are worthless in determining the audibility of a certain phenomenon.
As exemplified by yet another reader letter published in this issue, many people naively assume that blind listening tests are somehow more rigorous and honest than the “single-presentation” observational listening protocols practiced in product reviewing. There’s a common misperception that the undeniable value of blind studies of new drugs, for example, automatically confers utility on blind listening tests.
I’ve thought quite a bit about this subject, and written what I hope is a fairly reasoned and in-depth analysis of why blind listening tests are flawed. This analysis is part of a larger statement on critical listening and the conflict between audio “subjectivists” and “objectivists,” which I presented in a paper to the Audio Engineering Society entitled “The Role of Critical Listening in Evaluating Audio Equipment Quality.” You can read the entire paper here http://www.avguide.com/news/2008/05/28/the-role-of-critical-listening-in-evaluating-audio-equipment-quality/. I invite readers to comment on the paper, and discuss blind listening tests, on a special new Forum on AVguide.com. The Forum, called “Evaluation, Testing, Measurement, and Perception,” will explore how to evaluate products, how to report on that evaluation, and link that evaluation to real experience/value. I look forward to hearing your opinions and ideas.
 
Robert Harley"
 
 
Can we trust blind tests when they failed so badly to spot something that was then found very quickly with a sighted test?
 
Nov 1, 2010 at 4:29 PM Post #2 of 44
No, the testing done by Swedish radio tells me something about the brains plasticity (rewiring) and the common hold that "science" have on people that are seeing in their mind an authority figure in a lab coat, telling them about the world (of a limited scientific field).
 
Plasticity refers to the brain's ability to "rewire" itself based on changes in its functioning. When an individual learns something new, there are changes made in the wiring (the synapses) of the brain.

The subconscious part of the auditory nervous system works like a computer that sorts, organizes, and routes the different signals to provide the cortex with the information needed for survival and well being of the whole body. As part of this process, signals are filtered out that contain information that is irrelevant for survival and well being. Those signals, such as signals for the sound from an air conditioner, are blocked from reaching a person's consciousness. This blocking/filtering function enables a person to concentrate on what is most important for efficient functioning of the body.
 
Is this absurd rambling even of relevance to a controlled environment of DBT?

Consider living in a nest or animal burrow! In the relative silence of houses with doubled-glazed windows, often hermetically sealed from the outside world, the absence of sound stimulation leads to an increase in auditory gain (amplification) in the subconscious auditory pathways. The brain is always looking the best way it can for auditory signals. This process is enhanced by silence which is considered to be one of the signs of possible predator activity . The auditory filters 'open' in an attempt to monitor the external sound environment. External sounds may then increase dramatically in their perceived intensity and intrusiveness.

During the past decade, brain imaging has provided important insights into the enormous capacity of the human brain to adapt to complex demands. Brain plasticity is best observed in complex tasks with high behavioral relevance for the individual, i.e. that cause strong emotional and motivational activation. Plastic changes are more pronounced in situations where the task or activity has been developed early in life and whose performance is intense. Obviously, the continued activities of accomplished musicians provide the ideal prerequisites for brain plasticity, and it is not astonishing that the most dramatic brain plasticity effects have been demonstrated in professional musicians.

Music elicits strong emotional (and intellectual) responses. For humans, such responses are as essential to high-quality performance as to the reception of music and they are accompanied by strong activations of the limbic system – a network of brain centers at the inner border of the cortex – which is involved in reward, emotion and motivation. Much more research is required to show whether and how it is activity in areas of this network that mediates the strong and dynamic neuroplastic adaptations seen in performing musicians.

The human nervous system processes external source data at very high rates, perhaps as much as several billions of bits per second (several Gbit/sec). A majority of such processing occurs at a subconscious level, but may nevertheless guide the body in its maintenance and control functions. The associated neurological data processing “subsystems” typically call for the attention of the human conscious “executive system” only as needed, such as when something is awry or when the conscious executive has requested a special report. One may appreciate the advantage of such sub-conscious processing in light of the observation that the highest rates of conscious information processing have been estimated to be less than 100 bits per second, even as low as 25 bits per second.

Kyoto, 16 Jan 1995: Ha, ha, all the cats and dogs have suddenly vanished from this beautiful city...

Clearly lower animals (at least birds and mammals) show an ability to recognize 'things' in their environments. They show a capacity to recognize process and dynamic relationships. These abilities suggest that there is something very basic about systems representation and model building. Here is what we know about the brain that provides clues as to how it builds dynamic models of systems composed of subsystems and themselves composing meta-systems.

I haven't said anything as yet about the fact that the two hemispheres of the brain are lateralized or functionally dual. This issue is terribly overplayed in popular psychology (left-brain/right-brain people!) but there are some obvious differences in functions performed on either side by mature brains. One of the more interesting findings is that the left hemisphere (or at least the frontal lobes and parts of both the parietal and temporal lobes) is the site of enduring patterns of processing. Most often noted is that ordinarily Wernicke's area and Brocca's area work their speech processing magic on the left hemispheres. Other evidence suggests that other routine processing modules are instantiated in the left hemisphere cortex.
 
This leads to questions about the popularly viewed 'heart' side of the brain — the right hemisphere. Goldberg has developed a very interesting model that suggests that the right cortex is largely involved in processing novelty or newly developed circuits — new models. It could be that the left hemisphere, in particular of the prefrontal and pre-motor areas of the frontal lobe, has the machinery in place to guide the construction of a model to be built in the right hemisphere where it can be 'tried out'. The model could also arise by copying circuit relations from an existing model (from the left hemisphere) into the right hemisphere and then guiding changes. This would be essentially what we mean by analogic thinking.

Once a model is constructed and 'tested', perhaps validated by experience, it might be copied back into the left hemisphere for future use in routine thinking or as the basis for a new analogy.

This scheme requires a tremendous degree of plasticity in the wiring between neurons and cortical columns in the right hemisphere. If this is the case one test of the hypothesis would be to look for dynamic and possibly amorphous (that is, dense, but weakly activating) connectivity patterns in the right hemisphere. Indeed a great deal of work on working memory involving novel task learning implicates the right hemisphere frontal lobe. Barrs had developed a theory of working memory that is accessible to all relevant regions of the brain as a 'Global Workspace', though the idea here is related to consciousness and would not apply to subconscious processes of model building, strictly speaking.

Nevertheless, a general vision of the right hemisphere acting as a giant white board where images can be temporarily written and adjusted is appealing. The left hemisphere, frontal lobe acting as a controller, initiating the writing, guiding the adjusting, initiating testing, and finally encoding a permanent image of a dynamic model for later automatic use provides for a compelling model of how the brain can think new thoughts.

Strategic control (thinking) requires that we build models of how the external environment works. This includes making decisions and judgments about what should be learned and how to go about learning. The models are used to simulate how the world will evolve into the future under different starting conditions, particularly with respect to actions that we might take in the present. The models themselves are organized concept clusters probably residing in the anterior parietal lobes. Their dynamics (i.e. running the models) is likely orchestrated by the premotor regions of the frontal lobes in conjunction with the choreography directions mediated in the cerebellum. The outputs from these models are analyzed by regions of the prefrontal cortex (other Brodmann areas conjoined with area 10) and supplied to BA10 for disposition and ultimately for making strategic (e.g. long-term) decisions.

Bottom line.. DBT-testing are cutting off the subconscious part of hearing and going all statistical, the rushing part that gets one excited and going ballistic gets ignored. As I see it DAB-radio isn't all that, especially with this loudness war, but people no longer care I guess.
 
Sources:
http://www.tinnitus-audiology.com/trt101.html
http://www.tinnitus.org/home/frame/hyp1.htm
http://www.karger.com/gazette/70/altenmueller/art_4.htm
http://www.freepatentsonline.com/7648366.html
http://faculty.washington.edu/gmobus/Background/SapienceExplained/4.neuroscienceSapience/neuroscienceOfSapience.html
 
Nov 1, 2010 at 4:33 PM Post #3 of 44
His main point seems to be that blind tests are ridiculous because he thinks blind tests are ridiculous.

Also, it would be wrong to use that one single example to discredit all examples. First, educated ears can pick out things like null tones. It is completely possible that the expert could have picked it out during a blind test. Second, the null tone *actually existed* and was measurable, demonstrable and repeatable.

There is quite a difference between things that exist and the Land of Make Believe, where all wishes and dreams come true. Usually at ridiculous prices and where radically divergent design philosophies somehow provide the exact same benefit on two completely different systems. Not one single belief system in the Land of Make Believe squares with competing belief systems.

Not every belief system can be right. But all of them can be wrong.

Then again, who needs critical thought when you're frolicking with unicorns in the Land of Make Believe?
 
Nov 1, 2010 at 4:56 PM Post #4 of 44
I think blind tests as used in the examples above are somewhat limited and don't always lead to the correct conclusion, however, having been a victim of the placebo effect myself, I think that blind tests are still useful when you really want to prove a difference as definitive by successfully completing a test.
 
Question: FWIR on Hydrogenaudio. . . a long time ago. . . FAILING a blind test doesn't actually prove anything one way or another, so why was it used to conclude there was no difference in some cases? I'm not sure this is the way it is supposed to be done. If you incorporate this rule into blind testing it then it makes more sense and has the more specific purpose outlined in italics. (I think)
 
PLEASE CORRECT ME IF I'M WRONG!! I might be. . . 
 
 
Nov 1, 2010 at 5:00 PM Post #5 of 44
Don't you think you are throwing the baby out w/ the bath water Erik?  I see nothing wrong w/ illuminating previously inconceivable factors and challenging accepted Dogma w/ a skeptical viewpoint.  Surely, testing for veracity is but to reinforce the truth.  
 
On a side note I think it would be best to refrain from making universal judgements and claims w/ respect to thread integrity.  Otherwise history just repeats itself.
 
Nov 1, 2010 at 5:48 PM Post #6 of 44
There is nothing inherently wrong with testing methodologies. Some are more practical than others for different questions. There are obstacles or pitfalls to properly conducting a DBT for hearing subtle differences, but I believe it is possible to do a good test, just do a good job minimizing false negatives.
 
Nov 1, 2010 at 7:06 PM Post #7 of 44
Testing introduces another factor (DBT) that has nothing to do with how I usually listen to music, the mind alter my perception when for example I'm in an analytical mood and locking in on a part/ instrument of a musical piece... ignoring the less essential parts.
 
The test Swedish Radio performed was "Subjective Assessments on Low Bit-Rate Audio Codecs", so one can always confute it's relevance and then ignore auditory plasticity and brain hemispheres. Such data are only of interest in the scientific field of research concerning hearing disability, not those suffering of stereophilism. Science in that field at the current state knows very little about the subconscious processing speed (of light) @ 1,000,000,000 bits per second.
 
Is there anyone saying that in a rigorous ABX-environment where the processing speed (of darkness) @ less than 100 bits per second are the only way to listen to a musical piece and to get excited only produces an invalid test score?
 
Can anyone shed some light upon that aspect of hearing?
 
As for the test...
 
Robert Harley -> http://www.stereophile.com/asweseeit/894awsi/index1.html
It is ironic that Swedish Radio's extensive listening tests, with over 20,000 separate trials and 60 "expert listeners," failed to detect a flaw immediately apparent to a single listener. Their listening-test methodology—called "hidden reference, double-blind, triple stimulus"—was beyond scientific reproach. Yet a single listener in "unscientific" listening conditions immediately identified this fundamental problem. A paper by Michael Gerzon described later in this report comments peripherally on this issue of double-blind listening-test protocols not revealing the very flaws they are designed to detect.

 
..beyond scientific reproach, well I don't know and don't have $20 -> http://www.aes.org/e-lib/browse.cfm?elib=5396
 
Must add that I'm very tempted to quote one particular philosopher's view on aspect-blindness. which BTW has something to do with the subconsciousness. Going on about that is not very fruitful as one has to rely on esoteric philosophy, which I've managed to point to.. already, well, IME and all that.
 
Nov 1, 2010 at 7:28 PM Post #8 of 44
If you get into an analytical mood during a DBT, one thing that the test designer may try to implement is something that will minimize the analytical mood you feel.
 
I know most people reading this don't see the need to consider such a measure, but when dealing with something as unpredictable as human perception, such creative approaches can sometimes yield good fruit.
 
Albedo, give me your thoughts on some tips I wrote on what I believe would make for a decent cable DBT.
 
http://www.head-fi.org/forum/thread/435801/propose-your-protocol-here/15#post_5986220
 
Nov 1, 2010 at 7:52 PM Post #9 of 44
Hmm..
 
The Genie in the Bottle.. there's always a sexual side to it -> http://www.sacred-texts.com/oto/lib811.htm that's not usually applied, but to reproduce as in cloning (statistically significant) is difficult.
 
Energized Enthusiasm:
To sum up, I can always trace a connection between my sexual condition and the condition of artistic creation, which is so close as to approach identity, and yet so loose that I cannot predicate a single important proposition.

 
.. and there's also the instant refusal (subconsciously) to becoming a tool.
 
Nov 2, 2010 at 12:04 AM Post #10 of 44
The amount of debating here is hilarious.  Listen and compare.  If you cant hear a difference, then don't spend money on it.  I think failing to blind test equipment just leads to false views on it (This is seen here all the time with people's bogus claims about their equipment)
 
I personally can not hear a diff between 128 mp3s and lossless.  But I always try to get either 320mp3s or lossless because I have no reason not to.  Space is super cheap.  On the other side, I can not hear a difference using entry level hifi amps and dacs, so I just use my computers motherboard sound.  And boy does it sound good
smily_headphones1.gif

 
Nov 2, 2010 at 12:07 AM Post #11 of 44


Quote:
I can not hear a difference using entry level hifi amps and dacs, so I just use my computers motherboard sound.  And boy does it sound good
smily_headphones1.gif


Thats a joke right.......?
 
Nov 2, 2010 at 12:09 AM Post #12 of 44


Quote:
Thats a joke right.......?


No its not a joke.  I have blind tested 3 different hifi amps with my ipod shuffle and computer, and me or the other person testing was never able to distinguish them.
 
Head-fi does not wanna hear the truth: an ipod is as good as an audio gd compass.
 
Old computers used to have problems with noise, but it just is not the case anymore.  Claim what you want about the "noisy" environment the dac is in, I can't hear it.
 
In some cases, at least with sources, hifi is becoming a thing of the past.
 
Nov 2, 2010 at 12:12 AM Post #13 of 44


Quote:
No its not a joke.  I have blind tested 3 different hifi amps with my ipod shuffle and computer, and me or the other person testing was never able to distinguish them.
 
Head-fi does not wanna hear the truth: an ipod is as good as an audio gd compass.


I think you have a bottleneck somewhere in your testing.  If you want to tell me an HD800 plugged into your laptop playing FLAC sounds the same as using a DACport I'll say you're crazy and you can call me a Koolaid drinker.
 
Nov 2, 2010 at 12:17 AM Post #14 of 44


Quote:
I think you have a bottleneck somewhere in your testing.  If you want to tell me an HD800 plugged into your laptop playing FLAC sounds the same as using a DACport I'll say you're crazy and you can call me a Koolaid drinker.



I have never heard the HD800, so I cant say.  But on my HD580 what I said holds true.  The 300ohm sennheisers are supposedly supposed to be benefited greatly with dedicated amps.  New head-fi users feel they are missing out because they dont have an amp, but I found it to all be nonsense.
 
Nov 2, 2010 at 12:23 AM Post #15 of 44


Quote:
I have never heard the HD800, so I cant say.  But on my HD580 what I said holds true.  The 300ohm sennheisers are supposedly supposed to be benefited greatly with dedicated amps.  New head-fi users feel they are missing out because they dont have an amp, but I found it to all be nonsense.


I haven't heard the HD580 but the HD555 benefited little from amping, plus its veiled signature and average detail retrieval leaves little transparency to be observed from sources and amps.  If I did the same test w/ my 555 you did I would reach the same conclusion, however I would also know the HD555 was the bottleneck here.  300ohms also doesn't guarantee benefits from amping either.  I could easily hear a 50ohm phone clearly benefit wear a 150ohm phone might not at all.  I'm afraid it's a bit more complicated than that.  
 

Users who are viewing this thread

Back
Top