is this really a problem with blind tests?
Jun 26, 2016 at 11:50 PM Thread Starter Post #1 of 126

johncarm

100+ Head-Fier
Joined
Jun 27, 2014
Posts
304
Likes
18
I'm not an expert in sound science, but rather I'm a professional musician. My understanding is that many blind audio tests are done with short clips of sound, but the curious thing is that in evaluating a musician's sound we would never think that we could pick up all the details in a short clip. From my musical experience, the shorter the sound, the less you notice about it. So one would not notice much at all except for the largest factors in a very short clip. Is there some evidence to the contrary?
 
EDIT to clarify:
 
I know I sound like I'm asking about the "blind" part of blind tests, and the actual details of the test (long or short signals, for instance) can vary. However, I think I'm really asking more about the body of knowledge that results from doing blind tests.
 
(1) Let's just WAY oversimplify the situation to get a starting point. Let's say there's a body of tested hypotheses called "Sound Science." Let's say this knowledge includes the idea that most premium cables are snake oil, and that mp3 files of a certain bit rate are indistinguishable from uncompressed formats. I'm just trying to give a general idea here.
 
(2) I assume that a great deal of blind listening experimentation went into forming this body of knowledge.
 
(3) It seems to me that realities, practicalities, the need to run many tests for statistically valid results, and listener psychology would contribute to put pressure on shortening the time a listener spends with any one trial. What's curious is that from my perspective as a professional orchestral musician, we would never evaluate a player on a very short excerpt. We don't have forever to spend with them, but we know that the impression that music makes on a listener takes time to form.
 
Jun 27, 2016 at 12:04 AM Post #2 of 126
  I'm not an expert in sound science, but rather I'm a professional musician. My understanding is that many blind audio tests are done with short clips of sound, but the curious thing is that in evaluating a musician's sound we would never think that we could pick up all the details in a short clip. From my musical experience, the shorter the sound, the less you notice about it. So one would not notice much at all except for the largest factors in a very short clip. Is there some evidence to the contrary?

 
Blind testing just means that the testing is blind. There is no limit on the length of the clip. I mean, there can be time limitations, but by definition a test being blind has nothing to do with length of clips. That said, if you're talking about fast switching between sources, that has been shown as the only consistent way to tell the difference between more than one source. 
 
Jun 27, 2016 at 12:13 AM Post #3 of 126
   
Blind testing just means that the testing is blind. There is no limit on the length of the clip. I mean, there can be time limitations, but by definition a test being blind has nothing to do with length of clips. That said, if you're talking about fast switching between sources, that has been shown as the only consistent way to tell the difference between more than one source. 

 
I understand, it seems like there would be many possible ways to carry out blind tests. Can you define fast switching? What do you mean by "consistent"?
 
Jun 27, 2016 at 1:18 AM Post #4 of 126
   
I understand, it seems like there would be many possible ways to carry out blind tests. Can you define fast switching? What do you mean by "consistent"?

You have a point in that there are self-imposed time limits even if a test is not officially timed, but there are some blind tests that have taken place over very extended periods, and it seems that instantaneous abx switching is generally just as good as extended tests for spotting differences.
 
A word of warning: in my experience this forum is not a good place to discuss potential shortcomings of blind tests, just as the rest of headfi isn't a great place to talk about the usefulness of blind tests. It's a very partisan environment. If you want to see how offensive people find asking the wrong kind of question about blind tests, check out this thread that I started a while ago, in which I asked a similar question:
 
http://www.head-fi.org/t/809738/could-unconscious-auditory-processing-complicate-the-picture-of-what-is-and-what-is-not-audible-over-the-long-term 
 
I had to qualify my question about a dozen times and explain every 3 or so posts that I wasn't attacking or trying to replace blind tests, that I was just trying to explore the boundaries of blind tests' usefuless. After several pages I had to rename the thread to something less incendiary than "another flaw of blind tests?" at which point people stopped paying attention to it. I learned that sound science is generally not very friendly to theorizing. They want hard evidence and they may imply that you are stupid if you ask questions that are not immediately falsifiable.
 
Jun 27, 2016 at 1:57 AM Post #5 of 126
  You have a point in that there are self-imposed time limits even if a test is not officially timed, but there are some blind tests that have taken place over very extended periods, and it seems that instantaneous abx switching is generally just as good as extended tests for spotting differences.
 

Can you be more specific about what "instantaneous" switching is? For instance, is there a continuous music signal which is switched from A to B while the music is in progress? Or is there a short clip which is first played through A, then played through B?
 
It seems to me that in the big picture, there is not much motivation to encourage listeners to take their time. Research tests cost money, and time is money. Listeners themselves may feel certain they know the answer, but that's the thing about perception of sound--you always feel certain, but the thing you are certain is right changes over time. When a listener in a test thinks they know the answer, what motivation do they have to keep listening? In fact, it takes years of training for a musician to become more conscious of their perceptual process. How are listeners in tests trained, graded, and controlled?
 
Jun 27, 2016 at 2:05 AM Post #6 of 126
  Can you be more specific about what "instantaneous" switching is? For instance, is there a continuous music signal which is switched from A to B while the music is in progress? Or is there a short clip which is first played through A, then played through B?
 
It seems to me that in the big picture, there is not much motivation to encourage listeners to take their time. Research tests cost money, and time is money. Listeners themselves may feel certain they know the answer, but that's the thing about perception of sound--you always feel certain, but the thing you are certain is right changes over time. When a listener in a test thinks they know the answer, what motivation do they have to keep listening? In fact, it takes years of training for a musician to become more conscious of their perceptual process. How are listeners in tests trained, graded, and controlled?

Usually abx testing uses a continuous music signal. Here is an example that you can take comparing lossless to compressed music files: http://abx.digitalfeed.net/. I think that most blind tests  do not meet the standards of academic behavioral science research, but I could be wrong about that. One way that testers are commonly stratified is to separate the engineers from the audiophiles, and the engineers generally do better.
 
Jun 27, 2016 at 2:06 AM Post #7 of 126
  I learned that sound science is generally not very friendly to theorizing. They want hard evidence and they may imply that you are stupid if you ask questions that are not immediately falsifiable.

not stupid - just not useful - infinite "theorizing" is possible - falsifiable propositions testing/distinguishing specific hypothesis allow advance
 
sometimes accumulation of data/observations lead to hypothesis - but the real test is whether the hypothesis predicts results of new experiments
 
 
that people have internal "state" - hard neural wiring from experience, training, expectation - and the plasticity of that state from learning, changing focus, mood, or even social pressure makes any detailed theorizing about that internal brain state rather fraught
 
 
 
for experience with bind testing try foobar abx plugin with whatever you want to compare - I have shown "polarity"/relative phase of harmonics as audible with test files, convincing Ethan Winer to change his position
 
test signals, particularly with differences that have a uniform metric that can be ordered from "big difference" to "very little" are much better at showing thresholds of human auditory processing than random music cuts
 
but some cuts can be found with features that are good for specific tests - there are "killer samples" for many psychoacoustic codec tunings/levels
 
 
I believe most "musical features" of interest to professional musicians are actually much higher level in complexity than we expect most audio playback electronics and transducer systems to have the subtlety or "memory" to influence
 
a popular audiophile meme is that professional musicians pull what interests them in a performance from very weak by audiophile standards audio systems
 
while analogies can mislead - maybe you could think of the audio gear as a book's paper, print quality, font details - but do you need to read big L Literature, judge whether the dramatic content of the story is affected by the kerning, serifs?
 
Jun 27, 2016 at 3:01 AM Post #8 of 126
  not stupid - just not useful - infinite "theorizing" is possible - falsifiable propositions testing/distinguishing specific hypothesis allow advance

A nice addendum to "just not useful" would be  "in my opinion". Theories can be developed with limited evidence. Einstein  said "Physical concepts are free creations of the human mind, and are not, however it may seem, uniquely determined by the external world. In our endeavor to understand reality we are somewhat like a man trying to understand the mechanism of a closed watch". Many of his theories were not directly validated by experimental evidence for decades after he published them but he knew that this did not make them useless. Both abstract thought and experiments are tools that can be misused. If you are afraid of misusing your capacity for abstract thought there is no need to participate in or derail discussions that you aren't comfortable with
 
Jun 27, 2016 at 3:27 AM Post #9 of 126
  I'm not an expert in sound science, but rather I'm a professional musician. My understanding is that many blind audio tests are done with short clips of sound, but the curious thing is that in evaluating a musician's sound we would never think that we could pick up all the details in a short clip. From my musical experience, the shorter the sound, the less you notice about it. So one would not notice much at all except for the largest factors in a very short clip. Is there some evidence to the contrary?

first something involving rapid switching implies that you're looking for audible differences between 2 similar sounds. it's obvious, but I prefer to put it there.
biggrin.gif

such a test would tell you if you can notice some differences or not, at a given loudness. it may not help you establish a list of differences, or say if an instrument is more realistic or played with more expertise. so the question you're asking, first needs to be one that a blind test and rapid switching can help answer. something @AutumnCrown seemingly took too often as us trying to defend blind testing, when all we've been saying is "the right tool for the right job". but is it my bias that stops me from seeing that I'm overly protective? or is it AC's preconception that made him see it that way? ^_^. a world of mysteries!!!!!!
 
short samples are often suggested in tests, but only because it has been established that our audio memories start failing us after 3 to 10seconds(depending on what paper you read on the matter). so deciding to use short music samples is simply the decision to try and remove the memory variable from the test. it's often a game of compromises as we seldom can remove all variables for a perfect test. of course if a longer sample could help answer the specific question we have, then we would probably go for it. as said above the important part of comparing 2 samples is the rapid switching. but the length of the test, the number of samples, the order ... can very well change to adapt to a specific problem.
blind test is really just a generic term.
 
Jun 27, 2016 at 4:36 AM Post #10 of 126
  Usually abx testing uses a continuous music signal. Here is an example that you can take comparing lossless to compressed music files: http://abx.digitalfeed.net/. I think that most blind tests  do not meet the standards of academic behavioral science research, but I could be wrong about that. One way that testers are commonly stratified is to separate the engineers from the audiophiles, and the engineers generally do better.


It's curious that a continuous signal is used. Because music is always changing. At the moment you flip from A to B, the music has changed even in that moment. You are comparing something after the switch time that is not the same as what happened before the switch time. So you are supposed to sort out the change in the music from the change in the equipment?
 
Jun 27, 2016 at 4:38 AM Post #11 of 126
  short samples are often suggested in tests, but only because it has been established that our audio memories start failing us after 3 to 10seconds(depending on what paper you read on the matter). so deciding to use short music samples is simply the decision to try and remove the memory variable from the test. it's often a game of compromises as we seldom can remove all variables for a perfect test. of course if a longer sample could help answer the specific question we have, then we would probably go for it. as said above the important part of comparing 2 samples is the rapid switching. but the length of the test, the number of samples, the order ... can very well change to adapt to a specific problem.
blind test is really just a generic term.

 
How was it determined by sound science that audio memory is so short? Musicians and instrument makers can only function if they can track changes in sound that result from experiments that take place over days or weeks.
 
Jun 27, 2016 at 4:43 AM Post #12 of 126
  a popular audiophile meme is that professional musicians pull what interests them in a performance from very weak by audiophile standards audio systems
 
while analogies can mislead - maybe you could think of the audio gear as a book's paper, print quality, font details - but do you need to read big L Literature, judge whether the dramatic content of the story is affected by the kerning, serifs?

 
Some musicians have crummy systems, while some have very good systems. My interest is in how details of performance affect the overall musical effect, and you need good resolution to hear that.
 
I realize your analogy may be imperfect, but let me just state an objection to it, and then you can clarify if necessary. The font doesn't have a large effect on how you perceive a story other than making it easier to read, but the details of a musical performance have EVERYTHING to do with how it is perceived. For example, small changes in the overlap of successive piano notes (that is, changes in legato) can make a performance come alive, but you can only hear that overlap if you can hear the relatively quiet signal of the prior (damped) note ringing along with the attack of the new note.
 
Jun 27, 2016 at 8:01 AM Post #13 of 126
 
It's curious that a continuous signal is used. Because music is always changing. At the moment you flip from A to B, the music has changed even in that moment. You are comparing something after the switch time that is not the same as what happened before the switch time. So you are supposed to sort out the change in the music from the change in the equipment?

 
You can move back and forth in the time
 
   
How was it determined by sound science that audio memory is so short? Musicians and instrument makers can only function if they can track changes in sound that result from experiments that take place over days or weeks.

 
It's entirely possible that you think you're recalling it a lot better than you are. 
 
Here is a test where a group of audiophiles took home some equipment, had all the time in the world to test, while some engineers stayed and used fast switching to try to determine which of a couple sources had significant amounts of THD added:
 
http://audiosciencereview.com/forum/index.php?threads/aes-paper-digest-sensitivity-and-reliability-of-abx-blind-testing.186/
 
The group doing fast switching could pick it out. The group doing long-term listening tests couldn't. 
 
But the real answer is that years and lots of studies are what got to the answer you're looking for. The ability to quickly switch has been the only method wherein humans are able to overcome (or not overcome) their hubris when it comes to the ability of their ears and actually identify differences between sources. 
 
   
Some musicians have crummy systems, while some have very good systems. My interest is in how details of performance affect the overall musical effect, and you need good resolution to hear that.
 
I realize your analogy may be imperfect, but let me just state an objection to it, and then you can clarify if necessary. The font doesn't have a large effect on how you perceive a story other than making it easier to read, but the details of a musical performance have EVERYTHING to do with how it is perceived. For example, small changes in the overlap of successive piano notes (that is, changes in legato) can make a performance come alive, but you can only hear that overlap if you can hear the relatively quiet signal of the prior (damped) note ringing along with the attack of the new note.

 
Which is a limitation of human hearing. Masking means that humans often can't hear quiet sounds immediately before much louder ones, depending on how much more quiet the first signal is, relative to the second, and how close they are to each other. 
 
  A word of warning: in my experience this forum is not a good place to discuss potential shortcomings of blind tests, just as the rest of headfi isn't a great place to talk about the usefulness of blind tests. It's a very partisan environment. If you want to see how offensive people find asking the wrong kind of question about blind tests, check out this thread that I started a while ago, in which I asked a similar question:
 
http://www.head-fi.org/t/809738/could-unconscious-auditory-processing-complicate-the-picture-of-what-is-and-what-is-not-audible-over-the-long-term 
 
I had to qualify my question about a dozen times and explain every 3 or so posts that I wasn't attacking or trying to replace blind tests, that I was just trying to explore the boundaries of blind tests' usefuless. After several pages I had to rename the thread to something less incendiary than "another flaw of blind tests?" at which point people stopped paying attention to it.

 
It does get tiresome. You still haven't figured out why people were annoyed, just FYI. For example, I would quote something you said, respond to it directly like, line by line response, and you would reply that I wasn't even reading what you were saying. You can't say X=Y then get mad at people for saying, "No it doesn't" with the claim that it's not about Y and X. 
 
But again, if you're trying to describe a flaw of blind testing it has to be a flaw inherent to blind testing. If you are criticizing the design of a particular blind test, that's fine, but you aren't criticizing blind testing, just a flawed implementation of it.
 
Jun 27, 2016 at 10:19 AM Post #14 of 126
  (3) It seems to me that realities, practicalities, the need to run many tests for statistically valid results, and listener psychology would contribute to put pressure on shortening the time a listener spends with any one trial. What's curious is that from my perspective as a professional orchestral musician, we would never evaluate a player on a very short excerpt. We don't have forever to spend with them, but we know that the impression that music makes on a listener takes time to form.

 
Didn't some orchestras start using blind auditions for exactly the same reason we call for blind testing: to remove biases not associated with the actual sound something can make? That's at the core of all this. Quick switching is just a method that has proven useful within this context to detect subtle differences. We're talking things like a dB here and there in some frequency range, not the difference between two performers…
 
Quote:
  For example, small changes in the overlap of successive piano notes (that is, changes in legato) can make a performance come alive, but you can only hear that overlap if you can hear the relatively quiet signal of the prior (damped) note ringing along with the attack of the new note.

 
And people would probably easy hear the difference in a testing environment. Quick switching is a tool when doing an ABX test, not a requirement.
 
Quote:
  A nice addendum to "just not useful" would be  "in my opinion". Theories can be developed with limited evidence. Einstein  said "Physical concepts are free creations of the human mind, and are not, however it may seem, uniquely determined by the external world. In our endeavor to understand reality we are somewhat like a man trying to understand the mechanism of a closed watch". Many of his theories were not directly validated by experimental evidence for decades after he published them but he knew that this did not make them useless. Both abstract thought and experiments are tools that can be misused. If you are afraid of misusing your capacity for abstract thought there is no need to participate in or derail discussions that you aren't comfortable with

 
What we're really trying to get at is a *standard* of evidence. There's a big difference between limited *good* evidence and a mountain of subjective testimonials.
 

 
Jun 27, 2016 at 11:21 AM Post #15 of 126
You have a point in that there are self-imposed time limits even if a test is not officially timed, but there are some blind tests that have taken place over very extended periods, and it seems that instantaneous abx switching is generally just as good as extended tests for spotting differences.

A word of warning: in my experience this forum is not a good place to discuss potential shortcomings of blind tests, just as the rest of headfi isn't a great place to talk about the usefulness of blind tests. It's a very partisan environment. If you want to see how offensive people find asking the wrong kind of question about blind tests, check out this thread that I started a while ago, in which I asked a similar question:

http://www.head-fi.org/t/809738/could-unconscious-auditory-processing-complicate-the-picture-of-what-is-and-what-is-not-audible-over-the-long-term 

I had to qualify my question about a dozen times and explain every 3 or so posts that I wasn't attacking or trying to replace blind tests, that I was just trying to explore the boundaries of blind tests' usefuless. After several pages I had to rename the thread to something less incendiary than "another flaw of blind tests?" at which point people stopped paying attention to it. I learned that sound science is generally not very friendly to theorizing. They want hard evidence and they may imply that you are stupid if you ask questions that are not immediately falsifiable.

But it was your fault and nobody elses that your original thread title and first few posts were total bollocks, I can't imagine how these could have been any further away from what you claimed you wanted to discuss. Bluntly, you were being disingenious in subsequently claiming you weren't criticising blind testing, it was only the "heat" you got that made you backtrack, so spare us the guff, your silence on that thread since speaks volumes, there are questions on that thread you haven't answered. I guess the magic disappeared for you once the thread ceased to be a blind test kicking contest.

And nobody was interested in a discussion involving more "could it be's" than several seasons of Ancient Aliens. I wonder why?
 

Users who are viewing this thread

Back
Top