Hello Clutz,
Quote:
Originally Posted by Clutz /img/forum/go_quote.gif
Are either of you statisticians?
|
Not me.
Quote:
Originally Posted by Clutz /img/forum/go_quote.gif
How will wavoman's set up produce more false negatives than yours?
|
Reminder, the argument was about Wavoman's idea of gathering people in a meeting where two products are supposed to be compared, and asking each one if they prefer A or B.
Example 1 : everyone can easily hear the difference between A and B, but 50 % of the listeners prefer A, and 50% prefer B.
In my setup, everyone pass an ABX test (for example), with success.
In Wavoman's setup, 50 % of the listener say that they prefer A, and 50% say that they prefer B. Result : no difference is proven.
Example 2 : none can hear a difference except one, who prefers B.
In my setup, all listener fail the test, except the last one. That's enough to get a statistically significant positive.
In Wavoman's, all answers are random, except the last one, that says "B is better". No significant difference has been found.
Example 3 : everyone can hear the difference, everyone prefers B. But not all times except one listener, trained in blind testing.
In my setup, all listeners fail to acheive a individually statistically significant result except one, which is enough for the global result to be statistically significant.
In Wavoman's setup, most answers are not significant because of the lack of training of the listeners, except one, which only gives one answer, but is not enough to get a statistically significant result.
Quote:
Originally Posted by Clutz /img/forum/go_quote.gif
People are either going to hear a difference- and report it- or not hear a difference.
|
No, Wavoman said that the question would be "which one do you prefer ?", not "is there a difference ?".
Quote:
Originally Posted by Clutz /img/forum/go_quote.gif
Averaged over a large enough randomly selected population, that sort of noise doesn't really matter
|
The problem is that in real life, finding one listener takes about one or two years, so testing a significant sample of listeners would take hundreds of years.
That's why I give so much importance in finding a perfect, trained listener and an unquestionable blind setup, in order to succeed since the first test with the first listener.
Quote:
Originally Posted by Clutz /img/forum/go_quote.gif
First of all, it seems to me that there are two distinctly different questions here. One is "On average, can a population tell the difference between cable A and Cable B?". [...] The second question is "Can individuals detect a difference between cable A and Cable B?".
|
That's very true ! I only deal with the second question.
Quote:
Originally Posted by Clutz /img/forum/go_quote.gif
For a moment, let's imagine there are differences between two cables. 50% of the population cannot hear the difference between the cables, but 50% of the population can.
|
That's a fair starting point, but in reality, it turns out that facing blind tests, many listeners start with poor performance, then get much better after some training.
Sean Olive wrote a paper about this phenomenon :
Audio Musings by Sean Olive: Part 2 - Differences in Performances of Trained Versus Untrained Listeners
I ran a small experiment about this : you can read the account here :
http://www.head-fi.org/forums/f133/e...24/index3.html
These data give a good example of real-life situation. I think it is intersting to to apply it to the proposed protocols.
I thus disagree with your idea, Haloxt, that the listeners should not know what they are listening to : in this case, in my experiment, we can see that without knowledge of what to listen for, 2 listeners out of 7 have passed the ABX test, while, with the knowledge of the difference, 4 out of 7 have succeeded ! Information about what to listen for have doubled the number of listeners capable of hearing the difference.
I stand with Sean Olive here : the most the listeners are trained, the better the significance of the results.
And, is it useful to recall it ? we are dealing with
extremely small effects in this matter.
The results of my experiment also make me think that the swindles idea might be a bad one, Wavoman. For example, in my experiment, if I had removed listeners as soon as they make a mistake, 6 listeners out of 7 would have been rejected, while, with training, 4 out of 7 are capable of producing significant results on they own. I'm loosing 75 % of valuable listeners this way.