Originally Posted by nick_charles
So where do you place the threshold 80% , 75% , 55% , if you want 75% or less you really do need a big sample to rule out random guessing. Some of the old AES filter tests used /20 as a yardstick and saw 15/20 as good enough, though nobody managed that - 8/10 is too weak especially with really big numbers of subjects where you would expect some to get it by random guessing - in the same way that if you have 100 variables you will get significant correlations between some of them by chance ... To express a preference you must perceive a difference, so you are saying that being able to tell a difference 1/10 times is sufficient ?
I think the idea is to grab the 8/10 guy, and keep testing. That will smoke out whether he is the expected chance lucky guesser, or the real deal. I would even do that with the 7/10 guys.
Since we have no love for the null hypothesis, I am indeed suggesting that we can set the threshold low, say 60%, but really only to pick people to continue with. Test them further, lots and lots and lots.
And yes, if I express no difference 900 out of 1000 times, but I prefer hi-res to redbook 100 out of 1000 times, and never redbook to hi-res ever ... then that says something very meaningful: it is a small difference, easy to miss, but real. That's why I want to use A vs B, with A and B assigned randomly, and sometimes the same, and sometimes with ringers ... and ask the listener only to pick one of the four each time:
- I hear no difference
- I hear a difference but have no preference
- I prefer A to B
- I prefer B to A
This is very different, and much better, than A/B/X testing. Boy, will we learn a lot when we make both A and B the same (without telling the listener, of course), or make A 24/96 and B a 64 KB MP3. All the while telling the listener we are randomly assigning one type of signal to A, and another to B.
We don't even tell him what we are testing! Could be resolution, could be cables, could be CD players ... he doesn't see and doesn't know!! And he thinks that the two things we are randomly assigning to A and B stay the same. Ha!This
is testing. And the analysis is very revealing. One subject at a time.