wavoman
Headphoneus Supremus
- Joined
- Jan 19, 2008
- Posts
- 1,873
- Likes
- 45
Quote:
Picking the top scorers and claiming significance is of course the most elementary of mistakes, we agree! (Usually called "the selection fallacy"). But it is also the foundation of more sophisticated "play the winner" designs, which I see you are using ... via your pre-tests!
I think that the selection fallacy is so dreaded that people stay away from "play the winner", which is a shame since it is so powerful.
If you added swindle comparisons your protocols become even stronger. The standard A/B/X question does not cater to that, but the simple "difference", or "preference" question (I like "prefererence" better) does, and the answers quickly weed out the weak subjects.
I see no reason to ever compute a statistical threshold on the group total result ... it has no meaning. These tests are replicated "sample size one" designs, where each subject is a block.
BTW, there is no need to be so precise with your false positive ("guessing") probabilities ... as long as you go forward with additional randomized tests on the selected subjects, the threshold set to allow subjects to go on to the next stage does not need to be based on a probablility calculation.
Originally Posted by Pio2001 /img/forum/go_quote.gif ...most common error ...is to present individual results, then pick among them the most significant and presenting them as successful, because they are above the threshold. ... the question of averaging across the subjects was a very difficult one. On one hand, if we sum the results of everybody, we can prove a difference otherwise unseen, thanks to the statistical weight of all the answers, but on the other hand, since the listeners are rather untrained, the probability is high that one or two listeners score well and not the others. Summing the answers leads then to a failure while these listeners actually hear the difference ... That's why I rather first define a target probability of false positive,... Then I require for any listener that he scores a modest success, like 6/6, in order to be allowed to proceed to the real ABX.... |
Picking the top scorers and claiming significance is of course the most elementary of mistakes, we agree! (Usually called "the selection fallacy"). But it is also the foundation of more sophisticated "play the winner" designs, which I see you are using ... via your pre-tests!
I think that the selection fallacy is so dreaded that people stay away from "play the winner", which is a shame since it is so powerful.
If you added swindle comparisons your protocols become even stronger. The standard A/B/X question does not cater to that, but the simple "difference", or "preference" question (I like "prefererence" better) does, and the answers quickly weed out the weak subjects.
I see no reason to ever compute a statistical threshold on the group total result ... it has no meaning. These tests are replicated "sample size one" designs, where each subject is a block.
BTW, there is no need to be so precise with your false positive ("guessing") probabilities ... as long as you go forward with additional randomized tests on the selected subjects, the threshold set to allow subjects to go on to the next stage does not need to be based on a probablility calculation.