Why do you claim 384 test runs are necessary? Can't a p-value of less than 0.05 can be obtained with far fewer tests?
Edited by Jaywalk3r - 2/15/12 at 10:19am
One experiment can I think, if you measure it objectively. But if you are doing samples, I think you need 384.
Maybe I'm wrong, but I don't think you can do AB testing. Or at least come out with something you could hang your hat on.
Let say you have setup A and setup B. You double blind test it and determine A is better. That information is useless. Now if you get 384 people to double blind test it and 70% say A is better than B, then you can say 70% +- 5% of people will find A better than B.
I don't know if you can prove A sounds better than B. Because what is "better". I think statistical sampling is a more accurate way to test in audio.
Unless you are talking about measuring the actual waves, which is already done. That doesn't say how it will sound, or at least that's what people will argue.
I see where you coming from now. You're talking about if there is a known more accurate. Like testing Lossy/Lossless. I'm talking about something subjective like amp A is better than amp B. You would use different methods I would think.
I still don't like ABX testing. Different doesn't equal better.
Personally, I would love to see double blind testing with large number of participants. It's just very hard to setup.
Edit: I think your looking for http://en.wikipedia.org/wiki/Confidence_interval.
But you ABX testing only tells us that you can tell the difference. To generalize it to me or someone else, you need more people, a lot more. The only way to tell that 75% can tell the difference is if you have more people running the same tests. It still doesn't tell us it will sound better.