Head-Fi.org › Forums › Equipment Forums › Sound Science › Seeking your thoughts: Nousaine T. "To Tweak or Not to Tweak?" Stereo Review, June 1998. 79-81
New Posts  All Forums:Forum Nav:

Seeking your thoughts: Nousaine T. "To Tweak or Not to Tweak?" Stereo Review, June 1998. 79-81

post #1 of 12
Thread Starter 
http://www.nousaine.com/pdfs/To%20Tweak%20or%20Not.pdf

Many of you have probably read this. This was a reasonably well-done blind listening test that compared a high-end amplifier with premium interconnects/cables to a system with a cheaper amplifier and generic interconnects/cables. It was not published in a peer-reviewed journal, but the everything is explained in detail, including the reasoning behind the methodology and the analysis. The surprise is that there was no difference detected, and the article was published in a for-proft magazine that receives revenue from advertisements!

I'm pretty sure that the majority of people here would have expected there to be an audbile difference. My question is how do you intepret this study?
post #2 of 12
Quote:
Originally Posted by SmellyGas View Post
http://www.nousaine.com/pdfs/To%20Tweak%20or%20Not.pdf

Many of you have probably read this. This was a reasonably well-done blind listening test that compared a high-end amplifier with premium interconnects/cables to a system with a cheaper amplifier and generic interconnects/cables. It was not published in a peer-reviewed journal, but the everything is explained in detail, including the reasoning behind the methodology and the analysis. The surprise is that there was no difference detected, and the article was published in a for-proft magazine that receives revenue from advertisements!

I'm pretty sure that the majority of people here would have expected there to be an audbile difference. My question is how do you intepret this study?
It's always good to do experiments, but this study had only ten trials so Type II error was probably very high. Let p be the probability that a participant answers correctly. For any p > 0.5 there is a real audible difference. Let's guess that for one of the participants p was as high as 0.75. Type II error would probably be close to 1.0... meaning the test is almost guaranteed to give a null result.

Maybe Pio2001 can calculate the actual numbers.
post #3 of 12
Thread Starter 
Quote:
Originally Posted by mike1127 View Post
It's always good to do experiments, but this study had only ten trials so Type II error was probably very high. Let p be the probability that a participant answers correctly. For any p > 0.5 there is a real audible difference. Let's guess that for one of the participants p was as high as 0.75. Type II error would probably be close to 1.0... meaning the test is almost guaranteed to give a null result.

Maybe Pio2001 can calculate the actual numbers.
Using generic cheesy cables and budget amps vs. exotic cables and expensive amps, shouldn't you expect listeners to be correct fairly often? Say, 70% of the time? 80%? What do you think?

Gimme a reasonable % and I'll give you the type II error.
post #4 of 12
Quote:
Originally Posted by SmellyGas View Post
Using generic cheesy cables and budget amps vs. exotic cables and expensive amps, shouldn't you expect listeners to be correct fairly often? Say, 70% of the time? 80%? What do you think?

Gimme a reasonable % and I'll give you the type II error.
Let's say 75%. What is type II error for ten trials at a significance level of 5%?
post #5 of 12
Quote:
Originally Posted by SmellyGas View Post
Using generic cheesy cables and budget amps vs. exotic cables and expensive amps,
I don't think they changed the amps. It was only "tweak" type things.

EDIT: oops, you were right. Strange they considered amps a "tweak." Of course I find it unbelievable that anyone would think all amps sound the same. That's not to say I reject objective evidence or experiments. It's just that my own experience with amplifiers is so vivid that it is almost beyond belief that they don't sound different.
post #6 of 12
Thread Starter 
Quote:
Originally Posted by mike1127 View Post
Let's say 75%. What is type II error for ten trials at a significance level of 5%?
Nousaine's experiment had 10 trials each for 7 listeners (7x10=70). In addition, listener A requested 6 more trials. So there were actually a total of 76 trials. There is a long equation to give you a near-exact beta (type II prob), but in the interest of time (namely, my own),

I can tell you that, for 50 trials with an alpha of 0.033 (3.3% chance of type I error) and an assumption that differences among cable/amps would result in at least a 70% accuracy rate among listeners, the type II error probability(or "beta" or chance that an actual difference was not deteced) is only 14%. If you feel that these cables/amps should have allowed listeners to be correct 80% of the time, then this beta drops to 0.25%. Since there were actually 76 trials instead of 50, the type II error probability is actually lower than 14%/0.25%. Finally, if your significance level (alpha) is increased to 5%, then these 14%/0.25% beta values drop even more.

So in other words, the study size was sufficiently powered to detect audible differences large enough to allow listeners to correctly identify them 70% of the time, assuming an alpha <0.05 (pretty standard). The chance of type II error is considerably less than 14%, which was calculated based on a smaller sample size with smaller alpha.
post #7 of 12
I like this line at end of the article "These bias mechanisms are a part of the human condition,and we can't tune them out with good intentions".

Here is another article featuring a look at "do all amplifiers sound the same?".
http://www.bruce.coppola.name/audio/Amp_Sound.pdf
post #8 of 12
Quote:
Originally Posted by JadeEast View Post
I like this line at end of the article "These bias mechanisms are a part of the human condition,and we can't tune them out with good intentions".

Here is another article featuring a look at "do all amplifiers sound the same?".
http://www.bruce.coppola.name/audio/Amp_Sound.pdf
This is my 2nd favourite blind test article of all time. You can quible about methods all you like but if a $220 receiver cannot be immediately distinguished from $12,000 of amping then at least one of them is surely overpriced.....
post #9 of 12
Thread Starter 
Quote:
Originally Posted by nick_charles View Post
This is my 2nd favourite blind test article of all time. You can quible about methods all you like but if a $220 receiver cannot be immediately distinguished from $12,000 of amping then at least one of them is surely overpriced.....
I still get a laugh every time I see the photo of that P.O.S. Pioneer amp w/5-band equalizer next to the expensive high-end stuff. This was a very good study. It included skeptics and believers, all of whom heard differences when not blinded, a large n of 772 trials, good subgroup analysis by listener and amplifier pair, and they let listeners choose between an ABX box and cable-swapping. My only criticism is that they didn't calculate the power/beta of the trials, but just eyeballing the numbers, it is definitely sufficient to detect small differences. Furthermore, they didn't do subgroup analysis on just the skeptics, who might have been the group most likely to find a difference since they were the most motivated and convinced of the presence of differences. Okay, I'm bored, I'll analyze the believers-only: 163/324 = 50.3% correct. So not even the listeners who were convinced that amplifiers sounded different could tell the amps apart.
post #10 of 12
Quote:
Originally Posted by SmellyGas View Post
Nousaine's experiment had 10 trials each for 7 listeners (7x10=70). In addition, listener A requested 6 more trials. So there were actually a total of 76 trials. There is a long equation to give you a near-exact beta (type II prob), but in the interest of time (namely, my own),

I can tell you that, for 50 trials with an alpha of 0.033 (3.3% chance of type I error) and an assumption that differences among cable/amps would result in at least a 70% accuracy rate among listeners, the type II error probability(or "beta" or chance that an actual difference was not deteced) is only 14%. If you feel that these cables/amps should have allowed listeners to be correct 80% of the time, then this beta drops to 0.25%. Since there were actually 76 trials instead of 50, the type II error probability is actually lower than 14%/0.25%. Finally, if your significance level (alpha) is increased to 5%, then these 14%/0.25% beta values drop even more.

So in other words, the study size was sufficiently powered to detect audible differences large enough to allow listeners to correctly identify them 70% of the time, assuming an alpha <0.05 (pretty standard). The chance of type II error is considerably less than 14%, which was calculated based on a smaller sample size with smaller alpha.
I'm not sure it is right to pool the listeners. I suspect that in any test, a lot of the listeners are on the wrong track completely and they are doing no better than chance (p=0.5). There may be a few listeners who are more on track and p=0.75. I wanted to know Type II error for an individual listener.
post #11 of 12
To the OP, very nice read. JadeEast, nice link as well.
post #12 of 12
Thread Starter 
Quote:
Originally Posted by mike1127 View Post
I'm not sure it is right to pool the listeners. I suspect that in any test, a lot of the listeners are on the wrong track completely and they are doing no better than chance (p=0.5). There may be a few listeners who are more on track and p=0.75. I wanted to know Type II error for an individual listener.
I don't know what you mean by listeners who are "on track." Regardless, it would be meaningless and arbitrary to just calculate the probablity of type II error one listener and his 10 trials. Of COURSE the chance of type II error will be high if you only look at 10 trials. That is why the study designers had 70+ trials, which has very LOW probability of type II error.

But let's say for the sake of argument, that you DID calculate the probability of type II error with just 10 trials. Let's say it's really high, 80%. In order to calculate the probability that you had a type II error in all 7 different listeners (of 10 trials each), you would take 0.8^7 = 16%. Thus, the type II error probablity of the entire experiment, no matter how you swing it, is LOW.

Finally, using 7 listeners instead of 1 listener increase the generalizability of the study.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Sound Science
Head-Fi.org › Forums › Equipment Forums › Sound Science › Seeking your thoughts: Nousaine T. "To Tweak or Not to Tweak?" Stereo Review, June 1998. 79-81