Let's Prove The Null Hypothesis

upstateguy · Feb 28, 2009 at 7:18 AM

Quote:

Originally Posted by terriblepaulz /img/forum/go_quote.gif
If those test subjects paid more than $25-$30 for a digital cable, is that such a bad thing?

It's clearly not a bad thing if the reason they bought the cable was to get a nicer, better made, longer lasting, better looking, studier, etc, cable.

It would be a bad thing if they were snake oiled into buying it for a performance difference.

USG

ILikeMusic · Feb 28, 2009 at 3:49 PM

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
But A/B/X is NOT a difference vs no-difference test. Not close. A/B/X plays A, plays B, then picks one at random and asks "is it A or B". What the hell does that have to do with preferring A over B, or detecting a difference?

I greatly respect your expertise but that statement seems completely nonsensical to me.

nick_charles · Feb 28, 2009 at 4:25 PM

Quote:

Originally Posted by wavoman
But A/B/X is NOT a difference vs no-difference test. Not close. A/B/X plays A, plays B, then picks one at random and asks "is it A or B". What the hell does that have to do with preferring A over B, or detecting a difference?

Quote:

Originally Posted by ILikeMusic /img/forum/go_quote.gif
I greatly respect your expertise but that statement seems completely nonsensical to me.

There are several ABX approaches. In Wavoman's example the listener hears A and B as often as they like and then are presented with X which they much identify as A or B. In that scheme they must remember both A and B and recognise X as A or B. The problem is that if they are not certain they cannot go back.

I prefer the AB XY model. Here you listen to A and B as much as you like and X and Y as much as you like then you decide is X = A or Y = A or X = B or Y = B

Th nice thing about the AB XY model is you can swap between A and X or A and Y repeatedly and do a simple is it different ? check, this has much less cognitive load than holding two sample in memory.

ILikeMusic · Feb 28, 2009 at 4:49 PM

Quote:

Originally Posted by nick_charles /img/forum/go_quote.gif
There are several ABX approaches. In Wavoman's example the listener hears A and B as often as they like and then are presented with X which they much identify as A or B. In that scheme they must remember both A and B and recognise X as A or B. The problem is that if they are not certain they cannot go back.

I prefer the AB XY model. Here you listen to A and B as much as you like and X and Y as much as you like then you decide is X = A or Y = A or X = B or Y = B

Th nice thing about the AB XY model is you can swap between A and X or A and Y repeatedly and do a simple is it different ? check, this has much less cognitive load than holding two sample in memory.

I'll buy that.

But then again it seems we are becoming concenrned with resolving the most excruciatingly minute differences when we apparently have a large group here who feel that thay can easily resolve the diffreence between silver or copper, or 320 kbps lossy vs. lossless. Why not just select a group of these expert listeners and start with them? With lossy tracks sounding so blatantly terrible or copper sounding so much warmer than silver surely they can easily tell A from B without any special help?

.

nick_charles · Feb 28, 2009 at 5:19 PM

Quote:

Originally Posted by ILikeMusic /img/forum/go_quote.gif
Why not just select a group of these expert listeners and start with them? With lossy tracks sounding so blatantly terrible or copper sounding so much warmer than silver surely they can easily tell A from B without any special help?

Some tests do use expert listeners, but anyone can call themself an expert and it does not make it true.

It would be better to actually test hearing acuity and discriminatory powers rather thane assigning someone to the expert category just on their say so.

ILikeMusic · Feb 28, 2009 at 5:49 PM

I was only following up on wavoman's idea that we might first try to determine whether anyone can detect a difference before worrying about the general public, and so we might begin with a group of prequalified listeners who seem to know with certainty that they can easily detect these subtleties. I mean, I've seen so many posts indicating how obvious the differences are, why not start with those subjects? Surely they can resolve the differences without a host of special test procedures? Might help simplify things to just cut to the chase.

.

nick_charles · Feb 28, 2009 at 6:17 PM

Quote:

Originally Posted by ILikeMusic /img/forum/go_quote.gif
I was only following up on wavoman's idea that we might first try to determine whether anyone can detect a difference before worrying about the general public, and so we might begin with a group of prequalified listeners who seem to know with certainty that they can easily detect these subtleties. I mean, I've seen so many posts indicating how obvious the differences are, why not start with those subjects? Surely they can resolve the differences without a host of special test procedures? Might help simplify things to just cut to the chase.

.

It is tempting to ask those who make the most extravagant claims to put their reputations on trial first. However , however large those claims are it is not a crime (as such) to be mistaken (even if outrageously so) and if we only use "self-proclaimed experts" we miss the case where more modest particpants may actually be better.

Nightmare · Mar 1, 2009 at 1:07 AM

Wavoman's latest post has cleared things up greatly for me. So much energy spent arguing against things which no one has suggested, such as averaging results or not re-testing positive results. I don't care how large or small the sample size is, if even one subject can repeatedy show statistically significant results in scientifically valid tests, then anyone with respect for the scientific method will admit that a difference must exist. Most astoundingly, he declares that a well-established scientific audio testing protocol is not only wrong and useless (because it's

not a "difference/no difference" test), but has set back the science of audio testing and ruined it almost totally. Of course, he has developed the proper method. One for which there is no supporting evidence, unless you'll accept an unverified mathematical model. Surely such a revolutionary protocol is worthy of publication in a peer-reviewed journal.
Quote:

Originally Posted by nick_charles /img/forum/go_quote.gif
There are several ABX approaches. In Wavoman's example the listener hears A and B as often as they like and then are presented with X which they much identify as A or B. In that scheme they must remember both A and B and recognise X as A or B. The problem is that if they are not certain they cannot go back.

I agree, it would be a problem if subjects had to make a choice after only listening once to X. I don't see a problem with allowing subjects to switch among A, B, and X as much as they want before responding. Seems only fair, since we're asking them to listen for some very subtle differences. X will always be either A or B, so if A sounds different than B, it will be most apparent during switches from X to A and from X to B. So let them make those switches to their heart's content.

Pio2001 · Mar 1, 2009 at 3:08 AM

Quote:

Originally Posted by upstateguy /img/forum/go_quote.gif
Bring on the data.

Quote:

Originally Posted by upstateguy /img/forum/go_quote.gif
IMHO it reads like typical subjectivist rhetoric, but it's something.

There are frequency response measurments and ABX double blind tests with probability of guessing, in that test... what do you expect more ?

With the multiamplified difficult load, the probability of guessing was 0.002 % on music, and 0.0000001 % with pink noise.

With the mono-amplified difficult load, the probability of guessing was 72 % (they say 66 % I don't know why) on music, 0.003 % on pink noise.

With a monoamplified easy load, the probability of guessing was 1.8 % with pink noise.

David Carlstrom and Matrix-hifi have also some positive ABX results for amplifiers.

Quote:

Originally Posted by b0dhi /img/forum/go_quote.gif
Is anyone aware of experiments that have attempted to measure and compare directly the impulses along the aural nerves to see if different cables/amps/pebbles/potplants result in different aural stimuli being presented to the brain? This, to me, would be far, far more convincing than purely subjective (DBT or not) for about a million reasons.

Moving the head position one centimeter would cause measurable differences one order of magnitude higher than the difference between interconnects because of room acoustics and modification of the head-and-shoulders transfer function. Therefore we would always find a lot of differences between such recordings, even when the subject is listening to the exact same thing over and over.

Quote:

Originally Posted by Gundogan /img/forum/go_quote.gif
Think there're quit some tests where they show the measurable differences between cables/amps (...)

However, do you want to look at the difference in numbers or tests on people to see if they can hear the difference?

I think there are some articles written about the 2nd involving DBT's, but most of them cant be easily accessed (at least the more serious ones)

Speakers cables have been indirectly ABX successfully here : Why does the gauge of speaker wire matter? - Hydrogenaudio Forums

Measurments show that there is a 2 dB difference in the frequency response between the two cables ! This kind of thing doesn't happen with interconnects.
The main cause was the extremely low impedance of the speakers used.

Quote:

Originally Posted by bobsmith /img/forum/go_quote.gif
Yes, it could affect the validity. Many people believe that it may take days or weeks to fully appreciate certain sonic differences. Realistically, there is no practical way for a test to be structured to permit those lengths of time, and I am not aware of any that have attempted to do so.

It have been done. The most famous one is Kiang's power cable test. It lasted 8 monthes. Every listener had one week to give one answer. The probability were calculated over all the listeners who found a difference.

It was very interesting, because letting the listeners listen in their own environment, with no time limit, no short sample repetition, no listening fatigue etc, seem to have amplified the false differences perceived between similar cables. The final results feature comments about huge differences heard between identical cables !

Test introduction :
The Great Cable Debate - Technical debate - All things HiFI and AV - HiFi WigWam - HiFi Forum
Introduction of the results in the discussion (dead links inside) :
The Great Cable Debate - Technical debate - All things HiFI and AV - HiFi WigWam - HiFi Forum
The Great Cable Debate - Technical debate - All things HiFI and AV - HiFi WigWam - HiFi Forum
Final results with listener's comments :
http://3141592.pio2001.online.fr/fil...cable_test.xls

Also, in an old 16 bits vs 24 bits challenge, I remember that only one listener was able to ABX the samples, and he did one trial every morning during several days.

Quote:

Originally Posted by bobsmith /img/forum/go_quote.gif
I do however think that there are some inherent problems with DBT that cannot be corrected. As I said before, I think that the human brain is much better at spotting differences when it knows what to look for.

Usually, we perform ABX test between things that sound different. When two sources sounds the same, there is no need to start the ABX, since the listener won't be able to answer.

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
And maybe this will still be true if we use a $2 video cable (where the terminations are not likely to even be truly 75 ohms).

RCA (CINCH) connectors are around 40 ohms, except one special model by Canare, which acheive 75 Ohms using holes in both the male and female plugs. So, if the CD player has not this very Canare plug built-in, and if you don't use the mathing Canare plug on the cable, you can safely say that there is no such thing as a "75 Ohms RCA cable".

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
It is unnatural, and to my mind so stupid as to make me question why people keep doing it over and over -- answer: because they can do it double blind with the A/B/X hardware, but who cares. IMO, A/B/X has set back the science of human panel-based audio perception testing and damaged it nearly totally. Think hard, people! Why A/B/X? Why? If you want to know if two things sound different, play them and ask. Yes, this can only be done single blind, but double blind testing is not needed if the subjects are out-of-communication with the testers, especially if the testers are unbiased. With medicines, you HAVE TO do double blind, since the doctors are involved in the treatment. Not the case with audio. Think for yourself -- this is all obvious if you start from scratch with an open mind and don't say "DBT" and "A/B/X" just 'cause everyone else does. Think.

I prefer ABX, or even ABXY testing, because it offers any testing possibilty. You want an A/B test ? Just use the X and Y buttons. You need to check for a difference ? Just compare A and X, not using the B button.
I would certainly not like an A/B only setup, because there are no references... unless the references are also available to the listener at any time. In this case, I see no problem, since I can do an ABX listening to the references as A and B, and to the A of the A/B choice as X.
I don't see what the "naturality" of a blind test setup means, as I don't see what is the "musicality" of an amplifier.
But I don't understand why an A/B test could not be double blind.

Quote:

Originally Posted by nick_charles /img/forum/go_quote.gif
It would be better to actually test hearing acuity and discriminatory powers rather thane assigning someone to the expert category just on their say so.

But how do you test hearing acuity without double blind tests ? The audiogram doesn't tell anything about the ability of the listener for critical listening.

nick_charles · Mar 1, 2009 at 3:27 AM

Quote:

Originally Posted by Pio2001 /img/forum/go_quote.gif
But how do you test hearing acuity without double blind tests ? The audiogram doesn't tell anything about the ability of the listener for critical listening.

Yep, you could do some preliminary DBTs such as level differences, distortion and so on and use that to weed out the merely average listeners..

wavoman · Mar 1, 2009 at 3:32 AM

Quote:

Originally Posted by Nightmare /img/forum/go_quote.gif
Wavoman's latest post has cleared things up greatly for me. So much energy spent arguing against things which no one has suggested, such as averaging results or not re-testing positive results. I don't care how large or small the sample size is, if even one subject can repeatedy show statistically significant results in scientifically valid tests, then anyone with respect for the scientific method will admit that a difference must exist. Most astoundingly, he declares that a well-established scientific audio testing protocol is not only wrong and useless (because it's

not a "difference/no difference" test), but has set back the science of audio testing and ruined it almost totally. Of course, he has developed the proper method. One for which there is no supporting evidence, unless you'll accept an unverified mathematical model. Surely such a revolutionary protocol is worthy of publication in a peer-reviewed journal.

I agree, it would be a problem if subjects had to make a choice after only listening once to X. I don't see a problem with allowing subjects to switch among A, B, and X as much as they want before responding. Seems only fair, since we're asking them to listen for some very subtle differences. X will always be either A or B, so if A sounds different than B, it will be most apparent during switches from X to A and from X to B. So let them make those switches to their heart's content.

1. Many experiments reported in threads here -- and many discussions here -- do in fact suggest averaging results across the sample subjects. I am not tilting at windmills. And most experiments I have read do not re-test the winners.

2. Standard A/B/X protocol does not allow switching/listening over and over before making a choice. X is picked at random from A or B in each trial. Subjects answer, and the next trial starts. I think this is a terrible protocol.

3. I am not alone in criticizing A/B/X ... papers on this subject are starting to appear in the academic literature on sensory testing in the food industry.

4. I do not consider A/B/X "well-established audio testing". I do not think it has been established as useful at all. It is well-used, but that is something different. I really do think it has badly damaged the practice of audio testing.

5. I have been totally honest -- I do not know if my proposed methods are any good, but I am working on them, and of course I believe in them, otherwise I wouldn't spend the time! They are not my invention -- just a refinement of stuff already in the literature (again, in the sensory testing world).

6. If do indeed plan a submission to a peer-reviewed journal, but not until I have some experiments to talk about. I have been working hard to build the right lab -- this is costly in time and dollars, and this is a hobby, not my primary business, so progress is slower than I would like. But I am making headway. I have had patch panels custom built (both XLR and RCA, to make equipment changes a snap) with top-draw audiophile components, wire, and solder to remove criticism that I have spoiled the audio chain. I have assembled sources, DACs, and amps of the highest quality for the same reason. I have made a list of professional musicians and audiophies near me who I might invite to test. And I have worked out an underlying probability choice model that might be useful. My wife thinks I'm crazy, and you think I'm an egotistical ass, but I assure you I am neither. I do not apologize for having and expressing strong opinions -- I have read and thought deeply about these matters and reached conclusions that I hope are interesting and useful.

7. Let me ask you this -- use Occam's Razor. If you want to test to see if there is a difference, wouldn't you ask people "is there a difference between A and B"? If you want to test to see if there is a preference for A or B, wouldn't you ask people "do you prefer one of these to the other"? Why would you ask them "Can you tell me which of A or B this random sample X is" ?? Why would you possible think that is correct?

Clearly if there are large differences between A and B than this A/B/X challenge can be correctly answered by subjects, and if there is no difference for anyone between A and B then everyone will fail the A/B/X challenge. I grant you that. But in the case of small but real differences, the A/B/X challenge, since it is actually asking such an artificial and somewhat off-topic question, might be failing to uncover the truth.

wavoman · Mar 1, 2009 at 3:39 AM

Quote:

Originally Posted by ILikeMusic /img/forum/go_quote.gif
I greatly respect your expertise but that statement seems completely nonsensical to me.

I meant only this: A/B/X is asking a question somewhat removed from what we are trying to test, and there is no reason not to directly ask the quesion -- either a difference question, or a preference question, depending on your aims. Why ask a "can you identify this random sample" ... that is so indirect.

Seems clear to me. My guess is that A/B/X was used because the hardware could be built. Then it just got entrenched.

wavoman · Mar 1, 2009 at 4:12 AM

Pio2001 makes many comments that IMO are very worthwhile. Here are some follow-ups to his post:

1. If A/B/X finds a difference, then there is one, no argument from me on that. My point is that A/B/X could fail to find a small difference. Loudspeaker cables can make a difference, the cited A/B/X tests seem to prove. Agreed.

2. See how useful swindles are? Thinking two power cables are different when in fact they were identical smokes out the jokers in the pack.

3. It is cool that A/B X/Y hardware can be used to do straight A - B difference or preference tests. But we don't need this hardware, and this fact does not in any way refute my point that traditional A/B/X is asking a somewhat off-topic question.

4. Double blind is better than single blind, I agree, and DBT is essential in experiments where the experimentor interacts with the subject, like medical trials. Here, in audio testing, it is much easier to do SBT, since no special switching hardware is needed, just a curtain or separate room (hopefully close so that long cable runs don't mess things up). And we can isolate the subjects from the experimentors totally. That's my only point about DBT.

5. By "natural" test I meant this: if you want to test for differences or preferences, test for differences or preferences, not "can I identify?" . I was not trying to create a mystical adjective like "musicality of an amp".

6. Good point on no digital cable connection being 75-ohms the whole way, I should have been more precise -- "a junk video cable will present more deviations from the S/PDIF ideal than a good one, the RCA connectors being part of the problem". That's a better way of saying it. BTW I think Neutrik and WBT Nextgen now have 75 ohm RCA connectors as well as Canare.

7. Even if you or I can't hear a difference in some setup, we might want to go ahead and A-B test anyway, since others maybe can. (This is in regards to the answer to bobsmith -- perhaps pio meant something else and I have misunderstood him).

Great post, pio2001.

b0dhi · Mar 1, 2009 at 4:45 AM

Quote:

Originally Posted by Pio2001 /img/forum/go_quote.gif
Moving the head position one centimeter would cause measurable differences one order of magnitude higher than the difference between interconnects because of room acoustics and modification of the head-and-shoulders transfer function. Therefore we would always find a lot of differences between such recordings, even when the subject is listening to the exact same thing over and over.

You seem to be assuming that full sized speakers are used. This would indeed be poor experimental design.

One could ofcourse use IEMs instead. In combination with that, one could even use accelerometers to detect head/body movement and ignore data collected during such movement. Put the subject in an anechoic chamber, using IEMs, and an accelerometer to detect head movement and you could have a pretty accurate measurement. No need for ABXs or circular determinations of hearing acuity - just measure and compare aural nerve data.

Ofcourse, assuming a difference was detected in the measurements, this wouldn't prove that everyone can hear it, just that it's possible for us to hear it. On the other hand, though, if a difference was not measured, it would prove that it is not possible for anyone (with the same physiology) to hear a difference, regardless of hearing acuity.

Pio2001 · Mar 1, 2009 at 11:35 AM

Quote:

Originally Posted by nick_charles /img/forum/go_quote.gif
Yep, you could do some preliminary DBTs such as level differences, distortion and so on and use that to weed out the merely average listeners..

Yes, that's a good idea. That would be useful in public tests. But in private tests, it depends if our friends are OK to be left out, especially when they claim to hear the difference.

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
1. Many experiments reported in threads here -- and many discussions here -- do in fact suggest averaging results across the sample subjects. I am not tilting at windmills. And most experiments I have read do not re-test the winners.

Some experiments indeed use incorrect statistical analysis. The most common error (in the Prism paper about the audibility of the pressing quality of CD, for example) is to present individual results, then pick among them the most significant and presenting them as successful, because they are above the threshold.
Actually, if the threshold is one chance out of 20 of getting a false positive, and you test 20 subjects, and one of them is positive... well, you see what I mean

The strangest example I've seen was an AES presentation about high definition digital audio. The authors used such a complicated and obscure statistical model that they ended up proving statistically that group A scored better than group B ! Which could be seen immediately reading their scores... that were both way below the significance threshold !

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
2. Standard A/B/X protocol does not allow switching/listening over and over before making a choice. X is picked at random from A or B in each trial. Subjects answer, and the next trial starts. I think this is a terrible protocol.

In this case, I agree. When I setup an ABX test, I let the subjects have complete freedom in the choice of what they want to listen to : in each trial, A, B and X can be freely listened to, as many times as they want, in the order they like, for the time they need.
About correcting previous answers, it depends on the protocol. In a protocol where the answers are only given after all trials are done, I let people go back and change their previous answers. It is especially useful if people discover a detail during the test that allows them to easily identify the correct source. They can start again with much more confidence.
However, I personally prefer, by far, having my answer checked immediately after each trial. I need to know if I am doing right or wrong, so that if I am wrong, I can compare more carefully A and B, in order not to make anymore mistakes.
This way of testing puts two requirements : I can't correct my prevous answers, since the solution have been told to me, and the total number of trials must be decided in advance, and be respected in order to avoid "probability picking".

If a listener seems more at ease with other protocols like A/B preference, I see no problem with it, as long as there is randomization, and we can estimate the probability of false positive.

For me, the test must affect the listener the least possible. Otherwise, the protocol is always blamed in case of failure.

In the tests I have run with some forumers, the question of averaging across the subjects was a very difficult one. On one hand, if we sum the results of everybody, we can prove a difference otherwise unseen, thanks to the statistical weight of all the answers, but on the other hand, since the listeners are rather untrained, the probability is high that one or two listeners score well and not the others. Summing the answers leads then to a failure while these listeners actually hear the difference.
That's why I rather first define a target probability of false positive, according to the hypothesis under test : for example 1% for amplifiers, or 0.1 % for interconnects, because "extraordinary claims need extraordinary evidence".
Then I take the maximum number of listeners, and divide my probability by this number. It gives me, roughly, the probability that one of them at least (to precise, the probability that one of them exactly, but it's nearly the same) scores a false positive.
Then I estimate the minimum number of trials needed so that this probability can be satisfied with one error from the listener. I thus give them the right to make one mistake at most.
Then, there is still a risk of biasing the statistics by repeating ABXes over and over until the right score is met. Then I require for any listener that he scores a modest success, like 6/6, in order to be allowed to proceed to the real ABX. In reality, in our last meeting in Paris, we agreed that the 15 trials ABX would rather be divided int otwo parts. The 7 first trials. If there is more than one mistake, the test ends and another listener, or group of listener, or device tested, can proceed. If there is less than two errors, the ABX goes on until the 15 trials are done.
I did not check if it was as good as my method in order to avoid repetition bias, but I was not alone in the organisation.

Featured Sponsor Listings

Let's Prove The Null Hypothesis

upstateguy

Headphoneus Supremus

ILikeMusic

Headphoneus Supremus

nick_charles

Headphoneus Supremus

ILikeMusic

Headphoneus Supremus

nick_charles

Headphoneus Supremus

ILikeMusic

Headphoneus Supremus

nick_charles

Headphoneus Supremus

Nightmare

New Head-Fier

Pio2001

100+ Head-Fier

nick_charles

Headphoneus Supremus

wavoman

Headphoneus Supremus

wavoman

Headphoneus Supremus

wavoman

Headphoneus Supremus

b0dhi

Headphoneus Supremus

Pio2001

100+ Head-Fier

Users who are viewing this thread