Let's Prove The Null Hypothesis

wavoman · Feb 26, 2009 at 2:59 AM

Quote:

Originally Posted by bobsmith /img/forum/go_quote.gif
Its a shame you have decided to run the test using digital cables...by testing digital cables, I think you are giving your well designed test the least "impact" possible.

I agree 100%. We are doing this on purpose. It is just the first in a series of tests. We hope to show "no difference" between the $500+ digital cable and the cheap one. That will debunk a myth, although indeed a myth few here believe in (but some do).

Gotta walk before you can run. I wanted to test the principle of swindles, and 4-way choices, and estimate my likelihood model, etc.

Think of this as training wheels. I have a nice trivial-to-carry well-regarded DAC not doing anything, so testing the CDP ==> DAC connection seemed an easy thing to do at the meet, and I had the uber expensive S/PDIF cable too.

It will be hard for high-end digital cable makers to refute this.

wavoman · Feb 26, 2009 at 3:12 AM

Quote:

Originally Posted by ILikeMusic /img/forum/go_quote.gif
Say you take an already pre-qualified group (such as one might presume the attendees at a Head-Fi meet to be) and they all end up using response number four. In that case there appear to be two possibilities... one, that they do detect a difference but somehow still can't identify which is which, or two, that they are embarrassed to admit to the tester that they can't tell any difference and so simply resort to answer number four. In this way providing a 'way out' (being allowed to answer in the affirmative but still not being required to really demonstrate anything) would seem to serve to enhance response bias rather than eliminate it (?)

And, if all respondents use answer number four then viewers of the results on both 'sides' will claim foul... either that 'the respondents said that they could hear a difference but you didn't believe them!' or 'none of the respondents proved anything because they couldn't clearly identify one from the other.'

Seems as if you are setting yourself up for the very problems you are trying to prevent..?

You are thinking exactly the way I am ... but follow me to a different conclusion if I can persuade you.

I am giving a way for the repsonse bias -- which comes from tension felt by the Subjects (hereafter "Ss") -- to be channeled to a "less harmful" reply. Think about the same tension during A/B/X testing -- forced to pick "same" or "different", and not being sure, Ss give us polluted answers, and all our statistical tests are wounded, fatally IMO.

I want to pick up the signal if there is one. If everyone answers "difference, but no preference", then we learn very little, EXCEPT for my swindles, which are the most important new wrinkle. Some of my tests present A and A, and B and B -- two identical samples. Now we stratify the Ss by the answers they give here (where we know the truth) -- if there are Ss who say "hey, no difference" reliably on the swindle tests, then the fact they give us response #4 when there is a difference will be a fascinating and well-supported conclusion. On the other hand if the #4 responses are given during the swindles, we know the whole listening experiment with these Ss is a waste.

See how this plays out?

Ilike, I like the way you think (to turn a phrase) -- you are right on top of the real issues here.

nick_charles · Feb 26, 2009 at 4:26 AM

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
We hope to show "no difference" between the $500+ digital cable and the cheap one.

It will be hard for high-end digital cable makers to refute this.

Hmmm, let me think about this....if I were a high-end digital cable maker how might I answer that........

how about

1) Your listeners are deaf, badly trained, biased
2) The kit is not good enough, there are variations on this

2a) The distortion/jitter/noise/whatever overwhelms the differences
2b) The system is not revealing enough

3) The test protocol/setting is flawed

4) The test samples are not good enouugh

5) It wasn't our cable

I am sure I have missed a few...

Nightmare · Feb 26, 2009 at 4:37 AM

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
I am giving a way for the repsonse bias -- which comes from tension felt by the Subjects (hereafter "Ss") -- to be channeled to a "less harmful" reply. Think about the same tension during A/B/X testing -- forced to pick "same" or "different", and not being sure, Ss give us polluted answers, and all our statistical tests are wounded, fatally IMO.

I want to pick up the signal if there is one. If everyone answers "difference, but no preference", then we learn very little, EXCEPT for my swindles, which are the most important new wrinkle. Some of my tests present A and A, and B and B -- two identical samples. Now we stratify the Ss by the answers they give here (where we know the truth) -- if there are Ss who say "hey, no difference" reliably on the swindle tests, then the fact they give us response #4 when there is a difference will be a fascinating and well-supported conclusion. On the other hand if the #4 responses are given during the swindles, we know the whole listening experiment with these Ss is a waste.

Doesn't the requirement for a statistically significant number of correct responses deal with polluted answers? I mean, aren't polluted answers the very reason for requiring 12 correct responses in a 16-trial test? 11/16 is a negative result, just as if it were 0/16. Let 'em guess all they want, the odds that they'll be right 12 out of 16 times are so long as to be considered practically impossible.

Add me to that camp that says an ABX requires only two possible responses:

X is A
X is B

Even after reading your justification, I can't see why you'd care about preferences when you're only testing for detectable differences. Of course for a swindle test, you'd need to add a third "A is B" response. Not that I see much utility in swindle tests, other than humiliating the test subjects.

I will say that testing digital cables is a good idea from a level-matching standpoint. That being the most important and difficult aspect of audio testing.

b0dhi · Feb 26, 2009 at 9:58 AM

Quote:

Originally Posted by Nightmare /img/forum/go_quote.gif
Add me to that camp that says an ABX requires only two possible responses:

X is A

X is B

Even after reading your justification, I can't see why you'd care about preferences when you're only testing for detectable differences.

How do you know that a hypothetical difference in the sound won't result in a larger subconscious response than a conscious one? It's well established that the subconscious mind perceives things that the conscious mind typically does not. It's conceivable a similar effect can occur with audio as it does with other senses.

Wavoman's test better addresses this possibility than the standard ABX test, which depends strongly on conscious awareness of the difference, rather than on a causative effect induced by a hypothetical difference.

Pio2001 · Feb 26, 2009 at 12:54 PM

Quote:

Originally Posted by upstateguy /img/forum/go_quote.gif
I would be willing to look at any compiled results at this point, never mind hard cold evidence.

Hello Upsateguy. I have not had the time to read the discussion yet, but here is the compilation of all blind listening test that I know of :

Post-it: Annuaire des tests ABX

It is french, and thus features more french tests than other non-english laguages (except for all the spanish matrix-hifi tests).

It is up to date, and I shall translate it into english in the next weeks.

bobsmith · Feb 26, 2009 at 3:32 PM

Quote:

Originally Posted by upstateguy /img/forum/go_quote.gif
3- A body of evidence to support the proposition that all Amps do not sound the same.
...

I'm looking for any evidence, collected by any methodology.

Thanks to Pio2001's link, I came across this article regarding amps that seems to be a pretty solid set of results for the proposition that at least some tube amps sound different than some solid state amps.

http://www.soundandvisionmag.com/ass...rInterface.pdf

ILikeMusic · Feb 26, 2009 at 4:27 PM

Quote:

If everyone answers "difference, but no preference", then we learn very little, EXCEPT for my swindles, which are the most important new wrinkle. Some of my tests present A and A, and B and B -- two identical samples. Now we stratify the Ss by the answers they give here (where we know the truth) -- if there are Ss who say "hey, no difference" reliably on the swindle tests, then the fact they give us response #4 when there is a difference will be a fascinating and well-supported conclusion.

Thanks for the clarification, and what you say might be so if you do enough tests to rule out guessing. But I'm still not sure about what problem you're really trying to solve with the additional complexity. How does response bias come into play in a simple A/B/X test, meaning how can a subject be sidetracked into trying to please the interviewer when he has no idea which answer is correct? About all he could do is perhaps be forced into guessing instead of revealing that he can't tell a difference, but enough repetition will sort that out... just as would be required in your procedure as well.

I guess I don't understand how the answers in an A/B/X test will necessarily be any more polluted than one using swindles and other such mechanisms, and it seems to me the idea that asking for a simple A/B choice might provide too much 'pressure' or ignore subconscious impressions sounds like a mechanism designed to obfuscate the results. I mean really, if a difference is that excruciatingly small then wouldn't it be very difficult to get any results up out of the noise?

Quote:

On the other hand if the #4 responses are given during the swindles, we know the whole listening experiment with these Ss is a waste

Might it also mean that the experiment itself is good but the subjects simply can't tell a difference?

.

terriblepaulz · Feb 26, 2009 at 7:15 PM

Quote:

Originally Posted by Nightmare /img/forum/go_quote.gif
Not that I see much utility in swindle tests, other than humiliating the test subjects.

If those test subjects paid more than $25-$30 for a digital cable, is that such a bad thing?

Nightmare · Feb 26, 2009 at 9:48 PM

Never said it was a bad thing, paulz.

Just not very scientifically useful.
Quote:

Originally Posted by b0dhi /img/forum/go_quote.gif
How do you know that a hypothetical difference in the sound won't result in a larger subconscious response than a conscious one? It's well established that the subconscious mind perceives things that the conscious mind typically does not. It's conceivable a similar effect can occur with audio as it does with other senses.

Wavoman's test better addresses this possibility than the standard ABX test, which depends strongly on conscious awareness of the difference, rather than on a causative effect induced by a hypothetical difference.

How is it possible for a test subject to report a subconscious preference? In order to respond "prefer A/B" or "difference, no preference," a subject needs to be conscious of a difference. If the difference is only subconscious, the response will be "no difference," and the test will show a negative result. In a standard two-response ABX, a merely subconscious difference will give rise to random responses, and also show a negative result. (The lesson either way is that manufacturers will be able to claim a subconscious difference no matter what, because there's no way to prove or disprove it.)

"Difference, no preference" is a safe and ego-soothing response for subjects to give if they don't actually hear a difference. My concern is that wavoman's protocol may invite a glut of these responses, which will teach us nothing.

b0dhi · Feb 27, 2009 at 5:56 AM

Quote:

Originally Posted by Nightmare /img/forum/go_quote.gif
How is it possible for a test subject to report a subconscious preference? In order to respond "prefer A/B" or "difference, no preference," a subject needs to be conscious of a difference.

Say that one of the samples has an effect on them, maybe increasing dopamine levels (just as an example of a potentially subconscious response). Note at this point that asking the listener for a difference is specifically asking them to compare the sound of the two samples.

So, say that the subject is not able to consciously tell the two samples apart in terms of sound. To the "difference" question they would honestly have to answer "no difference". To the "preference" question they could, due to the effect is has on them, still answer A or B, even though they cant actually hear a difference. They are inferring a difference due to the reaction it has on them.

Also, (though this doesn't apply to the test where there's a "no preference" option) even if the subject can't consciously tell a difference in the samples and can't even consciously tell a difference in their own reaction, they could still - without knowing so - gravitate toward one that has a subconscious effect on them, and that affinity might be statistically evident and significant.

Ofcourse, all of this depends on the precise aim of the experiment. IMO an ABX test does not aim to expose the same effects as a preference based test, although both are valid tests in their own rite.

Nightmare · Feb 27, 2009 at 6:50 AM

Ah, well put, b0dhi. I see what you're talking about, now. The rigid science part of my brain still says, "if a subject can report a subconscious preference for A or B, they can also report a preference/non-preference for X." But knowing that subjects are human, perhaps a simple preference protocol would be suitable when testing for subconscious effects.
Quote:

Originally Posted by b0dhi /img/forum/go_quote.gif
Ofcourse, all of this depends on the precise aim of the experiment. IMO an ABX test does not aim to expose the same effects as a preference based test, although both are valid tests in their own rite.

Yeah, we were talking past each other a bit. So, what are wavoman's precise aims?

ILikeMusic · Feb 27, 2009 at 3:41 PM

Well if the proposition is 'Can high-end cables increase dopamine levels?' then the answer to that question is probably yes.

wavoman · Feb 28, 2009 at 5:32 AM

Quote:

Originally Posted by Nightmare /img/forum/go_quote.gif
..."Difference, no preference" is a safe and ego-soothing response for subjects to give if they don't actually hear a difference. My concern is that wavoman's protocol may invite a glut of these responses, which will teach us nothing.

Agreed ... but subjects who give this answer all the time, or most of the time, when presented with the swindle will have smoked themselves out and be eliminated.

BTW, all results will be confidential and the swindles will not embarrass anyone. Privately people will want to know "how they did", and we simply say "you were not able to distinguish these two cables".

The whole point of this experiment is to find out if there are people who can tell the difference between the two cables -- especially in light of the fact that most of us believe there is no difference between in-spec digital cables transmitting S/PDIF, and that spending $500 on such cables is not smart.

Statistical tests and the chance/guessing effect play no real role here. The binomial probability test is a red-herring, and although used in most of the published AES articles, is really meaningless here. Averaging the results of the sample group is pointless. Someone by chance may get a great score -- you have to test him again. So let's do this directly -- focus on the subjects who demonstrate they might be golden ears, or at least can resist the social pressure to find a difference (by seeing what they say when there is no difference -- the swindles).

If this effect exists -- if high-end digital cables sound better to someone -- then there is that someone. And he or she will be able to hear this difference almost all the time. Not some of the time, or "better than chance" ... most of the time, nearly all the time in fact, or there is no effect. There are no random elements here -- one piece of music, played over and over, through both cables.

I can either read the line on the eye chart, or I can't. OK, sometimes I miss a letter or two, it's fuzzy, that's why the eye doc does not expect you to get them all -- but only on the transition line. So it is with hearing. There might be someone who can hear a difference right on the edge of his perception ... but if this effect exists, then there is someone who can hear it (nearly) all the time.

I want to find that person, and estimate how rare that person is in the population. Play-the-winner experiments do just that. No statistical formulas will be necessary when we happen upon Mr. Goodears, and he aces the test, over-and-over. OK OK, we will use probability just to prove guessing is most likely impossible, but that's a trivial calculation.

Large sample sizes do not eliminate the pollution of the masses, in fact they exacerbate it. You must track individual results, not pool them.

Finally, let's think about A/B/X vs my 1-2-3-4 "prefer" testing.

1-2-3-4 is a refinement (in the technical sense of that word, i.e., it preserves the original choices but presents them in finer detail) of "difference vs no-difference". I am happy with either doing difference vs no-difference testing, or preference testing, but I beleive (I have not proven this experimentally, but I do have a mathematical model that backs me up, however the model has not been verified) that 1-2-3-4 will induce less response bias than "difference vs no-difference". As I said though, I'm OK with "difference vs no-difference" as long as we have swindle.

But A/B/X is NOT a difference vs no-difference test. Not close. A/B/X plays A, plays B, then picks one at random and asks "is it A or B". What the hell does that have to do with preferring A over B, or detecting a difference?

It is unnatural, and to my mind so stupid as to make me question why people keep doing it over and over -- answer: because they can do it double blind with the A/B/X hardware, but who cares. IMO, A/B/X has set back the science of human panel-based audio perception testing and damaged it nearly totally. Think hard, people! Why A/B/X? Why? If you want to know if two things sound different, play them and ask. Yes, this can only be done single blind, but double blind testing is not needed if the subjects are out-of-communication with the testers, especially if the testers are unbiased. With medicines, you HAVE TO do double blind, since the doctors are involved in the treatment. Not the case with audio. Think for yourself -- this is all obvious if you start from scratch with an open mind and don't say "DBT" and "A/B/X" just 'cause everyone else does. Think.

It's not worth arguing over "difference / no-difference" vs "prefer?" testing. I like "prefer?" since I think it reduces response bias, but I can't prove that. If you want to do "difference / no-difference", I'm not against you ... as long as you use some swindles to help stratify the population ... heck, I won't even fight you if you don't use swindles as long as you don't pool the results, as long as you look for winners and then repeat the test with them, etc.

But A/B/X? Nope. Unnatural, and the wrong question is being tested.

upstateguy · Feb 28, 2009 at 7:09 AM

Quote:

Originally Posted by bobsmith /img/forum/go_quote.gif
Thanks to Pio2001's link, I came across this article regarding amps that seems to be a pretty solid set of results for the proposition that at least some tube amps sound different than some solid state amps.

http://www.soundandvisionmag.com/ass...rInterface.pdf

Hi bobsmith

IMHO it reads like typical subjectivist rhetoric, but it's something.

USG

Let's Prove The Null Hypothesis

wavoman

Headphoneus Supremus

wavoman

Headphoneus Supremus

nick_charles

Headphoneus Supremus

Nightmare

New Head-Fier

b0dhi

Headphoneus Supremus

Pio2001

100+ Head-Fier

bobsmith

100+ Head-Fier

ILikeMusic

Headphoneus Supremus

terriblepaulz

1000+ Head-Fier

Nightmare

New Head-Fier

b0dhi

Headphoneus Supremus

Nightmare

New Head-Fier

ILikeMusic

Headphoneus Supremus

wavoman

Headphoneus Supremus

upstateguy

Headphoneus Supremus

Users who are viewing this thread