Blind cable testing: initial report

mike1127 · Jul 20, 2009 at 4:32 AM

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
Rather, there are two answers for each trial -- the observation is a vector. The two elements of the vector are NOT independent.

What's the reasoning they aren't independent? No information or state from sub-trials 1 & 2 gets carried to sub-trials 3 & 4.

wavoman · Jul 20, 2009 at 4:36 AM

Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
I don't give separate answers to "what is the third" or "what is the fourth". The hope is that the back-to-back contrast makes it easier to identify ABBA or ABAB.

Initially each trial involved only one answer to a binary question: ABBA or ABAB.

We agree -- this is just like A/B/X. One binary answer.

Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
Here's the reasoning that each trial involved two answers ...

Of course you do. We agree. But the two answers are not independent.

Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
So I consider myself 7/8.

That's not right -- see my previous post.

Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
It seems to me your idea of "superior" or "inferior" protocols has to do with our confidence in hearing differences under certain conditions, but nothing to do with the question "Can they be heard under ideal conditions?"

I believe the "two-choice with swindle protocol" is better (a) because the statistical test will have more power for a given number of trials [due to the swindles], and (b) for the reason you say (avoiding response bias).

I agree that if we could create "ideal conditions" it would matter a lot less what tests we did -- we would learn the truth quickly using nearly any protocol. You are right about that.

wavoman · Jul 20, 2009 at 4:39 AM

The lack of independence is between the answers, not between the first two and the second two listenings.

Whether or not you can tell which is the good or bad cable is very related to whether or not you can tell which two samples match.

Think this way: if the effect of the cables was so dramatic that everyone could instantly match them, they could also instantly tell you which was which. Now think this way: if the cables were really identical, then nobody could answer either question correctly.

You see -- your answers are NOT independent of each other. You have 4 trials, each with two answers that are highly related.

[to be continued ... gotta get some sleep ... sorry]

mike1127 · Jul 20, 2009 at 4:40 AM

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
Bad protocol, really bad protocol.

DO NOT flip to the other letter for the next week. That UNBLINDS the test ... you know that you have a different cable.

It does not unblind the test. Each track has one answer with a 50/50 chance of guessing correctly. Knowing a cable changed does not change that fact.

Quote:

Each week, pull another random selection of A or B for the 8 tracks.

Do this for several weeks.

NOW you have a protocol!

Sometimes you will be listening to the same cable on the same track -- but you won't know -- so your answers will be most revealing.

I agree with you, if we had unlimited time, but we don't.

The current protocol is fine in the sense that it can yield statistically significant results. Your ideas are more about collecting additional information. I would describe it by saying you want to gather more practical significance, but your ideas have no effect on statistical significance.

EDIT-- I thought some more about what you are saying, and I think I see the value.

One of the problems we face in audio testing is that a person's expectations can overwhelm a true perception. So if I get the idea that this week's cable is A, that might cause me to expect next week will be B, which may prejudice my perception.

Also, when I say your idea has no effect on statistical significance I don't mean it doesn't increase the number of independent trials, which of course it does.

mike1127 · Jul 20, 2009 at 4:43 AM

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
The lack of independence is between the answers, not between the first two and the second two listenings.

Whether or not you can tell which is the good or bad cable is very related to whether or not you can tell which two samples match.

Think this way: if the effect of the cables was so dramatic that everyone could instantly match them, they could also instantly tell you which was which. Now think this way: if the cables were really identical, then nobody could answer either question correctly.

You see -- your answers are NOT independent of each other. You have 4 trials, each with two answers that are highly related.

[to be continued ... gotta get some sleep ... sorry]

(note -- this post edited from what I originally posted after I figured out what you were saying)

I think your reasoning is wrong. I'm pretty sure that the definition of "independence" in statistics has to do with the null hypothesis, which is that the subject is essentially a "fair coin toss." When we say there is no dependence, we mean that in the null hypothesis we have 2**N equally likely results for a N-trial test. And that is true in my test.

The null hypothesis is that I can't tell the difference between the cables. In that case, the answers to my two questions are independent. Each one is a coin toss.

To reject the null hypothesis is to reject the idea that the tests are N independent coin tosses.

If the test is properly designed to look at sound alone, then the alternative hypothesis is that I'm picking up on the sound. There is no way for the two answers (identity and ordering) to be dependent on each other unless I'm hearing them. In which case, null hypothesis is rejected.

TStewart422 · Jul 20, 2009 at 11:58 AM

You can't change the test and include the original results in the conclusion. That's not how the Scientific Method works... you're not the government, you can't change the rules after the game begins and expect to count what happened before the rule change...

wavoman · Jul 20, 2009 at 2:19 PM

Under the null hypothesis if you are just guessing independenly then, well, you are just guessing independently, so you are correct, and in this narrow techincal sense -- which is irrelevant to the problem at hand -- your signficance test for n=8 is valid, but all wrong. You do not have the power against alternatives that n=8 implies.

Your test would be rejected by every single statistical journal -- you are really way wrong here.

n=4 if you do the analysis corrrectly. Two related measurements. n=4 for both.

And you absoultely unblind the test in an essential way if on the second trial you know it has been flipped. I think you agree with me now on this.

wavoman · Jul 20, 2009 at 2:28 PM

Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
your ideas have no effect on statistical significance....Also, when I say your idea has no effect on statistical significance I don't mean it doesn't increase the number of independent trials, which of course it does.

In the very narrow technical (and not particularly important) sense of "significance" my protocol is no better than yours, but who cares. The real issue here is power. My ideas introduce swindles, where we compare A to A but you don't know that. In this case it is an absolute fact that the null hypotheses is true, which gives my protocol incredible discriminatory ability.

Significance testing is in general highly flawed and a poor way to make inferences and advance knowledge, unless power is high -- you too often stay with the null hypothesis when you should not.

The fastest way to settle all this is to give the golden ear boys tests that include swindles -- that'll smoke 'em out fast when they describe the difference between two identical cables!

wavoman · Jul 20, 2009 at 3:11 PM

More to think about -- I'll bet that even under the null hypothesis, in the real world, people's answers to your two questions are not independent. Why? -- because of response bias. You are a deep thinker, trying hard in this test. You might answer independently (although in reality I think you make one aural decision and it colors both your answers). Your average subject however just shoots from the hip.

So: make A and B the same cable. Give the test to a lot of people. Half will say ABAB, and half will say ABBA. That's your first question. But I bet, if we look at the ABAB responders, more than half will say "A=Good". Should be half, but response bias rears its ugly head. Or maybe less than half, who knows ... but people are influenced by the order in subtle ways we don't understand.

I am going to go further and claim there is even a potential for response bias in the first question. Here is my prediction, based on nothing but my imagination, of what would happen if A and B were identical:

More than half will say ABBA. Since the cables are the same, they hear the second B as the same as the first B, and say so. "A" is too far back in the memory. I also think this group will claim "B is good" more than half the time. It's just natural to do so.

Of the group that says "ABAB", more than half will say "A is good". They have some bias towards the first sample, they are trying to game you, whatever.

We are going to find out someday.

mike1127 · Jul 20, 2009 at 10:53 PM

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
More to think about -- I'll bet that even under the null hypothesis, in the real world, people's answers to your two questions are not independent. Why? -- because of response bias.

Wrong. That's why the order of presentation is randomized. It eliminates response bias in the null hypothesis.

And in my trials, the order of the first two sub-trials and last two sub-trials are randomized independently. One reason they are independent.

Pio2001 · Jul 20, 2009 at 10:59 PM

Hello,

Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
So in this analysis, we have done 8 trials, and I have gotten 7 right. This reaches a significance level of (8 choose 1) / (2 ^ 8) or 3%.

So I have already succeeded.

Two remarks here.

First, 3% is a success for you. Fine, and ok for me.
But it would not be a success for me. After all the experiments and tests I have done with interconnects, I have seen so much evidence in favor of the null hypothesis that for me, 0.1 % would be a maximum to change my mind.
After all, I have already seen someone getting a 0.2 % probabilty of false success listening to... nothing ! Just hitting randomly the keys on an ABX software while not wearing the headphones !
Direct link : Blind test challenge - Hydrogenaudio Forums

Second, as Wavoman says, your score is not really 7/8. And you gave yourself the right explanation

:

Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
But this is a post-hoc analysis and carries some danger. For example, there is one thing that is a bit arbitrary. Why am I considered the answer about the ordering of the first two sub-trials so important that it is independent from my answer about the second two sub-trials, and why am I considering this post-hoc (i.e. it wasn't in the original protocol directions)? Because of my theory that I am most sensitive in the first or second listen, and because I was very confident about my answers. This does not convince you, of course.

Correct : the two problems are
-Why would the "fresh ears" identification be better than the "trained ears" one ? If the theory opposite to yours, that the hearing ability is better after some repetitions, then you have 6/8, not 7/8, because during the very last trials, you mistook both the sequence and the identification. Maybe you are right. Maybe the fresh ears listening was the good one. But this has not been the object of a double blind test so far. You are just assuming it.
-Why adding the second score post hoc ? If you go on with 50 other similar trials and get all sequences correct, but all identification wrong, what would you conclude ? 50/50 correct, or 50/100 correct ? You give yourself the choice between two possible ways of getting the score, and obviously pick the one with the best result. It multiplies the probability of false success by roughly 2.

Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
This test takes two days, a week apart. I pick 8 test tracks. On both Day 1 and Day 2 I listen to the 8 tracks. However, for each track there is a random assignment of cable A (the cheap cable) or cable B (the expensive cable).

As I listen to each track, I write down my impressions of it, and try to assign a score (from 1 to 10) to various aspects of the sound, like the highs, the dynamics, etc.

[...]

After Day 2 is over, I compare notes. For each track, I see if I rated it more highly with the good cable.

That's a good idea. But you need to clearly define before the test what exactly will be considered as a success, and what exactly will be considered as a failure.
Otherwise, there will always be a possibility to mess with the different tracks and characteristics, pick the ones that make a good score and say "these ones were the most revealing, the other ones can be discarded".

You must not do this "post hoc".

However, in order to make the test easier, you can do it after all listenings are over, but before your friend gives you the right answers. This way, you can still discard the tracks or characteristincs that didn't seem significant for you, you won't know if it will make your score go higher or lower.
It is better, in this case that you do this choice without your friend looking. Non-verbal clues can be powerful.

Dane · Jul 20, 2009 at 11:10 PM

May I suggest that you know beforehand what A is and what B is. I see no reason why the identity of A and B are hidden from you, it just adds unnecessary complications in interpreting the results.

I'm not completely sure about this, but I think that in ABX tests the subject is free to familiarize himself with A, B and X for at long as he reasonably wants; A and B are known, all he has to do is tell whether X=A or X=B.

I don't see the purpose for why you also randomize the identity of A and B, you only need to randomize whether you hear ABAB or ABBA.

mike1127 · Jul 20, 2009 at 11:11 PM

Quote:

Originally Posted by Pio2001 /img/forum/go_quote.gif
That's a good idea. But you need to clearly define before the test what exactly will be considered as a success, and what exactly will be considered as a failure.
Otherwise, there will always be a possibility to mess with the different tracks and characteristics, pick the ones that make a good score and say "these ones were the most revealing, the other ones can be discarded".

You must not do this "post hoc".

However, in order to make the test easier, you can do it after all listenings are over, but before your friend gives you the right answers. This way, you can still discard the tracks or characteristincs that didn't seem significant for you, you won't know if it will make your score go higher or lower.
It is better, in this case that you do this choice without your friend looking. Non-verbal clues can be powerful.

Hi Pio,

Thanks for your always-helpful suggestions. I understand that the standard of proof for someone else is different than for myself. Part of the reason for this test is to convince myself that something is going on sonically with cables... not to "prove" it beyond a shadow of a doubt to anyone, including myself... but to act as a guide for setting up my system.

If I am confident during the tests what I am hearing, and I don't feel like I am guessing, that is convincing to myself. I understand that does not convince anyone else, because you can never know what I was experiencing first-hand.

In this second test, I do think that I will discard some of the results from the 8 tracks, when I am not sure. I am using different music for each track, and it is a likelihood that some music is more revealing than others. So at the end of the test, before I discuss the results or even speak about the test (anything) to my helper, I will go through my notes. If, for track N, the differences in my notes are clear, I will make a guess. If not, I will discard track N from the test.

Thanks,
Mike

mike1127 · Jul 20, 2009 at 11:13 PM

Quote:

Originally Posted by Dane /img/forum/go_quote.gif
May I suggest that you know beforehand what A is and what B is. I see no reason why the identity of A and B are hidden from you, it just adds unnecessary complications in interpreting the results.

I'm not completely sure about this, but I think that in ABX tests the subject is free to familiarize himself with A, B and X for at long as he reasonably wants; A and B are known, all he has to do is tell whether X=A or X=B.

I think that's a fault of ABX testing. The reason to hide the identity of A and B is to reduce the chance that I impose bias on listening to them.

It's funny--objectivists say that we impose bias on a listening experience, which I agree with. Now let's call ABX a "measurement instrument." You are introducing "noise" into the "measurement" when you allow the listener to know the identity of A and B, by the objectivists own reasoning.

Dane · Jul 20, 2009 at 11:19 PM

Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
I think that's a fault of ABX testing. The reason to hide the identity of A and B is to reduce the chance that I impose bias on listening to them.

Oh, you mean that you always hear the "rat" cable before the "snake" cable, in case A and B are fixed and known. Yeah, that would add bias. Good point (if that was indeed what you meant

)

Featured Sponsor Listings

Blind cable testing: initial report

mike1127

Member of the Trade: Brilliant Zen Audio

wavoman

Headphoneus Supremus

wavoman

Headphoneus Supremus

mike1127

Member of the Trade: Brilliant Zen Audio

mike1127

Member of the Trade: Brilliant Zen Audio

TStewart422

100+ Head-Fier

wavoman

Headphoneus Supremus

wavoman

Headphoneus Supremus

wavoman

Headphoneus Supremus

mike1127

Member of the Trade: Brilliant Zen Audio

Pio2001

100+ Head-Fier

Dane

500+ Head-Fier

mike1127

Member of the Trade: Brilliant Zen Audio

mike1127

Member of the Trade: Brilliant Zen Audio

Dane

500+ Head-Fier

Users who are viewing this thread