Blind cable testing: initial report
Jul 22, 2009 at 1:27 AM Post #106 of 128
Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
I did not intend for my post to be condescending, only angry, and I apologize again. Condescension is not part of my nature, or if it is, I want to kill it. Anger is different, that I can live with. Although I repeat: one should not email or post when angry. Let me add an "IMHO" in case that last statement appears condescending!


wavoman, as we mentioned in PM we'll try to get to the bottom of this, which may requirement explanation in a Word doc (to typeset math) rather than an all-text forum.

I got frustrated, and so did several others here I think, because you were just telling us we were wrong. Your answers "did not add up" to me... things were so different from what we were saying and what we expected it almost seemed like you were talking to some phantom of your imagination rather than myself. I now understand that human factors statistics IS a radically different field and I'm starting to grasp some of the reasons. We will see what more we can learn.
 
Jul 22, 2009 at 6:04 AM Post #107 of 128
Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
..it almost seemed like you were talking to some phantom of your imagination rather than myself...


Yea, I'm a crummy teacher, I know that, fair criticism. I did not go in to academia for this very reason. I often have arguments with myself. The sad thing is I lose those arguments.
 
Jul 22, 2009 at 8:32 PM Post #108 of 128
Quote:

Originally Posted by sohels /img/forum/go_quote.gif
I shall attempt to make this more explicit: In general statistical usage, correlation refers to the departure of two random variables from independence. Hence, If the variables are independent then the correlation must be zero - but as demonstrated above, there exists a strong positive correlation between the answers to question one and two - and thus they cannot be independent.

Is this correct?



This doesn't seem right to me. By the same reasoning there exists a correlation between the answers of an ABX or any forced-choice test, hence every ABX or forced-choice test is currently being analyzed in the wrong way.

And, the "answers" are not the "variables." There's only one variable we are modeling, which is the probability I give the right answer. The first two answers are not "two variables." There is only one variable, so no need to deal with correlation between variables.

The right answers to the first two questions are randomized independently.
 
Jul 23, 2009 at 1:50 AM Post #109 of 128
Mike, that's not really correct.

The things we measure are observations on a random variable, and we are trying to infer something about the underlying distribution of that random variable. Now when we say "independence" in this context, we really do mean whether or not the observations, the ANSWERS, form an independent sample, one trial to the next.

That is what we mean in this context. This is standard statistical usage -- are we making independent observations of this random variable, that is the question.

Almost always in trials we are -- independence holds. Each answer is an independent answer to a test, independent of the prior answers to the prior tests.

See what we mean? (The observations really are variables, honest!)

Now if you ask TWO questions in ONE test, you run in to a problem where the answers to those TWO questions on this ONE trial might not be independent. Well, they might be, as you correctly point out, if the subject is simply guessing (and you also point out, quite correctly, that if the null hypothesis is exactly true, then the subject is always guessing).

But the issues are more subtle, as I will (I hope) eventually convince you when I write you my whitepaper. The "two answers to one test" that are not independent when the null hypothesis is false effects the power of the test, and effects the degree to which we believe the non-null ... and when you say n=8 it makes statisticians think of 8 independent observations and all that implies for power, but you don't actually have that.

Yes this is an incomplete discussion, and your probability calculations under the null are correct, but as I have said, the issues are elsewhere. I believe I will eventually convince you of this (or talk to a statistician where you work) but I do need to write a lot more words.

Hey, I have an idea! Read the entry in Wikipedia on:

Independence (probability theory)

And you will see, right at the start, the idea of independence of observations, with great examples. Then it goes on to talk about independence between variables in general, which is the concept you were thinking about.

I think if you read that you will be convinced.

Added: Mike, one more thing. When sohels says the two answers are not independent, he is talking about the two answers given on the one trial -- answers to your two questions ("is it A" and "is A good or bad"). He is not talking about independence from one trial to the next (which we hope we have -- some people will argue even that -- you know, memory effect, etc.). A/B/X tests do not ask two questions each trial, only one -- so this issue does not come up in A/B/X. Got it?
 
Jul 23, 2009 at 2:09 AM Post #110 of 128
Quote:

Originally Posted by wavoman /img/forum/go_quote.gif

Almost always in trials we are -- independence holds. Each answer is an independent answer to a test, independent of the prior answers to the prior tests.



What is your feeling, then, about computer-based ABX tests which set up N tests, each of which uses the same sound sample? Those N answers, by the same reasoning, are not independent of each other, yet I believe that common they are treated so.

In fact I'm not clear on what is necessary to make the N answers independent of each other. Use different music for each one?

The problem I'm having is that there is no conceptual difference between the two answers with "one trial" and the four answers to the "four trials" that I did. I think I used some of the same music for those four trials. If I am truly hearing the cable, then both of these are true:
  1. I'm more likely to get both "sub-answers" right within one trial.
  2. I'm more likely to get the other trials correct.

What makes sub-answers dependent that does NOT make trials dependent?

In fact, there is very little conceptual difference between asking the ordering of the cables and asking the identity. Both questions can only be answered reliably by detecting and remembering sound qualities. If sound qualities cannot be detected or remembered, then neither question can be answered.

There is only one reason to separate the questions. That is to allow me to give the right answer to the ordering in the case that sound qualities can be detected but not abstracted.

What I mean is, suppose that the cables R and C sound very different. R is brighter, let's say. That's an abstract quality which is evident with any music. It shouldn't be too hard to identify R.

Now let's suppose R and C sound different, but that difference depends on the music being used. R might sound "warmer" with some music, and "smoother" with other music. It's impossible, then, to pick the identity of the cable by some feature common to all music.

As it turned out, the sonic qualities I detected were abstractable, so I answered each of the 8 questions by identifying (in my mind) the specific cable being used.

So there's no conceptual difference between 8 or 4 trials.

Something doesn't compute.
 
Jul 23, 2009 at 3:09 AM Post #111 of 128
mike -- you are hip to the problems now!

Yes, it is possible that the sequential trials of A/B/X are not independent of each other. This has been discussed in the stats lit somewhere (can't find the reference). This can hurt the analysis, but a little bit of serial correlation one trial to the next might be OK, it often is, in the sense of not wrecking the math too much.

But it is a cause of concern, you are correct. Hey, I hate A/B/X for dozens of reasons.

Then you make many more very wise points -- maybe playing the same music wrecks independence! How can I guarantee independence anyway?

You can't. You hope. Your argument that, well, your 4 is like your 8 skewers you even worse ... it means you have less than 4 observations! You are shifting to my side now.

All we are saying is this: your two questions seem to us, if there is an effect, to be highly correlated, so you have less power than one would assume when reading n=8.

It's just never right to smash two different observational responses together. This is called "conflation", and it is to be avoided in sensory testing. We might ask "which is saltier" and "which do you like better", but we don't pool the answers.

We just don't want you to present this as 8 identical trials. It is 4 trials, with two questions. Present it that way, with the scores for each. The null hypothesis calculation is the same, you got that right, but you rejected the null so the issue is the weight of the evidence, and two 3-for-4's on two different but correlated questions is just not the same as 6-for-8 on one question, in anyone's book. Forget statistics, just think like a lay person here. Think logically ... not the same, yes?

You make excellent arguments that leave me saying -- hey, just ask the order question. The other is too tough. By asking the order question you are doing exactly what A/B/X does -- try to identify X. But you do it better -- more relaxed, at home, and after you play X you also play the other, which is neat.

And you are beating the test!
 
Jul 23, 2009 at 7:22 AM Post #112 of 128
Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
You want to conclude "can I hear a difference". So test "can I hear a difference". Simple, no?


Quote:

Originally Posted by mike1127 /img/forum/go_quote.gif
However, it was impossible for me not to have a judgment about the sonic characteristics of each cable. I just couldn't avoid it.

So in each trial, the original type of answer was ABBA or ABAB. Note that is equivalent to asking the ordering of the last two sub-trials.

The second answer is: in the first presentation, which is A and which is B?



I keep coming back to this. mike1127 is right in making himself more comfortable. But for statistical purposes, perhaps we should only consider the first answer - which it seems is necessary and sufficient to prove the null hypothesis.

Why make it more complicated, especially since we are uncertain about the second answer's contribution to the strength/power of the conclusion? Yes, this makes it an ABX test more or less - but one where he perceives greater comfort and control, while we are no worse off in terms of the validity of his results.
 
Jul 23, 2009 at 7:36 AM Post #113 of 128
Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
You make excellent arguments that leave me saying -- hey, just ask the order question. The other is too tough. By asking the order question you are doing exactly what A/B/X does -- try to identify X. But you do it better -- more relaxed, at home, and after you play X you also play the other, which is neat.


Exactly. Wonder how I missed this before making my post!
 
Jul 23, 2009 at 7:52 AM Post #114 of 128
Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
You make excellent arguments that leave me saying -- hey, just ask the order question. The other is too tough. By asking the order question you are doing exactly what A/B/X does -- try to identify X. But you do it better -- more relaxed, at home, and after you play X you also play the other, which is neat.


Again, thanks for your encouragement. I wonder if you don't know what a typical computer-based or comparator-box-based ABX test does. You can listen to A and B as many times as you want. And listen to X as many times as you want. My ABAB/ABBA protocol was designed to (1) limit the number of cable swaps necessary (because we deliberately aren't using a box and it's a lot of work) (2) hide the identity of A and B from me (so bias wouldn't come into play). It actually involves fewer comparisons than a typical ABX test.

It seems to me an obvious failing of the typical ABX test that the testee knows the identity of A and B during the "sighted portion" of the test.

By the objectivist's own reasoning knowing the identity of something can skew your perception, so this risks that the person's perception of A and B will be totally skewed, rendering them incapable of guessing X.
 
Jul 24, 2009 at 1:06 AM Post #115 of 128
Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
In the very narrow technical (and not particularly important) sense of "significance" my protocol is no better than yours, but who cares. The real issue here is power. My ideas introduce swindles, where we compare A to A but you don't know that. In this case it is an absolute fact that the null hypotheses is true, which gives my protocol incredible discriminatory ability.

Significance testing is in general highly flawed and a poor way to make inferences and advance knowledge, unless power is high -- you too often stay with the null hypothesis when you should not.

The fastest way to settle all this is to give the golden ear boys tests that include swindles -- that'll smoke 'em out fast when they describe the difference between two identical cables!



I think this is a good and important idea - include some amount of AAX and / or BBX into the mix. If you are randomly picking the cables and X, then this does not actually even really complicate the protocol. Just always toss the coin to know which cable to use.
 
Jul 24, 2009 at 2:50 AM Post #116 of 128
mike1127 -- I meant the question is the same between your protocol and A/B/X, that's all. Well, one of your questions, that is. Since the fourth choice is forced for you, it is "like A/B/X with a second listen". That's all I was trying to claim.

I totally agree that your blinding the identities of A and B from the outset is far better. You are right as rain about the response bias problem that classical A/B/X introduces.

Remember that A/B/X was invented to allow you to do blind tests on your own, or true double blind tests -- there was no way to hide the initial identities of A and B under these conditions, and A/B/X is pretty damn clever given these limitations.

But you have realized -- as I did too -- that single blind will work here, as long as the listener is totally isolated from the experiment leader to avoid tipping (Clever Hans).

I like your protocol. Even better would be: listen to A and B as long as you like without knowing the identities, then tell us if they are different (or tell us your preference). Of course this means your buddy has to move in with you, so it is not practical. But, ah, if we could build two cables that appeared identical but were not really ... then you could do this all by yourself. The perfect blind test.

As you know, I am pitching to do this.
 
Aug 9, 2009 at 5:56 AM Post #118 of 128
I find this fascinating, but I do not understand why it is under "sound science" since there is nothing remotely scientific about it. As I understand it, you compared (blind) two interconnects, and they exhibited a noticeable difference in frequency response. You used this difference to defeat a "straw man" who claims that all cables are indistinguishable. However, "objectivists" have never made that claim; rather, they claim that competently made cables with no significant measurable difference will sound the same. You seem to have compared 2 cables, at least one of which exhibits a measurable defect, and you were able to identify the cables by their sonic signature. This is not surprising, since it has been demonstrated more than once that several very expensive cables exhibit such gross defects in their R/L/C characteristics that even with normal source- and load-impedances they would act as low-pass filters. Some may find the loss of high frequencies to sound "better".

However, what is interesting is not your ability to hear the difference between 2 cables, one of which exhibited a gross frequency response aberration, but your ability to convince yourself, once you knew the identities of the cables (which could be differentiated by anybody without severe hearing loss), that the one which clearly was losing information was somehow superior.

In other words, your initial test allowed you to identify a frequency response difference between two cables, but then you inferred other differences based only on your knowledge of which cable was expensive, even though the expensive cable clearly was rolling off high frequencies.
 
Aug 14, 2009 at 10:13 PM Post #119 of 128
Wow. Although this was a very thorough and well throughout test, it's not going to bury the whole cable argument. People perceive things differently and some can say that they can hear differences while others can't.

Great read though.
 

Users who are viewing this thread

Back
Top