Scientific Listening Experiment: Can we tell the difference? Let's find out.

Ross · Dec 16, 2004 at 6:51 AM

Quote:

you're grossly confusing the testing of accepting the test hypothesis vs rejecting the null hypothesis

And you think my reply is meaningless?

PhilS · Dec 16, 2004 at 7:16 AM

Quote:

Originally Posted by Publius
Ross, you're grossly confusing the testing of accepting the test hypothesis vs rejecting the null hypothesis, as well as the scale of the test we're proposing. Your reply is meaningless.

I have to support Ross on this one. Obviously, there's no way the test he is proposing can be done as a practical matter, but the assumption behind it (as set forth in the first two paragraphs of his post) is IMO, absolutely correct.

TWIFOSP · Dec 16, 2004 at 2:17 PM

The more I read your guys posts about the nature of how we listen, the more I tend to agree. I'm the same way.

I'm convinced the method would work, if only we could devise the right format. I think we need to narrow the scope. One headphone, one recable vs stock cable.

Of course, then we're not really proving that "cables" change the sound. Just that one particular cable on one particular headphone changes the sound. I'm slowly going back to my original opinion that I posted in the scientific method thread. Science is rather out of place here

Hirsch · Dec 16, 2004 at 6:09 PM

Quote:

Originally Posted by TWIFOSP
No, not at all. The MSA is a means to an end. It's not part of the study. Repeat after me: It's not part of the actual study. The MSA has absolutely zero to do with cables. But with a completed MSA on the people attempting to detect a change in sound by changing cables, we can actually trust that there is or is not a change in cables.

Here, we're going to have to disagree. In order for variability in an MSA, as you propose, to have any effect whatsoever on interpretation of data in the actual study, there needs to be some sort of proven correlation between the response measured in the MSA and the response that you're measuring in the actual experiment. Does the variability in responding to known differences in amplification reflect the ability to discern differences between cables? Who knows?, since ability to discern differences in cables hasn't been quantified yet. Differences in amplification has better face validity than other possible indices of variability in your test sample, such as height (which will give you a measure of variability in your sample that should be completely irrelevant to anything, but might not be...do tall people have better hearing?). Until the apparent face validity of the correlation bewteen the MSA and test variable has been tested, though, use of any other variable as an initial screen provides a lot less information than you think.

PhilS · Dec 16, 2004 at 6:15 PM

Quote:

Originally Posted by TWIFOSP
The more I read your guys posts about the nature of how we listen, the more I tend to agree. I'm the same way.

I'm convinced the method would work, if only we could devise the right format. I think we need to narrow the scope. One headphone, one recable vs stock cable.

Of course, then we're not really proving that "cables" change the sound. Just that one particular cable on one particular headphone changes the sound. I'm slowly going back to my original opinion that I posted in the scientific method thread. Science is rather out of place here

I do think a more simplified test is better than a complicated one, and I also think that proof that one cable on one particular headphone changes the sound would not be insignificant, as many in the objectivist camp do not believe that any cable (assuming proper construction) can change the sound of a headphone.

Publius · Dec 16, 2004 at 6:44 PM

Quote:

Originally Posted by PhilS
I have to support Ross on this one. Obviously, there's no way the test he is proposing can be done as a practical matter, but the assumption behind it (as set forth in the first two paragraphs of his post) is IMO, absolutely correct.

It depends on exactly what you're testing. What I interpreted Ross's comment to mean was that even a positive test result would not be really applicable to testing for the appreciation of better cables in real-world listening, so that obviously, a negative test result would be even less important. But we're explicitly not looking for differences in preference or emotion; we're looking for differences in audibility. It's understood that more subtle effects that you hypothesize are going to go unnoticed.

It's like testing for mold when somebody comes in screaming "the test is useless! how can it possibly tell if it's affecting my health?" Of course it's not testing for emotional or behavioral effects. But that's just completely besides the point, and smells of an attempt to just kill the conversation.

Ross's comments about requiring thousands of listeners only makes sense in one of two contexts: either the effect we're looking for is very, very small (perhaps detectable by 1% of the population), or we're really trying to do a noninferiority test, which can in fact tell us once and for all if cables don't in fact make a difference. If the cable change causes a gross difference in listening quality, we only need at most one or two dozen listeners to verify that, not a few thousand, and a listening period of a few weeks rather than several months.

I hope nobody looking at this test gets the impression that we'll be testing anything like that. We won't be able to say anything conclusive under perhaps p<0.7 (hearing a difference perhaps half the time) and there's no way we can make any statement that the effect doesn't exist. All we should be doing is doing a small, solid blind test and putting another data point on the wall when we're done.

Publius · Dec 16, 2004 at 6:50 PM

Two more things.

First, why is everbody so worried about the test not really meaning anything even if a positive result is obtained, because it's such-and-such different from normal listening? No positive result has ever been obtained for a blind cable test under commonly accepted testing procedures. Any sort of half-decent positive result is going to be big news for everybody, and a stepping stone to better understanding the difference in cables in more thorough tests. In fact, I'd say we should probably make the test as lopsided as possible - the worst cable vs the best cable; try to do anything to make the switching time as short as possible, up to and including using two different headphones; uncompensated amplifier outputs if we can find them, to try to drive the system into oscillation. Anything to get a result in at least one configuration.

Second, I have no professional experience whatsoever in statistics, so take everything mathematical I say with a grain of salt.

PhilS · Dec 16, 2004 at 7:03 PM

Quote:

Originally Posted by Publius
Two more things.

First, why is everbody so worried about the test not really meaning anything even if a positive result is obtained, because it's such-and-such different from normal listening? No positive result has ever been obtained for a blind cable test under commonly accepted testing procedures. Any sort of half-decent positive result is going to be big news for everybody, and a stepping stone to better understanding the difference in cables in more thorough tests.

Maybe I'm missing something, but I think the concern is that the test conditions being proposed do not mirror the way people listen in the real world, that the test would likely suggest no audible differences (i.e., yield a negative result), and that this will be another flawed blind cable test that the "objectivists" will crow about and the "subjectivists" will argue was flawed. I think one of the purposes of this thread is to see if we can figure out a way to eliminate any problems with the test so "both sides" will accept its validity (at least to some exent). The problem is (from my perception) that it's easy to design a blind test that suggests no differences (see, e.g., many of the previous faulty tests you refer to), and very difficult (perhaps impossible) to design one that mirrors real world listening conditions and that at the same time is practical.

JohnFerrier · Dec 16, 2004 at 7:04 PM

I'm learning the stats. However, Publius may have the best ears for such a listening test.

JF

kyrie · Dec 16, 2004 at 7:15 PM

Just because I don't like seeing statistical terminology butchered.
"p < n" for some event, where n is some value, denotes that the likelihood that the event occurs just by random chance is less than n.
For example, if person X says "I could detect differences between cable A and cable B with p < 0.1," would mean that there is less than a 10% likelihood that the results of X's test could have occurred just by chance assuming that there are in fact no audible difference between A and B. In other words, the smaller n is, the more likely it is that the two cables actually sound different.
"p > n" on the other hand doesn't mean anything.

Ross · Dec 16, 2004 at 7:18 PM

Quote:

First, why is everbody so worried about the test not really meaning anything even if a positive result is obtained, because it's such-and-such different from normal listening? No positive result has ever been obtained for a blind cable test under commonly accepted testing procedures.

I don't think anyone is "worried", and in fact I would expect that there will be people who are reliably able to tell the difference. However, first, the experiment you have described does not represent "commonly accepted testing procedures". In a true blind experiment, eg one in which patients test an experimental drug against a placebo, the subjects do not know whether they have the placebo or the drug (and often don't know they are in an experiment). Here, knowing you are in an experiment, and knowing you are being tested, and knowing when a change has been made, invalidates it as a reliable experiment. There mere fact that you know you are being tested and the mere fact that you know when a change has been made will independently skew the results.

So, feel free to conduct the experiment, but understand that what you are testing is a specific proposition that has no real world application. That proposition is: can a small group of subjects with some hi fi experience, in an unfamiliar setting, on unfamiliar equipment, in artificial conditions, in a short period of time, determine that there is a difference between two components, when the point of change (or absence of change) has been identified.

As I said, I would expect that some people would be able to tell the difference reliably (I rather immodestly think I would, for example), but wouldn't be too surprised if no one can reliably tell the difference, because the conditions are just too artificial for the reasons given above. If nothing else, the pressure of knowing that your "audiophile credentials" are being tested will skew the results, and make it more difficult to discern differences. It's like having to pee in front of the doctor - I can pee without problem every day of the week (in fact, I'm pretty good at it!), but taking your willy out in public makes it decidedly shy. What your experiment is doing is asking each subject take his willy out in public (if you'll excuse the disgusting imagery), so it should come as no surprise that it is very difficult to perform in those conditions.

Even if you get a positive result, it will be a small statistical blip which tends to indicate a reasonable probability that some subjects have been able to detect the difference in a statistically significant number of cases. It won't rule out pure chance, and I can guarantee that the old ABX proponents and cable sceptics will just ignore the results, refuse to accept them on the grounds that they weren't verified or the experiment was flawed, or claim that the results are pure luck (or find some other excuse).

If, as is also possible, the results are negative, then for the reasons given above, that is not really surprising, and will demonstrate nothing because the proposition you are testing has no application in the real world.

In other words, experiment to your heart's content, but don't imagine that the results you produce will satisfy anyone and, to be honest, nor should they.

JohnFerrier · Dec 16, 2004 at 7:38 PM

Quote:

Originally Posted by kyrie
Just because I don't like seeing statistical terminology butchered.
"p < n" for some event, where n is some value, denotes that the likelihood that the event occurs just by random chance is less than n.
For example, if person X says "I could detect differences between cable A and cable B with p < 0.1," would mean that there is less than a 10% likelihood that the results of X's test could have occurred just by chance assuming that there are in fact no audible difference between A and B. In other words, the smaller n is, the more likely it is that the two cables actually sound different.
"p > n" on the other hand doesn't mean anything.

Thanks. I think I'm getting there. (I do wish to understand the stats here.) Sorry to perturb anyone.

JF

toor · Dec 16, 2004 at 8:16 PM

Quote:

Just because I don't like seeing statistical terminology butchered.
"p < n" for some event, where n is some value, denotes that the likelihood that the event occurs just by random chance is less than n.
For example, if person X says "I could detect differences between cable A and cable B with p < 0.1," would mean that there is less than a 10% likelihood that the results of X's test could have occurred just by chance assuming that there are in fact no audible difference between A and B. In other words, the smaller n is, the more likely it is that the two cables actually sound different.
"p > n" on the other hand doesn't mean anything.

I think you have been confused. There are 2 commonly used meanings of p, one for statists and one for probability.

You are using the p-value of regression taken. or in other words the statistical signifcance of some values that was measured.

the other meaning of p in probability theory is the likelihood of something happeining. i.e the probability (p) of someone hearing a difference in head phones is 80%.

People might be freely alternating between these two (which would be incorrect) but it should be pointed out that they are not necessarily using your definition.

edit...
Srry to confuse you more john

TWIFOSP · Dec 16, 2004 at 8:28 PM

Quote:

Originally Posted by Hirsch
Here, we're going to have to disagree. In order for variability in an MSA, as you propose, to have any effect whatsoever on interpretation of data in the actual study, there needs to be some sort of proven correlation between the response measured in the MSA and the response that you're measuring in the actual experiment.

No, actually there doesn't. You merely recording the measuring system's result.

So say you're in the call auditing business. Say you've got 10 call auditors that are supposed to follow a form on things to audit during a sales call. You can of course use real sales calls to perform your MSA to see how accurate and how close each auditor is to thier peers. You could also use an entirely different call type and a completely different form, so long as each person is given the same form and same call. You are merely measuring their ability to measure, and measure as a group, and nothing else.

Another example is, say you're using a ruler to measure the size of bricks. You've got several people. Some people use the edge of the ruler to start, and some use the 0 hash mark. If you want to perform a measurement system analysis on these "brick quality testers" you don't have to use bricks. You can use anything you slap a ruler on.

So why won't we swap cables in the MSA? Because the entire point is to see if we can even detect a change with cables. If the result is that we can't detect the change with cables, then we can't measure any change whatsoever. So can we automatically conclude that cables make no change? Not really. What if the measuring system we used to detect change, can't detect any change. By creating a base line of variance and ability for measurement we know exactly what kind of changes can be detected, and therefore will know how sensitive of a measuring system is required to detect a cable change.

Back to the call auditing example. If you take your auditors and give them calls you KNOW have faults with, and every single auditor passes the call, or doesn't check the appropriate box, you know your auditors are incapable (or improperly trained) of detecting the fault with the call. You can also see how consistent they are by giving them the same faults twice with a different call. If they rate it different, you know that there is a level of variability with the accuracy of detection.

I really think you keep mistaking this process for something else. Like a control test. This is a measurement test. We are simply testing and collecting data on the ability for our testers ears & listening experience (the measuring device) to detect an unknown variable.

Publius · Dec 16, 2004 at 8:54 PM

Quote:

Originally Posted by JohnFerrier
I'm learning the stats. However, Publius may have the best ears for such a listening test.

hey hey hey hey hey hey hey!

foul ball! foul ball!

Featured Sponsor Listings

Scientific Listening Experiment: Can we tell the difference? Let's find out.

Ross

500+ Head-Fier

PhilS

Headphoneus Supremus

TWIFOSP

Headphoneus Supremus

Hirsch

Why is there a chaplain standing over his wallet?

PhilS

Headphoneus Supremus

Publius

500+ Head-Fier

Publius

500+ Head-Fier

PhilS

Headphoneus Supremus

JohnFerrier

1000+ Head-Fier

kyrie

1000+ Head-Fier

Ross

500+ Head-Fier

JohnFerrier

1000+ Head-Fier

toor

100+ Head-Fier

TWIFOSP

Headphoneus Supremus

Publius

500+ Head-Fier

Users who are viewing this thread