Head-Fi.org › Forums › Equipment Forums › Sound Science › Propose your protocol here
New Posts  All Forums:Forum Nav:

Propose your protocol here

post #1 of 18
Thread Starter 
Since we have several people on this forum who are knowledgeable about statistics, it might be interesting to propose our ideal protocol for testing audibility of subtle differences.

Since I'm engaged in a single-blind test right now, under progress, it might be interesting to make a proposal how to continue.

Here's what I've done so far. I picked eight test tracks. I chose two cables to test: a Radio Shack and a Cardas. ($3 vs. $650). I put ten little squares in a box, 5 of them labeled "Cardas" and 5 labeled "radio shack". My helper took a list of the test tracks, then for each track drew a square to indicate the cable to be used for that track.

Why didn't I have him flip a coin for each track? (Wavoman talks about swindles. This is the same issue for me.) I think a test needs contrast. I think I'm better at determining relative quality differences than absolute quality differences. I wanted to have at least three trials with each cable so the contrast would be there. If I wasn't too sure with track #3, but then track #4 was so good it reminded me what the Cardas sounds like, then I can answer with more confidence that #3 was the Radio Shack.

So we carried out this test today. I listened to the eight tracks and took notes on each. I actually didn't commit myself to "Radio Shack" or "Cardas," but I wrote down quality ratings in various factors. Like "smoothness=5," "microdynamics=8", etc.

Where to go from here?

My original idea was to repeat the eight tracks next week with the cable choices flipped for each track. I then have a chance to give the ordering for each track: was it C on week #1 and R on #2, or vice-versa. That gives a test with 8 binary answers which can be analyzed as signficiance against a null hypothesis with n=8.

We could then repeat this again on the next two weeks to get a total of n=16.

However, another possibility would be for my helper to set up the cables choices on week #2 by drawing from the box again. I would then give my answers as "no preference", "prefer week #1," "prefer week #2," etc.

I don't know how to analyze that kind of test, though, especially for significance testing. I would like a test to convince myself that the cables matter, and maybe convince a few other cable skeptics too (I count myself among them). What size N do we need to reach a nice level of significance?

I also am concerned about contrast. I would like to hear, say, track #3 with both cables. In case I'm kinda iffy about it... I'm thinking, "Well it was so-so on week 1 and maybe a touch better on week 2..." I would be unsure what to say. Maybe I preferred week 2, or maybe it was hardly a change and the significance just seems magnified to me because I'm not so sensitive to absolute differences.
post #2 of 18
Hello Mike,
Your protocol is mathematically quite difficult to handle.

The possible lists are not equiprobable. There are 2 chances out of 9 to get one of the 56 possible lists of 5xCardas / 3xRadioshack, 5 chances out of 9 to get one of the 70 possible lists of 4xCardas and 4xRadioshack, and 2 chances out of 9 to get one of the 56 possible lists of 3xCardas and 5xRadioshack.

The problem is to calculate the probability that any given list out of the 182 possible ones has a number of matches equal or superior to yours with the one got by the operator.

I'll post my protocol later, as soon as I have some time.
post #3 of 18
Thread Starter 
I figure that if my protocol gets ridiculously complicated mathematically, I will compute the P-value by a Monte Carlo method. In other words, if at the end of this test I get 7/8 correct, then I will write a program to run thousands of simulated experiments (in which the test subject is guessing, as dictated by the null hypothesis) and see what fraction result in a score of 7 or better.
post #4 of 18
Yes, this is a correct method. You'll have to simulate the 10 little squares, and the drawing of 8 of them.

Don't forget to sum the probability of getting your score with the ones of getting even better score.

I still don't have the time to post my own protocol. But if you happen to read french, I've just posted a good deal of how I perform my own tests here : homecinema-fr.com • Voir le sujet - Le débat objectivistes/subjectivistes
post #5 of 18
I have a protocol. Take several people who strongly believe that they can hear differences between cables. They should be well-known to the audience you wish to convince (i.e. head-fi, for instance). They should be the most vocal.

Then invite them to a test. These people can use their own equipment (e.g. their own speakers, their own favorite cables, their own living room, their own CD collection), whatever they think they need to hear the difference between cables.

After it's all set up, switch their favorite cable with a Radio Shack cable, unblinded. If they DO hear a difference, then continue. If they do NOT hear a difference, you're done. Pack your bags and find another person.

Now, since they can hear a difference using their own conditions (i.e. room, CD's, equipment, and length of testing), the ONLY thing you change now is their knowledge of which cable is which.

Do exactly the same thing that allowed them to hear the difference previously. Switch the cables when the listener wants them switched, just as before, except place a blanket over the cables or cover them up so the listener can't see them.

Then do a pre-determined number of trials based on the expected alpha and beta desired. Then tally up the score.

Then repeat this experiment with everyone else who "believes." Definitely send an invitation to Noel Lee to participate.
post #6 of 18
Quote:
Originally Posted by SmellyGas View Post
I have a protocol. Take several people who strongly believe that they can hear differences between cables. They should be well-known to the audience you wish to convince (i.e. head-fi, for instance). They should be the most vocal.

Then invite them to a test. These people can use their own equipment (e.g. their own speakers, their own favorite cables, their own living room, their own CD collection), whatever they think they need to hear the difference between cables.

After it's all set up, switch their favorite cable with a Radio Shack cable, unblinded. If they DO hear a difference, then continue. If they do NOT hear a difference, you're done. Pack your bags and find another person.

Now, since they can hear a difference using their own conditions (i.e. room, CD's, equipment, and length of testing), the ONLY thing you change now is their knowledge of which cable is which.

Do exactly the same thing that allowed them to hear the difference previously. Switch the cables when the listener wants them switched, just as before, except place a blanket over the cables or cover them up so the listener can't see them.

Then do a pre-determined number of trials based on the expected alpha and beta desired. Then tally up the score.

Then repeat this experiment with everyone else who "believes." Definitely send an invitation to Noel Lee to participate.
One refinement I would suggest. Ask the believers to put up $50 of their own money, more if they are really vocal .

If they succeed they get the money back and big kudos, if not use the $50 for expenses...
post #7 of 18
Thread Starter 
Quote:
Originally Posted by SmellyGas View Post

After it's all set up, switch their favorite cable with a Radio Shack cable, unblinded.
I think this is a bad idea.

During the sighted portion of the test, the listeners need to perceive the true audible qualities of the devices under test (DUTs) so they can have a true reference during the blind portion of the test.

You are letting them know what the cable is, so expectation bias may interfere with getting their senses calibrated.

That's why my blind test protocols don't involve the direct identification of the cable, and don't have a "sighted portion"---I can listen to A and B, but I don't know which is a cheap component and which is expensive.
post #8 of 18
Quote:
Originally Posted by mike1127 View Post
I think this is a bad idea.

During the sighted portion of the test, the listeners need to perceive the true audible qualities of the devices under test (DUTs) so they can have a true reference during the blind portion of the test.

You are letting them know what the cable is, so expectation bias may interfere with getting their senses calibrated.

That's why my blind test protocols don't involve the direct identification of the cable, and don't have a "sighted portion"---I can listen to A and B, but I don't know which is a cheap component and which is expensive.
I don't follow your explanation at all. A sighted portion is absolutely necessary. It acts as a positive control. A positive control demonstrates that your test is actually capable of showing what it needs to. Without a positive control, a listener who is unable to differentiate between two cables could object and say that the equipment/room/whatever masked his ability to hear differences between the cables. However, if he WAS able to hear differences before blinding (positive control) but WASN'T able to hear them after blinding (and doing nothing else to change the conditions), then we can say with great confidence once bias was removed (listener knowledge of cable identity), the differences between tested cables were not audible to the listener.

It would be a perfectly legitimate objection for a listener to say "the system/setup I listened to did not have sufficient resolution for me to hear the difference between cables. Cables have audible differences. I just couldn't hear them on this stupid system." It sounds ridiculous, but I know people will say this. HOWEVER, if you demonstrate to the listener that he can hear differences between cables (before all you do is cover up the identity of them), he cannot make this objection!
post #9 of 18
Thread Starter 
Quote:
Originally Posted by SmellyGas View Post
I don't follow your explanation at all. A sighted portion is absolutely necessary. It acts as a positive control. A positive control demonstrates that your test is actually capable of showing what it needs to. Without a positive control, a listener who is unable to differentiate between two cables could object and say that the equipment/room/whatever masked his ability to hear differences between the cables.
What you aren't testing for, where you lack discriminatory ability, is for people who thought they heard a difference in factors M, N, O, P during the sighted portion, yet those were entirely attributable to expectation bias; while the real audible differences, Q, R and S, aren't known to the listeners before going into the test, so they listen for the wrong things.


I think Wavoman has a better idea (his test with swindles, see any of his threads).

In my own blind tests, sometimes I keep the identity of A and B hidden from me. I don't use a comparator box, so to keep the number of trials down, I have used a test I call ABBA/ABAB. A and B are randomly assigned to the expensive and cheap component though a coin toss by my assistant. The assistant then chooses ABBA or ABAB randomly, and I am given four presentations---I have to guess which order.

There is no question that people can hallucinate differences which aren't there, or attribute good qualities to a cheap component. I've done it myself. Therefore any protocol which permits the listener to hallucinate differences during the sighted portion has major problems, at best. My test doesn't eliminate this, but at least it keeps me from expecting the expensive component to be the best one.

Quote:
HOWEVER, if you demonstrate to the listener that he can hear differences between cables (before all you do is cover up the identity of them), he cannot make this objection!
I don't really care whether they "make an objection," I care about the truth.
post #10 of 18
"smoothness" and "microdynamics" mean little if undefined and without a calibrated benchmark (a system arbitrarily defined as "5" for example).
post #11 of 18
Quote:
Originally Posted by mike1127 View Post
What you aren't testing for, where you lack discriminatory ability, is for people who thought they heard a difference in factors M, N, O, P during the sighted portion, yet those were entirely attributable to expectation bias; while the real audible differences, Q, R and S, aren't known to the listeners before going into the test, so they listen for the wrong things.
That still doesn't make sense. If during the sighted test, people perceive MNOP differences (which aren't actually present and are a result of the placebo effect), there is absolutely nothing stopping them from hearing QRS differences (true audible differences in cables, should they exist), during the blinded portion. Listeners are not robots! You don't program them to hear just MNOP differences and nothing else afterwards. If you blind them and there are now true ABC or DEF audible differences, they will report them.

Quote:
There is no question that people can hallucinate differences which aren't there, or attribute good qualities to a cheap component. I've done it myself. Therefore any protocol which permits the listener to hallucinate differences during the sighted portion has major problems, at best. My test doesn't eliminate this, but at least it keeps me from expecting the expensive component to be the best one.
What you really mean is: if we are only interested in finding true differences between cables, we cannot permit any methodology that allows hallucinated differences to be counted as true differences. This is why we do BLIND listening tests. Blinding eliminates the ability of listener hallucinations (based on placebo effect) to affect the proper identification of truly audible differences.

A completely separate issue is whether or not our test apparatus (speakers, amp, cd's, etc., and of course the listener) is even capable of detecting differences between cables. If a blinded listener can correctly differentiate between two cables beyond guessing, then who cares if he can differentiate them unblinded. However, if a blinded listener can't tell the difference, then he would simply claim that the test apparatus is flawed and not of sufficient resolution to allow him to differentiate anyway (i.e. claim false negative). Once again, this is why it is imperative that you have a positive control - some way of verifying that the listener is able to discern differences between cables. The concept of a positive control is very basic and fundamental to experimental design. Perhaps you could read about it? If you have a better suggestion for a positive control, I'd like to know.

Your typical "believer" (once who believes cables sound different) has his cable hooked up in the living room and he knows the identify of it (after all, he bought the cable). It is under these circumstances that he has convinced himself that his cable affects sound quality. Therefore, it is only natural that he be allowed to compare his cable to a cheapo Radio Shack cable under identical conditions. The only thing we change is the BLINDING and nothing else.

Quote:
I don't really care whether they "make an objection," I care about the truth.
Part of good experimental design is anticipating the objections your audience and reviewers will have and addressing them in your methodology or analysis before the objections can even be made. If you don't, a simple objection like the common one I pointed out will cause a large group of people to dismiss your results based on such a criticism. If you're not interested in designing a good experiment that will pass muster among people who capable of interpreting your experiment, then why do one? Just do your own test at home, using whatever methods you want, forget about statistics, and come to your own conclusion!
post #12 of 18
Thread Starter 
Maybe I need to know more about how each trial is conducted. What makes up one trial?

Do you pick one of the cables and ask the listener to identify it? Do you carry out a kind of ABX test in which the listener can ask to switch between the known cables and the unknown cable as much as he likes?
post #13 of 18
Thread Starter 
Let me make my issues clearer.

Say we have cables A and B which have an audible difference under the right conditions. "Conditions" include the system and room environment, as you have mentioned. But conditions also include the way the test subject uses his attention. This includes both conscious choices about what to pay attention to, and unconscious influences such as expectation bias.

For example, if a person is very influenced by unconscious factors, they won't be able to hear the real differences. If cable A is audibly brighter than B under the right conditions, but the test subject has a strong unconscious expectation that cable B is brighter, then that subject won't be able to perceive the real differences, because he will likely hallucinate that cable B is brighter.

Call the set of real differences between A and B by the name R, and call the hallucinated differences H.

When you do this "positive control," which is to ask the subject to indicate he can hear a difference under sighted conditions, you have not demonstrated that the differences R are audible in that particular system or room. The test subject could be responding to the hallucinated differences H.

Let me make a comment about ABX testing. To reliably identify X, the subject needs to be to perceive the real differences R. If the subject is strongly influenced by the hallucinated differences H, this will negatively impact their ability to identify X.

If the subject knows the identity of A and B (for example that A is a very expensive component and B is a cheap one) it seems to me that they will be strongly influenced by the hallucinated differences.

My interest in designing test protocols is at a different level than yours. In your test, you let the subjects use their attention any way they like, and assume that if there is a real difference they will find it. I'm skeptical of this approach because I think it is so easy for people to hallucinate differences. I think that many component differences between amps, DACs and cables are real, but I'm not sure that audiophiles always identify the real differences when they do sighted testing.

My interest, then, is in finding conditions that minimize the influence of H and maximize R.

I've made some progress in this area with regard to my own listening process. I'm in the middle of a single-blind cable test (using a friend as the volunteer to change the cable) that is taking several weeks. I think I have found some key factors that minimize H. At the end of the test, if I do well, I will report my findings. If I do no better than chance, I will seriously reconsider much of what I think I know about audio. I give this test a lot of weight because the conditions of the test feel so right to me---it feels like H is minimized.
post #14 of 18
Quote:
Originally Posted by mike1127 View Post
Let me make my issues clearer.

For example, if a person is very influenced by unconscious factors, they won't be able to hear the real differences.
This is not necessarily true, and I am unaware of an evidence that requires this to be true.

[QUOTE[ If cable A is audibly brighter than B under the right conditions, but the test subject has a strong unconscious expectation that cable B is brighter, then that subject won't be able to perceive the real differences, because he will likely hallucinate that cable B is brighter.[/QUOTE]

Again, this is not necessarily true. A subject who expects cable B to brighter, yet hears that cable A is brighter, could certainly say "Well, I was convinced B was brighter, but A is definitely brighter to me." If differences among cables are as great as people claim they are, then they should have NO PROBLEM detecting them, REGARDLESS of what expectations they had going in.

Quote:
Call the set of real differences between A and B by the name R, and call the hallucinated differences H.

When you do this "positive control," which is to ask the subject to indicate he can hear a difference under sighted conditions, you have not demonstrated that the differences R are audible in that particular system or room. The test subject could be responding to the hallucinated differences H.
Sure. If the subject indicates he can hear a difference under sighted conditions, he demonstrates the he has perceived R *AND/OR* H, to use your example. On the other hand, if the subject hears NO difference on the sighted test, then he has demonstrated that he can perceive NEITHER R *NOR* H. If a listener can perceive NEITHER R NOR H, then he is useless to thet study because we need people who can perceive R. Make sense? This is why we need the sighted test as a positive control.

The purpose of the BLINDED condition is to test if R is present. If a blinded listener can reliably hear a difference between two cables, he has detected R (but NOT H) by definition. If the BLINDED listener cannot reliably hear a difference, then he cannot detect R. Since the blinded listener passed the positive control and was thus able to hear R AND/OR H, but proved unable to hear R, we can conclude that he is only able to hear H, the hallucinated differences.

It's pretty simple and straightforward.

Quote:
My interest in designing test protocols is at a different level than yours. In your test, you let the subjects use their attention any way they like, and assume that if there is a real difference they will find it. I'm skeptical of this approach because I think it is so easy for people to hallucinate differences.
Your argument is that knowledge of which cable is which during a listening test will invoke hallucinated perceived differences that will later mask the listener's ablity to identify REAL differences during the blinded test. I personally see very little merit in this statement.

Quote:
I've made some progress in this area with regard to my own listening process. I'm in the middle of a single-blind cable test (using a friend as the volunteer to change the cable) that is taking several weeks. I think I have found some key factors that minimize H. At the end of the test, if I do well, I will report my findings. If I do no better than chance, I will seriously reconsider much of what I think I know about audio. I give this test a lot of weight because the conditions of the test feel so right to me---it feels like H is minimized.
To be frank, if it takes you weeks to hear differences between cables, then chances are, you wouldn't notice the difference in a 1 hour listening session. I don't know about you, but I don't listen to music more than an hour a day, maybe two if I'm bored. If during my usual listening session I wouldn't hear the difference between cheapo cable A and expensive cable B (because it takes me hours or weeks to be able to tell reliably), why on earth would I spend more money on expensive cable B?????
post #15 of 18
Thread Starter 
Quote:
Originally Posted by SmellyGas View Post

Your argument is that knowledge of which cable is which during a listening test will invoke hallucinated perceived differences that will later mask the listener's ablity to identify REAL differences during the blinded test. I personally see very little merit in this statement.
If placebo medicines can convince people they are better, regardless of the result of any objective test of their health, then I see merit in my statement. The person thinks they are better. They obviously cannot perceive their real physical state with complete clarity. Yet we know at other times people can perceive their physical state with some degree of accuracy. So, yes, I think hallucinated differences can mask real differences.

[Note: the most recent Skeptic magazine discussed placebo---to date, our best evidence is that placebo changes nothing physically measurable and only changes a person's perception of their health---note that some experiments have shown changes in the brain's pain centers in response to placebo but that is more in the domain of a person's perception.]

Quote:
To be frank, if it takes you weeks to hear differences between cables, then chances are, you wouldn't notice the difference in a 1 hour listening session. I don't know about you, but I don't listen to music more than an hour a day, maybe two if I'm bored. If during my usual listening session I wouldn't hear the difference between cheapo cable A and expensive cable B (because it takes me hours or weeks to be able to tell reliably), why on earth would I spend more money on expensive cable B?????
No, it doesn't take weeks to notice. The test follows this protocol:
  • To be compared was a Rat Shack IC and a Cardas IC. Source is a Naim CD5X cd player, IC runs to a headphone amp (the DNA Sonett), and headphone is K601.
  • Several days before the first test day, I picked 8 musical selections. I listened to them and noted interesting aspects: things I particularly enjoyed or thought my system did well. I made a spreadsheet: selections down the left side, and interesting aspects across each row.
  • For several evenings I listened to the test tracks exactly once each evening using the good setup.
  • On test Day #1, my helper choose an assignment of Rat Shack or Cardas cable to each track, randomly. I listened to each track with the cable he had chosen. (I left the room between tracks so he could change the cable. The system was hidden behind a sheet.) I listened to each track once, usually for less than three minutes. I wrote down my observations.
  • Test Day #2 happened a week later. Couldn't do it sooner just for schedule reasons. Otherwise, same procedure.
  • Test Day #3 is tomorrow.
  • Depending on my confidence, I may ask for a fourth test day and go back to some earlier configurations (i.e., "Please set up day #1, track #3 for me.") Again a week will probably go by just for scheduling reasons.

The theory I'm testing is this:
  • Cable differences are very significant.
  • People are most sensitive to the differences when they listen with "fresh ears," which means listening to fresh music (something you haven't heard in a while) and listening in non-fatiguing situations. Note that using your stereo for hours every day does not preclude having fresh ears, as long as you keep picking new selections.
  • A person can also know the sound of their system well through long-term use. Under those conditions, a person readily detects changes to the system---but most readily on the first presentation. (Rapidly alternating between the "usual" setup and "new" setup will quickly make everything sound the same, in this theory.)

I'm guessing you will tell me that my test is unlikely to yield a non-null result because people are most sensitive to differences under quick-switch conditions. Well, I guess we'll see.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Sound Science
Head-Fi.org › Forums › Equipment Forums › Sound Science › Propose your protocol here