Since we have several people on this forum who are knowledgeable about statistics, it might be interesting to propose our ideal protocol for testing audibility of subtle differences.
Since I'm engaged in a single-blind test right now, under progress, it might be interesting to make a proposal how to continue.
Here's what I've done so far. I picked eight test tracks. I chose two cables to test: a Radio Shack and a Cardas. ($3 vs. $650). I put ten little squares in a box, 5 of them labeled "Cardas" and 5 labeled "radio shack". My helper took a list of the test tracks, then for each track drew a square to indicate the cable to be used for that track.
Why didn't I have him flip a coin for each track? (Wavoman talks about swindles. This is the same issue for me.) I think a test needs contrast. I think I'm better at determining relative quality differences than absolute quality differences. I wanted to have at least three trials with each cable so the contrast would be there. If I wasn't too sure with track #3, but then track #4 was so good it reminded me what the Cardas sounds like, then I can answer with more confidence that #3 was the Radio Shack.
So we carried out this test today. I listened to the eight tracks and took notes on each. I actually didn't commit myself to "Radio Shack" or "Cardas," but I wrote down quality ratings in various factors. Like "smoothness=5," "microdynamics=8", etc.
Where to go from here?
My original idea was to repeat the eight tracks next week with the cable choices flipped for each track. I then have a chance to give the ordering for each track: was it C on week #1 and R on #2, or vice-versa. That gives a test with 8 binary answers which can be analyzed as signficiance against a null hypothesis with n=8.
We could then repeat this again on the next two weeks to get a total of n=16.
However, another possibility would be for my helper to set up the cables choices on week #2 by drawing from the box again. I would then give my answers as "no preference", "prefer week #1," "prefer week #2," etc.
I don't know how to analyze that kind of test, though, especially for significance testing. I would like a test to convince myself that the cables matter, and maybe convince a few other cable skeptics too (I count myself among them). What size N do we need to reach a nice level of significance?
I also am concerned about contrast. I would like to hear, say, track #3 with both cables. In case I'm kinda iffy about it... I'm thinking, "Well it was so-so on week 1 and maybe a touch better on week 2..." I would be unsure what to say. Maybe I preferred week 2, or maybe it was hardly a change and the significance just seems magnified to me because I'm not so sensitive to absolute differences.
Since I'm engaged in a single-blind test right now, under progress, it might be interesting to make a proposal how to continue.
Here's what I've done so far. I picked eight test tracks. I chose two cables to test: a Radio Shack and a Cardas. ($3 vs. $650). I put ten little squares in a box, 5 of them labeled "Cardas" and 5 labeled "radio shack". My helper took a list of the test tracks, then for each track drew a square to indicate the cable to be used for that track.
Why didn't I have him flip a coin for each track? (Wavoman talks about swindles. This is the same issue for me.) I think a test needs contrast. I think I'm better at determining relative quality differences than absolute quality differences. I wanted to have at least three trials with each cable so the contrast would be there. If I wasn't too sure with track #3, but then track #4 was so good it reminded me what the Cardas sounds like, then I can answer with more confidence that #3 was the Radio Shack.
So we carried out this test today. I listened to the eight tracks and took notes on each. I actually didn't commit myself to "Radio Shack" or "Cardas," but I wrote down quality ratings in various factors. Like "smoothness=5," "microdynamics=8", etc.
Where to go from here?
My original idea was to repeat the eight tracks next week with the cable choices flipped for each track. I then have a chance to give the ordering for each track: was it C on week #1 and R on #2, or vice-versa. That gives a test with 8 binary answers which can be analyzed as signficiance against a null hypothesis with n=8.
We could then repeat this again on the next two weeks to get a total of n=16.
However, another possibility would be for my helper to set up the cables choices on week #2 by drawing from the box again. I would then give my answers as "no preference", "prefer week #1," "prefer week #2," etc.
I don't know how to analyze that kind of test, though, especially for significance testing. I would like a test to convince myself that the cables matter, and maybe convince a few other cable skeptics too (I count myself among them). What size N do we need to reach a nice level of significance?
I also am concerned about contrast. I would like to hear, say, track #3 with both cables. In case I'm kinda iffy about it... I'm thinking, "Well it was so-so on week 1 and maybe a touch better on week 2..." I would be unsure what to say. Maybe I preferred week 2, or maybe it was hardly a change and the significance just seems magnified to me because I'm not so sensitive to absolute differences.









.