Quote:
Originally Posted by Hirsch
Now throw in another kicker. For some trials, use a positive control, where you keep the cables the same, but make some sort of auditory change that is known to be just above a perceptual threshold. For these trials, you'd better see hits. That gives you your measure of quality control built right into your experimental design.
|
Now we're talking. This is precisely what I want to do.
How do you propse we make some sort of auditory change that is known to be just about the perceptual threshold? What defines the threshold? Is it different for every person? How do we know who can hear it and who can't? Do we pick an arbitrary change? If so, how can we seperate out should have heard it, vs who can't hear the change. If they don't key in, they say "not a cable change", not because they detected the change and decided it wasn't a cable change, but didn't detect the change at all.
If you're designing this experiment and you choose changes that are barely above your own perceptual threshold, in the end, you didn't know if your testers we're better or worse listeners. What's worse is that you also never quantified how much of a change a cable swap really is.
The answer? MSA.
And yes, part of the post analysis I thought would include an ANOVA for each listener, cable, and other changes to see which are statistically different by variation, or by variable. Probably use Tukey-Kramer.
With the MSA data, we could also construct a stepwise, full factorial fit model where we could make some conclusions about: How much change (X) does a cable really introduce? What type of hearing is required to detect X Change? We'd could of course take wild stabs in the dark as to how significant a change in setup really is. Or we could perform the MSA where by we introduce change and see how many of our listeners are capable of detecting it. We can then use the results to devise a scale of how significant said changes are. Then once we run the cable expirement, we can basically stack rank, using the model, how significant the cable changes were (assuming they are detected) and what type of hearing you need to detect the change.
I don't know about you, but I'd like to come away from this with a bit more data than "in our test group X number of people could detect a cable change X% of the time". If I read experiment results that concluded that, and only that, I'd start asking questions like: "How well do the testers detect change other than cables?" "Are they capable of detecting minor changes in the first place?" ect ect.
Maybe I'm being over zealous with the attempt, but I do not want a simple binary conclusion that determines whether testers can or can not hear a change introduced with cables.
If I could design the uber experiment, I'd have more testers, they'd spend more time with the equipment, have a wider variety of equipment, and give us opinionated feedback. Such as, "Better, Worse, or The same", and also try and give an attempt to rate on a "scale" of 1 to 10. This would allow us to construct several models that gave us an idea of what an "uber rig" might contain. And what kind of listening habbits would gravitate towards differnt "uber rigs". As I suspect there'd be more than one depending on your tastes. But in theory, there'd be groupings of the population and only a few uber rigs. But that would take entirely too much time, require entirely too many people in order to overcome the bounds of subjective opinion of "better" and "worse". But it'd still be neat