Along with strictly psychological issues like cognitive bias, I find that many audiophiles seem to be either unwilling or unable to understand certain fine nuances of meaning.
And I find that this specifically seems to relate to many of the apparent disagreements in this particular forum.
In many of the tests I've read about, the protocols used are seriously limited, or actually flawed.
And, in MANY cases, people try to use the results of tests to bolster their arguments when the results themselves simply fail to do so.
For example, you could test a few dozen people, or even a few hundred, and conclude that "peanuts are not toxic to humans".
(The reality is that peanuts are perfectly safe for most people, cause slight allergic reactions in some people, and are quite lethally toxic to a very few.)
As an example of the problem.... to pick a subject that's a personal favorite of mine......
"Is there an audible difference between high-resolution and CD resolution audio files?"
This seems to be a very simple question, which suggests that a simple answer might in fact be found....
HOWEVER, actually designing a protocol to test it properly is rather complex, and the details of such a test will depend on your goals.
For example, let's say my goal is to determine: "If most people can tell the difference between 16/44k and 24/96k files."
(This would be the sort of question I would want to know if I sold high-res files, or a player that plays them, or even a magazine that reviews them.)
To test it, I would probably make up a group of ten files... composed of a random mix of different resolutions...
Then I would pick a random sample of 100 people, ask each to rate which files they thought were high-resolution, and then correlate the results.
A good correlation between their guesses and the actual sample rates of the individual files would indicate that "most of them noticed a difference".
And a poor correlation (approaching random) would indicate that most of my test subjects hadn't been able to tell the difference.
However, you have to be
EXTREMELY CAREFUL to avoid "reading things into the results that aren't there".
Let's just assume, for the sake of our discussion, that, out of 1000 total guesses, 516 were correct and 484 were wrong (we required each person to pick one or the other).
From an "overall statistical view" our results would seem to show that "the majority of people cannot tell the difference most of the time".
(516/1000 is well within random variation for a sample of that size.)
And, so, if our goal was "to find out whether the majority of people could tell the difference" then we have probably got a usable result (the result being "no").
HOWEVER, have we proven that "there is no audible difference"?
Not at all.
What if it turned out that, out of our 100 participants, five of them (that's 5% of our sample) were right 90% of the time?
We would then have a very positive correlation with a specific portion of our test sample.
If even one person could tell with 90% accuracy, then we have a pretty good case to claim that "at least some people can probably tell the difference with good reliability"...
And, if five people guessed with 90% accuracy, we would have an even better case to claim that "a significant minority of people seem able to tell the difference with good reliability"...
(However, notice that, if we'd only looked at the overall number, we would have missed that significant portion of the result.)
Now, which result is "the correct one"?
The answer is: BOTH OF THEM.
However, which result might be more useful to you or I may depend on what we want to use it for.
- If I'm setting up a new "easy listening broadcast radio station" for the general public, I would probably conclude that "most of my intended audience won't notice the difference".
- BUT, if I was setting up "a new audiophile radio station for discerning listeners", I might decide that many of the members of my target audience would be among the 5% who notice.
- So, if the goal of my study was: "To find out if MOST people can hear the difference", I would have a result: NO.
- BUT, if the goal of my study was: "To find out if there was an audible difference", I would also have a result: YES (because I have several test subjects who rated very highly).
- AND, if I personally am trying to decide whether it's worth buying high-resolution files, then that result would be somewhat inconclusive.
(If that's the case then my best bet is to take the test myself.)
Note also that our results would absolutely suggest further study.
After all, there is some statistical probability that, by random chance, some of my subjects will score far better than the random average.
(Flipping a coin and getting ten heads in a row by random chance is extremely unlikely, but the odds aren't actually 0%.)
Statistics tell us that, if only one person guessed with 90% accuracy, there's a good chance it's random; but, if five people guessed with 90% accuracy, the odds of that are lower.
We've also only determined our results with certain music samples, certain associated equipment, and under certain test conditions.
We always need to consider how our test conditions relate to real world usage conditions:
- perhaps, with a different set of speakers, the results would be the same - perhaps not.
- perhaps we would get different results with speakers - or with headphones
- perhaps the results would be different with different types of music.
- perhaps, if my new audiophile radio station is going to be dedicated to 70's and 80's era rock music, it might be a good idea to run a test specifically with those.
- and, if I'm hoping to attract 50 and 60 year old listeners, perhaps I should be more interested in what they hear than in what high-school students notice
- (and perhaps people would score better after being "trained" by hearing both versions of all the files first)
As a "science oriented discussion area" it would be really nice if people would be, well, more detailed and scientific about both their claims and their conclusions