Quote:
Originally Posted by wavoman /img/forum/go_quote.gif
The statistical reasoning presented by some posters here, while correct, is based on classic significance testing, which begins with the assumption (the "null hypothesis") that there is no difference between the two systms and will not move from that hypothesis unless there is major-league evidence in the other direction. It also assumes a homogeneous population vis-a-vis the ability to detect a difference.
Not appropriate here.
|
I think this is still the way it should be. If an inventor produces a new system and declares it is "better" you really do want to see some hard evidence and you want to be cautious at least. So the inventor has to provide evidence that system B is better than system A. Since better is difficult to quantify (and preference can go against technical superiority, see CD vs LP
) we often settle on testing for different. If not different then B cannot be better than A.
As for distribution of discriminatory abilities this is a straw man. Look at all the most interesting AES listening test papers they all describe the listening populations, often (Benjamin and Gannon , Ashihara et al, Meyer and Moran, Blech and Yang***) they deliberately choose subjects whose abilities should be better than average. In short they often go out of their way to give System B as strong a chance as possible.
Quote:
Think like this: suppose there is in fact a small difference, not always obvious, but some of the time A sounds better than B to some individuals (and never the other way). |
This is a hypothesis that you must go out and find evidence for before going further. If you can find evidence for this then you can proceed.
Quote:
With this hypothesis, people who really can hear a difference some of the time, and would like therefore to own A instead of B, still post results that would seem to only be chance. |
If the data does not support the model then the model has to be revisited
Quote:
The way out of the bind is to isolate the subjects who show a preference for A over B (not significant, but in the right direction), and re-test them. |
This is called cherry picking , I have done this myself and I have an ongoing debate with one of my committee members about how valid it is, in the sub-field of the academic community which I am nominally active in this is a semi common practice. You can justify it but you really have to have a strong case to do it. I do not think it is a valid approach here.
Quote:
Moreover the published tests are not well done, they don't simulate real listening, they ask for difficult (A/B/X) instead of realistic (A>B) comparisons, etc. etc. ... all discussed in other threads. |
First establish a difference then worry about preference, without difference preference is meaningless. In any case I and many others have successfully used ABX testing to show discrimination of real differences between things like codecs, file formats, distortion levels, volume levels, frequencies and so on. ABX testing can be really sensitive and there are positive results out there if you look for them.
The reason ABX testing gets a bad press in some quarters is that it flies in the face of accepted audiophile wisdom. It is uncomfortable to have evidence that contradicts a given world view.