(Note: I wrote this post late at night so numerous edits were necessary today to clarify it. You might want to make sure you are reading the new version. Yes, I am included among the people who can sometimes write terrible grammar!
)
Royalcrown,
Perhaps I have given the mistaken impression that I think measurements are totally useless or have no basis in reality. A lot of your reply here doesn't really address my point(s), but perhaps I have not made my point(s) clear.
To me, the question is: how well do measurements correlate with perceived sound? A related question is: how complete are our models of audio devices; that is, how significant is unmodeled behavior? You say that measurements are useful for deciding if two devices sound similar, and that using measurements to decide how something will sound subjectively is the wrong use of measurements.
But I never expected a measurement to tell me "what a speaker sounds like" in an aesthetic sense. However, there
is some relationship between distortion measurements and sound quality. A device with high distortion---I mean really high distortion---always sounds bad. A device with less distortion will sound better (better fidelity to the original). The question is: how far can you take this relationship? Are there any measurements that a closely correlated with sound quality? Consider measurement X: suppose a device with 1% X distortion sounds accurate, 2% sounds so-so, and 3% sounds bad. That would be a close correlation.
If a particular set of measurements is so complete that it leaves no significant unmodeled behavior, then we can use that set to determine if two devices are identical.
If you survey audio engineers, you will find a range of opinions about measurements. Some engineers say they are very useful, and will say they've found a way to relate almost any perception to a measurement. (In other words, if I report that a speaker sounds "warm" they will say, "Yup, it has a 250 Hz bump.") Other engineers say that measurements are nearly useless except for diagnosing gross types of distortion.
Why such a range of opinions? Probably because people are listening for different things. In the general "area" I move in---that is, the type of equipment and music I use---measurements have not proven to be very useful. So I downplay the usefulness of measurements.
This is not to say measurements are useless
in principle. It's just that in my "area", we don't have any good ones, that really correspond to sound fidelity in a more refined way than detecting gross distortion.
Quote:
Originally Posted by Royalcrown
Quote:
Originally Posted by mike1127
Perhaps this example will make it clearer. If two paragraphs receive the same Flesch-Kinkaid score, does that tell you the two paragraphs contain the same text?
|
Of course not - but that doesn't mean that the measurements aren't useful. The text may differ, but the Flesch-Kincaid is a metric of readability. This is a perfect example of how missing the purpose of the measurements leads to an incorrect conclusion. The Flesch-Kincaid metric wasn't designed to tell whether or not two texts are identical, they were designed to assess readability, which they do just fine. Likewise, audio measurements aren't designed to tell you what sounds good; that's probably not possible in the first instance. But that doesn't mean that audio measurements should be disregarded altogether - that's throwing out the baby with the bathwater.
|
I was addressing the question of whether audio measurements are good for determining whether two devices are audibly the same. You raised this point yourself:
Quote:
Originally Posted by Royalcrown on measurements
They exist to determine whether or not a device is beyond audible limits or to determine differences between components (i.e. whether or not a difference exists, not how that difference will perceptually manifest itself).
|
First of all, Flesch-Kinkaid is a measure of
comprehensibility, not
readability. I would like to see a fifth grader prove their comprehension of that paragraph by repeating it back in their own words.
The issue is not "what measurements are for." The issue is that any measurement of anything, whether it is Flesch-Kinkaid or frequency response, is a kind of lossy compression. It takes a rich source of information and compresses it into a few numbers. That's why Flesch-Kinkaid cannot tell you if two paragraphs are the same---because so much information has been lost. The same is true of audio measurements.
Quote:
Perhaps I should rephrase my question: do you have any substantiation that this unmodeled behavior matters? Perhaps theoretically there's some behavior we haven't measured, but there's no proof that said behavior actually matters when it comes to audio. |
This question is at the heart of the matter. However, I think it's silly to presume that our measurements are complete, and that the "proof" is needed to show they aren't. It should be the other way. Our measurements
are grossly simplified representations of complex behavior. No proof is needed for that. That is the
definition of a measurement.
Consider codec research. Codec researchers have models of the ear/brain that predict whether the distortion induced by lossy compression is likely to be audible, and how audible. Through experimentation, both listening tests and developing understanding of neurology, those models have been refined.
If the models are good enough, there should come a time when codec researchers can predict the results of listening tests without having to do them. For example, if someone comes up with a new lossy compression algorithm, we should be able to apply a measurement to it and determine how it would fare in listening tests. (Because you are interested in determining if two devices sound the same, consider that equivalent to asking if a codec introduces audible distortion.)
If we are going to say our models are
complete, then this should be true of any new codec... particularly if it's a radically new type of algorithm. No matter how new or different than what we've seen before, it should be analyzed via existing measurements, and any listening results predicted with perfect accuracy.
But I think most scientists would never say that you should stop running the listening experiments. There may come a time (or it may have come already) when so much success has occurred that it doesn't seem necessary to run the tests, but I'm sure that scientists would want to keep running some tests, especially with new ideas.
I am not a codec researcher, so we would really have to get one to comment on the question: how complete our are models of the ear/brain's response to lossy codecs? You asked" "Have we ever predicted two things to sound the same, and yet they prove to be distinguishable in a double-blind test?" I would guess this situation comes up frequently in codec research. If it didn't, the research would be over! We would know everything there is to know about codecs.
Now, codecs are a specialized area of knowledge. A particular type of device. If you are going to say our measurements are complete as a whole, that would apply to all audio devices... amplifiers, DACs, headphones, etc. (*)
It seems to be a very, very bold claim to claim that we have a way to completely characterize the behavior of all these devices. Consider what a brilliant accomplishment it would be to achieve this just for codecs... and codecs are probably THE easiest type of device to run ABX tests on.
Because I'm not an audio engineer, I don't have a lot of field data. EDIT: let me qualify my resume a bit here. I took a class in college in which I designed speakers and worked with an audio engineer. We did a lot of measurements, and none of the correlated to sound quality, especially not frequency response or harmonic distortion. After college I worked for a company that serviced audio test equipment and I spent some time at Harman, so I was able to observe in an indirect way their process for evaluating speakers through measurements. This was Floyd Toole, supposedly the king of scientific/subjective evaluation. And their speakers sounded terrible. Something was wrong with their approach.
Later I discovered that LP is a higher-resolution medium than CD. There are no measurements to explain that. I have some of my own guesses, though... I think the impulse response of a system may be key, and LP and CD have very different impulse responses (because CD has a brick-wall filter at 22.05 KHz). But there is no single number that demonstrates less distortion in analog. I take this to mean we haven't found the right measurement.
(EDIT - clarified this terrible sentence!) Part of my opinion comes from this fact: the good pieces I own are made by companies who don't rely exclusively on measurements and put subjective evaluation as a high priority; while companies who emphasize measurements over all else make terrible-sounding stuff (to my ears).
(*) EDIT: to say we can completely characterize a device doesn't mean that we can predict how that device will sound subjectively in all circumstances, but it means there are no significant unmodeled effects, so that two devices can be compared for audible equality with confidence.