AES 2012 paper: "Relationship between Perception and Measurement of Headphone Sound Quality"
Oct 18, 2012 at 6:24 AM Thread Starter Post #1 of 135

JMS

100+ Head-Fier
Joined
Sep 16, 2002
Posts
106
Likes
19
An interesting paper from Sean Olive of Harman (AKG) is set to be published at AES2012 in a couple weeks that should be of great interest to folks in this forum:
 
http://www.aes.org/events/133/broadcast/?ID=3181
------------------------
P10-1 The Relationship between Perception and Measurement of Headphone Sound QualitySean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International - Northridge, CA, USA
Double-blind listening tests were performed on six popular circumaural headphones to study the relationship between their perceived sound quality and their acoustical performance. In terms of overall sound quality, the most preferred headphones were perceived to have the most neutral spectral balance with the lowest coloration. When measured on an acoustic coupler, the most preferred headphones produced the smoothest and flattest amplitude response, a response that deviates from the current IEC recommended diffuse-field calibration. The results provide further evidence that the IEC 60268-7 headphone calibration is not optimal for achieving the best sound quality.
------------------------
 
I'm hoping this will shake up everything we know about headphones and start the path towards better headphone science.
 
Oct 18, 2012 at 8:40 AM Post #2 of 135
Quote:
 
I'm hoping this will shake up everything we know about headphones and start the path towards better headphone science.

 
"When measured on an acoustic coupler, the most preferred headphones produced the smoothest and flattest amplitude response"
 
[size=small]Sounds to me like this is confirming what we already know. [/size]
 
[size=small]I will be interested to see the paper, certainly. [/size]
 
Oct 18, 2012 at 9:04 AM Post #3 of 135
The unexpected part is made explicit in the next sentence: "...the IEC 60268-7 headphone calibration is not optimal for achieving the best sound quality."
 
Oct 18, 2012 at 9:12 AM Post #4 of 135
To clarify, it's currently assumed that an equalization curve (IEC 60268-7 or others) should be applied to a headphone's frequency response before interpretation. Innerfidelity.com's measurements are an example. This paper suggests otherwise.
 
Oct 18, 2012 at 10:20 AM Post #5 of 135
Ah - I see. Sorry, I misunderstood what you meant. 
 
I will be especially curious to see their methodology documentation on this. 
 
Oct 18, 2012 at 9:40 PM Post #6 of 135
It doesn't say how many people took the test, I guess we'll have to wait for the whole document.
 
However, acknowledging the fact that "populations" tend to fall in a normal distribution, I'm pretty sure there's a mean in terms of the frequency response based on consumer preferences.
It would be good to know if its flat or not!
 
Oct 18, 2012 at 10:20 PM Post #7 of 135
His previous research shows that pretty much everybody prefers a flat response, at least for speakers, and in the kind of tests that are run.  You can see some results on his blog at least:
http://seanolive.blogspot.com/
 
So you can use whoever you want to listen, mostly.  And for higher statistical power, greater repeatability, it's probably better to use trained listeners, as suggested above.  Then again, that apparently doesn't seem to matter much, so it's not a big deal.  If the results are statistically significant, then it should be okay regardless.
 
The point here seems to be that the standard (well, IEC standard) diffuse-field equalization for headphones is not what people prefer, so another definition or equalization of "flat" is better than that one.
 
Oct 18, 2012 at 10:31 PM Post #8 of 135
It's interesting for a few reasons. Important audio measurements were derived by listening. Not the other way around yet it's now measurements that trump listening for some odd reason. This is an example of a past derived measurement not being optimum. This measurement is a bit more complex than distortion threshold for instsnce but that will also vary by frequency. There's always something not covered when music is playing and all the static measurements and others not covered are playing at the same time. Your ear is great way to put it all together.
 
Oct 18, 2012 at 10:34 PM Post #9 of 135
Quote:
His previous research shows that pretty much everybody prefers a flat response, at least for speakers, and in the kind of tests that are run.  You can see some results on his blog at least:
http://seanolive.blogspot.com/
 
So you can use whoever you want to listen, mostly.  And for higher statistical power, greater repeatability, it's probably better to use trained listeners, as suggested above.  Then again, that apparently doesn't seem to matter much, so it's not a big deal.  If the results are statistically significant, then it should be okay regardless.
 

I agree with this. There's always exceptions but I do believe a majority of folks will get it right if it's presented to them in a meaningful fashion.
 
Oct 19, 2012 at 1:04 PM Post #11 of 135
Quote:
Originally Posted by mikeaj /img/forum/go_quote.gif
 
And for higher statistical power, greater repeatability, it's probably better to use trained listeners, as suggested above.

It definitely is:
 

 
Oct 19, 2012 at 1:26 PM Post #12 of 135
I like charts like that! They look so official.
 
Oct 19, 2012 at 9:39 PM Post #13 of 135
Is there really so much gap? I must've imagine someone beside "Trained" would be deaf or something.
 
Oct 19, 2012 at 10:18 PM Post #14 of 135
When the authors present some kind of new or non-standard metric and plot it**, you need to know the context and what it means—information that is in the paper but not on that graph.  What's a 25 on that scale mean, anyway?  You see this kind of thing all the time in research papers.  
 
**that's not to even mention the number of papers with new metrics that are pretty much garbage, don't mean anything significant.
 
What's there is the percentage of the F-statistic for that group relative to the F-statistic for the trained group (so by definition, trained group is identical to itself so it gets 100).  For those who need a quick primer... When doing the ANalysis Of VAriance (ANOVA), a higher F-statistic value means that the result is more statistically significant.  The statistic value is a ratio between, very roughly speaking, (1) how far the responses deviate from the mean due to the treatment effect (e.g. ratings of sound quality when you're switching between one speaker and another) to (2) how far the responses deviate from the mean due to just the randomness of the results and chance.  Suffice to say, that difference is huge, but a previous intuitive interpretation of 30 vs. 100 on that graph may not really relate to the actual data being shown.
 
More or less, what it's saying is that peoples' given responses to the same stimulus are inconsistent; with training, they become significantly more consistent.
 

Users who are viewing this thread

Back
Top