AES2013: Listener Preferences for Different Headphone Target Response Curves

JMS · Jul 20, 2013 at 3:08 AM

Sean Olive, Todd Welti, and Elisabeth McMullin of Harman have recently published another intriguing paper in subjective headphone sound quality. Following their previous paper "Relationship between Perception and Measurement of Headphone Sound Quality" discussed in this thread, they have performed more subjective, blind listening tests on the subjective preference for different headphone frequency response curves.

I'll summarize the main points here.

Setup:

Headphones Sennheiser HD518 and Audeze LCD-2 rev 2 for chosen for this test, based on consistency of seal, low distortion, and extended frequency response.
Measurements were made using a GRAS 43AG ear and cheek simulator, mounted on a Styrofoam manikin.
Listeners were asked to rate each headphone after equalization to a number of different target response curves.
Two of these curves, RR_G and RR1_G, were created by measuring the in-room response at the listening seat of Harman's 7-channel, 4-subwoofer listening room. The RR1_G curve was based on an in-room response that, when measured using a regular microphone, is basically a straight line that slopes downwards with frequency. Such a response was favored in subjective experiments in loudspeaker listening. The RR_G curve corresponds to a straight horizontal line, except with a bass boost. For reference, here is the measured RR1_G curve (black line) as measured on the Audeze.

Results:

Test using Audeze LCD-2 showed that the RR1_G response was by far the most preferred, over RR_G, no-EQ, diffuse field, and free field curves.
Tests using Sennheiser HD518, which did not include RR1_G, showed that the RR_G response was by far the most preferred.
Though the headphones were tested in separate tests so I'm not sure how comparable scores are across tests, I do note that HD518 equalized to RR1_G had a higher rating (5.85 vs 4.08) than an unequalized LCD-2 (!).
From the paper's Conclusion: "....a headphone target response curve that approximates the in-room response of a calibrated loudspeaker produces a higher quality of sound reproduction [than diffuse-field and free-field] that listeners prefer."

Big big kudos to the researchers, as well as Harman for publishing this research. Audeze and Sennheiser take note! Harman just published some free research very pertinent to your interests!

JMS · Jul 20, 2013 at 3:19 AM

My own questions are:

No details were given on how the RR1_G curve was derived from the normal in-room response:
- Was it a steady-state measurement using GRAS 43AG?
- Did the Audeze's equalized response match it exactly?
- Why average 7 channels individually, versus playing them simultaneously, interference and all, if our own ears do hear the same artifacts?
The results of this paper don't depend too much on how the curve was derived, but it'd be good to know for those who want to measure a personalized version of RR1_G using their own heads. I tried doing this in a previous thread, but did not measure nearly as drastic a bump at 2-8khz as the image above.
How comparable are the scores across the LCD-2 test and the HD518 test? Since most of the listeners overlapped between the two tests, I would think the scores should be pretty comparable. However, in this test, an unequalized LCD-2 scored a 4.08, versus 6.55 in the 2012 paper, which also uses an "11-point scale". Is this due to the pool of listeners being different, or because now the listeners have better baselines for comparison?

Tonmeister2008 · Jul 20, 2013 at 7:41 PM

Quote:

jms said:
My own questions are:

No details were given on how the RR1_G curve was derived from the normal in-room response:

Was it a steady-state measurement using GRAS 43AG?

Did the Audeze's equalized response match it exactly?

Why average 7 channels individually, versus playing them simultaneously, interference and all, if our own ears do hear the same artifacts?

The results of this paper don't depend too much on how the curve was derived, but it'd be good to know for those who want to measure a personalized version of RR1_G using their own heads. I tried doing this in a previous thread, but did not measure nearly as drastic a bump at 2-8khz as the image above.

How comparable are the scores across the LCD-2 test and the HD518 test? Since most of the listeners overlapped between the two tests, I would think the scores should be pretty comparable. However, in this test, an unequalized LCD-2 scored a 4.08, versus 6.55 in the 2012 paper, which also uses an "11-point scale". Is this due to the pool of listeners being different, or because now the listeners have better baselines for comparison?

To answer your questions:

1.
(a) Yes, all of the headphone and in-room loudspeaker frequency response measurements were steady-state. The in-room loudspeaker measurement for RR1_G and RR_G are based on measurements of the LSR 6332 + subs made with a 43AG flush mounted to a head. Details are in the paper.
(b) Although we didn't show the 43AG measurements of the Audeze equalized to different targets in the paper I showed them in my PPT presentation which I will soon post in my blog. The equalized response matches the RR1_G and RR_G targets very closely up to 10 kHz.
(c) We measured each channel of the 7.4 loudspeaker system separately at the listening location rather than so we understand how they contribute to what is seen at the ear drum. It also gives us the ability to weight each channel separately for further research. It turns out that our system is so well calibrated that the in-room frequency response of each channel is almost identical at the primary listening sea (also in my PPT presentation). The arrival times are also near identical. Below 100 Hz we use 4 x bass-managed subs calibrated to minimize response variations from seat to seat using destructive interference to cancel odd-order length and width room modes. So destructive interference between channels is minimal between channels. Moreover, the amount of interference among channels while listening to music will totally depend on the phase relationships of the different channels and how correlated. Our ears are relatively insensitive to phase effects at higher frequencies. Otherwise, we would go nuts listening to reflections in rooms.

2. You generally can't compare preference ratings of a headphone or speaker across two listening tests unless you are comparing the same stimuli under the same test conditions and/or you have some common anchors in both tests to minimize scaling biases. Therefore, these tests produce relative ratings -- not absolute scores.

In the 2012 paper we compared LCD2 to 5 different models of circumaural headphones (Audeze, AKG, Bose, Beats, V-Moda). In this 2013 paper we are comparing an LCD-2 unequalized to 4-5 different equalized target responses including free-field, diffuse-field variations, and 2 different in-room targets. These are much different contexts, with different listeners, with a different range of sound qualities. Therefore, you would not expect the exact ratings for the same headphone due to the differences in context and test conditions.

Hope that answers your questions.

JMS · Jul 20, 2013 at 10:12 PM

Thanks for the helpful response. This answers my questions. Looking forward to the PPT.

With regards to (2), I agree that without anchors and identical conditions the scores aren't directly comparable. Still, in your loudspeaker paper "Regression Model for Predicting
Loudspeaker Preference", results from 13 separate tests were included. It was noted that it was not ideal to compare across tests, but there was still enough signal to derive a highly predictive model.

JMS · Jul 23, 2013 at 11:26 AM

[size=small]Couple more questions for [/size]Tonmeister:

[size=small]You mentioned that channels were measured independently for the paper because comb filtering in the upper frequencies are ignored by our ears. However, that wouldn't apply to the 1-2khz stereo crosstalk dip, which accounts for an audible difference between loudspeaker and headphone listening. I would thus think RR1_G sounds "forward" like music normally is when played back on headphones. Have you considered accounting for this dip in the target curve?[/size]
[size=small] If we aren't trying to reproduce stereo's recessed in-room sound anyway, what if instead, we approximated the sound of two loudspeakers pointed directly at the ears from a distance away, i.e. the HRTF's at +/- 90 degree azimuth? The motivation is that it'd approximate the sound of having speakers strapped to the head. The AKG K1000, which I haven't heard myself, looked to approximate this effect to good reviews. If you can comment on how the +/-90 graphs would look in your measurement setup, that would be great.[/size]

[size=small] [/size]
[size=small] Thanks again![/size]

NA Blur · Aug 20, 2013 at 12:25 PM

Most listeners interpret louder bass and louder volumes as better sounding, but rarely supply objective measurements as to why. If we were all informed at an early age that neutral was best then over time we would expect neutral to sound better than something with more bass.

It really depends on what group you fall in be it more analytical which is where a lot of audiophiles want to be so they know what they are listening to is how the artist / mix really sounds. Adding color may make music more enjoyable ( a good thing ), but it may be adding something the artists of producer did not want in the track to begin with.

This test should be done with sound engineers too to show that a neutral preference may also be achieved.

Interesting nonetheless.

Latest Thread Images

AES2013: Listener Preferences for Different Headphone Target Response Curves

JMS

100+ Head-Fier

JMS

100+ Head-Fier

Tonmeister2008

New Head-Fier

JMS

100+ Head-Fier

JMS

100+ Head-Fier

NA Blur

Headphoneus Supremus

Users who are viewing this thread