Blind testing headphones
May 6, 2012 at 2:59 PM Thread Starter Post #1 of 16

anetode

Headphoneus Supremus
Joined
Oct 8, 2008
Posts
2,050
Likes
312
Headphones are unique components in that they give away their identity even after the listener is blindfolded. Unlike level-matched equipment and speakers hidden behind acoustically transparent drapery, it's fairly easy to tell what headphone you're wearing if you are at all familiar with how it looks or feels. Even the most comfortable headphones, like the HD800, betray a specific contact area.

After coming back to this thought experiment a couple of times, I have some rather extreme "solutions" to try to set up a rig which would mask most differences. The first certainty is that it isn't going to be comfortable, in fact it is the specific comfort of each headphone that will have to be masked from tactile perception. The easiest, least elegant solution would be a sort of skin-tight hood which would apply equal pressure to areas around the scalp, coupled with an adjustable weight to mount on top of the headphones. While this isn't a setup that any tester would tolerate for more than a few minutes, it compensates for a couple of important factors: the clamping force between headphones will be equalized, as will their total mass and balance, due to the center of gravity being shifted upwards.

Further, something must be done about the most delicate sensory organ - the ears, with their plethora of nerve endings, cartilaginous flexibility and so on. This is where I propose to borrow a page from tattoo artists and use a mix of topical anesthetics (lidocaine/prilocaine eutectic mixture liquid at body heat) to isolate the epidermis while not affecting the deeper mechanisms of the ear canal (cilia). The temporary numbness will be aided by the sensation of the same weight pressing down on the top of the skull from the weighted phones and hood.

So there's our tester, wearing something that resembles a medieval torture device, blindfolded and partially anesthetized. Admittedly it is an absurd solution but it could be done and would offer some otherwise unobtainable test results. Even then, the value of any results would be compromised by the possibility of interference with the seal/fit of the headphone as well as any psychological component introduced by the not-quite-pleasant physical sensation of numbness and head crushing.

Maybe there's a less insane way to go about this, and if so, I would love to hear some suggestions.
 
May 6, 2012 at 7:13 PM Post #2 of 16
OK, there is another way.
 
You make an artificial head (and ears) and record a reference piece through different earphones at the position that corresponds to the eardrum (probably not too difficult with a electret capsule, they're about the right size) and then compare the recordings.
 
Easy.
 
w
 
May 6, 2012 at 7:39 PM Post #3 of 16
Yes, there are a number of great dummy-head setups for comparing response. What I was hoping for was to allow for an observer to preform an ABX trial sort of in situ.
 
May 6, 2012 at 8:24 PM Post #4 of 16
The observer ABXs the recordings. The purpose of ABXing is to determine whether there is an audible difference. If there's an audible difference in the recordings then the difference is demonstrated.
 
It's true that a recording process might mask differences that are audible, but this is unlikely, a similar process has revealed differences between, for example, DACs. Certainly, indistinguishable recordings of different earphones would have to be made before an argument could begin.
 
It all depends what you are trying to demonstrate. Unlike in the case of amplifiers, few people would  expect different earphones to sound exactly identical because of their different physical configurations, and I've never seen anyone claiming that they do.
 
w
 
May 6, 2012 at 8:32 PM Post #5 of 16
Normally, headphones are not blind tested because they are pretty much capable of producing discernible differences, unless you say you've switched cables on one with some uber-expensive silver cable. I'm doubtful about the latter case as well.
 
May 7, 2012 at 5:45 AM Post #6 of 16
Admittedly the ABX would have narrow implications, it would be most useful for closely related headphones, like the Senn 558/598 or the Stax 202/404 or 404/507. The goal there would be to determine whether the upcharge is justified by any difference in sound. Also, headphones with similar frequency response curves could be compared to determine whether close measurements result in greater difficulty to discern between two different kinds of headphone (e.g. closed/open).

Then there's the whole pure blind testing portion, which would help determine what sound signature an individual prefers. It would be interesting to see how brand fanaticism fares without visual queues. Say comparing the sexy Ultrasone Edition 8 to the plastastic Monoprice 8323.
 
May 11, 2012 at 12:56 AM Post #7 of 16
This kind of testing *is* performed with speakers, I know by at least Harman, if not others. The questions asked are a bit different than you would see with a cable ABX, instead of asking "does zipcord sound different than Monster Cable" they're more interested in "which speaker design do listeners preference" - they take a couple speakers and let people blindly compare them. They then compare those results to the measured/objective data for the speakers, and draw conclusions.

There's actually a very recent publication that came out of this methodology:
http://seanolive.blogspot.com/2012/05/more-evidence-that-kids-even-japanese.html

But there's also some older Toole presentations from AES where he mentions the same thing. But what generally comes out of these tests is that listeners tend to preference "flat and accurate speakers with good on-axis response" - Toole actually had a neat "dog bone" graph that he showed at AES a few years ago (basically "any speaker/room in this range is "ideal")) along those lines. Of course individual preferences are variable, but there are trends that we can certainly note.

Of course, with speakers vs headphones you've got this big gorilla in the room: headphones bring their own acoustics but have to deal with HRTF, speakers bring their own resonances and have to live with the acoustics of where they're installed. With the later, you can entirely adjust and modify the environment to change the acoustics. You can go all the way to completely destroying all echoes and outside noise (it's expensive, but it can be done). You can also modify the environment to enhance the characteristics of a given speaker. Thus far though, I'm not aware of any head-fier who has undergone plastic surgery to improve the fidelity of their cans (changing their HRTF), but I know there's lots of modding done to the cans themselves to deal with resonance.

I know that Ultrasone published that (absurdly hard to understand) paper a number of years ago that was talking about listener preference and S-LOGIC (Tyll talks about it in his Edition 10 review: http://www.innerfidelity.com/content/ultrasone-edition-10). I'm not aware of anything else (off hand) that deals with listener preference and headphone design; I would assume "flat and accurate" is probably on the table, and based on what a lot of people regard as "good" or "best" headphones, I think that would be easily supported. But it's hard to say. There's also a whole lot of people who like headphones that measure inherently non-linear or introduce "weird" distortions (usually resonance) and I'm somewhat hesitant to write them all off as "misguided."
 
May 12, 2012 at 3:22 PM Post #8 of 16
I'm a fan of Olive's blog and Harman's speaker turntable was definitely an inspiration for this thread.

You bring up an interesting point in that many people might not prefer flat and accurate in headphones. It's been my experience that a large contingent of head-fi users find headphones that measure flat on the standard HRTF compensation curves sound too bright to them. Which is why it would be interesting to find out if there's a specific "hi-fi" sound signature that may be preferable. Off the top of my head I think it would be +3db below 100hz and a -6db between 1khz & 10khz compared to, say flat on innerfidelity's reference.
 
May 12, 2012 at 3:43 PM Post #9 of 16
I'm a fan of Olive's blog and Harman's speaker turntable was definitely an inspiration for this thread.
You bring up an interesting point in that many people might not prefer flat and accurate in headphones. It's been my experience that a large contingent of head-fi users find headphones that measure flat on the standard HRTF compensation curves sound too bright to them. Which is why it would be interesting to find out if there's a specific "hi-fi" sound signature that may be preferable. Off the top of my head I think it would be +3db below 100hz and a -6db between 1khz & 10khz compared to, say flat on innerfidelity's reference.


If I remember right, Tyll actually alluded to this in one review. I think the "correct" rendition of "flat and accurate" is along the lines of the ESP/950 or K701 (flat from 20-1k and then progressively rolled off to 10k and over a cliff by 20k). However, based on the impressions from Head-Fi, these two are not universally loved. I'm a bit hesitant to say that whatever bias has that completely obscured this "flat and accurate" sound (and there are other models that measure similar enough, like the SR-009, HD 650, etc - still not universally accepted). Would be super-cool if someone measured the R10 and L3000 as frames of reference - they get a lot of love and I'm curious if they don't look similar to the 009/950/701/etc but with different resonances. My suspicion is there's something to the resonance bit as well as the ideal FR. Just like with speakers - you can have super flat speakers but if the acoustics of the room muck it up, it's no good. However I don't think it's as simple as "replicate ideal loopback."
 
May 15, 2012 at 10:43 PM Post #10 of 16
The best way to ABX-compare headphones blinded is to capture headphones' transfer function using HATS, and binaurally reproduce the captured sample with a calibrated headphone. (Harman utilizes sample principle for a binaural-room-scanning) This should effectively eliminate all of error-inducing mediating variables. If you just swap headphones physically like Tyll did with his break-in self blind test, the result could become quite misleading: considering the role ear cushions playing in headphone acoustics, you can never be sure whether the discernible sonic difference is derived from either the broke-in driver or the collapsed cushions.
 
May 16, 2012 at 9:24 AM Post #11 of 16
Quote:
The best way to ABX-compare headphones blinded is to capture headphones' transfer function using HATS, and binaurally reproduce the captured sample with a calibrated headphone.

 
This method is still not quite perfect. First, the HATS has different HRTFs than the listener, and even the relative response of two headphones can be different on a dummy vs. a real head. Although if the only difference between the headphones is the drivers (e.g. DT880 250 vs. 600 ohms), then this is less of an issue. If the reproduction has a significantly non-flat frequency response compared to actually wearing the headphones, then that can also alter the perception of any differences. Finally, the captured sound is still susceptible to differences due to positioning on the HATS (unless the sound is recorded many times to avoid a consistent bias) and sample (e.g. ear pad) variation.
 
May 10, 2013 at 7:34 PM Post #12 of 16
Quote:
I'm a fan of Olive's blog and Harman's speaker turntable was definitely an inspiration for this thread.

You bring up an interesting point in that many people might not prefer flat and accurate in headphones. It's been my experience that a large contingent of head-fi users find headphones that measure flat on the standard HRTF compensation curves sound too bright to them. Which is why it would be interesting to find out if there's a specific "hi-fi" sound signature that may be preferable. Off the top of my head I think it would be +3db below 100hz and a -6db between 1khz & 10khz compared to, say flat on innerfidelity's reference.

 
We recently looked into listener preferences for different headphones equalized to different target responses. The diffuse and free-field options were indeed perceived by listeners as sounding  too thin and bright. The most preferred headphone target response was based on binaural measurements of a loudspeaker calibrated in our reference listening room, which is neither diffuse nor free-field, and has a certain amount of low frequency room gain.
 
 
See  http://www.aes.org/events/134/papers/?ID=3474
 
P10-3 Listener Preferences for Different Headphone Target Response CurvesSean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International - Northridge, CA, USA; Elisabeth McMullin, Harman International - Northridge, CA USA
There is little consensus among headphone manufacturers on the preferred headphone target frequency response required to produce optimal sound quality for reproduction of stereo recordings. To explore this topic further we conducted two double-blind listening tests in which trained listeners rated their preferences for eight different headphone target frequency responses reproduced using two different models of headphones. The target curves included the diffuse-field and free-field curves in ISO 11904-2, a modified diffuse-field target recommended by Lorho, the unequalized headphone, and a new target response based on acoustical measurements of a calibrated loudspeaker system in a listening room. For both headphones the new target based on an in-room loudspeaker response was the most preferred target response curve. 
Convention Paper 8867

 
May 10, 2013 at 8:41 PM Post #13 of 16
The problem with any blind ABX is there is no way to eliminate random sample variation.  So whatever difference the participants either hear or quantify-ably measure... how do they rule out the effects of sample variation over a production population?
 
May 11, 2013 at 8:15 AM Post #14 of 16
I don't see how this is a problem as long as the headphones tested have hugely different frequency response curves compared to sample variation differences, which I'd argue is the case with most headphones.
 
But nothing stops you from doing a blind test with different samples (for example old vs. new) from the same model. That would actually be more of a proper blind test than using different headphones.
 
You can even go one step further and use just one headphone with different equalization curves, which seems to be roughly what Mr. Olive did.
 
May 11, 2013 at 8:18 AM Post #15 of 16
There's a difference between a double-blind test and an ABX test.  A double-blind test is defined as a test method where neither the administrators nor the test subjects are aware of the choices or control group.  The ABX test is a kind of double-blind test, but not the only kind. It may be that in this case ABX testing isn't quite what we're after anyway.  An ABX test is designed to identify the presence of a difference between two samples, not to evaluate preferences.  
 
The earlier Olive/Welti paper from October 2012 (AES 8744), "The Relationship between Perception and Measurement of Headphone Sound Quality" included the results of double-blind (but not ABX) testing of headphones.  The goal was to collect data as to preference in several categories, obviously sound quality, but also including things like comfort.
 
There's little question that there are rather obvious differences between just about any two headphones, and most are significant enough to easily identify, so applying an ABX test seems a little unnecessary.  However, identifying an average listener preference, now that's interesting!
 
That paper is available through the AES.  And, of course, we're all wondering exactly which headphone was "HP-1"! 
 
Unfortunately the latest Olive, Welti & McMullen paper (it's actually 8866, not 8867) is not available on the AES site yet.  Hopefully soon, though.
 

Users who are viewing this thread

Back
Top