The State of the Flagships
Dec 22, 2013 at 2:38 PM Thread Starter Post #1 of 138

anetode

Headphoneus Supremus
Joined
Oct 8, 2008
Posts
2,050
Likes
312
SanjWatsuki recently completed a survey of flagship headphone metrics replete with a grading system.
 
It's definitely a good idea to demand higher standards from flagship headphone manufacturers, however I think that the survey could use some fine tuning.
 
I'll go through a quick take of each of his criteria:
 
Let’s define what a flagship headphone should achieve.
  • Bass linearity of +-5 dB from 20hz to 100hz.
    • Bass linearity is difficult to achieve, but it should be done for a flagship.
    • Even reference-grade dynamics tend to have a bass roll off. If the roll-off occurs below 40hz then even -10db could be acceptable. Foster dynamics (Ol' Denon DX000 & THXXX) go for the opposite mentality of adding a bass boost. There should be a criterion which determines the maximum extent of such a boost in relation to the midrange (200hz+). If the bass bleeds over too much then the effect can be pretty nasty.
  • 100dB distortion should not exceed 0.8% beyond the sub-bass frequencies.
    • No flagship should have any distortion that comes close to audibility.
    • OK, arguable threshold through the midrange, less so for treble.
  • 100dB distortion should not exceed 1% at 30hz.
    • No flagship should have a messy bass due to distortion.
    • A bit harsh on dynamics, maybe 3%?
  • Frequency response curve should be very smooth with any resonances being very minor -- no major dips.
    • A smooth frequency response suggests few resonances. Few resonances suggests a well engineered diaphragm and enclosure.
    • Please add a definition of "major". Consider where the peaks/dips occur as well as the decay time of the resonances.
  • Very small to absolutely no dip at 70hz-150hz.
    • Those resonances are caused by an interaction of the headphone cushion and your face.   
    • Large bumps in the frequency response at those ranges suggests a poorly engineered headphone pad or that they didn’t do their testing with the human flesh as a variable.
    • These bumps are usually ~3db, are they that important? If you are accounting for the human flesh variable then comfort should be of paramount importance, even at the cost of such a bump.
  • Air-level treble should be no more than -15dB relative to the mid-range.
    • A flagship should have excellent treble extension. Extreme treble roll-off should only exist on non-kilobuck headphones.
    • Requires greater specificity and accounting for treble peaks (& the associated listening fatigue)
  • No "wiggle" in the impedance graph.
    • Wiggle suggests a poorly balanced voice coil in a dynamic headphone.
    • Get that out of here if you’re a flagship.
    • If the wiggle is not accompanied by a problematic resonance or major frequency response wiggle then who cares?
  • Nearly perfect channel balance.
    • If I’m paying over $1000 for a headphone, I expect the channel matching to be as good as it gets.
    • Yes. 1db through the bass & midrange, 3db through treble.
  • The headphone is open or semi-open. --- Arbitrary.
 
Dec 22, 2013 at 2:49 PM Post #2 of 138
Yes. These types of attempts are a way forward.
 
I'd like to see time domain stuff (square wave edges, impulse response, CSD) also.
 
I think there are closed headphones around now that beat a lot of open ones. 
 
Impedance graph wiggles from problems around drivers goes up to 3kHz in my experience.
Sometimes the wiggles come from designers putting pressure on the driver to make it flatter in some regions. IOW, acoustic compensations to control the natural resonances may show up in impedance plots as bumps but actually flatten the audio.
 
Dec 22, 2013 at 3:15 PM Post #3 of 138
+1 very good read. thought it was interesting how SanjWatsuki's "A-class" flagship headphones are all planar magnetics or electrostatics + the HD800.
 
edit: curious where the original was posted.
 
Dec 22, 2013 at 3:25 PM Post #4 of 138
Dec 22, 2013 at 4:16 PM Post #5 of 138
  SanjWatsuki recently completed a survey of flagship headphone metrics replete with a grading system.
 
It's definitely a good idea to demand higher standards from flagship headphone manufacturers, however I think that the survey could use some fine tuning.
 
I'll go through a quick take of each of his criteria:
 
Let’s define what a flagship headphone should achieve.
  • Bass linearity of +-5 dB from 20hz to 100hz.
    • Bass linearity is difficult to achieve, but it should be done for a flagship.
    • Even reference-grade dynamics tend to have a bass roll off. If the roll-off occurs below 40hz then even -10db could be acceptable. Foster dynamics (Ol' Denon DX000 & THXXX) go for the opposite mentality of adding a bass boost. There should be a criterion which determines the maximum extent of such a boost in relation to the midrange (200hz+). If the bass bleeds over too much then the effect can be pretty nasty.
  • 100dB distortion should not exceed 0.8% beyond the sub-bass frequencies.
    • No flagship should have any distortion that comes close to audibility.
    • OK, arguable threshold through the midrange, less so for treble.
  • 100dB distortion should not exceed 1% at 30hz.
    • No flagship should have a messy bass due to distortion.
    • A bit harsh on dynamics, maybe 3%?
  • Frequency response curve should be very smooth with any resonances being very minor -- no major dips.
    • A smooth frequency response suggests few resonances. Few resonances suggests a well engineered diaphragm and enclosure.
    • Please add a definition of "major". Consider where the peaks/dips occur as well as the decay time of the resonances.
  • Very small to absolutely no dip at 70hz-150hz.
    • Those resonances are caused by an interaction of the headphone cushion and your face.   
    • Large bumps in the frequency response at those ranges suggests a poorly engineered headphone pad or that they didn’t do their testing with the human flesh as a variable.
    • These bumps are usually ~3db, are they that important? If you are accounting for the human flesh variable then comfort should be of paramount importance, even at the cost of such a bump.
  • Air-level treble should be no more than -15dB relative to the mid-range.
    • A flagship should have excellent treble extension. Extreme treble roll-off should only exist on non-kilobuck headphones.
    • Requires greater specificity and accounting for treble peaks (& the associated listening fatigue)
  • No "wiggle" in the impedance graph.
    • Wiggle suggests a poorly balanced voice coil in a dynamic headphone.
    • Get that out of here if you’re a flagship.
    • If the wiggle is not accompanied by a problematic resonance or major frequency response wiggle then who cares?
  • Nearly perfect channel balance.
    • If I’m paying over $1000 for a headphone, I expect the channel matching to be as good as it gets.
    • Yes. 1db through the bass & midrange, 3db through treble.
  • The headphone is open or semi-open. --- Arbitrary.

 
It's probably worth going into the intent behind the metrics.
 
My goal was to try and give points for what I believed to be a headphone rising to a technical challenge, rather than an absolute measurement of how good it sounds. Reproducing the lowest frequencies for dynamics is an example of one, and reproducing high frequencies for planar magnetics is another.  You could create a headphone that would score very highly on this system without being a good headphone, for example. It's also worth noting that I came up with the metrics in a few minutes with arbitrary numbers responding to someone on this forum, actually. I just took the concept and ran with it for a few hours and out came this document.
 
This explains a few of the metrics. For example, the headphone cushion one -- if a company isn't willing to put time into the development of the cushion and its effects, then it shouldn't be a flagship. That technical challenge should be solved. Voice coil weirdness, likewise, should be solved. Even if the overall effect is minimal, I would expect that the company putting out the flagship to have given the headphone that much thought. When you're in the kilobuck range, I believe near perfection should be assumed and not giving that would be a massive red flag about possible other engineering failures in the headphone.
 
The criticisms about the numberless metrics are completely valid. A lot of them #4, #5, #7, and #8 really should have some sort of concrete definition. I didn't feel confident enough that I had a model that I could drop into #4, so I didn't. There are so few failures of #5 (i.e. DT48) that I wasn't sure where to put that threshold, I just thought "I'll play that one by ear." #7 likewise should have a concrete definition, as should #8. My biggest fault here is I really don't know where to put that threshold.
 
Frequency response analysis is hard. So much research is still going into it that I really don't feel safe making a strong comparison against any model yet. We have the Lorho modified DF, we have the Olive-Welti curve, there's an analysis for 1995 that proposed another curve, Sennheiser seems to have their own curve that almost all of their headphones match, as does Beyerdynamic, etc, etc. I tried to avoid that problem by trying to analyze the FR as minimally as I could, even though it probably is the most important metric of them all here, for this reason.
 
As for #9, I feel like openness is actually important and not arbitrary. Research is still going on as to the "why" open headphones perform better even with matched frequency response curves, but it's a fact that they score better. If we compare open vs. open and closed vs. closed, it's a bit arbitrary, but I feel there is a technical advantage that may not be possible to fix when compared open to closed. I feel like on a theoretical level you cannot make a closed headphone sound as good as an open headphone with the knowledge we have right now (at the flagship level). 
 
----------------
 
Those criticisms in mind, let's open source the metrics to try and improve them.
 
1. The numbers originally given were arbitrary and I came up with them on the spot. +-5 dB just sounded nice at the time. Is a significant roll-off of the frequencies you feel, rather than hear, acceptable? I still feel like this should be relatively tight because most of the newer dynamic flagships appear to be trying to get great extension into the lowest frequencies -- I can't really think of a really high end dynamic flagship that doesn't shoot for this except for maybe the Grado HP1000. 
 
2. Once again, arbitrary number was chosen on the spot. Let's open source the % and possibly break it up by treble and mid-range. Given more data, I wouldn't mind expanding this to tackle additional orders of distortion, but that data isn't comprehensive enough right now. Most headphones that don't have issues with odd resonances in the mid-range tend to fail this on the upper treble, so a split could make sense.
 
3. I feel like loosening this threshold is a good idea. Hearing a relatively low amount of distortion in those frequencies is incredibly tough -- harder than the rest of the frequency band. A greater separation makes sense.
 
4. It seems like the simplest way to analyze the FR for significant dips/peaks would be to have a reference curve to compare against. I hesitate to do that because of how in flux I think the research on FR curves is right now, but I can't think of a better solution off the top of my head. Anetode has suggested an analysis similar to Rin Choi's.
 
5. I still think this is a good one to keep around. We need to give this metric a concrete number, though. -3dB sounds as fine to me as any.
 
6. Not sure if we should tackle treble peaks in this one -- that sounds more like the domain of #4. I was mostly just trying to give credit for if a headphone managed to give good treble extension. Does anyone have a better concrete metric in mind for measuring treble extension?
 
7. I think this one is a good one to keep around but, like #5, we need to have a concrete measurement. It'd probably have to be by % of the impedance at 1khz or something. We'd also probably have to look at the other graphs to determine if the bump is actually a voice coil issue or if it was intentional or if it is just a housing resonance. Tyll's article has some great info on an introduction to what we're trying to measure.
 
8. I agree that we need to have concrete measurements on this one. Anetode's suggestion of 1dB in bass and mid-range and 3dB in the treble seem like reasonable ballpark figures. I think we need to be a bit more exacting,though. I think we encounter measurement difficulties past 10khz, so we should probably only look up to 10khz. I feel like some of the IF measurements suffer from a poor seal, which also makes those ones hard to compare against. I also feel like this one SHOULD have unit-to-unit variation taken into account, but the data doesn't exist yet for most headphones.
 
9. Someone raised a good point about how a closed headphone shouldn't be penalized for being closed. There are closed headphones which can sound better than open headphones, after all. I can't think an elegant way to credit a closed headphone for that, though, and I don't personally think it exists at the flagship level yet. Does anyone have any ideas on this one?
 
Dec 22, 2013 at 8:35 PM Post #6 of 138
.  
I'd like to see time domain stuff (square wave edges, impulse response, CSD) also.
 
I think there are closed headphones around now that beat a lot of open ones. 
 

These two are exactly what I was thinking when I read the criterias in the googledoc...
 
By the way, just noticed this line in the googledoc slide 14 (Why this metric #9):
1. Crossfeed from one channel is leaking into the other, creating a more natural sound. (open-back)
No, open-back headphones won't magically give you crossfeed..
 
Dec 23, 2013 at 7:38 AM Post #9 of 138
moarrrrr!!!!!!!!!!!!!!!!
 
to me it's not about throwing our gears that didn't get A+ to the bin and it's certainly not a definitive classification. but work like that are dearly needed in the audio world and need to be famous so it can evolve and be refined. but also to say to manufacturers that they will not get away with nicely warped rubbish like they always did (hi ultrasone and beats!).
I'm not too sure about the actual ratings, but he pointed out most of the weak points of each gear and just that is a nice read.
 
whining about how it's impossible to base audio quality on technical values alone won't get us anywhere, so let's try instead to make it one step closer to working. that's called progress and the audio world sure is in need of progress. everyday I browse through photo and audio websites and everyday I'm amazed at how medieval the amateur audio discussions look.
learning anything is super hard and full of misleading intel, we can't even agree on what's neutral!!??!!**!!! and we kind of try to go around it as if it was ok to not even set the most basic references once and for all.
 
-so let's launch a rocket into space, how heavy did you say it was?
-oh I'd say it's on the heavier side of weight. a lot heavier than my car, I've ABed both so I know.
-ok so let's use above average quantity of the new fuel stuff. I heard it's a good combo an the flames get green.
 
this is an amazing world.
 
 
Sanji thank you very much for the effort.
and indeed thank you tyll for bringing measurements and always looking on the bright side of life ^_^.
 
Dec 23, 2013 at 1:39 PM Post #11 of 138
Brilliant! More reviews and comparisons like this with objective measurements compared to subjective "I feel like..." arguments. Kudos to the author!
 
Dec 23, 2013 at 7:27 PM Post #12 of 138
That's an interesting analysis of objective measurements. I enjoyed reading it and it pretty much matches what I heard from the headphones or read from the others I didn't hear (grado and ultrasone usually get bad reviews. The data correlate that.)
 
The HE-5LE would be interesting here. It might fall short of the HE-500 in terms of linearity, but the distortion figures are lower.
This is exactly how I heard them: cleaner and slightly brighter sound than the lush and treble smooth HE-500. Very similar to the HE-6 in some ways, but closer to the HE-500 technically.
 
If not for its linearity, the HE-5LE is as good as the HE-500 and might even get a A vs B+.
 
Dec 24, 2013 at 1:51 AM Post #13 of 138
Btw, shouldn't this thread be moved into the headphone equipment forum?
 
Dec 25, 2013 at 11:38 AM Post #15 of 138
I believe that while this comparison is good spirited, it is not really useful and ends up being really skewed.

For instance, neglecting the THD, which I have always found too inconsistent to be reliable, the hd800 and the t1 would be equally good using your own criteria and THIS data:

http://graphs.headphone.com/graphCompare.php?graphType=-2&graphID[]=2033&graphID[]=4061&scale=30

In fact this graph makes the hd800 to appear "peakier". It doesnt looks like an A vs D case any more!


Your idea is good, but if the measurements are all over the place you cant really make any conclusions. In addition, I find the criteria to be insufficient (where is a REAL test for linearity other than the THD?) and the quantification of the attributes rather simplistic (basically a binary system!).

I hope this, otherwise nice, contribution is taken as a proof of concept thing and no as a real-world usable conclusion. Sustaining harsh and definitive comments as the ones the author makes regarding the HP companies with this document is rather silly, even if they may be true.


PS dont any of you dare bringing those waterfall plots into the hd800's aid, the author himself discredited them in his own article due to the inconsistent measurements :wink:
 

Users who are viewing this thread

Back
Top