Headphone Measurements: The New Standard, Part 1
Sep 2, 2020 at 3:06 PM Post #47 of 88
Add to what has been written above, our individual brain and consciousness is an interpretation of reality.

When it comes down to evaluation, hearing is an art and not subject to a true absolute, and is a reason music as art is a human characteristic.

Somewhat related to that, this illustration might help.
Human Hearing system.PNG

It appears to me that what the 5128 is trying to do is to improve the physics of measuring. That means a redesign of the pinnae/concha/ear canal, as well as creating a new acoustic coupler that better matches the human ear impedances etc.. That should, in theory, allow us to better measure devices like headphones and capture the signals as if they were captured at the ear drum of your average human being. (that, btw, has always been the goal of this equipment - the 5128 is just the latest attempt at evolution in this domain)

However, once we have measured the signals, we have to interpret them. We have to connect those physical stimuli with the subjective experiences. That's where psychoacoustics come into play. The straight forward example of course is that two equal amplitude tones - one at 100Hz and one at 4000Hz - are not perceived the same way. And things become way more complicated when you take into account more complex stimuli like speech and music and the temporal aspects of the human hearing sensation.

Now add a sprinkle/spoonful/truckload of psychology and you end up with the final subjective experience.

Anyway, @jude, where is Part 2!? :grin:
 
Sep 2, 2020 at 7:28 PM Post #48 of 88
Somewhat related to that, this illustration might help.

It appears to me that what the 5128 is trying to do is to improve the physics of measuring. That means a redesign of the pinnae/concha/ear canal, as well as creating a new acoustic coupler that better matches the human ear impedances etc.. That should, in theory, allow us to better measure devices like headphones and capture the signals as if they were captured at the ear drum of your average human being. (that, btw, has always been the goal of this equipment - the 5128 is just the latest attempt at evolution in this domain)

However, once we have measured the signals, we have to interpret them. We have to connect those physical stimuli with the subjective experiences. That's where psychoacoustics come into play. The straight forward example of course is that two equal amplitude tones - one at 100Hz and one at 4000Hz - are not perceived the same way. And things become way more complicated when you take into account more complex stimuli like speech and music and the temporal aspects of the human hearing sensation.

Now add a sprinkle/spoonful/truckload of psychology and you end up with the final subjective experience.

Anyway, @jude, where is Part 2!? :grin:
Mr. Jacob, accurate measurement is crucial to understanding. Even the chaotic world of quantum mechanics could only be reached after careful measurement and its interpretation coupled with imagination.

I appreciate the work you are doing and the excellent way your present it, thank you.
 
Last edited:
Sep 10, 2020 at 12:39 PM Post #49 of 88
Headphone Measurements: The New Standard, Part 1

By Jude Mansilla
Head-Fi.org​

There is a new, substantially more human-like standard for measuring headphones. And as I will show you shortly, this new standard is a very important step toward more realistic, more meaningful headphone measurements. When it comes to headphone measurements, one of the biggest challenges we've seen in our community is correlation between subjective and objective analysis. I will show you, then, examples of how this more accurate hearing simulator will help close the gap between the headphones as we hear them and the headphone measurements we look at.

At the ALMA* annual conference in 2018, I saw one of Brüel & Kjær's earliest public presentations of the research and development behind their new High-frequency Head and Torso Simulator (HATS) Type 5128. I was sitting among many acoustical engineers in the audience who were bowled over by what Vince Rey (from Brüel & Kjær) had just shown us, which was the answer to the following question: How would you propose improving upon the 40-year-old standard of human hearing simulation to meet measurement needs that have evolved beyond that standard over the past several decades (and that will continue to evolve)?

The answer, simply put, was start over from scratch. And to do that meant finally establishing an entirely new standard that more accurately simulates human hearing across the full audio range. As we'll see later, accomplishing this was fantastically complex, taking over ten years, and involving technologies and techniques that were not available 40 years ago.


* ALMA is the Association of Loudspeaker Manufacturing & Acoustics, which is now known as Audio & Loudspeaker Technologies International (ALTI).



NOTE: You can click on any of the images below to see them full size.

HeadphoneMeasurements_ANewStandard_PartI_01c1.jpg
Fig.1: Brüel & Kjær High-frequency Head and Torso Simulator (HATS) Type 5128.


When measuring headphones – whether around-ear/on-ear (AE/OE) or in-ear (IE) – we use a human hearing simulator, often simply called an "ear simulator." The typical ear simulators used in our industry are based on an international standard called IEC 60318-4. Because the IEC 60318-4 standard used to be called IEC 711 (and then IEC 60711), ear simulators based on this standard are often nicknamed simply "711 simulators," "711 couplers," or even just "711."

What many do not know is that the 711 standard is, again, 40 years old. What many also don't know is that these 711 ear simulators only simulate human hearing from 100 Hz to 10 kHz [1]. (As we'll see later, even that specified range of simulation was subject to improvement.). Below 100 Hz and above 10 kHz you can use a 711 ear simulator as an acoustic coupler, but you are not simulating average human response in those ranges outside the standard. (See Fig.2 below.)

Human-Hearing-Range-and-711-standard.png
Fig.2: Human hearing simulation range of IEC 60711/60318-4 ("711") compared to the full audio range.


40 years ago, that range was perfectly acceptable, as the focus for this type of measurement then was testing hearing aids and telecom devices. However, in the decades since the establishment of 711, the need to obtain realistic measurements across the full audio range (20 Hz to 20 kHz) has increased. This need has been particularly apparent in our segment of the audio industry, in which consumers have high expectations when it comes to audio quality from premium headphones – very high expectations.

In the last four decades, what has also evolved are the technologies and techniques available to characterize human hearing over the full audio range. While I'll discuss in some detail how Brüel & Kjær was able to characterize human hearing over the entire audio band to eventually develop the 5128, their methods were succinctly summarized by them in a video titled "Evolution of Hearing Simulation – Part 2." If you haven't already watched that video, please do so now, as the information in it will come in handy as we move further in this discussion.




The Brüel & Kjær 5128 arrived at Head-Fi HQ this past winter, and we started measuring with it immediately, to get to know this new measurement fixture and the results from it, and to try to determine what an entirely new, more human-like standard of hearing simulation represents. Even very early on, key findings and observations started to take shape:

  • The 5128 represents a very important step toward closing the gap between the measurements we look at and the headphones as we hear them.

  • While I expected most of the differences to be above 8 kHz, with many headphones I also found measurement differences (between the old standard and the 5128) frequently occurred throughout the audio band, including all the way down to the bottom of our hearing range. (We will explore the reasons why later.)

  • The 5128 forced me to reconsider many of the hundreds of measurements we had done here at Head-Fi HQ since 2015. (I'll be showing examples in our discussion.)

  • Measuring with the 5128 led me to a hypothesis that may have a role in explaining at least some of the differences between the Harman AE/OE (Around-Ear/On-Ear) Target and the Harman IE (In-Ear) Target. (More on this later, too.)

When it comes to correlation between subjective and objective analysis, one example of a headphone I've found to sound quite different from its 711 measurements (found online) is the Westone W60 universal-fit in-ear monitor. (In each earpiece the W60 uses six balanced armature drivers with a 3-way passive crossover.) Here are the 711 measurements of the Westone W60 I found online:


Westone-W60-FR-711-01.jpg
Fig.4: Westone W60 frequency response measurement using a 711 ear simulator, example 1 of 4
Westone-W60-FR-711-02.jpg
Fig.5: Westone W60 frequency response measurement using a 711 ear simulator, example 2 of 4
Westone-W60-FR-711-03.jpg
Fig.6: Westone W60 frequency response measurement using a 711 ear simulator, example 3 of 4
Westone-W60-FR-711-04.jpg
Fig.7: Westone W60 frequency response measurement using a 711 ear simulator, example 4 of 4


Here are the Westone W60 "711" frequency response measurements from Figs. 4 to 7 shown together:


Westone-W60---four-711s.jpg
Fig.8: Westone W60 "711" frequency response measurements from Figs. 4 to 7 shown together (normalized at 1 kHz)


If you had not heard the Westone W60 before seeing these measurements, what you see (in each of the measurements in Figs. 4 to 7 above) would suggest that above 5 kHz, the W60's tonal balance is largely characterized by the 10 to 18 decibel peak centered at 8 kHz to 10 kHz, as well as the peak's effects through its lower and higher peripheries. This is due to a resonance that actually is part of the 711 standard.

Is that resonance, however, part of 711's human hearing simulation range? No, it is not. Because these resonant peaks will often shift down and appear within 711's specified range of simulated human hearing when measuring headphones [2], I can understand how this might be a bit confusing. Let us examine this, then, by looking at some key parts of the 711 standard (emphasis by me):


Above 10 kHz, the device does not simulate a human ear, but can be used as an acoustic coupler at additional frequencies up to 16 kHz. Below 100 Hz, the device has not been verified to simulate a human ear but can be used as an acoustic coupler at additional frequencies down to 20 Hz [1].​


Again, it might appear that the resonance in Figs.4 to 8 – because it appears at or below 10 kHz in these four measurements – is within the simulation range. However, let's look at another part of the standard:


The length of the principal cavity shall be such as to produce a half-wavelength resonance of the sound pressure at (13.5 ± 1.5) kHz [1].​


Measured at the 711 simulator's reference plane, this resonance is outside the 711 standard's human simulation range, its purpose being to help specify the physical geometry of the principal cavity (the primary volume). It is very important to note a couple of things about this part of the specification, though:

  • Depending on the headphone being measured (and how it's coupled to the 711 ear simulator), this resonance can shift down to frequencies well within 711's specified human simulation range.

  • The IEC 60318-4 (711) standard does not specify the magnitude of this peak, which may be why you see it at four different levels in the four different measurements in Figs. 4 to 8, which can add to measurement uncertainty.

So, while that resonance is not part of a 711 coupler's human simulation, it nevertheless largely defines much of the treble region of the Westone W60 measurements in Figs. 4 to 8. I feel very confident that of those who have heard the Westone W60, most would agree those previous measurements do not correlate well with subjective impressions. And the W60 is just one of many examples of headphones for which I could not previously reconcile what I was hearing with the measurements I was seeing.

Now let's look at measurements of the Westone W60 using the Brüel & Kjær 5128. Though we measured with two different ear tips (stock silicone and stock foam tips), the silicone tips are what comes installed on the W60 out of the box.


Westone-W60-FR-5128.jpg
Fig.9: Head-Fi-measured Westone W60 frequency response using the Brüel & Kjær Type 5128 HATS


Here (below) is the Westone W60 measurement from the Brüel & Kjær 5128 compared to the previously shown "711" W60 measurements:


Westone-W60---5128-(silicone-tips)-compared-to-four-711s-FINAL.jpg
Fig.10: Head-Fi-measured Westone W60 frequency response using the Brüel & Kjær Type 5128 HATS compared to the previously shown "711" W60 measurements.


Following is a comparison of the W60 measured on the 5128 compared to the shaded-in measurement range of the 711 measurements shown together (below):


Westone-W60---5128-(silicone-tips)-compared-to-four-711s-RANGE-FINAL.jpg
Fig.11 Head-Fi-measured Westone W60 frequency response using the Brüel & Kjær Type 5128 HATS compared to the measurement range from the four previously shown "711" W60 measurements.


Notice also that differences between the 711 and the 5128 measurements of the W60 are not limited to the treble range but appear throughout most of the audio band. Compared to the 5128, all of the 711 measurements in Figs. 4 to 8 show more bass, with two of the four showing what could reasonably be called substantially higher bass levels versus the 5128. Again, if you've listened to the W60, I think there's a very strong chance the measurements of it from the 5128 would correlate much more closely with what you heard versus any of the 711 measurements shown.

We will discuss in Part 2 what causes these noteworthy measurement differences, and we will also look at research that is completely separate from the work behind the 5128 that helps corroborate its findings.

The Brüel & Kjær 5128 was developed to finally simulate average adult human hearing across the full audio range (20 Hz to 20 kHz) for the first time, based on research and development using more modern technologies and techniques developed in the past four decades, and taking 12 years to complete. Some of the names you'll see behind the 5128 research are also names you'll see in the references of the IEC 60318-4 standard. Brüel & Kjær was also instrumental in establishing the 711 standard, having contributed to its development, and being the first with a commercially available IEC 711 simulator with the Brüel & Kjær Type 4157 (which is still commonly in use today).

While, again, we'll discuss that research and development in greater detail in another post, the Brüel & Kjær High-frequency Head and Torso Simulator (HATS) Type 5128 (and the Type 4620 ear simulators within it) involves very important developments versus all previous such standards. To summarize those changes simply for now:

  • The anatomical ear canal used in the 5128 is based on average human geometry, from the canal entrance all the way to the eardrum. This allows for more human-like response-shaping, resonances, and damping characteristics [3] [6]. This canal even includes a more realistic smooth soft-to-hard transition (simulating the transition to the bonier condition nearer the eardrum) to help reproduce the correct damping; and an angled coupler attachment to simulate a human's slanted eardrum [3] [4].
    • In contrast, the 711 ear simulator's canal geometry is commonly realized as a 7.5mm × 22mm metal tube, with a half-inch microphone terminating one end of that tube, perpendicular to the tube's axis.
  • The 5128's middle ear simulator uses more precise, thorough wideband impedance modelling for more detailed, more accurate characterization of the complex acoustical loading of the human ear [5] [6].
    • Compared to the 711 simulator's two-branch coupler, the 5128 uses a far more complex four-branch eardrum for more accurate, more detailed, more human-like frequency, resonance, and damping simulation [7].

    • The 5128 uses a newly developed prepolarized microphone with a diaphragm that better simulates the dimensions of the human eardrum [7].
      • The microphone's diaphragm is also at the front of the microphone/coupler assembly. This allows the diaphragm to terminate the canal at an incline, simulating the slant of the tympanic membrane in relation to the canal, as in a real human ear [3] [4].

A new era of headphone measurement arrived with the advent of the Brüel & Kjær 5128. While the 711 standard will no doubt continue with the inertia of a 40-year-old industry standard for the foreseeable future, engineers and enthusiasts will increasingly seek out the new standard for more representative, more meaningful absolute measurements of headphones.

We will continue the discussion of the Brüel & Kjær 5128 in Part 2 of this series soon, including a closer look at the associated research and development, as well as, of course, more measurements. We will also look at separate, corroborating research, discuss measurement observations with the 5128 and a corresponding hypothesis that may help explain some of the key differences between the Harman AE/OE Target and the Harman IE Target.


References

[1] IEC 60318-4 Ed. 1.0 (2010) Simulators of human head and ear – Part 4: Occluded-ear simulator for the measurement of earphones coupled to the ear by means of ear inserts (International Electrotechnical Commission).

[2] Wille, M. (2017). High Resolution Ear Simulator. GRAS Sound & Vibration white paper.

[3] Darkner, S., Sommer, S., Baandrup, A. O., Thomsen, C., & Jønsson, S. An Average of the Human Ear Canal: Recovering Acoustic Properties with Shape Analysis. Cornell Univ. Libr., ArXiv e-prints (2018).

[4] Staab, W. April 2013. Tympanic Membrane – Anatomical Influence on Hearing Aid Fittings. Hearing Health & Technology Matters. https://hearinghealthmatters.org/wa...anatomical-influence-on-hearing-aid-fittings/

[5] S. Jønsson, A. Schuhmacher, H. Ingerslev Jørgensen, Wideband impedance measurement techniques in small complex cavities such as ear simulators and the human ear canal, ArXiv e-prints (2018).

[6] S. Jønsson, A. Schuhmacher, H. Ingerslev Jørgensen, Wideband impedance measurement in the human ear canal; In vivo study on 32 subjects, ArXiv e-prints (2018).

[7] Brüel & Kjær, Design of the new Bruel & Kjaer High Frequency Head and Torso Simulator (HATS) type 5128, presented by Vince Rey at the 2018 ALMA International Symposium & Expo (AISE).

Graphs are irrelevant... where is soundstage? Timbre? Tone? space between the notes, in a graph?
 
Sep 11, 2020 at 9:21 AM Post #51 of 88
Great now all I need is minidsp 2 or rubber ear add-ons for IEMs.
 
Oct 1, 2020 at 5:17 PM Post #52 of 88
I wouldnt characterize the inner ear as Psychoacoustics. The inner ear is transfering vibration into the electrical signals.


Somewhat related to that, this illustration might help.
Human Hearing system.PNG


However, once we have measured the signals, we have to interpret them. We have to connect those physical stimuli with the subjective experiences. That's where psychoacoustics come into play. The straight forward example of course is that two equal amplitude tones - one at 100Hz and one at 4000Hz - are not perceived the same way. And things become way more complicated when you take into account more complex stimuli like speech and music and the temporal aspects of the human hearing sensation.

Now add a sprinkle/spoonful/truckload of psychology and you end up with the final subjective experience.

Anyway, @jude, where is Part 2!? :grin:


Human_Hearing_system.PNG
 
Oct 2, 2020 at 12:39 AM Post #53 of 88
Somewhat related to that, this illustration might help.
Human Hearing system.PNG
It appears to me that what the 5128 is trying to do is to improve the physics of measuring. That means a redesign of the pinnae/concha/ear canal, as well as creating a new acoustic coupler that better matches the human ear impedances etc.. That should, in theory, allow us to better measure devices like headphones and capture the signals as if they were captured at the ear drum of your average human being. (that, btw, has always been the goal of this equipment - the 5128 is just the latest attempt at evolution in this domain)

However, once we have measured the signals, we have to interpret them. We have to connect those physical stimuli with the subjective experiences. That's where psychoacoustics come into play. The straight forward example of course is that two equal amplitude tones - one at 100Hz and one at 4000Hz - are not perceived the same way. And things become way more complicated when you take into account more complex stimuli like speech and music and the temporal aspects of the human hearing sensation.

Now add a sprinkle/spoonful/truckload of psychology and you end up with the final subjective experience.

Anyway, @jude, where is Part 2!? :grin:
@Mr.Jacob This is neuroacoustics. The cochlea packs hair cells tuned to different frequencies, which transmit to the brain through the auditory nerve bundle. Psychoacoustics actually occur in the brain.
 
Oct 15, 2020 at 11:33 AM Post #54 of 88
Headphone Measurements: The New Standard, Part 1

This is very interesting and quite an accomplishment.

Of course, the limitations of the 711 coupler are known and well documented. Alternatives with a more accurate HF performance have existed for some time (the G.R.A.S. 43BA-2 is over 5 years old).

Of course, one does NEED to know when one sees instrument ghosts, caused by issues in the instrumentation and mentally "dial them out" and/or explain them properly if publishing such measurements. If this is not done (in an honest and transparent way) then the measurements published will give wrong impressions.

The issue with standards for testing is that they are out of date the moment they are printed (or manufactured), BUT they are universal and thus ensure that the results of tests done fairly and correctly are comparable.

Setting a new standard is always fraught with problems. Machiavelli commented:

“It ought to be remembered that there is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things. Because the innovator has for enemies all those who have done well under the old conditions, and lukewarm defenders in those who may do well under the new.”

Those who have paid for their 711 coupler and a 43BA-2 will resist investing again a lot of money into new tools that may be superseded more or less quickly.

It will be interesting to see many headphones and IEM's will look re-measured on the new system AND especially how consistent repeated measurements are on this system - every measurement tool I have at hand right now for Headphones and IEM's measure quite different after each removal and re-placing the item being tested even with the greatest care taken.

A toolset that gives much better consistency while having some known and documented problems in the measured response would be worth more to me than a toolset that simply avoids showing some imaginary peaks and incorrect LF response that I know about AND that is used by everyone else as well.

Thor
 
Oct 18, 2020 at 6:13 PM Post #55 of 88
It will be interesting to see [...] how consistent repeated measurements are on this system - every measurement tool I have at hand right now for Headphones and IEM's measure quite different after each removal and re-placing the item being tested even with the greatest care taken.

A toolset that gives much better consistency [...] would be worth more to me [...].
As I understand it, the improved accuracy of the B&K 5128 is a very interesting innovation.

Let us consider an excellent measurement tool (whether it is the 5128 or some future system).
So, for any fixed headphone placement, the measurements are precise and accurate.

When the consistency of measurements is poor after each removal and re-placing then I do not see what would be wrong with it.
This would then simply reflect the reality that placing the headphones a few millimeters top/down/left/right leads to a different sound.

But maybe I understood something wrong, in which case I look forward to your feedback.
 
Oct 19, 2020 at 6:07 AM Post #56 of 88
Hello,

As I understand it, the improved accuracy of the B&K 5128 is a very interesting innovation.

On this we both agree. But is this achievement relevant and meaningful to those who develop, review and test hearables? At first blush, maybe.

We need to be clear that a system like a HATS has multiple user groups.

Actually, what is "HATS"?
HATS (Head and Torso Simulator) - applicable standards
Head IEC60959, ANSI S3,36-1985
Ear pinna IEC60268-7 / JEITA RC-8140

It is a standardised system for acoustic measurement in consideration of reflection and diffraction of sound by the human head and torso.

Main uses:
Sound quality/acoustic field evaluation
Hearing sensation/research in architectural acoustics
Various research
Sound pressure measurement of Headphones/Earphones
Evaluation of noise-canceling headphone

The classic user for a HATS are acoustic scientists researching the interaction of sound, room and the "average human" (#1) and acousticians evaluating acoustics. Their need is accuracy but as they normally do not place hearables on the HATS to test them, their "accuracy" is different from that of other groups, who do exactly that.

Due to the lack of more appropriate tests systems further groups may use HATS for different purposes.

One group are the developers of hearables. They need to know how a device tests objectively. They may tune the device by ear, with focus groups etc. or just draw a line on chart and say "we want this response", but they eventually need to test, to make sure there are no gross problems. This group will usually NOT use the measurement as a specific qualification of the "goodness" of the device being tested, but to assure that desired design goals have been met. This group is aware of limitations and problems in the test system and will know how to compensate for them. To this group it matters little if their test setup shown false spikes, resonances and needs a range of adjustments to make the results meaningful as long as these issues are documented and known.This group needs most consistency and repeatability. Repeating the same process should give identical results within reasonable margins of error. I fall into this group and currently non of the systems at hand and that I have had a chance to test achieve anything I'd call reasonable consistency. I can use my experience to "see past" these differences but they are annoying and troubling.

Another group are those who review & test hearables and share their results with the general public. This group would ideally slap a pair of hearables on the test fixture press a button and get a graph that shows how this specific hearable deviates from the platonically ideal hearable's (#2) and can wax lyrically how the graph confirms his listening impressions of "muddy and overblown bass", "piercing treble knifes" etc. Then customers would be able to choose (presumably) the device measuring most close to the platonically ideal hearable; which in reality no doubt some customers will find too bright, some customers too dull, some customers as having too much bass and some customers as having too little bass, while some customers will feel it's ok as is and other customers won't care as long as it plays their favourite music. For this reviewer group the most important factor is "portability" of measurements, meaning if item is measured by reviewer A, B, X, Y &Z measurements should match very closely between each other even across different test systems. Seeing the various tests even of those using the exactly same setup from the same manufacturer, this currently also is not achieved even remotely.

Let us consider an excellent measurement tool (whether it is the 5128 or some future system).
So, for any fixed headphone placement, the measurements are precise and accurate.

When the consistency of measurements is poor after each removal and re-placing then I do not see what would be wrong with it.
This would then simply reflect the reality that placing the headphones a few millimeters top/down/left/right leads to a different sound.

Well, this where the rub lies. I place headphones on my head, take them off, replace them, they sound rather the same to me, despite no doubt some placement inaccuracy, maybe more than I would do on a HATS where I am very careful. Yet it is quite easy to tell the different headphones apart sonically. So how can we claim a given test device is "accurate" or "more accurate than X/Y/Z" if minor position changes make huge differences in what is measured, BUT NOT what is heard? It seems then there are aspects to which this "accuracy" fails to extend.

I have since read the thread about the 5128 at that other site I rarely go to because of the gross cargo cult science there. From this it seems this HATS has the same issues regarding placement as others, which is not too surprising. But I feel that this makes this (or any) HATS a less than ideal tool to measure hearables, plus, there is no need to have a simulated head and torso for testing hearables, can't deny it looks cool and very "scientific" though.

Honestly, personally I'll probably prefer a IEC60318 desktop coupler (plus 711 & 43B.A. to cover the different ranges) over any HATS as it has much better repeatability (still could be better) but a range of known resonances etc., which make a raw graph meaningless and require one to apply suitable corrections, often using manual "pen damping". But I try to find out if devices work as intended and I am aware of the limitations and issues of this set-up, so it does not trouble me the same way the inconsistency of HATS does.

If instead I was publishing reviews I might find the need to radically alter the graphs to go from what is measured to what should be expected both bothersome and questionable - if my readers know each graph I publish has been manually corrected - will they believe me that they are honest and accurate representations? Or will they believe a graph from HATS that is known to be accurate even if no 3 measurements of the same hearable match on that device? Ok, so we take 10 or 20 measurements on the HATS and average - you do it have fun!

While the 5128 HATS can measure with high accuracy, it is hard if not impossible to get the same exact result twice, when testing hearables. So if Tester X and Tester Y test the exact same set of hearables on the same 5128 HATS and rest of system, I'd wager they get results that are different, possibly dramatically so even if they take great care. And then the question becomes which of the two different results is the "correct one" and who needs to try harder at his "measurement gong fu". In my experience that result comes down to a popularity/shouting contest, which is hardly scientific. To me this means that the apparent accuracy of the 5128 is an illusion, in the actual context of testing hearables, no matter how accurate the HATS is in other applications and no matter how accurate the human hearing is simulated. The problem here is also not the 5128 itself, but rather the misappropriation of a tool with a clearly defined purpose for a different purpose without updating the tool for this purpose to assure consistency.

A good comparison is measuring loudspeakers. Measuring loudspeakers is quite a challenge already. But with fairly basic and inexpensive tools and a little experience we get excellent consistency and repeatability of measured results done the same way (1m distance, tweeter axis, calibrated Omni microphone, anechoic or windowed pseudo-anechoic) even between testers using very different microphones, software, room setups etc.; This suggests that we have an appropriate test method for a very limited data set which further has very limited significance to the actual objective and subjective performance of said loudspeaker in your own living room. But AT LEAST we all get something very similar, for better and worse. One could now work onwards from there to refine the match between the anechoic measurements and the in room objective and subjective results using extensions of the the same methodology which has been done by Harman's Floyd Toole and some others but is still very much in infancy.

So, perhaps for Hearable testing B&K could develop a new system based on the accurate acoustic system of the 5128, but combined with a test fixture that has a specific placement of hearables that is repeatable and achievable by anyone who follows the instructions diligently as additional design goal. This would combine accuracy with consistency/repeatability and make sure everyone measuring the same item gets a reasonably close AND accurate result. Somehow I feel this would answer the purpose of "Hearables Test System" much better than the use of the 5128 HATS. Then this in turn may allow the community to make progress reconciling what we measure and what we hear and to make better products for consumers as a result, instead of arguing who's curve is more right.

Thor

#1 - The "normal" or "average" is a null set nothing and nobody ever belongs to. It is an abstract idea of something that should somehow fit most in a non-null set. If the "normal" is based on a very small sample size from few tests done by interested parties they may even be far from what would suit the majority of people well enough. For example, G.R.A.S claims KEMAR is preferable to other HATS because of the very large sample size and thus statistical significance underpinning their Manakin.

#2 - The problem with ideals is that in the real world they do not actually match any individual, but an abstract "average" based on more or less diligent research of often unknown significance (statistically) and thus may or may not be suited to a given individual, see also #1
 
Oct 20, 2020 at 5:34 PM Post #59 of 88
Thank you for your extensive reply.
I place headphones on my head, take them off, replace them, they sound rather the same to me, despite no doubt some placement inaccuracy, maybe more than I would do on a HATS where I am very careful. Yet it is quite easy to tell the different headphones apart sonically. So how can we claim a given test device is "accurate" or "more accurate than X/Y/Z" if minor position changes make huge differences in what is measured, BUT NOT what is heard? It seems then there are aspects to which this "accuracy" fails to extend.
I only recently started to look into audio measurements.
If the statement in the above quote is correct (and I have at the moment no reason to doubt it) then this is a very fundamental problem:
Small variations in placement change the measurements significantly but do not lead to audible differences for the listener.
This would lead to the question: Why make measurements of headphones at all?
Even if a protocol to place the headphone is part of the standard (as suggested in the post above) then I wonder what would be the point of it.
This would be a standardised single instance of a spectrum of possible measurements for a headphone.
I do not see how this can be useful in assessing or comparing headphones.

Some weeks ago I thought that the fundamental question in comparing headphones is as follows:
Given two reasonably decent headphones (i.e. no extreme frequency response across the audible range and sufficiently low distortion), is it possible to equalize one headphone so that it sounds like the other headphone?
(Background for this question: I was inclined to think that a Stax SR-009s sounds fundamentally different to a, say, Sennheiser HD 800S. So they would not differ only by frequency response, but the 009s sounds 'somehow' much more detailed.)

But now @Thorsten Loesch 's post makes me wonder why to have headphone measurements at all.
 
Oct 21, 2020 at 1:06 AM Post #60 of 88
Thank you for your extensive reply.

I only recently started to look into audio measurements.
If the statement in the above quote is correct (and I have at the moment no reason to doubt it) then this is a very fundamental problem:
Small variations in placement change the measurements significantly but do not lead to audible differences for the listener.
This would lead to the question: Why make measurements of headphones at all?
Even if a protocol to place the headphone is part of the standard (as suggested in the post above) then I wonder what would be the point of it.
This would be a standardised single instance of a spectrum of possible measurements for a headphone.
I do not see how this can be useful in assessing or comparing headphones.

Some weeks ago I thought that the fundamental question in comparing headphones is as follows:
Given two reasonably decent headphones (i.e. no extreme frequency response across the audible range and sufficiently low distortion), is it possible to equalize one headphone so that it sounds like the other headphone?
(Background for this question: I was inclined to think that a Stax SR-009s sounds fundamentally different to a, say, Sennheiser HD 800S. So they would not differ only by frequency response, but the 009s sounds 'somehow' much more detailed.)

But now @Thorsten Loesch 's post makes me wonder why to have headphone measurements at all.

You aren't really posing a dilemma - the solution to measurement variances is to repeat the measurements multiple times and take an AVERAGE of the data, something which you have no doubt done in high school science experiments.

Once you've build up a data base of such measurements, then you use that as a reference to how your own personal preferences aligns with the data. This is the part which you separate the objective data with subjective personal preferences.

For example, I know I don't like the sound signature of the HD800 - to me the HD800 bass is lacking impact and sounds hollow, I know what the measurement of it looks like since it is probably one of the most measured headphones out there at the moment with a very flat frequency response, so in the future I stay away from headphones (or at least place them in a low priority to demo/purchase) that have measurements which is similar to the HD800, because my personal preference for sound is not of a flat frequency response, but one with a slightly tilted bass response. On a personal level, this is how you home in to your own personal preference, using headphone measurements as a guide during the process.

This part is actually not rocket science, and why headphone measurements are meaningful. A person don't have infinite time and money to demo every headphone out there on the market, so some sort of tool to be used as a guide for comparison is very much needed, and because using a human's subjective review is not a reliable source of reference as one person's "right amount of bass" could be another person's "too much bass" due to subjectivity.
 
Last edited:

Users who are viewing this thread

Back
Top