Dr. Hans W. Gierlich On The Future of Audio Quality Testing: More Than Just Frequency Response
May 8, 2023 at 12:39 PM Post #61 of 71
@jude - one of your best videos. I agree with many concepts that the Doc. is addressing in their processes. Really can't wait for something like a TWS xMEMSx that has been ran through their system to give excellent results through DSP.
 
May 8, 2023 at 2:21 PM Post #62 of 71
Thank you @Mr.Jacob for the interesting research and your effort to bring that experience also to our headphones community, and to @jude for bringing this to our attention. :) After lots of bla bla in my previous posts, at last I found the time to watch at least your linked presentations. I have several questions.

Appreciate it. 🙏 And credit goes to Magnus and the rest of the team working under Dr. Gierlich for the excellent research spanning many years.
I am happy to answer thoughtful and genuine questions like yours as best as I can.


1. I maybe can guess the reason behind smaller distortion coefficient in the simplified model here. My questions is, how can inexperienced listeners that don't fully understand a subjectively vague concept of distortion and make a rating about it, especially when the volume is low or the distortion can manifest as an oddness in FR (assuming it is audible)? Does it even make sense for headphones?

Your link doesn't pin down a timestamp, but I'm assuming you are referring to the equation shown at ~9m40s? That goes back to early on in our research that showed the initial trends and relationships between the 3 dimensions and overall quality. The low distortion coefficient goes back to people generally being accepting of some distortion without grossly affecting their overall perception of the audio quality. The linear regression works to an extent, however when you think of evaluating audio quality, you can also see why it doesn't work.

As an example, take the perfect headphone in every way, except it has terrible distortion. In that case, no one would rate the overall quality as high. And the linear regression model fails.

And to be clear, we DON'T use that equation in the final version of MDAQS.

Either way, the participants in the auditory tests were primed about what each of the categories represented, so they could comfortably make their judgements (to the best of their ability). And as I recall, all audio samples were scaled to the same loudness, so the effect of level is taken out of their judgement and the audio systems we're playing back at "normal" operating conditions.

2. Have you tried it also with experienced critical listeners and did you have similar subjective results, especially those that can relate the subjective experience to objective facts? Or did they just become the part of the same data to train your DL network? Was there a study to evaluate them separately? I personally, am not the same person I was from only a few years ago and what I would rate positively back than might be a hard fail today.
Yes! In fact, we did have a panel of critical listeners in our study at one point.
And I'm told that they were extremely hesitant to use the 3 dimensions. They wanted many more dimensions to properly evaluate the audio quality.
However, once we strong-armed them into doing the listening test, their scores actually came out very similar to our panel of untrained listeners. They generally had the same ranking and preferences.

We made the decision to continue our auditory tests using untrained listeners, because one of our stated goals was to create an algorithm that would match a general consumer, as opposed to someone trained or specialized in specific fields of audio.

Lastly, your comment about how you have changed over the last few years is something to be aware of with any instrumental assessment models (whether for speech, echo or audio quality): they are a snapshot in time of the auditory test participants' preferences and, when done well, capture the zeitgeist well. However, years/decades from now, it's likely to be less accurate.

3. Is the intention to make an industry standardization? Can we, in the future, expect different manufacturers making "from the factory" evaluation and rating of their, for example, headphones? For the automotive industry this is of course a different story, as standardization in that industry is a must.
Good question!
Short answer: yes.
Slightly longer answer:
I've often used the automotive industry as an example because they are so standardized. It would be nice to shop for a vehicle that states top speed, (e)mpg, weight, 0-60, etc. AND audio quality, so you have some knowledge going into the test drive or shopping experience. Right now, car stereo evaluations are hard judge.

In terms of headphones, much like the frequency response provides good insight and value into the audio playback performance, it would be wonderful to see audio quality scores associated with those.

However, all standardization is a democratic (and frankly can be very "political") process that can take years! So we decided to release MDAQS as is, and see how the market responds.

We obviously think there is value to the scores (3 dimensions + Overall Quality), and can help clarify certain industry claims.


4. There is a lot of resistance against quantified data (even if it has a foot in the subjective feedback) in this community. You can see traces of it also in this thread, full of hope that your work will prove that frequency response is meaningless. Unfortunately, many vendors, actively or passively, go along with those claims not to spread negative or unwanted (I use the term snake oil for it) information on their products. Do you think the community can be convinced without the honest efforts from the vendors? My guess is HeadFi is a small community in a much much larger causal HP listeners around the world, so we don't really matter much. :p

Ha! The feedback from this community is appreciated! Even if it is an order of magnitude (or three orders) smaller than the general HP world..

And we're not shy about mentioning the merits of a frequency response. It's immensely useful to the Audio Product designers, and as consumers, we're getting better at interpreting them. So we're not here to discredit a frequency response measurement, but we are here to say that there is more to Audio Quality than what can be deduced from a frequency response plot. And more data, especially if presented clearly, can be helpful, and perhaps lead to more nuance and an informed perspective on the product audio performance.


Lastly my opinion, I value the effort to bring this quantitative approach to our headphone community and hope that it will also honestly be supported by the vendors. I personally am not too convinced about a single number evaluation rating as I value, for example, timbre over everything else, and would prefer to look at individual ratings, maybe in addition to the frequency response, but still a valuable contribution that supports us on the way to become more informed consumers.

Thanks!

Thanks @DarginMahkum.
I think we all wish we could hear things with our own ears before we buy, but since we can't, we have to rely on the reviews and measurements available to us. And MDAQS is another (cool) tool in our toolbelt. I see MDAQS as a great complement - or addition to - frequency response measurements, and hope the various industries/communities will be open to it.
 
May 24, 2023 at 9:10 AM Post #63 of 71
I find quite a lot of the comments (and questions/points asked in the video) hard to understand. For example, I can’t recall ever seeing specs for HPs, DACs, etc., which only had a FR graph. Don’t they usually also provide measurements of THD, impedance, dynamic range, self noise, sensitivity and/or other measurements? If so, why are so many saying ONLY freq response graphs?

The other thing I can’t get my head around is the apparent confusion between audio performance vs human perception. A DAC, amp or HPs for example obviously don’t have ear drums or anything beyond them, they do not have any human perception. If we’re measuring the performance of say a DAC, then surely we want to know the performance of that DAC, not the performance of a human listener’s perception, which is obviously an entirely different question?

Using the car analogy again; we typically see a quoted specification for its 0-100kph performance. Just like with a FR specification, it tells us quite a lot about the overall performance of the car but certainly not everything. Also just like a FR spec, it is an absolute objective measurement. A perceptual measurement of 0-100kph would be at least somewhat meaningless or just plain wrong for many consumers. Take for example a car with a 0-100kph time of 7 seconds. My mother is used to cars with a 0-100kph time of around 11 secs, which are never driven flat out, probably around 14secs is the most she’s used to, so she’d perceive a car doing it in 7secs as horrifically fast (ask me how I know, lol). Someone used to a fast car would perceive 7sec to just be normal and someone used to sports motorbikes or supercars would probably perceive it as painfully slow. When it comes to cars, pretty much all of us know and accept the difference between this objective absolute measurement and different people’s perception of it and that a measurement based on people’s perception would be vague and very possibly useless to us personally. If we want to know the 0-100kph performance a car then that’s what we measure, not the brain/perception of someone driving it.

We intuitively know this about products, not only cars but for some reason this doesn’t seem to be the case with most audiophiles and audio reproduction equipment, why is this? My guess is years/decades of audiophile marketing deliberately trying to conflate equipment performance and human perception.
again....the goal of the audiophile is to recreate the illusion of the "original" recording/space....not to enhance or degrade it.
Why would that be “the goal of the audiophile” and how could it even be attempted?

The vast majority of music recordings do not have “an original recording space”, they were recorded in different spaces, recorded “dry” and/or generated at least partially electronically (therefore with no recording space) and then a variety of reverbs/delays/other effects applied to create a subjectively pleasing/enhanced sense of space. Even in those cases where there was a single “original recording/space” (most classical music recordings for example) it’s virtually always recorded with multiple mics in diverse locations and then mixed to enhance the original space (as opposed to the single location that would be experienced by an audience member/listener).

As there currently isn’t anyway to satisfactorily undo any of this, isn’t your stated “goal of the audiophile” both impossible and undesirable? Isn’t this just another example of the result of years/decades of misleading marketing?
Science priests are not gonna like this...
Why would science “priests” not like an idea devised by a doctor of science and entirely based on the sciences of psychoacoustics and AI? My guess is the exact opposite, that they’ll like it infinitely more than the majority of audiophile ideas, which are mostly based on marketing that is contrary to the science/facts! This assumes it is actually based on science of course and not just marketing presented as science.

G
 
May 25, 2023 at 9:10 AM Post #64 of 71
again....the goal of the audiophile is to recreate the illusion of the "original" recording/space....not to enhance or degrade it.
Way back when, at least when I got into this hobby that was the target.
Why would that be “the goal of the audiophile” and how could it even be attempted?
Originally the ideal was a stereo recording without mixing or effects and then the holy grail for playback was a stereo system which then reproduced that recording as life like as possible. We've gotten pretty close.
The vast majority of music recordings do not have “an original recording space”, they were recorded in different spaces, recorded “dry” and/or generated at least partially electronically (therefore with no recording space) and then a variety of reverbs/delays/other effects applied to create a subjectively pleasing/enhanced sense of space. Even in those cases where there was a single “original recording/space” (most classical music recordings for example) it’s virtually always recorded with multiple mics in diverse locations and then mixed to enhance the original space (as opposed to the single location that would be experienced by an audience member/listener).........
Yes.. Times have changed and music/sound has been so manipulated that the art of creating music has indeed taken on a another appearance all together...and yes how does one playback that music accordingly, especially with headphones?
As there currently isn’t anyway to satisfactorily undo any of this, isn’t your stated “goal of the audiophile” both impossible and undesirable? Isn’t this just another example of the result of years/decades of misleading marketing?
I myself was composing and producing "Electro-acoustic music", literally total sound creation and manipulation with no reference to any recording space for almost 40 years but my stereo setup still makes reference to simple acoustic stereo recordings as a starting point to be properly set-up...otherwise what is a reference except what subjectively sounds good and musically fulfilling to the individual?
So the question remains how do we objectively make measurements that can supply more aural information than just FR to better understand the psychoacoustics on the user of a given piece of equipment?

Also..99.9% of music lovers are not audiophiles...hopefully all audiophiles are music lovers, but sadly not always.
 
Last edited:
May 25, 2023 at 1:37 PM Post #65 of 71
Originally the ideal was a stereo recording without mixing or effects and then the holy grail for playback was a stereo system which then reproduced that recording as life like as possible. We've gotten pretty close.
But there was never a time when there was just a stereo recording without mixing or effects, so I still can’t see how that could have been “the ideal”. Sure, there were a few recordings done that way but an exceedingly tiny percentage. Multi-Mic’ing techniques for large classical music ensembles were developed in the 1950’s, as were multi-tracking and layering in pop/rock. By the mid 1960’s bands were multi-tracking, recording in drum booths, the live room, the stairwell and using plates, springs and analogue reverbs, there was no real acoustic space, just a mix of different acoustic spaces and acoustic/analogue effects.
Times have changed and music/sound has been so manipulated that the art of creating music has indeed taken on a another appearance all together...and yes how does one playback that music accordingly, especially with headphones?
Times did change, particularly in the late 1970’s with digital reverbs, then in the 1980’s with MIDI and samplers and then of course the computer sequencers and DAWS through to the 2000’s. So music has become far more manipulated in recent decades but it was already manipulated enough in the 1960’s to invalidate the concept of “the ideal” you described.
what is a reference except what subjectively sounds good and musically fulfilling to the individual?
Yes, that’s what it comes down to, although not exactly what sounds good to one individual but to several individuals, the engineers, the producer, musicians, etc.
… how does one playback that music accordingly, especially with headphones?
That’s a good question which science has been studying for many years. The answer is effectively a personalised HRTF + head tracking + good convolution reverb. Unfortunately, it’s not a practical solution for the average listener currently, although companies such as Apple are currently throwing a lot of resources at the issue.
So the question remains how do we objectively make measurements that can supply more aural information than just FR to better understand the psychoacoustics on the user of a given piece of equipment?
Again, it’s not “just FR” is it? Aren’t distortion, self noise and other measurements also usually published? And also again, if we want to know the performance of a piece of equipment don’t we want to measure the performance of that piece of equipment, rather than the psychoacoustics (perception) of a user of that equipment? EG. Isn’t the 0-100kph time of a car more useful than an average of how fast say my mother and a supercar driver feels/perceives it to be?

G
 
Last edited:
May 31, 2023 at 11:36 AM Post #66 of 71
Using the car analogy again; we typically see a quoted specification for its 0-100kph performance. Just like with a FR specification, it tells us quite a lot about the overall performance of the car but certainly not everything. Also just like a FR spec, it is an absolute objective measurement. A perceptual measurement of 0-100kph would be at least somewhat meaningless or just plain wrong for many consumers. Take for example a car with a 0-100kph time of 7 seconds. My mother is used to cars with a 0-100kph time of around 11 secs, which are never driven flat out, probably around 14secs is the most she’s used to, so she’d perceive a car doing it in 7secs as horrifically fast (ask me how I know, lol). Someone used to a fast car would perceive 7sec to just be normal and someone used to sports motorbikes or supercars would probably perceive it as painfully slow. When it comes to cars, pretty much all of us know and accept the difference between this objective absolute measurement and different people’s perception of it and that a measurement based on people’s perception would be vague and very possibly useless to us personally. If we want to know the 0-100kph performance a car then that’s what we measure, not the brain/perception of someone driving it.

I don't think your analogy here is exactly spot on.
We're not trying to replace the 0-100kph measurement with a perceptual model.

We're trying to come up with a model that helps explain how the vehicle is to drive. The feeling you get behind the steering wheel.
0-100kph is one (rather important) parameter that tells us something about the vehicle performance. But it's not the only thing performance related, and certainly doesn't tell us everything about the feeling of driving the car.

Same goes for headphones. There are many objective measurements that audio engineers and headphone designers rely on (and sometimes publish for us to digest). You mention many of them. The question is what those objective measurements mean in terms of perceived quality. That can be difficult to extrapolate and balance. Especially for the general consumer.

Absolutely the objective measurements need to be stated, but we think it would be nice with a handy indicator telling us how the average listener would rate this product.
 
Jun 1, 2023 at 2:37 PM Post #67 of 71
We're trying to come up with a model that helps explain how the vehicle is to drive. The feeling you get behind the steering wheel.
0-100kph is one (rather important) parameter that tells us something about the vehicle performance. But it's not the only thing performance related, and certainly doesn't tell us everything about the feeling of driving the car.
Yes, I agree entirely. And another difference, many consumers would have a reasonable idea what a 0-100kph time of say 8 seconds would feel like, while relatively few wouldn’t have any idea what a frequency response graph (plus noise, distortion and other specs) would feel like with headphones. Therefore:
Absolutely the objective measurements need to be stated, but we think it would be nice with a handy indicator telling us how the average listener would rate this product.
Again, yes, for the vast majority of consumers “it would be nice”. I’m not disagreeing with your concept but I do have concerns. There’s a lot of parameters/variables involved, in addition to quite a lot of perception variability between individuals. So, you’d need quite a decent sample size and even then you would have an “indicator” that only applies to a percentage of consumers. As well as a lot of work on HRTFs, the two main examples of objectively measuring an aural perception are probably Harman’s target curve and the ITU/ATSC/EBU loudness perception (ITU BS 1770). The Harman curve took many years and revisions and is not applicable/entirely correct for a substantial percentage of consumers. Loudness perception/normalisation also took many years, although it’s quite accurate for the vast majority of consumers but it is only one parameter!

Up until 6 months ago or so, I would have said your goal is unachievable. It would simply take decades with so many parameters and tests/test subjects required to get useful results. Your use of current AI models is what I find intriguing though, it could in theory drastically decrease the years/decades that would otherwise make your concept impractical. Whether you can actually create a metric that is quite widely applicable for most/many consumers I have no idea, but I will be interested in what you come up with.

G
 
Last edited:
Jun 3, 2023 at 6:59 PM Post #68 of 71
I think the most refreshing thing about this is the reminder that, while there are many measurements that engineers, designers and listeners may rely on, the question remains what those measurements mean in terms of perceived quality. And it is a question. Who knows! It might even lead to a situation where can once again talk about what we hear as well as what we measure!!
 
Jun 4, 2023 at 9:18 AM Post #69 of 71
the question remains what those measurements mean in terms of perceived quality. And it is a question. Who knows! It might even lead to a situation where can once again talk about what we hear as well as what we measure!!
Don’t reviewers and most others here already “talk about what we [they] hear”? Unfortunately, that raises other questions though, for example what are we/they actually hearing vs what are we/they perceiving, as well as questions of personal preference. For example some seem to prefer a very wide soundstage, while others deliberately try to narrow the soundstage (with crossfeed), not to mention that two different people might perceive exactly the same soundstage differently. It all becomes very personal/individual when we talk about audio performance in terms of what we “hear” (hear/perceive/prefer) and it’s almost always going to be at least somewhat different to what we personally hear, perceive and prefer. A measurement on the other hand doesn’t present any of these issues, it’s the same whoever is listening to it.

G
 
Jun 4, 2023 at 12:32 PM Post #70 of 71
Don’t reviewers and most others here already “talk about what we [they] hear”? Unfortunately, that raises other questions though, for example what are we/they actually hearing vs what are we/they perceiving, as well as questions of personal preference. For example some seem to prefer a very wide soundstage, while others deliberately try to narrow the soundstage (with crossfeed), not to mention that two different people might perceive exactly the same soundstage differently. It all becomes very personal/individual when we talk about audio performance in terms of what we “hear” (hear/perceive/prefer) and it’s almost always going to be at least somewhat different to what we personally hear, perceive and prefer. A measurement on the other hand doesn’t present any of these issues, it’s the same whoever is listening to it.

G
I think this point system is not really for those that already know what they are looking for. We are small minority (those that know what they are looking for and can assess objectively a particular device using various data) in another minority (HP hobbyists) which is in another minority (HP users) which is in another group of average music consumers. I guess the HP hobbyists are maybe just a mere 2-3% of total HP users. Even if a HP would have these data on the package out of factory, I wouldn't use it to make a decision for my next buy, as I know what I am looking for. For the rest, I think it might be a useful indication, so that they can make a more informed buy.
 
Jun 4, 2023 at 6:40 PM Post #71 of 71
Don’t reviewers and most others here already “talk about what we [they] hear”? Unfortunately, that raises other questions though, for example what are we/they actually hearing vs what are we/they perceiving, as well as questions of personal preference. For example some seem to prefer a very wide soundstage, while others deliberately try to narrow the soundstage (with crossfeed), not to mention that two different people might perceive exactly the same soundstage differently. It all becomes very personal/individual when we talk about audio performance in terms of what we “hear” (hear/perceive/prefer) and it’s almost always going to be at least somewhat different to what we personally hear, perceive and prefer. A measurement on the other hand doesn’t present any of these issues, it’s the same whoever is listening to it.

G
The point, I think, is to talk about what we measure and what we hear.
 
Last edited:

Users who are viewing this thread

Back
Top