ANC Is More Complicated Than It Sounds: Advanced ANC Headphone Measurements
Jun 12, 2020 at 8:19 PM Post #31 of 77
Jacob and Jude,

Thanks for the video, that was surely fun and educating! @jude, maybe you could talk to Jacob about using loudness metric when aligning 2 headphone measurements (I haven’t forgotten our conversation in Tokyo some time back ;-p )

Here's an idea. Instead of making sarcastic comments, why not explain your position with reasoned argument?

I think Jacob did a fantastic job at conveying the reason for existence and gist of various psycho-acoustic metrics used in present study. He did for instance mention anc overall noise reduction looking at just a steady state random (pink) noise excitation is far from sufficient to characterize perceived real world performance from a customer standpoint.

At the end of the day, you‘re still relying on an FFT analyzer to process the data and output various metrics but I totally appreciate the approach being used: we’ve got all these measurement processes existing for testing telecom equipment like cell phones, why not use that for characterizing headphones ANC performance.

It totally makes sense to me that this provides a more useful set of metrics than basic overall passive / active attenuation for pink noise... (where a basic fft + time averaging functionality suffices...).

That‘s my read on it fwiw,

cheers,
arnaud
 
Jun 12, 2020 at 8:20 PM Post #32 of 77
can I try?

the video going through my brain didn't come out with the message that FFT analysis was wrong. just like the soup anecdote didn't try to say that temperature measurements are bad or wrong. I'm sure a few audiophiles will love to hear it that way, but it's not what they are saying.
@Mr.Jacob only argued that an FFT analysis wasn't always enough to predict perceived noises and intelligibility, or whatever subjective aspect of perception a given approach was focused on modeling. which isn't much of a surprise given all the peculiarities/flaws of human perception or how complicated human auditory masking really is once we get beyond a simple 2 tones signal.
in the end he's only making a case that they needed to rely more on psychoacoustics. nothing audacious. so I think @csglinux might have obsessed over FFT from hearing it mentioned so many times. sorry buddy but I'm not with you on that one. I kind of get what you're trying to say, plus one anecdote doesn't show how well FFT analysis could(or couldn't) be used to correlate with perception at a statistical level. but he had a specific question about the discrepancies between subjective impressions and measurements of attenuation, so it's natural that he would focus on those and present a probably rarer case where the impressions of attenuation and the measured attenuation completely switch sides.



ps: I remember @Mr.Jacob from a RMAF video 4 or 5 years ago when I was just starting to get interested in headphone/IEM measurements. it was just the right amount of vulgarization for me at the time and I loved it.
 
Jun 13, 2020 at 2:21 AM Post #33 of 77
Amazing talk! Thanks so much for uploading it.

Things I don't quite get - how does the surround system that reproduces real world environmental noises record the original sound? It sounds like the ideal test bench would be full sound field synthesis - which is incredibly hard to re-produce and record. It's not binaural but it's replicating the original sound field and then reproducing it. I'm skeptical that a simple array of floor loud speakers, in a single plane, in an anachoeic or quasi-anechocic room would be able to reproduce the full X,Y,Z co-ordinates of sound. I also didn't see a subwoofer there - how are the low frequencies re-produced?

I see any surround system, say a theater 7.1.4 system to be able to re-produce those sounds. I myself (for my own testing) do a similar thing playing background noise on a 7.1.4 theater setup that has an f3 of 6hz. It sounds convincing - but not when you A vs B to a binural recording you make yourself. That contains all the localisation information - but of course that's easy for us with our brain picking up 2 mic inputs from our ears.

Of course it was just a brief video not one on the background noise reproduction system - so more details would be very interesting!

What I like about this approach is that there's an objective criteria to weighting a white noise attenuation curve. Some frequencies are more important than others for the perception of noise. Over ears like the WHXM3 and Bose 700 cancel far more 20-30hz noise than something like the Air Pods Pro or WI-1000x, but on my ears the latter - perceptually - cancel more noise. Things like sex and age matter here too - women are more sensitive to certain frequencies than men. It's a complex psycho-acoustic model if you factor in these additional variables.

Fit is important here too - one thing that's rarely taken into account. Sex is important too - women have longer hair typically so seals are worse. When I grew my hair long it ruined the seal for my over ears. A simple way I test for this is binural mics on my own ears with white noise playing on my 7.1.4 setup - I can then objectively test the attenuation on my own head of my over ears.

But we are 4 generations into ANC headphones and it's not actually improving much anymore. A few db here and there is easily forgotten about. I think we're at the state of the art now and waiting for new silicon to increase the higher frequency attenuation. Apple here could change the game - they have the speed in their chips to push it to higher ranges. The APPs actually ANC out higher frequencies with foam ear tips than silicon. See here for proof: (turn on english subs) so the upcoming overears might really change things. The YTer tested passive attenuation with silicon and foam to discount that as the source of improvement. With foam tips Air Pods Pro are by far my best ANC headphone
 
Last edited:
Jun 13, 2020 at 3:44 AM Post #34 of 77
So at 21:40 in the video Jacob talks about 3 Quest which is in measurement based on subjective data. This is one example of the use of psychology and psychoacoustic evaluations to derive data.
I'm not denying any of that, though I think you're slightly stretching the definition of "derive".
What's being done here are measurements (using microphones at the dummy's eardrums), and then subjective post-processing applied to that data to come up with a metric. MOS metrics are somebody's opinion, not fundamental theory.
The impulse response and the FFT are powerful things, they tell us many things about a particular event.

But they do not tell us anything related to any non-linear events presented to the human brain.
Here is an example of what an FFT cant tell you. The FFT tells you that there is comb filtering but it does not tell you why the sound image is so wide. Or where exactly the sound image is. sample
Listen to stereo speakers, and on headphones.
@pfzar - spatial perception simply requires two FFTs, one for each ear, with the phase shifts representing the timing differences our brains need to process spatial perception. This just boils down to what we do with the data - not because there's anything intrinsically wrong with, or missing from, the data itself.

Castle, you are right. I heard FFT getting bashed a lot. Jacob says an FFT analyzer is inherently linear. Sure, but that's not a flaw - that's just a correct, uncolored representation of reality. There's nothing to stop anybody wearing different colored sunglasses, should they choose to do so...

What @arnaud is saying has to be true:
At the end of the day, you‘re still relying on an FFT analyzer to process the data and output various metrics...
I strongly suspect so. Still using an FFT, but doing something subjective to its data to boil an ANC headphone down to one number (!=OASPL) that you can easily put on a scale with a bar graph.

It would be interesting to know if what's being done to obtain this metric is anything more than just a weighting function. Jacob mentions transient effects such as an A/C unit switching on/off, but I would think the best ANC candidate for on or off transient perception would simply be the one that had the largest SPL reduction in the A/C frequencies that the human ear is most sensitive to. Minimizing the perception of that particular sound would logically seem to minimize the perception of its starting and stopping. In other words, even for such transients, it doesn't seem we'd need anything more than a weighting function - and that would be most easily applied in the frequency domain.
 
Last edited:
Jun 13, 2020 at 3:57 AM Post #35 of 77
It would be interesting to know if what's being done to obtain this metric is anything more than just a weighting function. Jacob mentions transient effects such as an A/C unit switching on/off, but I would think the best ANC candidate for on or off transient perception would simply be the one that had the largest SPL reduction in the A/C frequencies that the human ear is most sensitive to. Minimizing the perception of that particular sound would logically seem to minimize the perception of its starting and stopping. In other words, even for such transients, it doesn't seem we'd need anything more than a weighting function - and that would be most easily applied in the frequency domain.
It’s probably worth reading about the standards and I hope I am not wrong but I think these metrics (SIL, STI, SII) do take into account both the amount and time varying characteristics of background noise + target signal.

@Jacob did mention about masking effects and how our brain can adjust for it depending on how transitory it is, hence why a steady state pink noise is not useless but not far. Besides the perception aspect you also need to recall we’re measuring a fundamentally non-linear system (adaptive noise control adjusts itself on the fly with feedback / feedforward techniques) so I suppose this also plays a role in perceived performance for non stationary background like tested here...

arnaud
 
Jun 13, 2020 at 5:21 AM Post #36 of 77
Amazing talk! Thanks so much for uploading it.

Things I don't quite get - how does the surround system that reproduces real world environmental noises record the original sound? It sounds like the ideal test bench would be full sound field synthesis - which is incredibly hard to re-produce and record. It's not binaural but it's replicating the original sound field and then reproducing it. I'm skeptical that a simple array of floor loud speakers, in a single plane, in an anachoeic or quasi-anechocic room would be able to reproduce the full X,Y,Z co-ordinates of sound. I also didn't see a subwoofer there - how are the low frequencies re-produced?

I see any surround system, say a theater 7.1.4 system to be able to re-produce those sounds. I myself (for my own testing) do a similar thing playing background noise on a 7.1.4 theater setup that has an f3 of 6hz. It sounds convincing - but not when you A vs B to a binural recording you make yourself. That contains all the localisation information - but of course that's easy for us with our brain picking up 2 mic inputs from our ears.

Of course it was just a brief video not one on the background noise reproduction system - so more details would be very interesting!

What I like about this approach is that there's an objective criteria to weighting a white noise attenuation curve. Some frequencies are more important than others for the perception of noise. Over ears like the WHXM3 and Bose 700 cancel far more 20-30hz noise than something like the Air Pods Pro or WI-1000x, but on my ears the latter - perceptually - cancel more noise. Things like sex and age matter here too - women are more sensitive to certain frequencies than men. It's a complex psycho-acoustic model if you factor in these additional variables.

Fit is important here too - one thing that's rarely taken into account. Sex is important too - women have longer hair typically so seals are worse. When I grew my hair long it ruined the seal for my over ears. A simple way I test for this is binural mics on my own ears with white noise playing on my 7.1.4 setup - I can then objectively test the attenuation on my own head of my over ears.

But we are 4 generations into ANC headphones and it's not actually improving much anymore. A few db here and there is easily forgotten about. I think we're at the state of the art now and waiting for new silicon to increase the higher frequency attenuation. Apple here could change the game - they have the speed in their chips to push it to higher ranges. The APPs actually ANC out higher frequencies with foam ear tips than silicon. See here for proof: (turn on english subs) so the upcoming overears might really change things. The YTer tested passive attenuation with silicon and foam to discount that as the source of improvement. With foam tips Air Pods Pro are by far my best ANC headphone

I noticed differences in elevation both on the speakers and on the apparatus we see him put on the dummy head. So my very wild guess was that this "crown" had mics at the bottom, and that they used those to capture a sound field, not at the ear, but in the vicinity of a head where the HRTF hasn't yet altered the sound too much. And then later on they'd use that again to calibrate the speakers so that the signal in the vicinity of the head would be similar?
I have no idea if that's right or not. It came to me from seeing one picture, so it's probably not^_^.
 
Jun 13, 2020 at 9:31 AM Post #37 of 77
Here is what it doesn’t tell me. It doesn’t tell me where the sound is perceived. It doesn’t tell my Why one hrtf is better than the other. It doesn’t tell me why people get localization miss match. Etc, Etc.

We can record the HRTF via two Microphones and an FFT which does give me a simple answer of time or arrival and level difference for the ipsilateral and the contralateral given a Single event. But we as humans don’t listen to single event. We are not an FFT analyzer. And the point is just that.



spatial perception simply requires two FFTs, one for each ear, with the phase shifts representing the timing differences our brains need to process spatial perception. This just boils down to what we do with the data - not because there's anything intrinsically wrong with, or missing from, the data itself.
 
Jun 13, 2020 at 9:53 AM Post #38 of 77
The benefit of the information given by Jude and Mr. Jacob is to give and make available more data to form a better understanding.

I am surprised by comments that seek and some demand an absolute judgement; it is an absurd demand for a relative conclusion.

The most non-measurable component and where all data is assembled is our individual consciousness. For most here, music is a common destination for our audio equipment and yet music itself is subjective.

Music theory might explain differences, but it will not ever show a definitive value or an absolute.
 
Last edited:
Jun 13, 2020 at 10:45 AM Post #39 of 77
Hey @csglinux, thanks for watching!
I'm happy you had the time to do so, and that it sparked some commentary.

I feel you might be reading too much into my perspective on fft analyzers. Others in the thread (pfzar, arnaud, castleofargh) have already pretty much nailed it, but I'd like to answer you directly. I have no problem with an fft analyzer or using it to gather data. However, where I feel like the fft analysis is "wrong" (and even that is way too strong a word - "potentially misleading" is better), is when the engineer/designer/analyst/consumer(?) attributes too much meaning to the fft analysis as it relates to human perception.

And that's where we feel psychoacoustics (and psychology) can play a role in helping to explain how some things are perceived better (or worse), despite the results given using an fft analysis.

As you latch onto, fft analyzers are (hopefully) very linear, and allow us to capture the data in an uncolored manner. That's great! and super important!
But how do we make sense of that data? The human ear behaves very differently - both spectrally and temporally than any analyzer.

SII and Zwicker Loudness are both older metric - but they are internationally standardized and common. So we wanted to start there and see if they could shed a little light on the issue. Both of those metrics only operate in frequency domain. They (like a traditional fft) will take a time sample of x seconds long and then convert that to the frequency domain, where they then both apply well-documented frequency based effects found in the human hearing systems (weighting curves, masking effects, etc). They are by no means a full replication of the human hearing system.

Moore-Glasberg has a relatively new hearing model for time-varying sounds. And the Relative Approach (which we used) is similarly designed around emulating the human hearing system in both freq/time domains. That gets us one step further and both tools that engineers can use to better gauge the effectiveness of their ANC algorithms or headphone designs. The 3QUEST MOS metric then uses RA to help score/rate the signal quality.

And of course, we are still miles away from perfectly representing the human ear function, but in this particular case regarding the evaluation of ANC headphones, we feel there's really good correlation between the subjective reviews/impressions and the psycoacoustic data.

I hope that helps clarify it.
 
Jun 13, 2020 at 11:14 AM Post #40 of 77
Amazing talk! Thanks so much for uploading it.

Things I don't quite get - how does the surround system that reproduces real world environmental noises record the original sound? It sounds like the ideal test bench would be full sound field synthesis - which is incredibly hard to re-produce and record. It's not binaural but it's replicating the original sound field and then reproducing it. I'm skeptical that a simple array of floor loud speakers, in a single plane, in an anachoeic or quasi-anechocic room would be able to reproduce the full X,Y,Z co-ordinates of sound. I also didn't see a subwoofer there - how are the low frequencies re-produced?

I see any surround system, say a theater 7.1.4 system to be able to re-produce those sounds. I myself (for my own testing) do a similar thing playing background noise on a 7.1.4 theater setup that has an f3 of 6hz. It sounds convincing - but not when you A vs B to a binural recording you make yourself. That contains all the localisation information - but of course that's easy for us with our brain picking up 2 mic inputs from our ears.

Of course it was just a brief video not one on the background noise reproduction system - so more details would be very interesting!

What I like about this approach is that there's an objective criteria to weighting a white noise attenuation curve. Some frequencies are more important than others for the perception of noise. Over ears like the WHXM3 and Bose 700 cancel far more 20-30hz noise than something like the Air Pods Pro or WI-1000x, but on my ears the latter - perceptually - cancel more noise. Things like sex and age matter here too - women are more sensitive to certain frequencies than men. It's a complex psycho-acoustic model if you factor in these additional variables.

Fit is important here too - one thing that's rarely taken into account. Sex is important too - women have longer hair typically so seals are worse. When I grew my hair long it ruined the seal for my over ears. A simple way I test for this is binural mics on my own ears with white noise playing on my 7.1.4 setup - I can then objectively test the attenuation on my own head of my over ears.

But we are 4 generations into ANC headphones and it's not actually improving much anymore. A few db here and there is easily forgotten about. I think we're at the state of the art now and waiting for new silicon to increase the higher frequency attenuation. Apple here could change the game - they have the speed in their chips to push it to higher ranges. The APPs actually ANC out higher frequencies with foam ear tips than silicon. See here for proof: (turn on english subs) so the upcoming overears might really change things. The YTer tested passive attenuation with silicon and foam to discount that as the source of improvement. With foam tips Air Pods Pro are by far my best ANC headphone


Hey @johnn29 - thanks! And great discussion topic!

We didn't dive too much in to the details of the background noise reproduction system in the video, but I'm happy to share more here.
@castleofargh is spot on. For you guys (and others who are interested), the system complies with ETSI TS 103 224: https://www.etsi.org/deliver/etsi_ts/103200_103299/103224/01.01.01_60/ts_103224v010101p.pdf (ETSI standards are free to download - so go nuts! :metal:)

That standard does get into the match of matrix inversion etc., so I won't do that here. But the short story is:
- we use the 8mic array on the manikin to record noise events and environments
- we then go back to the lab and use the same 8mic array and manikin to equalize the playback
- the equalization process identifies the impulse response from each loudspeaker to each mic (8x8 matrix of IR)
- apply fancy math to figure out which delays, FIR, IIR filteres need to be applied to each speaker channel to obtain the same frequency and phase output at each of the mics in the lab (which means room effects, speaker/amp/cabling issues are accounted for)
- one you hit "play" on any of your 8ch noise sources, they are then "reassembled" at the mic locations.

A little bit of background:
the original 8mic/8speaker MPNS (Mulit-Point Noise Simulation) system was designed for mobile phone testing and used a mic array that was slightly right side biased - for better or for worse, that how mobile phones get tested: on the right side of the manikin head.
MSA_I.jpg

That works well for mobile phones, because their mics are all located along the mobile phone plane.
The standard requires that the spectrum is equalized form 50Hz-20kHz. Decent two-way bookshelf monitors (Klipsch R-51m) can get there, where the EQ portion then compensates a bit. Magnitude and phase of the complex coherence has to exceed 0.9 and +-10deg in a narrower frequency range, but enough to "trick" a mobile phone)

The goal of the standard was to create a method and system that doesn't require a million dollor anechoic room and a 128 loudspeaker ambisonic system. And the succeeded. As pointed out, there is some vertical separation of both the mics and the loudspeakers, so that helps the system more easily recreate some of the vertical noises, but, as also mentioned in the video, this wasn't designed for our subjective enjoyment, nor is it fully 3D capable. It's fun! But there are limitations! :relaxed:

Since the original ETSI TS 103 224 was written, there's been an update that allows for more flexible MPNS systems. One that we use for headphone testing does use a different 8mic array, which focuses the eq spot closely around the ears (close to where the ANC mics are located in the headphone cups).
HMS+MSAII_sm.jpg

Now we also typically use a 8.1 speaker system for better lower frequency playback capabilities. And for the more flexible MPNS system, I have seen fantastic coherence numbers out to 20kHz. It's really impressive.
(for reference, the MPNS system also allows you to simply do binaural recordings and playback. Which we can also do using our 8+1 speaker setup. That would give you an even better sensation if you are right in the sweet spot.:beyersmile:)

Thanks again for the interesting question. I hope my response was on point. But I'm well aware I might have triggered a series of follow up questions... :wink:!
 
Jun 13, 2020 at 11:59 AM Post #41 of 77
Here is what it doesn’t tell me. It doesn’t tell me where the sound is perceived. It doesn’t tell my Why one hrtf is better than the other. It doesn’t tell me why people get localization miss match. Etc, Etc.

No, but it would tell you all of the above if you only knew how to interpret the data, because you haven't lost anything in switching to the frequency domain. Your argument ("it" doesn't tell me...) would apply equally to a time-domain signal. So your issue is measurement in general, not specifically FFT.

However, where I feel like the fft analysis is "wrong" (and even that is way too strong a word - "potentially misleading" is better)

Not just better. A critical distinction.
Imagine we distributed some heavy industrial equipment with a user manual translated into Russian. That might be potentially misleading. But the machinery isn't defective - it's just not being properly used and understood :wink:

Other than that though - nice video. Interesting work.
 
Jun 13, 2020 at 12:49 PM Post #42 of 77
Seriously, Do you know how?
If it’s so easy please explain how.
No, but it would tell you all of the above if you only knew how to interpret the data, because you haven't lost anything in switching to the frequency domain. Your argument ("it" doesn't tell me...) would apply equally to a time-domain signal. So your issue is measurement in general, not specifically FFT.
 
Jun 13, 2020 at 12:57 PM Post #43 of 77
Seriously, Do you know how?
If it’s so easy please explain how.
I think you're smart enough to know full well that's not what I'm suggesting. I'm simply saying that switching to the frequency domain puts you in no worse situation than you were in in the time domain. The problem is our limited understanding of the world, not an intrinsic flaw or lack of information in the data.
 
Jun 13, 2020 at 1:03 PM Post #44 of 77
Precisely. But you do come off as you know everything.

no implied sarcasm.
 
Last edited:

Users who are viewing this thread

Back
Top