Recording Impulse Responses for Speaker Virtualization
Feb 4, 2024 at 11:09 AM Post #1,801 of 1,816
I don't believe that THD is very meaningful for subjective experience (until the values explode), but I suspect some simple testing errors if someone came to the conclusion that IEMs at large have lower distortions than full size headphones. It's probable that the measurements got drowned in ambient noises and only showed that the IEMs have better isolation.
I've had some single dynamic driver that measured quite well, but I would still argue that most IEMs have rather poor distortions figures. More so if we consider the balanced armatures.
I'd say for high-end headphones, THD is more a matter of "measurebating". I mean, who wouldn't want to see a multitone measurement that looks as clean as this?

2024-01-31 - Meze Elite hybrid Rs V3_1 2 - 1_24 octave pink 94 dBA 4M FFT.jpg

Figure 1: Meze Elite right driver with personal "V3_1 PEQ". Balanced output. 1/24 octave pink spectrum multitone, approximately 94 dBA (simulation of opening of Mahler Symphony No. 5).

2024-01-31 - Arya Stealth OS Ls - 1_24 octave pink 94 dBA 4M FFT.jpg

Figure 2: Arya Stealth right driver. Balanced output. 1/24 octave pink spectrum multitone, approximately 94 dBA. The HE1000se is no better here.

As for IEMs, the limitation I find is the lack of being able to measure your ears' actual FR for the given IEM (good luck fitting a probe mic) or to have a mic in the same position for the IEM as when taking the in-ear speaker response measurement.

I think I had read that IEMs can give lower THD for the price, e.g. https://www.audiosciencereview.com/forum/index.php?threads/7hz-x-crinacle-zero-2-iem-review.50534/. As for the above, a curiosity I had lately come across is how a headphone EQed to the same target and with a very similar single-tone harmonic distortion, say, above 1 kHz compared to my EQed Meze Elite, can still have worse multitone performance.

2024-02-04 - ATH-M50xBT Rs - 1_24 octave pink 94 dBA 4M FFT.jpg

Figure 3: Audio-Technica ATH-M50xBT matched to personal "V3_1 PEQ". Unbalanced output. 1/24 octave pink spectrum multitone, approximately 94 dBA.
 
Last edited:
Feb 4, 2024 at 7:48 PM Post #1,802 of 1,816
As for IEMs, the limitation I find is the lack of being able to measure your ears' actual FR for the given IEM (good luck fitting a probe mic) or to have a mic in the same position for the IEM as when taking the in-ear speaker response measurement.
Yes. that's right. However, if you apply personal calibration to the right tip and the right response for the position of the ear microphone, that problem is solved.
Leading the response from Etimotic by 5128, I have converted all the IEMs I have into personal PEQs which are successful.

1707093265742.png


Such as, this messurement is Seeaudio Yume Ultra(Based 5128).
And if you correct this to my FL,FR (to be precise, the response from the ear microphone that was inserted deep), you get this response.
If you have the volume, you have a microphone deep in your ear, and you measure that IEM, that's the response.

I don't believe that THD is very meaningful for subjective experience (until the values explode), but I suspect some simple testing errors if someone came to the conclusion that IEMs at large have lower distortions than full size headphones. It's probable that the measurements got drowned in ambient noises and only showed that the IEMs have better isolation.
I've had some single dynamic driver that measured quite well, but I would still argue that most IEMs have rather poor distortions figures. More so if we consider the balanced armatures.

I also don't think Thd has much influence on the subjective listening experience. I think everyone misunderstands what I said. (Is it because I use a translator?)
I don't think of headphones and IEM as individual objects. To me, headphones and IEM are just my role as a playback machine to play my impulse files (thank you, files from jakk's impulcifer).
In this sense, I usually listen to Topping L70 Highgain with a lot of high volume (any IEM or any headphone)
But it was only some IEMs that fully played my preferred volume and preferred in-room targets and the responses I recorded and generated.
It doesn't affect my audiovisual experience, but IEM/headphones with bad qualities have dissatisfied me.
Because I didn't do my job properly as a regenerator that I wanted and demanded.
So I'm looking at it as one of the interview test items as a kind of device for me. It's not about finding a better receiver, it's about a reference to exclude receivers that might dissatisfy me.

Yes. Of course, there are many headphones that are excellent at playback, not just in IEM. But the most important thing about BRIR is that the feeling of covering your ears hurts your immersion. (This is my subject. Don't get me wrong.)
So, considering many things, some people's choices, myself included, had to flow to the IEM.

I've shared and recommended a short experience of IEM. Don't get me wrong everyone. Headphones are great too.
 
Last edited:
Feb 4, 2024 at 8:12 PM Post #1,803 of 1,816
When worn correctly, it is uniform to play the sign-sweep while touching the IEM, but finding the right IEM is another matter.

KakaoTalk_20240123_180009654.jpg

KakaoTalk_20240123_180009654_01.jpg

KakaoTalk_20240123_180009654_02.jpg

KakaoTalk_20240123_180009654_03.jpg


KakaoTalk_20240127_172226244.jpg


Some IEMs don't have pictures, but honestly I'm not happy with all IEMs either.
Ironically, the only IEM that gave me satisfaction in the reproducibility of sound without collapsing even at extremely loud volumes was the cheap 7Hz ZERO:2. (Zero RED was fine, but it didn't fit me properly.)
Especially in the case of Apple's Earpods, there are some unstable factors because the response really varies a lot depending on the size of the ear hole and the wearing.
But nonetheless, I'm getting a more uniform listening experience than the HD800s. I'm also thinking about custom IEM, but I'll take more time to consider that.
 
Mar 5, 2024 at 2:44 AM Post #1,805 of 1,816
Someone here should start making open ear canal microphones and selling them. Using as small capsule as possible with a some kind of fitting or clamp to keep the capsule in place at the center of the ear canal opening. Currently the biggest problem with Impulcifer is the headphone compensation because close ear canal microphones prevent ear drum from loading the headphones and therefore the headphones won't have the same frequency response as when worn normally.
(1) I had in implementing the free-field EQ for my Earfish HRTF been concerned about this. Assuming consistent in-ear microphone placement, one could probably pretty accurately match the frequency response reaching the microphones through some headphones with that received from the speakers, but upon removing the in-ear mics, I was concerned that the transfer function of the part between the microphones' position and the ear drum would differ between the speaker measurements and those from the headphones or what I suppose to be their rough equivalent to the 90-degree incidence HRTF. Are open canal or at eardrum probe microphone measurements the only way to accurately resolve this compensation discrepancy? For me, I would ideally be able to do away with just one pair of blocked canal microphones as those facilitate playing louder test signals for my in-ear headphone distortion measurement use case (where getting clone couplers for a dummy head wouldn't be economical or useful for my circumstance or where getting measurements for my ears is of foremost importance for EQ).

(2) I have been impressed with XingYu's results, whereby their HRTFs effectively look pretty close to mine, but with more ear gain thanks to the deeper mic insertion. Does deeper mic insertion partly mitigate the headphone compensation discrepancy issue?

(3) Does Impulcifier (provided that the original HRIRs were measured with proper sample synchronization and hence accurate phase response) when compensated for one's headphones with minimum-phase EQ reproduce the original HRIR's phase responses (within the limitations of headphone phase responses)? I had mentioned in my Earfish findings that plugins such as SPARTA AmbiBIN and SPARTA Binauraliser NF differ in their phase response implementations such that the frequency response at a given ear when playing in-phase sounds from both virtual channels simultaneously differs, possibly causing perceived tonal colourations between different HRTF plugins and versus the speaker reference. Would Impulcifier be impervious to such phase error colourations? E.g. Measuring a sine sweep at one ear for both virtual channels playing in phase would yield a very similar magnitude response as when measuring the same with two speakers. I have been planning this spring to try taking around 16 m distance HRTF measurements to see if this enables a more convincing projection of the imaging of symphonic music. With Earfish, I had always felt like the imaging though coherent has been limited to being cast from the wall in front of me due to the original 1.5 m measurement distance.

(4) On the other hand, for the primary purpose of low-noise on-head headphone distortion measurements, other than costs and the act of finding one with a small enough capsule and cable diameter, what are the concerns with using ready-made performance XLR Lavalier/lapel microphones mounted within DIY ear impressions to improve comfort, noise isolation, and consistency of fit? It so far seems to be easier to find Lavalier mics "directly" accepting 48V phantom power so that I don't have to deal with RØDE's adapters per https://www.audiosciencereview.com/...ones-with-motu-m2-and-rew.49384/#post-1783062 (post #6) where I found that the VXLR Pro has lower midrange noise but greater third-order harmonic distortion in the bass to lower midrange while the VXLR+ is better for said distortion but has more lower midrange noise. Worst case, those XLR Lavaliers basically have those kinds of adapters built-in and will encounter similar distortion or noise problems.
 
Mar 5, 2024 at 8:50 AM Post #1,807 of 1,816
Whatever method I tried, I always end with what the Realiser A16 calls ManLoud. It's been discussed here at one point. The principle is that you EQ manually to have near equal perceived loudness for a bunch of frequencies, then the standard equal loudness contour(inverted) is used to retune the entire thing and compensate for our natural changes in sensitivity. Hopefully, you now perceive neutral.
This is the best option IMO, with the huge caveat that you still need good hearing, otherwise your equal loudness contour might not look anything like the average graphs and the result could be quite horrible.
For one reason or another, I suck bad at that exercise at normal to loud listening level, but I get great result doing it at barely audible level EQ. Not sure why? Obviously, if you do that you also need a compensation for that kind of listening level, not the equal loudness graph for 80dB SPL.

I explained this with barely more details here https://www.head-fi.org/threads/neutralizer-freqency-loudness.965205/#post-17183247


The issue of trying to compensate for the ear canal objectively is that we lack data. How long is your own ear canal? Is there no impact at all when the mic closes it, or is there some gain at some frequencies for a shorter canal/tube that also need to be compensated in the final tuning?


About far away measurements, I discussed this recently somewhere else. I redid measurements coming back home after 6 months(not really I came back shortly for Xmas), and thinking mine from near field speakers and near field distance were not the best idea for spatial simulation. I measured in someone's big living room, and it sounded great, there! Back at my desk in my much smaller bedroom with the wall not half a meter behind the computer screen, it sounded like way too much reverb and I ended up going back to my intimate prirs. Again, it's just me, but I struggle with big differences between my listening room and the one recorded.
 
Mar 5, 2024 at 10:54 AM Post #1,808 of 1,816
Whatever method I tried, I always end with what the Realiser A16 calls ManLoud. It's been discussed here at one point. The principle is that you EQ manually to have near equal perceived loudness for a bunch of frequencies, then the standard equal loudness contour(inverted) is used to retune the entire thing and compensate for our natural changes in sensitivity. Hopefully, you now perceive neutral.
This is the best option IMO, with the huge caveat that you still need good hearing, otherwise your equal loudness contour might not look anything like the average graphs and the result could be quite horrible.
For one reason or another, I suck bad at that exercise at normal to loud listening level, but I get great result doing it at barely audible level EQ. Not sure why? Obviously, if you do that you also need a compensation for that kind of listening level, not the equal loudness graph for 80dB SPL.

I explained this with barely more details here https://www.head-fi.org/threads/neutralizer-freqency-loudness.965205/#post-17183247


The issue of trying to compensate for the ear canal objectively is that we lack data. How long is your own ear canal? Is there no impact at all when the mic closes it, or is there some gain at some frequencies for a shorter canal/tube that also need to be compensated in the final tuning?


About far away measurements, I discussed this recently somewhere else. I redid measurements coming back home after 6 months(not really I came back shortly for Xmas), and thinking mine from near field speakers and near field distance were not the best idea for spatial simulation. I measured in someone's big living room, and it sounded great, there! Back at my desk in my much smaller bedroom with the wall not half a meter behind the computer screen, it sounded like way too much reverb and I ended up going back to my intimate prirs. Again, it's just me, but I struggle with big differences between my listening room and the one recorded.
Thank you for reminding me of the equal loudness (or threshold of hearing) approach for producing compensations. I am guessing for a threshold-based procedure that one would need to first use the calibrated in-ear mics to establish the loudness reference at a frequency of stable loudness before after removing the mics recording the relative thresholds in dBFS at various reference frequencies (I suppose I could capture that with an equalizer APO variable band graphic EQ), then with the headphones use the in-ear mics to match the reference loudness, remove the mics, apply the speaker's threshold EQ, and then use appropriate procedures to generate a separate threshold EQ on top of that for the headphone which on its own will become the correct compensation curve.
  1. Would such a compensation curve correctly matched at the threshold of hearing automatically translate to other listening levels? That is, assuming transducer frequency responses largely do not vary within the usual range of loudnesses and that at a given loudness, the at-eardrum response is in fact effectively the same within limit, the relative changes in equal loudness contour for other listening levels should not differ between the speaker and headphone methods of imparting that frequency response? It would be unfortunate if differences in transducer coupling or incidence of sound would render static headphone compensations only accurate around certain loudnesses or require more dynamic compensation.
  2. And are these canal resonance discrepancies generally "low Q"? At least depictions of the canal transfer function on its own seem to suggest that one or two (if the canal resonance's peak frequency differs) peaking filters could suffice, and I've found in practice that finer-grained EQ discrepancies should be barely perceptible. Even then, I've found with most classical recordings that after my blocked canal measurement PEQ match and its respective ear gain levels, I rarely find much benefit from adjusting a 3 kHz peaking filter or 2 kHz and 4 kHz levels, whether or not I had really just gotten used to an excessive or still insufficient ear gain level. But as mentioned in my previous post, the SPARTA HRTF plugins' phase response rendering may already be colouring sounds shared between the channels.
Regarding long-distance measurements, this assumes the case of conducting them in a large backyard or gym if not field perhaps with the aid of strategically placed acoustic panels provided that there is a way to window the response and smooth out the compensation (I have been doing manual PEQ so far), this hopefully mitigating the capturing of reverb.
 
Mar 5, 2024 at 2:20 PM Post #1,809 of 1,816
Thank you for reminding me of the equal loudness (or threshold of hearing) approach for producing compensations. I am guessing for a threshold-based procedure that one would need to first use the calibrated in-ear mics to establish the loudness reference at a frequency of stable loudness before after removing the mics recording the relative thresholds in dBFS at various reference frequencies (I suppose I could capture that with an equalizer APO variable band graphic EQ), then with the headphones use the in-ear mics to match the reference loudness, remove the mics, apply the speaker's threshold EQ, and then use appropriate procedures to generate a separate threshold EQ on top of that for the headphone which on its own will become the correct compensation curve.
Yes, it would be good to have at least some idea of the SPL for the tones or sweep you'll use. In my case, as I stick with barely audible, I just lower the playback level and EQ. That's easier, but of course I'm not sure at all about the specific levels. In the past doing this by hand, I would use either the hearing threshold curve or the 20dB which are not drastically different, plus I always end up also EQing the subs to my liking anyway.
Would such a compensation curve correctly matched at the threshold of hearing automatically translate to other listening levels?
I have no idea. I want to say it should as that's statistically valid. But if we bother with all this it's usually because the statistical models have let us down already.
I imagine the biggest question is at what level your own acoustic reflex gets triggered. Ideally I would suggest doing this at normal listening levels so that we reduce the risks of variation in general, but as I said, my own attempts have been hilariously bad.
I think your general concern brings us back to the unavoidable issue of us having a different sensitivity at different SPL. To listen quietly at night, I use an extra V shaped EQ, so I can still feel like I'm getting some bass and treble.
Transducers usually stay about the same, but I did see IEMs having rather noticeable and measurable changes, mostly in the bass at low SPL. So it's a reasonable concern, and I guess I'm just lucky to end up with something that feels right enough. But again, I tend to go with personal taste for the low end, so maybe that's how I solve the transducer matter?
And are these canal resonance discrepancies generally "low Q"? At least depictions of the canal transfer function on its own seem to suggest that one or two (if the canal resonance's peak frequency differs) peaking filters could suffice, and I've found in practice that finer-grained EQ discrepancies should be barely perceptible. Even then, I've found with most classical recordings that after my blocked canal measurement PEQ match and its respective ear gain levels, I rarely find much benefit from adjusting a 3 kHz peaking filter or 2 kHz and 4 kHz levels, whether or not I had really just gotten used to an excessive or still insufficient ear gain level. But as mentioned in my previous post, the SPARTA HRTF plugins' phase response rendering may already be colouring sounds shared between the channels.
I know what you know from the famous ear gain graph at 45° or whatever. It seems like a wide enough gain. I basically never use sharp EQ for manual adjustments, at low freqs it's almost never needed, and at higher frequencies, I don't trust that it won't shift just enough to cause a catastrophe. I know that at large I tend to spend more time fine-tuning 1-2khz with 3 really not being a significant frequency for me(well, if it's horribly off I will notice). Is it because it's not that much of a bother with music? Because I'm used to the wrong amount of gain with headphones and IEMs? Or because I have a rather big head and longer than usual ear canal? IDK.
I do go mad if 4-5kHz isn't just right, and I don't need test tones or measurements for that, just about any song does it. But I don't think it has anything to do with ear canal gain.
I'd have to go look into old hard drives, see if I can find the EQ I used then for that specific purpose. Now it's all done in the A16 and I guess I'd have to measure the output with and without that correction to really find out what I'm using and what exact curve they use(it offers 80 and 20dB so it's probably the standard curve at those SPL. I won't say I'll do it for sure because I'm super lazy those days, but at the same time I'm somewhat curious, and it's not that hard to plug the output into an ADC to look at it. So I'll go for a solid, probably, maybe, some day.
Regarding long-distance measurements, this assumes the case of conducting them in a large backyard or gym if not field perhaps with the aid of strategically placed acoustic panels provided that there is a way to window the response and smooth out the compensation (I have been doing manual PEQ so far), this hopefully mitigating the capturing of reverb.
A field certainly takes care of reverb from walls and ceiling:smile_cat:. And with an uneven ground, it might even deal with most of the ground's reverb. But will you still feel like that distance if you don't have some reverb to help imagine a big space? I'm not sure the simple high frequency attenuation over distance is going to do it, we're already not that good at estimating distances by ear when all cues are present.
Then again, I remember that blind dude making click noises with his mouth and telling what was in front of him with impressive details.
Try and tell us how it is for you. If I remove reverb I tend to get the distance of whatever visual cue I have(speakers on my desk, the wall, a TV). I seem to prioritize sight even more than the average guy.
 
Mar 5, 2024 at 9:50 PM Post #1,811 of 1,816
I highly doubt that it is possible to do good measurements at distance larger than 5m.
A Korean user said that the thread measured even at a distance above that, and there seems to be no significant difference after about 4m. In addition, noise floor management becomes very difficult. It requires too much volume in outdoor(10~20m)
(2) I have been impressed with XingYu's results, whereby their HRTFs effectively look pretty close to mine, but with more ear gain thanks to the deeper mic insertion. Does deeper mic insertion partly mitigate the headphone compensation discrepancy issue?
I think deep insertion may or may not help.
I saw a lot of other users' files, including Korean users, and they corrected me a lot. Then, I saw HRTF of various people and found one thing in common that was not easily explained in words.
If it's better than shallow insertion depth, deep insertion depth is good, but it becomes more unstable (even if it's measured by your own ears, it becomes very unstable)
Not only that, but it also depends on the degree to which it is sealed inside.
If it was not properly sealed even when deep, I saw that some high-pass resonance was not properly implemented and recorded more deteriorated.
And the biggest feature of those files was that no matter how you calibrate it, it's hard to calibrate it close to normal recording.
And what's interesting is that the high notes were recorded properly (1 to 8 kHz), and no matter how you put them into the file, they all worked fine. Octa subwoofer synthesis, BacchXTC synthesis, inverse phase, upmixing, artificial reflection, etc
But the files that had issues with the high notes, no matter how high notes were synthesized and raised, weren't speakers at all, and they were very strange. XTC didn't work as well.
Slightly out of the original story, but the depth of insertion does not increase the accuracy of FL,FR files and Headpones (rather more likely to be unstable), the correct wearing was more important.
And the headphones themselves are very unstable devices. I'm already enjoying them with IEM and reproduce them more accurately than the HD800s that were my reference.
I recommend trying with IEM and using it as a reference.
Of course, the shape of the FL, FR response of the microphone and the 711/5128 response of the IEM do not match, so you need to listen to the 2~8k side and correct it to some extent.
I made my own calibration curve, and now whatever IEM I use, I applied my calibration curve to the graph of the earphone, made it minimal phase, saved it as headphones.wav, and applied it, and it worked perfectly with almost no correction required. Additionally, even if I calibrated it a little with eQ, it was around plus or minus 1db.

Btw,
There is not only a simple FR match, inconsistency problem, but also a discrepancy problem that comes from impulse self-timing.
For example, FL-L and FR-R must start at zero at the same time, no matter what.
However, other users' files that I thought were strange or incorrect when I heard them, didn't fit from the impulse timing.
Those files are very strange when you hear them. They're unnatural.
This can be calibrated by forcibly adjusting the Ref timing in REW.
And even if either FL, FR file or Headphones.wav is recorded incorrectly (even if impulcifer headpohnes treatment is weird), you can listen to it and recalibrate it correctly.
Recording is important, but post-correction is also important.
If you don't like something after listening to your recording, feel strange, incompatible, or inconsistent, play SineSweep.
And you can first check that it runs smoothly to approximately 600-10 kHz, and apply EQ that restores your original reference by alternating sign sweep, pink noise, and listening to general music.
Even HD800s that I thought were correct in my experience were not perfect compared to IEM.
 
Last edited:
Mar 19, 2024 at 5:36 AM Post #1,812 of 1,816
I would like to report on my latest findings with trying out other binaural decoders and conducting hearing threshold EQ to compensate for the lack of canal gain in blocked canal measurements. This is the main development after https://www.head-fi.org/threads/rec...r-speaker-virtualization.890719/post-17951999 (post #1,789).

The measurements below show the three left ear measurements at their actual levels relative to one another.

Finding a binaural decoder with correct phase implementation:

As I had mentioned before, per the figure below, when taking in-ear measurements, the combined response at an ear from playing both channels in phase should have roughly the same amount of ear gain between the single and combined channel cases:

2024-03-19 - Genelec 8341A indoor measurements.jpg

Figure 1: Indoor Genelec 8341A measurements; 1/12 octave smoothing. Throughout, "R 30 L" means "left ear with the head rotated 30 degrees right from the currently playing channel (the left channel; nomenclature kept from when I did single-channel outdoor measurements)", "L 30 L" means "left ear with the head rotated 30 degrees left from the currently playing channel (the right channel)", and "LR 30 L" means "left ear with both 30-degree channels playing together in phase", from hereon called the "combined response". As a comment, these were the best (smoothed) indoor in-ear measurements of my Genelecs I have taken, my having somehow struggled to get similar results so close to my original outdoor measurements and the resulting EQ profiles.

Since latter October, I had been happily using SPARTA AmbiBIN with the Reaper DAW and my personal HRTF which when EQed to match my outdoor approximate free-field response sounded absolutely wonderful, particularly rendering orchestral strings with exquisite texture and woodwinds and brass with vividness. Unfortunately, I had always known per Figures 4 and 6 shown later on that due to how that binaural decoder renders the phase responses, the combined response was being relaxed in a likely incorrect way.

SPARTA BinauraliserNF has a similar issue, but relaxes 1.4 kHz a bit more while incurring more 3 kHz to 5 kHz than AmbiBIN, making it sound noticeably brighter for centered sounds:

2024-03-19 - BinauraliserNF transfer functions.jpg

Figure 2: MOTU M2 loopback measurements of SPARTA BinauraliserNF transfer functions for my personal HRTF.

A few months ago, I checked out APL Virtuoso which was perhaps slightly better (it needed the diffuse-field and SPARTA-compatible version of my SOFA file), but still incurred a non-ideal midrange relaxation and slight ear gain relaxation relative to the bass and lower midrange with my SOFA file as well as the three presets. There was no option to completely disable the simulation of room reflections (I wanted to simulate the clarity of anechoic listening), and rotations were colouring the sound a lot more than AmbiBIN or BinauraliserNF, whether or not this was mainly due to an incompatibility of my SOFA with this decoder.

2024-03-19 - APL Virtuoso transfer functions.jpg

Figure 3: MOTU M2 loopback measurements of APL Virtuoso transfer functions for my personal HRTF.

I lately tried out the IEM BinauralDecoder which unfortunately doesn't support custom SOFA files, and IEM AdaptiveBinauralDecoder was failing to generate a preset from my SOFA files. This decoder has a variation on the previous two's issues:

2024-03-19 - IEM transfer functions.jpg

Figure 3: MOTU M2 loopback measurements of IEM BinauralDecoder transfer functions for the default HRTF.

I then tried out COMPASS Binaural which was finally promising in regard to combined response, well, when accidentally mismatching the ambisonic orders, and it was a pain working with the settings, or the settings closest to how the other three rendered my HRTF (it needed the non-SPARTA version of my SOFA files) were incurring troublesome bass shelfs that could be tuned for the single-channel response but get messed up in the combined response. This plugin also refused to play non-distorted/clipping sound for me.

I finally tried out CroPac-Binaural which was finally what I was looking for as you can see below, compared against AmbiBIN:

2024-03-19 - AmbiBIN and CroPaC transfer functions.jpg

Figure 4: MOTU M2 loopback measurements of AmbiBIN (brighter traces) and CroPaC (darker traces) transfer function magnitude responses for my personal HRTF; both needed the SPARTA version of my file; both decoders likewise didn't have any differences between the diffuse-field and non-diffuse-field versions of my SOFA files, so I opted to use the latter. For these comparison measurements, I used the yaw and pitch settings that centered the images at my preferred (slightly elevated) height for listening to classical orchestral works: AmbiBIN had a yaw of -6.50 and pitch of 10 while CroPaC had a yaw of -6.23 (-6.24 incurs a jump in the imaging per the measurement resolution or errors and interpolation algorithm) and pitch of 18. CroPaC's "diffuse to direct balance" was set to be fully direct, and the covariance matrix averaging coefficient was set to 0.50.

2024-03-19 - AmbiBIN and CroPaC phase responses.jpg

Figure 5: MOTU M2 loopback measurements of AmbiBIN (brighter traces) and CroPaC (darker traces) transfer function phase responses for my personal HRTF. The "CroPaC L 30 L" measurement has some ripples due to some slight sample desynchronization or other midway through the measurement.

2024-03-19 - AmbiBIN and CroPaC in-ear FRs.jpg

Figure 6: AmbiBIN (brighter traces) and CroPaC (darker traces) in-ear free-field-EQed magnitude responses for my personal HRTF.

Initial listening impressions:

CroPaC compared to AmbiBIN needed the pitch to be set higher to achieve the desired vertical centering. I on top of a 12 dB boost set AmbiBIN to 3 dB and CroPaC to 4.07 dB to normalize the levels about the bass of the combined response. CroPaC certainly sounded brighter, effectively too bright as opposed to achieving the level of vividness that I thought AmbiBIN was still failing to achieve. For my favourite tracks like Daniel Harding's or Blomstedt's Brahms Symphony No. 3 or 4 or Boulez' Mahler Symphony No. 5, switching from CroPaC to AmbiBIN brought back the "realness" and spaciousness I had loved about the AmbiBIN sound, whereby AmbiBIN's combined response may have been smoothing the sound while allowing strings and whatnot to sound exquisitely vivid and textured. On the other hand, centered instruments and vocals sounded more vivid with CroPaC, so I figured that it might just be a matter of waiting for "mental burn-in" to settle in. The issue was that my Genelecs definitely did not sound this bright, technically sounding somewhere in between, and pink noise through both channels sounded closer to the combined response of AmbiBIN.

In regard to A/Bing, I was initially holding ALT to switch between my different Reaper tracks, incurring buffer delays that made it hard to evaluate whether CroPaC was incurring some distortions or echo artifacts or if it was merely tonally accentuating certain dynamic "wavering" details in some piano tracks or strings. The breakthrough was when I learned that I could keep both Reaper tracks on record mode and hold CTRL + ALT when soloing each track, allowing for seamless volume-matched switching that reassured me that CroPaC wasn't actually incurring any "technical" deficiencies compared to AmbiBIN (e.g. in going from 7th-order ambisonics to 1st-order; I would later come to find CroPaC's head-tracking to sound plenty sufficient, though maybe with more jumps in tonality at certain orientations) and that any spatial differences were tonal in nature, its simply sounding like switching between EQ profiles, except that each binaural decoder could EQ each part of the stereo pan differently (CroPaC would ideally minimize those colourations). Chinese orchestra sounded great, vocals sounded vivid, and your typical "audiophile music" compilation video on YouTube imaged quite nicely. I had also initially thought that some instruments like a piano in the mid left was imaging more inward, or that bass was being pulled inward, or that the image wasn't as wide as in AmbiBIN, but this was fixed by switching from the non-SPARTA SOFA file I was measuring with COMPASS back to the SPARTA version.

One issue with CroPaC is that sharp transients such as claps or certain metronome sound tracks incur "tweew" artifacts and accentuate some "woo" sounds for particularly fast metronome tracks. This is fortunately not noticeable for drum or plucked transients among others. I later found this to be diminished upon switching the covariance matrix averaging coefficient from 0 to a nonzero value, my choosing 0.50. The effects of that slider weren't visible in transfer function measurements, but with pink noise, you would notice that when switching between direct and diffuse balance (for fully diffuse balance, you hear quieter pink noise or music imaging around the back of and a bit above your head), the higher the coefficient, the longer it would take for "stored" pink noise energy to "decay". This unfortunately didn't completely eliminate the "tweew" artifacts.

Another issue was that a centered guitar instrument, at least its treble content, within Rodrigo y Gabriela's "Oblivion", was seemingly disappearing and the center guitar(s) were sounding congested, my later finding that it may have rather been AmbiBIN that was erroneously pulling some treble content toward the middle, but AmbiBIN for that track still seemed to sound closer to what I vividly and sharply heard through the Genelecs. This had me thinking that some instruments if "weirdly" mixed could aggravate edge cases in CroPaC like sharp transients did.

Calibration using threshold of hearing curves:

For those not in the know, since equal loudness contours should theoretically be constant, if you EQ a reference transducer (e.g. speakers) to one of your equal loudness contours (in this case, using your threshold of hearing curve can be more accurate or consistent), if you while keeping that EQ enabled EQ the target transducer (e.g. your headphones, in my case after applying my DSP and existing free-field EQ based on blocked canal measurements) to the same equal loudness contour, disabling (in effect subtracting) the equal loudness EQ of your reference transducer will match the at-eardrum response through your target transducer with that imparted by the reference transducer. Basically, if R + E1 = C and T + E1 + E2 = C = R + E1, then T + E2 = R.

My first attempt was done with both Genelec 8341As playing simultaneously as driven by REW's tone generator, my needing to set them to the minimum volume before the sound completely cuts off in order to approach my threshold for 100 Hz. I unfortunately had no access to a proper audiology app that could automate the proper stochastic sampling, my thus having to spend a fair bit of time finding the point where I perceive "maximum doubt" regarding the audibility of the tone. Here I simply set my reference frequencies within Equalizer APO's variable-band GraphicEQ and adjusted the levels discretely, sometimes using a separate digital preamp knob for more gradual checks or "sanity checks". My very first attempt was not ideal insofar as I had made an assumption on the "quietest sound", my for the following attempts establishing the criterion that the threshold has been reached when it takes a second or two for neural temporal summation or whatever to fade the tone into believably reliable perceptibility and the tone "clearly" disappears upon removing the stimulus; sometimes I would close my eyes and click the play button rapidly until I was convinced that I did not remember the state, but I came to favour that "temporal summation threshold" approach. My PC fan noise and tinnitus would have certainly interfered with this, particularly affecting 3 kHz and 9.5 kHz and up.

Below are the five trials, the first three displayed being the single channel attempts with the left channel and the left ear (and technically inherently also the crossfeed and room imaging artifacts perceived by my right ear; the headphone reference would also have the crossfeed instituted by the binaural decoder), the fourth and fifth being the two-channel and both-ears attempts, though the fourth one was technically the one I did first:

2024-03-19_01-14-36 - Threshold EQs.png

Figure 7: Genelec 8341A threshold of hearing contours.


2024-03-19_01-14-36 - Threshold compensation EQs.png

Figure 7: Threshold of hearing compensation EQs for CroPaC and my free-field EQ for the Meze Elite.

2024-03-19_01-20-09 - Theshold compensatoin EQ - average out of 5.png

Figure 8: Final threshold of hearing compensation EQ averaged from all five attempts. I had been dismayed by the differences in the compensation EQs, but they did seem to all be generally similar, particularly between 700 Hz and 2 kHz, the main point of error having been in the 3 kHz to 4 kHz thresholds. 4 kHz effectively being EQed up was somewhat expected insofar as I had EQed 4 kHz down a bit while fixing some local channel imbalances.

What was clear from this exercise was that there was definitely a discrepancy between the perceived at-eardrum response from speakers at 30-degree incidence compared to the result of using blocked canal measurements to match that response for a headphone transducer at 90-degree incidence among other coupling matters; the 90-degree incidence was with the in-ear mics removed imparting around 8 dB more ear gain than in the actual 30-degree speaker response. Likewise, there was virtually no difference in overall ear gain compensation level between the two-channel and single-channel cases such that AmbiBIN's brightening of the flanks of the image certainly could not be correct, regardless of how spacious and vivid it sounded.

2024-03-19 - AmbiBIN and CroPaC in-ear FRs - compensated.jpg

Figure 9: AmbiBIN (brighter traces) and ear-gain-compensated CroPaC (darker traces) in-ear free-field-EQed magnitude responses for my personal HRTF. Notice how CroPaC's combined response is now rather similar to AmbiBIN's as though AmbiBIN had implemented the correct combined response accounting for the ear gain discrepancy all along, just that it becomes brighter toward the flanks of the image.

2024-03-19 - CroPaC R 30 L versus indoor Genelec measurement.jpg

Figure 10: Uncompensated and compensated CroPaC "R 30 L" measurement compared to the indoor Genelec measurement. Again, that indoor Genelec measurement was one especially good case from another day, its happening to match quite nicely here, though of course, the darker trace was what was needed for them to actually sound the same.

2024-03-19 - CroPaC L 30 L versus indoor Genelec measurement.jpg

Figure 11: Uncompensated and compensated CroPaC "L 30 L" measurement compared to the indoor Genelec measurement. We also see a pretty decent match, though a higher-resolution SOFA would have probably done away with the 3 kHz dip, and likewise, there was not supposed to be a 7.5 kHz null, though my outdoor measurements did at least have a sliver of such that my generated SOFA seemed to accentuate.

2024-03-19 - CroPaC LR 30 L versus indoor Genelec measurement.jpg

Figure 12: Uncompensated and compensated CroPaC "LR 30 L" measurement compared to the indoor Genelec measurement. Again, we have a pretty decent match, though the smoothing may have shallowed out the likewise narrower real nulls.

2024-03-19 - CroPaC R 30 L versus Meze Elite hybrid V3_1 PEQ and HE1000se stock.jpg

Figure 13: Compensated CroPaC "R 30 L" and "LR 30 L" measurements compared to my Meze Elite hybrid pads V3.1 neutral PEQ and HE1000se with stock pads.

I usually used AmbiBIN with a bass shelf partly per my having had to lower the volume to accommodate the brighter effective tuning. Now with compensated CroPaC, it feels like it has plenty of bass as is. This otherwise shows that despite my past claims of "neutral speakers being a lot brighter than you think", "neutral" headphones or EQs do in fact come close to the actual in-ear response, whereby it was the blocked canal measurements that incurred an EQing discrepancy between the open speaker and closed headphone loading.

Listening impressions:

With this, the sound of pink noise was now much better matched between CroPaC and the Genelecs for the left channel, right channel, and for both playing together. This also brought strings and others to a level of "dullness" closer to how my Meze Elite V3 neutral PEQ (V3.1 adds more 6 kHz to approach the HE1000se and my in-ear speaker measurements) sounded after having gotten used to AmbiBIN. The vividness of AmbiBIN could be restored by turning the volume up and setting AmbiBIN to be 5 dB quieter when A/Bing, yielding the same string texture and treble forwardness while making AmbiBIN sound thinner and CroPaC sound fuller, almost too full; when disabling the compensation EQ and making CroPaC play 2 dB quieter than AmbiBIN to have the same perceived brightness, it ends up being AmbiBIN that sounds fuller due to the normalization essentially happening relative to AmbiBIN's relaxed upper midrange and lower treble. I have generally had the feeling that for practical music listening at least in my untreated room, the Genelecs tonally sound somewhere in between AmbiBIN and CroPaC.

Imaging correction shenanigans:

One thing other than tonal consistency throughout the stereo pan is the imaging coherence of all the frequencies throughout that pan; that is, for pink noise or a given instrument, you should not have difference frequency bands from that same source imaging from different locations or shifting around when you pan that source across the stereo field. In this case, I had early on done the A/B with AmbiBIN's pitch set to 0 so that the center image was mostly coherent, my coming to realize that this was causing the treble to "droop" toward the flanks. For vocals, the pitch of 0 may sound better and more coherent, but for orchestral music, the whole image would seem to clearly be low by around 10 degrees. A pitch of 10 degrees came to be the general preferred setting probably consistent with what I was indirectly setting when using head-tracking, though this clearly caused upper components of vocals to be imaged high; when using JS: Volume/Pan Smoother v5 in Reaper (I set the Pan Law to 4 dB for reasonable volume consistency) to pan from side to side, the upper midrange and treble trace an arc going up and back down while the bass and lower midrange stayed level.

CroPaC on the other hand was pleasingly coherent throughout the stereo field, but there came a point where I had set the pitch to 16 to match the image elevation of AmbiBIN for orchestral music, my coming to notice annoying cases where centered bass and others were imaging too low, a side-to-side pan of pink noise now having the sub-bass and some upper midrange content trace an opposite arc from what I was encountering with AmbiBIN, this feeling more annoying than AmbiBIN's imaging flaws. Some centered instruments could thus sound vertically smeared. (The positional imaging errors for non-DSPed or purely EQed headphones is its own can of worms including things imaging from above or behind my head that definitely shouldn't, though some may perceive this as enjoyable "holography" or "3D soundstage" despite its probably being objectively completely wrong compared to proper stereo imaging.)

My first idea was to use a plugin to extract the common and hence centered content between the channels, low-pass filter that, then play it through a raised central and equidistant "woofer" configured within AmbiRoomSim... I soon learned that plain JS: Mid/Side Encoder would not work for this, my fortunately coming across https://bertomaudio.com/phantom-center.html which can be had for free (I was willing to give them $5 if it did what I wanted it to do). I was eventually able to route the channels such that when playing with the volume panner and isolating the channels, the low-passed pink noise or test signal would be loudest through the "woofer" when centered and taper toward the flanks while when listening to the left or right channel in isolation, JS: Mid/Side Encoder was indeed cancelling out the centered content from the flanking channels in order to facilitate volume normalization of this upmix. Unfortunately, that "woofer" I added was still bound by the same imaging issue. I also tried adding some peaking filters to try to "lift" some midrange frequencies that were imaging too low, but this incurred some phase issues or just wasn't working. Earlier on, if ever this approach seemed like it was working, it was probably "placebo" or inconsistent imaging perception. Likewise, there were times where it seemed as though the "problem" wasn't showing up in my reference recordings anymore, its eventually turning up again while switching from AmbiBIN, ones perceiving the abrupt introduction of the problematic low midrange image.

My next absurd idea was to have two instances of AmbiRoomSim on the same track and use ReaEQ to implement crappy crossovers between the CroPaC midrange and treble and the AmbiBIN "woofers"... Filter tuning and integration was a pain if not impossible in this context, and the frequency bands of AmbiRoomSim's and CroPaC's midrange imaging flaws may have overlapped. I also tried using AmbiBIN to implement the center woofer from the previous approach, but this still left issues with midrange content imaging too low.

I finally realized that this imaging issue was only present on CroPaC when the pitch was between 14 and 17, this probably being an actual error in my HRTF measurement if not an error in the interpolation implementation. A pitch of 13 yielded a centered and coherent image for vocals or other tracks that I would prefer to be more vertically centered while a pitch of 18 raised the image to around the levels I experience at the fourth to seventh rows of a concert hall. If I set my head-tracked image height appropriately, I should be able to avoid that problematic pitch range most of the time.

Final results and concerns:

So now I'm back to a tonality closer to how I used to hear music before receiving my SOFA files. The AmbiBIN sound may still appeal with its vividness perhaps more consistent with my preferred concert seating closer up within the orchestra level, compensated CroPaC otherwise sounding clearer and bringing forth previously lacking fullness in some places, though when A/Bing against my Genelecs, the midrange in the center of the image may feel too full through CroPaC. That full midrange within CroPaC also tends to image closer, detracting from the sense of space and the orchestra or instruments being out there in front of you. The Genelecs may likewise continue to exhibit an air of vividness shared with live concert hall performances, probably related to the experience of room reflections whereas AmbiBIN or CroPaC will simply sound cleaner. The next step is to acquire nicer and more comfortable in-ear mics, acquire longer-distance HRTF measurements to see how that affects the imaging of chamber music and orchestral distance, and to see how Impulcifier compares for tonal consistency and image coherence across the stereo field.
 
Last edited:
Apr 1, 2024 at 1:20 PM Post #1,815 of 1,816
I'm looking at it now. Seems like it can use Atmos audio content.
After some testing I can confirm that Dolby Atmos content ca be converted to some other channel based format like PCM. This way Objects will be hard coded into channels.

That’s awesome news. I can’t test it for a few days myself but came across it on Reddit. Does it decode on the fly or do you have to encode the audio separately ?

Also he wrote this
“This also only works with e-ac3+joc not trueHD+”

@musicreo Did you try trueHD?
 
Last edited:

Users who are viewing this thread

Back
Top