I would like to report on my latest findings with trying out other binaural decoders and conducting hearing threshold EQ to compensate for the lack of canal gain in blocked canal measurements. This is the main development after
https://www.head-fi.org/threads/rec...r-speaker-virtualization.890719/post-17951999 (post #1,789).
The measurements below show the three left ear measurements at their actual levels relative to one another.
Finding a binaural decoder with correct phase implementation:
As I had mentioned before, per the figure below, when taking in-ear measurements, the combined response at an ear from playing both channels in phase should have roughly the same amount of ear gain between the single and combined channel cases:
Figure 1: Indoor Genelec 8341A measurements; 1/12 octave smoothing. Throughout, "R 30 L" means "left ear with the head rotated 30 degrees right from the currently playing channel (the left channel; nomenclature kept from when I did single-channel outdoor measurements)", "L 30 L" means "left ear with the head rotated 30 degrees left from the currently playing channel (the right channel)", and "LR 30 L" means "left ear with both 30-degree channels playing together in phase", from hereon called the "combined response". As a comment, these were the best (smoothed) indoor in-ear measurements of my Genelecs I have taken, my having somehow struggled to get similar results so close to my original outdoor measurements and the resulting EQ profiles.
Since latter October, I had been happily using
SPARTA AmbiBIN with the Reaper DAW and my personal HRTF which when EQed to match my outdoor approximate free-field response sounded absolutely wonderful, particularly rendering orchestral strings with exquisite texture and woodwinds and brass with vividness. Unfortunately, I had always known per Figures 4 and 6 shown later on that due to how that binaural decoder renders the phase responses, the combined response was being relaxed in a likely incorrect way.
SPARTA BinauraliserNF has a similar issue, but relaxes 1.4 kHz a bit more while incurring more 3 kHz to 5 kHz than AmbiBIN, making it sound noticeably brighter for centered sounds:
Figure 2: MOTU M2 loopback measurements of SPARTA BinauraliserNF transfer functions for my personal HRTF.
A few months ago, I checked out
APL Virtuoso which was perhaps slightly better (it needed the diffuse-field and SPARTA-compatible version of my SOFA file), but still incurred a non-ideal midrange relaxation and slight ear gain relaxation relative to the bass and lower midrange with my SOFA file as well as the three presets. There was no option to completely disable the simulation of room reflections (I wanted to simulate the clarity of anechoic listening), and rotations were colouring the sound a lot more than AmbiBIN or BinauraliserNF, whether or not this was mainly due to an incompatibility of my SOFA with this decoder.
Figure 3: MOTU M2 loopback measurements of APL Virtuoso transfer functions for my personal HRTF.
I lately tried out the
IEM BinauralDecoder which unfortunately doesn't support custom SOFA files, and
IEM AdaptiveBinauralDecoder was failing to generate a preset from my SOFA files. This decoder has a variation on the previous two's issues:
Figure 3: MOTU M2 loopback measurements of IEM BinauralDecoder transfer functions for the default HRTF.
I then tried out
COMPASS Binaural which was finally promising in regard to combined response, well, when accidentally mismatching the ambisonic orders, and it was a pain working with the settings, or the settings closest to how the other three rendered my HRTF (it needed the non-SPARTA version of my SOFA files) were incurring troublesome bass shelfs that could be tuned for the single-channel response but get messed up in the combined response. This plugin also refused to play non-distorted/clipping sound for me.
I finally tried out
CroPac-Binaural which was finally what I was looking for as you can see below, compared against AmbiBIN:
Figure 4: MOTU M2 loopback measurements of AmbiBIN (brighter traces) and CroPaC (darker traces) transfer function magnitude responses for my personal HRTF; both needed the SPARTA version of my file; both decoders likewise didn't have any differences between the diffuse-field and non-diffuse-field versions of my SOFA files, so I opted to use the latter. For these comparison measurements, I used the yaw and pitch settings that centered the images at my preferred (slightly elevated) height for listening to classical orchestral works: AmbiBIN had a yaw of -6.50 and pitch of 10 while CroPaC had a yaw of -6.23 (-6.24 incurs a jump in the imaging per the measurement resolution or errors and interpolation algorithm) and pitch of 18. CroPaC's "diffuse to direct balance" was set to be fully direct, and the covariance matrix averaging coefficient was set to 0.50.
Figure 5: MOTU M2 loopback measurements of AmbiBIN (brighter traces) and CroPaC (darker traces) transfer function phase responses for my personal HRTF. The "CroPaC L 30 L" measurement has some ripples due to some slight sample desynchronization or other midway through the measurement.
Figure 6: AmbiBIN (brighter traces) and CroPaC (darker traces) in-ear free-field-EQed magnitude responses for my personal HRTF.
Initial listening impressions:
CroPaC compared to AmbiBIN needed the pitch to be set higher to achieve the desired vertical centering. I on top of a 12 dB boost set AmbiBIN to 3 dB and CroPaC to 4.07 dB to normalize the levels about the bass of the combined response. CroPaC certainly sounded brighter, effectively too bright as opposed to achieving the level of vividness that I thought AmbiBIN was still failing to achieve. For my favourite tracks like Daniel Harding's or Blomstedt's Brahms Symphony No. 3 or 4 or Boulez' Mahler Symphony No. 5, switching from CroPaC to AmbiBIN brought back the "realness" and spaciousness I had loved about the AmbiBIN sound, whereby AmbiBIN's combined response may have been smoothing the sound while allowing strings and whatnot to sound exquisitely vivid and textured. On the other hand, centered instruments and vocals sounded more vivid with CroPaC, so I figured that it might just be a matter of waiting for "mental burn-in" to settle in. The issue was that my Genelecs definitely did not sound this bright, technically sounding somewhere in between, and pink noise through both channels sounded closer to the combined response of AmbiBIN.
In regard to A/Bing, I was initially holding ALT to switch between my different Reaper tracks, incurring buffer delays that made it hard to evaluate whether CroPaC was incurring some distortions or echo artifacts or if it was merely tonally accentuating certain dynamic "wavering" details in some piano tracks or strings. The breakthrough was when I learned that I could keep both Reaper tracks on record mode and hold CTRL + ALT when soloing each track, allowing for seamless volume-matched switching that reassured me that CroPaC wasn't actually incurring any "technical" deficiencies compared to AmbiBIN (e.g. in going from 7th-order ambisonics to 1st-order; I would later come to find CroPaC's head-tracking to sound plenty sufficient, though maybe with more jumps in tonality at certain orientations) and that any spatial differences were tonal in nature, its simply sounding like switching between EQ profiles, except that each binaural decoder could EQ each part of the stereo pan differently (CroPaC would ideally minimize those colourations). Chinese orchestra sounded great, vocals sounded vivid, and your typical "audiophile music" compilation video on YouTube imaged quite nicely. I had also initially thought that some instruments like a piano in the mid left was imaging more inward, or that bass was being pulled inward, or that the image wasn't as wide as in AmbiBIN, but this was fixed by switching from the non-SPARTA SOFA file I was measuring with COMPASS back to the SPARTA version.
One issue with CroPaC is that sharp transients such as claps or certain metronome sound tracks incur "tweew" artifacts and accentuate some "woo" sounds for particularly fast metronome tracks. This is fortunately not noticeable for drum or plucked transients among others. I later found this to be diminished upon switching the covariance matrix averaging coefficient from 0 to a nonzero value, my choosing 0.50. The effects of that slider weren't visible in transfer function measurements, but with pink noise, you would notice that when switching between direct and diffuse balance (for fully diffuse balance, you hear quieter pink noise or music imaging around the back of and a bit above your head), the higher the coefficient, the longer it would take for "stored" pink noise energy to "decay". This unfortunately didn't completely eliminate the "tweew" artifacts.
Another issue was that a centered guitar instrument, at least its treble content, within Rodrigo y Gabriela's "Oblivion", was seemingly disappearing and the center guitar(s) were sounding congested, my later finding that it may have rather been AmbiBIN that was erroneously pulling some treble content toward the middle, but AmbiBIN for that track still seemed to sound closer to what I vividly and sharply heard through the Genelecs. This had me thinking that some instruments if "weirdly" mixed could aggravate edge cases in CroPaC like sharp transients did.
Calibration using threshold of hearing curves:
For those not in the know, since equal loudness contours should theoretically be constant, if you EQ a reference transducer ("R"; e.g. speakers) to one of your equal loudness contours ("C"; in this case, using your threshold of hearing curve can be more accurate or consistent), if you while keeping that EQ ("EQ1") enabled EQ the target transducer ("T"; e.g. your headphones, in my case after applying my DSP and existing free-field EQ based on blocked canal measurements; let's call that EQ profile "EQ2") to the same equal loudness contour, disabling (in effect subtracting) the equal loudness EQ of your reference transducer will match the
at-eardrum response through your target transducer with that imparted by the reference transducer. Basically, if R + E1 = C and T + E1 + E2 = C = R + E1, then T + E2 = R.
My first attempt was done with both Genelec 8341As playing simultaneously as driven by REW's tone generator, my needing to set them to the minimum volume before the sound completely cuts off in order to approach my threshold for 100 Hz. I unfortunately had no access to a proper audiology app that could automate the proper stochastic sampling, my thus having to spend a fair bit of time finding the point where I perceive "maximum doubt" regarding the audibility of the tone. Here I simply set my reference frequencies within Equalizer APO's variable-band GraphicEQ and adjusted the levels discretely, sometimes using a separate digital preamp knob for more gradual checks or "sanity checks". My very first attempt was not ideal insofar as I had made an assumption on the "quietest sound", my for the following attempts establishing the criterion that the threshold has been reached when it takes a second or two for neural temporal summation or whatever to fade the tone into believably reliable perceptibility and the tone "clearly" disappears upon removing the stimulus; sometimes I would close my eyes and click the play button rapidly until I was convinced that I did not remember the state, but I came to favour that "temporal summation threshold" approach. My PC fan noise and tinnitus would have certainly interfered with this, particularly affecting 3 kHz and 9.5 kHz and up.
Below are the five trials, the first three displayed being the single channel attempts with the left channel and the left ear (and technically inherently also the crossfeed and room imaging artifacts perceived by my right ear; the headphone reference would also have the crossfeed instituted by the binaural decoder), the fourth and fifth being the two-channel and both-ears attempts, though the fourth one was technically the one I did first:
Figure 7: Genelec 8341A threshold of hearing contours.
Figure 7: Threshold of hearing compensation EQs for CroPaC and my free-field EQ for the Meze Elite.
Figure 8: Final threshold of hearing compensation EQ averaged from all five attempts. I had been dismayed by the differences in the compensation EQs, but they did seem to all be generally similar, particularly between 700 Hz and 2 kHz, the main point of error having been in the 3 kHz to 4 kHz thresholds. 4 kHz effectively being EQed up was somewhat expected insofar as I had EQed 4 kHz down a bit while fixing some local channel imbalances.
What was clear from this exercise was that there was definitely a discrepancy between the perceived at-eardrum response from speakers at 30-degree incidence compared to the result of using blocked canal measurements to match that response for a headphone transducer at 90-degree incidence among other coupling matters; the 90-degree incidence was with the in-ear mics removed imparting around 8 dB more ear gain than in the actual 30-degree speaker response. Likewise, there was virtually no difference in overall ear gain compensation level between the two-channel and single-channel cases such that AmbiBIN's brightening of the flanks of the image certainly could not be correct, regardless of how spacious and vivid it sounded.
Figure 9: AmbiBIN (brighter traces) and ear-gain-compensated CroPaC (darker traces) in-ear free-field-EQed magnitude responses for my personal HRTF. Notice how CroPaC's combined response is now rather similar to AmbiBIN's as though AmbiBIN had implemented the correct combined response accounting for the ear gain discrepancy all along, just that it becomes brighter toward the flanks of the image.
Figure 10: Uncompensated and compensated CroPaC "R 30 L" measurement compared to the indoor Genelec measurement. Again, that indoor Genelec measurement was one especially good case from another day, its happening to match quite nicely here, though of course, the darker trace was what was needed for them to actually sound the same.
Figure 11: Uncompensated and compensated CroPaC "L 30 L" measurement compared to the indoor Genelec measurement. We also see a pretty decent match, though a higher-resolution SOFA would have probably done away with the 3 kHz dip, and likewise, there was not supposed to be a 7.5 kHz null, though my outdoor measurements did at least have a sliver of such that my generated SOFA seemed to accentuate.
Figure 12: Uncompensated and compensated CroPaC "LR 30 L" measurement compared to the indoor Genelec measurement. Again, we have a pretty decent match, though the smoothing may have shallowed out the likewise narrower real nulls.
Figure 13: Compensated CroPaC "R 30 L" and "LR 30 L" measurements compared to my Meze Elite hybrid pads V3.1 neutral PEQ and HE1000se with stock pads.
I usually used AmbiBIN with a bass shelf partly per my having had to lower the volume to accommodate the brighter effective tuning. Now with compensated CroPaC, it feels like it has plenty of bass as is. This otherwise shows that despite my past claims of "neutral speakers being a lot brighter than you think", "neutral" headphones or EQs do in fact come close to the actual in-ear response, whereby it was the blocked canal measurements that incurred an EQing discrepancy between the open speaker and closed headphone loading.
Listening impressions:
With this, the sound of pink noise was now much better matched between CroPaC and the Genelecs for the left channel, right channel, and for both playing together. This also brought strings and others to a level of "dullness" closer to how my Meze Elite V3 neutral PEQ (V3.1 adds more 6 kHz to approach the HE1000se and my in-ear speaker measurements) sounded after having gotten used to AmbiBIN. The vividness of AmbiBIN could be restored by turning the volume up and setting AmbiBIN to be 5 dB quieter when A/Bing, yielding the same string texture and treble forwardness while making AmbiBIN sound thinner and CroPaC sound fuller, almost too full; when disabling the compensation EQ and making CroPaC play 2 dB quieter than AmbiBIN to have the same perceived brightness, it ends up being AmbiBIN that sounds fuller due to the normalization essentially happening relative to AmbiBIN's relaxed upper midrange and lower treble. I have generally had the feeling that for practical music listening at least in my untreated room, the Genelecs tonally sound somewhere in between AmbiBIN and CroPaC.
Imaging correction shenanigans:
One thing other than tonal consistency throughout the stereo pan is the imaging coherence of all the frequencies throughout that pan; that is, for pink noise or a given instrument, you should not have different frequency bands from that same source imaging from different locations or shifting around when you pan that source across the stereo field. In this case, I had early on done the A/B with AmbiBIN's pitch set to 0 so that the center image was mostly coherent, my coming to realize that this was causing the treble to "droop" toward the flanks. For vocals, the pitch of 0 may sound better and more coherent, but for orchestral music, the whole image would seem to clearly be low by around 10 degrees. A pitch of 10 degrees came to be the general preferred setting probably consistent with what I was indirectly setting when using head-tracking, though this clearly caused upper components of vocals to be imaged high; when using JS: Volume/Pan Smoother v5 in Reaper (I set the Pan Law to 4 dB for reasonable volume consistency) to pan from side to side, the upper midrange and treble trace an arc going up and back down while the bass and lower midrange stayed level.
CroPaC on the other hand was pleasingly coherent throughout the stereo field, but there came a point where I had set the pitch to 16 to match the image elevation of AmbiBIN for orchestral music, my coming to notice annoying cases where centered bass and others were imaging too low, a side-to-side pan of pink noise now having the sub-bass and some upper midrange content trace an opposite arc from what I was encountering with AmbiBIN, this feeling more annoying than AmbiBIN's imaging flaws. Some centered instruments could thus sound vertically smeared. (The positional imaging errors for non-DSPed or purely EQed headphones is its own can of worms including things imaging from above or behind my head that definitely shouldn't, though some may perceive this as enjoyable "holography" or "3D soundstage" despite its probably being objectively completely wrong compared to proper stereo imaging.)
My first idea was to use a plugin to extract the common and hence centered content between the channels, low-pass filter that, then play it through a raised central and equidistant "woofer" configured within AmbiRoomSim... I soon learned that plain JS: Mid/Side Encoder would not work for this, my fortunately coming across
https://bertomaudio.com/phantom-center.html which can be had for free (I was willing to give them $5 if it did what I wanted it to do). I was eventually able to route the channels such that when playing with the volume panner and isolating the channels, the low-passed pink noise or test signal would be loudest through the "woofer" when centered and taper toward the flanks while when listening to the left or right channel in isolation, JS: Mid/Side Encoder was indeed cancelling out the centered content from the flanking channels in order to facilitate volume normalization of this upmix. Unfortunately, that "woofer" I added was still bound by the same imaging issue. I also tried adding some peaking filters to try to "lift" some midrange frequencies that were imaging too low, but this incurred some phase issues or just wasn't working. Earlier on, if ever this approach seemed like it was working, it was probably "placebo" or inconsistent imaging perception. Likewise, there were times where it seemed as though the "problem" wasn't showing up in my reference recordings anymore, its eventually turning up again while switching from AmbiBIN, one's perceiving the abrupt introduction of the problematic low midrange image.
My next absurd idea was to have two instances of AmbiRoomSim on the same track and use ReaEQ to implement crappy crossovers between the CroPaC midrange and treble and the AmbiBIN "woofers"... Filter tuning and integration was a pain if not impossible in this context, and the frequency bands of AmbiRoomSim's and CroPaC's midrange imaging flaws may have overlapped. I also tried using AmbiBIN to implement the center woofer from the previous approach, but this still left issues with midrange content imaging too low.
I finally realized that this imaging issue was only present on CroPaC when the pitch was between 14 and 17, this probably being an actual error in my HRTF measurement if not an error in the interpolation implementation. A pitch of 13 yielded a centered and coherent image for vocals or other tracks that I would prefer to be more vertically centered while a pitch of 18 raised the image to around the levels I experience at the fourth to seventh rows of a concert hall. If I set my head-tracked image height appropriately, I should be able to avoid that problematic pitch range most of the time.
Final results and concerns:
So now I'm back to a tonality closer to how I used to hear music before receiving my SOFA files. The AmbiBIN sound may still appeal with its vividness perhaps more consistent with my preferred concert seating closer up within the orchestra level, compensated CroPaC otherwise sounding clearer and bringing forth previously lacking fullness in some places, though when A/Bing against my Genelecs, the midrange in the center of the image may feel too full through CroPaC. That full midrange within CroPaC also tends to image closer, detracting from the sense of space and the orchestra or instruments being out there in front of you. The Genelecs may likewise continue to exhibit an air of vividness shared with live concert hall performances, probably related to the experience of room reflections whereas AmbiBIN or CroPaC will simply sound cleaner. The next step is to acquire nicer and more comfortable in-ear mics, acquire longer-distance HRTF measurements to see how that affects the imaging of chamber music and orchestral distance, and to see how Impulcifier compares for tonal consistency and image coherence across the stereo field.