Wait. Are you judging this from listening to the
streamed sample tracks via your internet browser, or did you download the actual files and listen to those?
So what recordings are IYO closer to sounding "real" and "unprocessed"? Mappleshade? Others?
Regarding Chesky's recordings, for the most part I don't always like the "room" or space that they have been recorded in (it's too "big" and "echoey"). For my tastes, they are all usually a bit too reverberant, with too much of the "room ambiance" captured. I prefer a bit more intimate, or maybe "close-mic'd" sound with just enough of the "room" included to identify the space.
That extra reverb or room ambience may be accurate to the actual space the performance was recorded in, but IMO, when you are listening to a live performance, your brain can somewhat "filter out" a lot of that reverb information to concentrate on the more intimate details (maybe because of actual visual cues?). But I find it more difficult to "filter" the room reverb/echo when listening to a recording.
However, for most of Chesky's spatial/imaging demonstration tracks, that room ambiance or "space" is necessary to the demonstration. Maybe it's just the mics he uses, the mic preamps, or just the placement of the mics? Or maybe (when listening with headphones) the HRTF of the dummy head that was used is just very different from your own? Mark (immersifi) may be able to expand on that aspect.
Just my .02, though I may be crazy, LOL.
EDIT: Look for the gearslutz link that Mark (immersifi) provided in Post #106.
OK, now on to the HRTF bit...
First of all, it is fair and accurate to say that the HRTF of any mannequin head out there is different than the HRTF of any human. Differences in morphology, and the same of each individuals' ears plays a significant role.
I don't know if it still exists, but once upon a time, there was in ISO standards committee that dealt with this thjis rather political-meets-technical issue, that is...
what ear shape is the 'most correct' ear shape?
It's an honest question, but here's the thing...if I am not mistaken, several of the members on the committee had ties or worked for vendors of mannequin head microphones, and pretty much all of these have established and esteemed businesses which service the NVH / Sound Quality arenas, but also, the recording sectors. I say "political" because deciding on one "most correct" ear shape has ramifications in terms of perceived technical competence. As you might imagine, mannequin head microphones, which tend to be rather expensive are sold with perceptual accuracy in mind. Thus, whatever "most correct" ear shape resulted from said work is bound to have specific elements of "Company A" s implementation, and perhaps elements of "Company B' s and so on. If any one
existing shape were to be chosen as the correct shape, you can see how the other mannequin head + ear designs would immediately be 'wrong'. Considering the expense of these mannequin head mics, there's a lot riding on such a decision, thus, the political aspects of the process. My guess is that out of necessity and decorum, whatever ear shape is deemed 'most correct' will be a neutral design (i.e. not borrowing strongly from any one particular manufacturer's design parameters).
So, this still doesn't answer the question about differences in the HRTF difference between mannequin and end user; this is the point of such an undertaking.
The thing is, most mannequin heads have silicone ears that are removable, so should such a 'most correct' ear shape ever result, then in theory, new molds could be made to allow the new reference shape to be incorporated into the existing 'skull' of the mannequin head. Mind you, this would automatically mean that the equalization(s) for the head(s) would need to be re-done, because changes in the morphology of the ear would affect the HRTF. Just how much this would affect a given, existing mannequin head mic of course comes down to how similar the 'new' is as compared to the 'old. Winners and losers will result, but my guess is that each will still lay claim to the overt import, or lack thereof depending upon in which camp each company finds itself.
While equalization (as discussed in the GS post) can and will affect the timbre, the overall geometry plays a huge role in the HRTF. That is, the directivity as a function of frequency and angle. I suspect that for those individuals whose ears are similar in shape to an existing shape would find the biggest changes with a new 'standard' ear shape - even if the issues surrounding EQ were corrected.
Now, as far as how we listen to binaural...that is another huge topic. However, I'll try to hit the highlights.
Moeller and others have suggested that for binaural, the 'best' headphone choice is likely to be one that has the best acoustical impedance match to free-air (the normal condition for your ear - it 'sees' an acoustic impedance of the air ). There are other parameters as well (most notably the PDR) that are related to the acoustic impedance of the headphones - and we've not even talked about how the headphones are equalized.
So what, in theory
should be the most like free air? I would argue that electrostatic or similar open-back dynamic phones having similar acoustical impedance. The way I see it, the 'ideal' headphone for binaural would be one that basically allows your ear to sense as though it is not encumbered in any way, which is why I say that open-back (i.e. very, very low noise reduction (or isolation if you prefer)) would seem to be the logical choice.
EDIT: Likewise, these are the same boundary conditions under which a mannequin head microphone operates; the ears of the mannequin head 'see' free-air.
From an impedance perspective, and from power transfer principles, the way in which one maximizes power transfer from source to load is to have the load be real (if the source is real). However, when the source is complex (real and imaginary components), then maximum power is delivered when the load impedance is the complex conjugate of the source impedance. A slight diversion here - those familiar with active noise cancellation understand that in essence, you minimize the noise from the source (an exhaust pipe, HVAC duct, or whatever) by minimizing the real portion of the impedance. This is because power can never be dissipated into an imaginary load - and in this case, we're talking about acoustic power. So, what an active silencer does (and yes, this includes active headphones) is to minimize the real portion...or maximize the imaginary portion in a relative sense, thereby minimizing the transfer (source --> receiver) of acoustic power).
So, it would seem obvious that the headphone / ear interface is the opposite of the example shown above - we want to maximize the power transfer of the original signal as presented by the headphones. Again, headphones having an impedance that is close to "FEC" (
Free-air
Equivalent
Coupling) would present the most 'natural' boundary conditions for the ear. and in theory would maximize the power transfer as a function of frequency. In essence, assuming the frequency response of the headphones is 'neutral' (I know, a relative term) then this would mean not altering the timbre of what's being presented to the listener's ears.
Having said that, I have heard many binaural recordings that sound very good on closed-back headphones - in no way do they sound similar to an open-back set of phones, but it sure would be interesting to break this down further (and it's probably already been done) by matching the EQ of the closed versus open pair. In this way, we'd only be comparing the acoustic impedance that is presented to each ear (because the frequency response would be the same). However, even changes in the size of the radiator will play a role as it's conceivable that though the radiate the same spectral power and shape, differences in the surface are between the two might play a role as well, mainly...
because you are so incredibly close to the ear, which is now of a similar dimension as is the very transducer that you are using to reproduce the pressure response (sound).
EDIT: Another thing to consider when discussing headphone types (open versus closed back) is the concept of perceived dynamic range. Let's consider a 'typical' home environment in which one might listen.
Let's suppose that you are listening to an a cappella piece with very high dynamic range. It's not uncommon for such pieces to have 40 or even 60 dB of dynamic range (I know this because I have seen this in some of my binaural a cappella recordings). Now, let's think about the fact that open-back headphones by their very nature have very low isolation (noise reduction)...during the quiet passages, you may not hear the quietest passages of the recording because of the poor isolation of the headphones. Paradoxically, these are the headphones that, in theory, are best suited for binaural (i.e those closest to
FEC behavior). SO, in order to hear the quietest passages, you turn the gain up to get above the noise floor...problem solved, right?
Well, here's the thing...our hearing mechanism is non-linear; this goes back to the work done by Fletcher and Munson all those years ago (and probably earlier, though their work is most often referenced). So, by tuning the gain up, you also alter the timbre, because the ear will now better hear the bottom end than when the gain was lower. This has all kind of ramifications from a psychoacoustic standpoint (masking, upward / downward spread of masking...and on and on and on).
Now think about a binaural recording made of a band having significantly lower dynamic rage - maybe 10 dB. In this case, the background noise is probably less important when using open back phones that it was for the high dynamic range piece.
Now let's flip it again... let's go back to the closed-back headphones. Since these are far from
FEC behavior, one would think that they are the worst thing possible for binaural. However, they offer much better isolation than do open-back variants. SO now, our high dynamic range recording is experienced with more correct dynamic range - but at the expense of being (from an acoustic impedance perspective) nowhere near the
FEC criterion.
An interesting problem, to be certain.
As an aside, my 'other' job involves Managing a Sound Quality and Noise and Vibration Group for a Tier 1 automotive supplier. As such, we do a fair amount of listening studies on product sounds. TYhis is also common in the OEM's Noise and Vibration Groups. The thing is, almost always, open-back headphones are almost always used in conjunction with binaural signals. However, care is always taken to
present the sounds at 1:1 loudness as they were recorded. That is, if a sound is 65 dB(L) in situ, it will be presented to the listeners of the sounds likewise at 65 dB(L) (or alternately, some value of loudness in sone). So, in research, care is always taken to present the levels at their actual loudness levels - something almost never done (unless by coincidence) during casual listening.
So we have a couple things going on:
Limited dynamic range of the sounds
Sounds of a loudness level that are often easily heard in real life
A 'treated' room in which the listening studies take place
Companies spend a lot of money to build these listening spaces. Often times they are on isolated flooring / foundations, have walls that afford tremendous transmission loss, and walls that are treated to absorb nearly all sounds. You can see that this is a very expensive proposition, but again, in the interest of perception of product sounds, they really must be presented at 1:1 loudness. Thus, such a space solves the issue of the poor isolation of open-back headphones...but just think what's required to make that true.
There are SO MANY aspects to this whole puzzle, and again, my feeling is that for the simple aesthetic purposes, much of this is moot. In the end, what matters is whether or not you like how it sounds. However, in the
research world, the import of this and the need for better accuracy is paramount, especially when localization studies and virtual environments are undertaken.
Anyway, that's a sort of NON Reader's Digest version of just
some of the concerns surrounding (no pun intended) why you perceive certain binaural recordings made with mannequin head X, and auditioned with headphone Y to be so different from one another. Mind you, this sort of thing is, as I mentioned, the topic of much research, but to carry this out with the requisite repeatability and reproducability requires an appropriate lab, with all the appropriate gear - a hefty investment indeed.
EDIT: I just found this on the web - the graphic on the home page speaks volumes as to the complexity of the HRTF, and why it is unlike any traditional microphone patterns (cardoid, omni, figure-8 (velocity), hyper-cardoid, etc) that are typically used in two-microphone stereo, or stereo mixes made from multichannel recordings. Yes, how one configures a given pair of mics in two-microphone stereo plays a huge role into the imaging of the end-product (as do many other factors), but just take a look at the surface plot of this particular HRTF; you will see that an HRTF is a function of frequency and azimuth (elevation) but also rotation (not shown in this plot):
http://www.ais.riec.tohoku.ac.jp/lab/db-hrtf/index.html
Some interesting links within this as well. I hope the graphic helps to make the HRTF concept a bit less nebulous and easier to visualize.
Mark