Are binaural recordings higher quality than normal ones?

immersifi · Mar 22, 2015 at 10:32 AM

Uli87 said:

"Wow thanks for all this info. I found you also posting on gearslutz talking to a guy about Holophnics and Zuccarelli when doing a google search.

I found a really cool video on BRIR on youtube https://www.youtube.com/watch?v=bmOvHjSBlnc.

These sort of technologies are already implemented into recording and mixing programs like protools? Because I know of some pretty awesome music that was recorded on smaller budgets in small home studios that also manage to capture pretty great depth. I mean like when you pan left and right while mixing, is there a way to actually add depth on a z axis?"

[sorry for the 'butchered' quote - I meant to quote you, but typed my reply first...and without coffee...Now, my reply]

First, you're most welcome.

Second, Yes.

ADDED : In the BRIR video cited by Uli87, the mannequin head shown right around the 1:50-mark looks like it is the Neumann KU-100, which is a well-known diffuse field-equalized mannequin mic - it's also the same type that is used in all of my (immersifi recording services) work.

If you canvass the 'web, you will find that people like me (acousticians, recording engineers, etc.) often post impulse responses - they may be monaural, they may be stereo, they may be binaural etc. I can't recall the links this moment, but basically, there are sites wherein they will tell you the format, type of mic (so you can de-convolve the mic's frequency response if so desired).

The thing is this... the realism of the reverb generated relies upon a few things, but most notably the quality of the impulse response, and there are several things that govern the accuracy of that part of the puzzle - all of that is way beyond the scope of this post. However, a good place to start is to review the lectures of Dottore Angelo Farina (University of Parma).. There are so many issues surrounding the 'accuracy' of an impulse response. My point here is that while there are free impulse responses all over the web, like most things...'caveat emptor' applies.

As far as obtaining the IR (whether BRIR or others), there are a few typical means of gathering the IR, and each have their pro's and con's; some are better suited to noisy spaces (MLS-based (circular cross-correlation)) others are better suited at de-convolving the non-linearity of the speaker used to generate the impulse (Farina's method of the chirp and de-convolution based on time-reversed kernel), and so on. See, the challenge with all IRs is that you are often up against dynamic range constraints, extraneous noises, speaker non-linearities and such, and so there are things that need to be considered. Delta's are the desired source, but actually making a true delta function is a bit tricky, however, as Farina has pointed out, if your main area of interest is the HF portion of the IR, then remarkably, firecrackers are pretty repeatable (not line-on-line, but not bad) as ersatz delta functions. Popped balloons is another way to get there, but the size, pressure inside of, and shape of the balloon affect its spectral composition when it bursts.

As far as the elevation question you pose...clearly in VR, if they have a database of IRs as a function of X, Y, and Z (and of rotation about those axes - particularly critical for BRIR) and can interpolate to the next valid IR if there are a wealth of points.

That is, it's impossible to have every single IR as a function of mannequin head mic location and orientation, however, if 'n' is sufficiently large, then a convolution algorithm can sort of interpolate between the individual IRs as the listener turns his or her head. Likewise, the more distinct the IRs are, the more of them you need to make the interpolation not noticeable in real time. On the otehr hand, if you had a truly diffuse field (a sanctioned reverb chamber), then the IRs would not be all that different from location to location within the reverb chamber. Likewise for free-field conditions - in true free field conditions (no reflections), everything would simply come down to magnitude and orientation with respect to the source. So...those two (free and diffuse) represent the ends of the continuum where things are well-defined, but in reality, all other spaces in which we live, breathe, and experience music occupy a point on that continuum, and this, the IRs as a function of source-to-receiver distance, orientation and the like are more and more disparate.

As far as hearing how convolution using BRIRs can be used to design the acoustics of spaces...you should check out this demo from ODEON - seriously, this diversion is wholly worth taking...some very cool demos here, and at the core is the concept of the BRIR.

http://www.odeon.dk/auralisation

This code is written around ray-tracing and models of a space (virtual or real) in which the material properties are described (for each and every surface), and then BRIRs are used so that dry sounds, recorded in an anechoic chamber, can be convolved with the BRIR, itself being the result of a simulation of the ray-tracing and the location in the virtual space.

Great stuff.

Another similar bit of code is CATT.

Mark (immersifi)

immersifi · Mar 22, 2015 at 4:20 PM

uli87 said:
Wow thanks for all this info. I found you also posting on gearslutz talking to a guy about Holophnics and Zuccarelli when doing a google search.

I found a really cool video on BRIR on youtube https://www.youtube.com/watch?v=bmOvHjSBlnc.

These sort of technologies are already implemented into recording and mixing programs like protools? Because I know of some pretty awesome music that was recorded on smaller budgets in small home studios that also manage to capture pretty great depth. I mean like when you pan left and right while mixing, is there a way to actually add depth on a z axis?

In my 'other' reply to this question, I mentioned the various means by which one could obtain an impulse response (whether monaural, stereo, binaural, or any other approach), but this paper, which I failed to mention (sorry) prior to this is most assuredly a great read. They compare a few different stimuli as well as the measurement side of things. Someone else on this thread probably referenced this paper already, so apologies for the double-post. However, this paper really should be trotted-out every now and again, as there's a fair amount of detail describing the methodology. This paper is a good place to get a handle on maximum length sequences, how they work, but also what their strengths and weaknesses are - likewise for the other approaches discussed.

http://www.montefiore.ulg.ac.be/~stan/ArticleJAES.pdf

And I also found this which is a pretty nice condensed version of MLS:

http://www.libinst.com/mlsmeas.htm

I hope this helps all who peruse the links.

Thanks,

Mark (immersifi)

bigshot · Mar 22, 2015 at 7:38 PM

Similar things are being done in multichannel sound. There is a film called Leviathan from 2012 that was directed by two Harvard "sensory ethnologists". It's ostensibly a documentary about the North American fishing industry, but it really isn't about that at all. The film is completely immersive and indescribable. It's like no other movie ever made. But if you see it on a big screen with a good surround sound system, you feel like you're being shot into outer space.

immersifi · Mar 22, 2015 at 9:24 PM

bigshot said:
Similar things are being done in multichannel sound. There is a film called Leviathan from 2012 that was directed by two Harvard "sensory ethnologists". It's ostensibly a documentary about the North American fishing industry, but it really isn't about that at all. The film is completely immersive and indescribable. It's like no other movie ever made. But if you see it on a big screen with a good surround sound system, you feel like you're being shot into outer space.

Yes, both approaches are aimed at immersive results, but approach the problem from different angles, and with differing constraints.

It is true to say that convolution is at the very heart of all impulse-response based shaping as well as synthesized soundcscapes. Like anything either approach can be used as pure and stand-alone (i.e. a live mannequin head-based recording and a live n.1 surround recording), or the elements of each approach can be used to different ends in each medium. Quite often (most often I would say), in film the goal of n.1 is not necessarily to reproduce a realistic sound field - it is to create an effect and to use the sound to further the story. This is kind of where the sound designers come into the picture, as the look at - and rightly so - all of the techniques possible as would an artist look at brushes.

In this respect, it's the same as it has always been in music production - there are those artists / engineers / producers who strive for a "verite" type sound, and there are others who are willing to use every effect and plug-in available in their DAW or equipment rack. Neither approach is correct, and neither approach is wrong. It all comes down to what you like, no more, and no less.

immersifi · Mar 25, 2015 at 9:33 AM

brhfl said:
A binaural recording can be just as well- or poorly-made as any other. The difference is simply in the mic technique - one that imitates a human head. This in itself goes a long way to reproducing an accurate sense of space and all, and in general a lot of folks seem to prefer recordings made with simpler techniques like this, or other single-point methods. It's worth noting that the mics used are pretty expensive, so you're probably not dealing with a bunch of amateurs slapping together a binaural recording. All in all, there are reasons that these will tend to sound good - but one could still royally screw up a binaural recording if one wanted!

I just wanted to say 'thanks' for this post. It's succinct and accurate. As someone who has worked in and with binaural, signal processing / DSP, simulation, sound quality, and NVH for 25+ years, it's refreshing to see someone get past so much of the 'noise' that seems to permeate the web when it comes to binaural, and actually gets it right.

Thanks again for this technically-correct and 'efficient' post.

Mark (immersifi)

immersifi · Mar 30, 2015 at 1:57 PM

Here's something cool...

Asante Hunter (from the U.K.) is a cat I met on line some time ago. Long story short he ended up using a cinaural recording that I made of a thunderstorm, but as part of a musical track. It was just released. Check it out...

http://www.hungertv.com/feature/premiere-asante-hunter-home/

"Lyrics of homesickness are layered over gentle bass and the sound of torrential rain before symphonic strings and a female counterpart arrive to transport a lost and lonely Hunter to the place that he’s been searching for, “Home”."

Immersifi contributed the "torrential rain" portion of this track - a binaural recording of a thunderstorm that took place in the suburban Detroit area that Asante then used as an artistic and aesthetic touch.

Check it out...remember...use headphones!

bigshot · Mar 30, 2015 at 2:10 PM

Reminds me of the old Mystic Moods Orchestra records!

uli87 · Mar 31, 2015 at 2:18 AM

Very cool! Very layered sound.. love it, especially when the thunder kicks in.

AstralStorm · Apr 2, 2015 at 11:57 AM

rrod said:
I think headphones desire more to be speakers than the other way around; speakers mimic our natural interaction with music more realistically. To me it makes more sense to record for multi-channel, and use HRTFs to get to a binaural setting. A properly mixed surround recording will be naturally transformed by our anatomy, whilst binaural recordings have to approximate that via "average" dummy heads.

Stereo speakers are not quite near the target, in fact they smear localization cues quite a bit.
Wavefront synthesis (and complementary large microphone array recordings) is where it's at - if you have enough speakers, and you need a lot of them.

Headphones can be much, much closer to reality than stereo or 5.1 or even cinema 11+ channel surround setups.

RRod · Apr 2, 2015 at 12:18 PM

astralstorm said:
Stereo speakers are not quite near the target, in fact they smear localization cues quite a bit.
Wavefront synthesis (and complementary large microphone array recordings) is where it's at - if you have enough speakers, and you need a lot of them.

Headphones can be much, much closer to reality than stereo or 5.1 or even cinema 11+ channel surround setups.

Well I wouldn't consider stereo much of a surround localization format, having only one axis. I'd be perfectly willing to try out the best that hrir/brir has to offer against a well-calibrated 11+ setup, especially if someone set it up in my house for free and then left it there

Kidding aside, I am stoked that headphone localization might get some kind of bump from the standardization protocols that are being laid out. It's really great technology, even if it doesn't thump your chest like good speakers.

bigshot · Apr 2, 2015 at 7:20 PM

astralstorm said:
Stereo speakers are not quite near the target, in fact they smear localization cues quite a bit.

Stereo speakers add *real* localization cues because of the room they are in and the distance from the listener.

Stereo in headphones is like a straight line through your head- one dimensional. Stereo with speakers creates a vertical soundstage plane in front of you- two dimensional. 5.1 adds a horizontal plane through the room and behind you 2 dimensional sound times two. Dolby Atmos adds vertical information creating a dome of sound over you- a true three dimensional sound field.

Assuming high fidelity sound reproduction, the more defined the sound field, the more realistic the sound is. Dolby Atmos is the most realistic system in home audio.

uli87 · Apr 3, 2015 at 2:56 AM

Most stereo recordings through good headphones do convey a sense of depth - 3D layering. I have never tried this Dolby Atmos thing, so I couldn't compare.

AstralStorm · Apr 3, 2015 at 7:50 AM

bigshot said:
Stereo speakers add *real* localization cues because of the room they are in and the distance from the listener.

Stereo in headphones is like a straight line through your head- one dimensional. Stereo with speakers creates a vertical soundstage plane in front of you- two dimensional. 5.1 adds a horizontal plane through the room and behind you 2 dimensional sound times two. Dolby Atmos adds vertical information creating a dome of sound over you- a true three dimensional sound field.

Assuming high fidelity sound reproduction, the more defined the sound field, the more realistic the sound is. Dolby Atmos is the most realistic system in home audio.

Can you render a very near source reliably using any number of speakers? Try it, it is a very hard problem. Rendering farther sources is also doubtful.
Attempt a source set directly to your left, or to right. Contrary, using headphones I can localize the sources anywhere I want.

In a reverberant space, speakers also anchor localization into discrete points - they can only reproduce frontal half-plane somewhat precisely in an anechoic chamber.
As for multispeaker systems - you're ignoring all issues of crosstalk, speaker bandwidth, speaker placement and sometimes room treatment.

Dolby Atmos in the wild is not actually that much better, mostly because it is a lossy downconverted version, generally mastered by ear by some engineers on a good reference setup into a few select formats. Now guess which formats will be supported and what kind of results you'll get. My bet is on {5,7}.1.{1,2,4}.

Actual true movie theatre quality object-oriented Atmos is in fact easier to decode for headphones than on any number of speakers or "virtual speakers" constructed by reflecting sound off ceiling.

Here is some actual information sans marketing hype: http://www.audioholics.com/loudspeaker-design/hrtf-and-elevated-sound-dolby-atmos
So yes, Atmos uses partial HRTF for the top channel due to deficiencies in the speaker setup, especially in the reflected version.

Generating signal for "Home Atmos" setup from a binaural recording using a known measurement head is definitely possible, but quite a chunk of DSP. True Atmos is even harder.
Quite a few labs are working on systems for sound source detection and tracking, incl. Dolby itself and Fraunhoffer. It is much easier to use a microphone array for such recording than a dummy head.

--
Related: AES paper on results using generalized HRTFs. Verdict: Not Good. http://webs.psi.uminho.pt/lvp/publications/Mendonca_et_al_20120_JAES.pdf
Here's a very interesting way to personalize them using essentially a 3D scan: http://gamma.cs.unc.edu/HRTF/docs/PHRTFpaper_final.pdf

bigshot · Apr 3, 2015 at 12:53 PM

astralstorm said:
Can you render a very near source reliably using any number of speakers? Try it, it is a very hard problem.

I was listening to an Elton John SACD the other day, and there was a song (can't remember which one though) where the guitar solo started in the center channel and moved forward from the back wall to the center of the room a few feet in front of me. It was manipulating the phantom center of the mains and rears to locate the sound right in front of the listening position. There was an effect like that in the Steven Wilson remix of Anthony Phillips' The Geese and the Ghost too.

I don't hear that often in 5.1 mixes, but it is possible.

miceblue · Apr 13, 2015 at 1:22 AM

I didn't bother to read the thread, but I'll respond with my own personal experience. Some albums are indeed really great and make use of the binaural recording technique. Ottmar Liebert's "Up Close" album is still my favourite binaural album ever.

Not from his album, but an AWESOME binaural recording nonetheless.

[video]https://www.youtube.com/watch?v=ecOrBqQAuXg[/video]

Jamey Haddad, Lenny White, and Mark Sherman's "Explorations In Space And Time" album comes in a close second place from my experience with binaural albums.

My favourite track:

[video]https://www.youtube.com/watch?v=GMXYig-PizQ[/video]

........and then you get to other binaural albums like Chesky's binaural albums. I don't have anything against Chesky and his work, but his binaural albums are pretty disappointing to me. Basically you have like 1 or 2 instruments on the left side of the recording space, another instrument on the right, and maybe a singer in the center. That doesn't make use of the binaural recording technology in my opinion and it really doesn't sound much different from a typical stereo recording; it's pretty underwhelming to say the least. Some of his works are better, when they're recorded in a church and you can hear the reverberations, for example, but most of his work is like the former.

Pearl Jam's "Binaural" album was a disaster in my honest opinion, and that's coming from a local Seattle resident. XD

I personally own a pair of inexpensive, relatively speaking, in-ear binaural microphones. The recordings can be a hit or miss sometimes since it doesn't seem to completely capture how I hear things around me, especially behind and in front of my head, but it's much closer than what I would imagine a stereo recording to sound like. Its power source also makes a big difference in terms of noise floor. My USB stereo dongle sucks in that regard and my TASCAM iPhone recorder dongle is much better.

Featured Sponsor Listings

Are binaural recordings higher quality than normal ones?

immersifi

100+ Head-Fier

immersifi

100+ Head-Fier

bigshot

Headphoneus Supremus

immersifi

100+ Head-Fier

immersifi

100+ Head-Fier

immersifi

100+ Head-Fier

bigshot

Headphoneus Supremus

uli87

100+ Head-Fier

AstralStorm

500+ Head-Fier

RRod

Headphoneus Supremus

bigshot

Headphoneus Supremus

uli87

100+ Head-Fier

AstralStorm

500+ Head-Fier

bigshot

Headphoneus Supremus

miceblue

Headphoneus Supremus

Users who are viewing this thread