A layman multimedia guide to Immersive Sound for the technically minded (Immersive Audio and Holophony)

    True, I do care. It might appear from my posts that I object to or have a personal bias against anything more than stereo. That appearance is entirely incorrect, I completed my first multi-channel (5.1) commercial project in 1998, I've worked extensively (though not exclusively) in 5.1 and other surround formats ever since, I far prefer working in surround than in stereo and I've always been a strong advocate for it. So, what I've been saying is based purely on the realities and practicalities (both logistical and artistic) of music creation and is NOT due to any bias/preference for stereo (or against multi-channel formats). I'll stick with surround/5.1 for the moment because although it doesn't qualify as an immersive format (as "immersive" is currently marketed), it is more immersive than stereo. When working with a surround format we've got two fundamental choices for how we employ it: 1. To have just a stereo soundfield for all the sound sources (as we would with a stereo mix) and then use the additional (surround) channels purely for spatial information (echoes/reverb) or 2. To actually place sound sources in the surround channels and fully employ the entire surround soundfield.

    1. I would argue that #1 is more a sort of spatially enhanced stereo than true surround. Nevertheless, I have heard some excellent examples of music recordings produced this way, some classical ensemble (inc. orchestra) recordings. Even though the entire surround spatial information is not an accurate preservation but judiciously mixed from widely spaced mono and/or stereo mic positions, it can be convincing or at least pleasing but it is moderately subtle. You are not intended to be consciously aware of this surround spatial information, just feel a somewhat more real/immersive experience. This use of surround obviously doesn't really work (or doesn't work very well) for non-acoustic, popular music genres because there is no one coherent space to record and, we need isolation of the sound sources from each other and the spatial information in order to process those sound sources. We could in theory just artificially generate most/all the surround spatial information using reverb plugins/processors but I say "in theory" because it's only just becoming possible to achieve this and even then, it's only possible to manipulate certain aspects of the reverb (early reflections but not reverb tails for example). Going a step further and adding height information is in it's infancy and is functionally very basic, typically just mono or stereo height info. Achieving height spatial info in an acoustic music recording is more practical as we would just require some additional mics placed in a high position, there are some caveats though: We're multiplying the potential for phase interactions affecting the perceived frequency response and secondly, even with 5.1 we're talking about surround spatial info which is already at a level not intended to be consciously heard and height info is even lower in level. So even if it's mixed exceptionally well and the consumer playback system/environment is good enough to represent it, the potential improvement should be relatively tiny.

    2. This is where music and sound really diverge. In real life it's not only acceptable to have sound sources behind and coming from almost every angle but it's actually expected, however, this is NOT the case with music. Indeed, a rock band (and all the rock sub-genres; metal, punk, commercial pop, thrash, etc.) are based on a drumkit, lead guitar, bass guitar and vocal, typically with added backing vocals and/or keyboard. It is NOT co-incidental that this is a perfect number of sound sources to create width and depth in a stereo mix! Much more than this and we'd have a cluttered/messy stereo image with poor separation and much less than this and we'd have too much separation and a non-cohesive mix or a mix which doesn't take full advantage of the stereo soundfield. Either of these two problems is exactly what we run into when trying to expand our soundfield from stereo to 5.1, our popular music genres have evolved for stereo not for 5.1, trying to mix a rock band (or rock band related ensemble) in 5.1 results in too much separation and a non-cohesive mix. Some people are going to like this of course but to most it just sounds somewhat wrong/gimmicky and that's why 5.1 has never really taken off for music! And I disagree with your statement "How many playback environments are enabled to play multichannel? Very few.", I remember seeing industry figures from around 2009 stating that over 30 million consumer 5.1 systems had been sold, that's certainly a big enough user base to target commercially and I would assume that number is significantly higher today. In fact, almost all network TV in the more developed countries is made and distributed in 5.1, not stereo but the opposite is still true for music. Although some genres could lend themselves to 5.1 (EDM and related genres, as mentioned previously), what we really need is some new genres specifically designed for 5.1, just as the rock/pop band genres were specifically designed for stereo but after 25 years of 5.1, still no one has managed to come up with anything compelling enough to drive significant sales and incentivise the music industry. And of course, what you're talking about is beyond 5.1, an additional dimension/plane (height) when no one has really figured out how to effectively take advantage of the dimension/plane we've already got with 5.1!

    When I read that I remember Thriller start when someone walks across the room. Or Beat It when just a moment before Eddie Van Halen solo, someone knocks the door and open it very roughly as if he (Eddie) was entering the room to play. Of course I like those musics even when I play them in Bluetooth speakers. And I didn’t like your sample even with those police sirens, window crashing, telephone ring, shots, airplane and knocking door effects. ^_^

    I just heard them with my crosstalk cancellation pillow and lossless and yes those effects are cool.

    Listening the Thriller effect I wonder if it were recorded with a binaural head.
    Thank you very much for this post. I couldn’t agree more with you regarding the problems you describe in items 1 and 2.

    You made me “happy, happy, happy, happy! Happy, happy happy, happy!
    There have only been a few Atmos music mixes, and I've only heard 5.1 mix downs of two of them. They are spectacular with sound objects moving through the center of the room. I'm convinced that Atmos is the way to incorporate the vertical dimension to sound. The trick is to come up with a standard for home listening, and to convince home builders to build Atmos ready rooms in houses. Then buying speakers would be plug and play. It wouldn't be difficult. It's just a matter of running a few more wires through the walls and installing mounting brackets- perhaps some basic room treatment built in. It will probably happen because of home theater, and music will go along for the ride, just like with 5.1.
    Gonna threadjack just a bit: If one were buying a house that didn't have a dedicated room available, what should one look for in terms of a room to convert (say for 4-6 people to watch/listen)?
  6. bigshot
    When I was house hunting, I was looking for a good sized room with high ceiling, a roughly symmetrical layout, and good acoustics. For video projection the ability to darken the room during the day is important too. Flexibility in construction helps too. Things like being able to run wires in the walls, ceiling and floor. I didn't find that in my room and it ended up costing me a couple thousand dollars. Once you find the room, you can figure out what that particular room needs.
    I'm not sure why, I didn't really say anything different to what I've said previously. I'm still questioning the benefits of immersive sound for music, still pointing out the practical limitations and that it's unlikely the mainstream music industry will move beyond stereo in the foreseeable future.

    I'd agree with bigshot but will expand the answer somewhat. Your question is one of those "how long is a piece of string" questions. It depends on what you want to end up with and how much time and money you're willing to invest. What you really want, you almost certainly won't find because a series of adjacent rectangular/cuboid rooms is the the easiest and most efficient use of the space available within the footprint of a house but that's not good for acoustics. This can be overcome to a significant degree but what you should look for in a room will depend on how far you are willing/able to go to in order to overcome the issues. Obviously you're going to need a room big enough for 4-6 people to comfortably sit and listen without being right next to the speakers but if you're willing/able to try and overcome the issues then you want a much bigger room because you're going to loose a good 3-4 feet or so from the width, maybe 6 feet or so from the length and a couple from the height. It doesn't need to cost the earth to achieve a good/very good environment, if you've got the tools, time and moderate DIY skills, as essentially all you'll need is widely available, fairly cheap materials. Obviously, you'd end up with a dedicated room but I'm not sure from your post if that's what you're ultimately after:

    If it is, then there are a few things to look out for in addition to just a bigger room, if possible: 1. An entrance/exit from the room very roughly central to the side walls. A door in the front or back wall will cause problems which can't be solved, more so in the front wall than the back. A door right in a corner is also a problem, again a front corner being worse than a back corner. 2. Same with any windows, although best is no windows at all. 3. A floor and ceiling which you can drill into and provide solid/secure attachment points. 4. You'd ideally want an isolated room, one which doesn't share a wall/floor/ceiling with a neighbour for example. There's little you can do to stop bass frequencies passing through floors or walls, even with some serious heavy construction you can only somewhat mitigate this problem. It would be a shame to put a lot effort in a room and then only be able to use it's potential on certain occasions/times. BTW, there's a sub-forum on the AVS forum dedicated to DIY home theatre builds, containing a wealth of useful information.

    If that's not what you're after, then you don't need a bigger room. The location of doors and windows mentioned above would be nice but not essential. Walls (rather than floor and ceiling) to which you can attach things would be good to look out for, as there could still be a few things you could do to improve the acoustics without upsetting the wife or making the room "dedicated".

    Actually a dedicated entertainment room would certainly be something I'd be looking for if we did buy a new house. The house would be detached with a decent distance to a neighbor, so isolation won't be as much of an issue. Mainly I'd like it to be for our 4 person nuclear family, but allowing up to 6-8 would be occasionally nice for visitors even if the acoustics aren't perfect for the invaders. Combining your two posts gets me a good idea of what I was asking for: are there room attributes that will minimize wall-smashing. I'll dig more once we actually have a plan, but we're semi-looking right now so it's good to have things in mind.

    The last thing I'll ask here b4 diving into other forums is: does an unfinished basement make things any easier or less costly than finding a room?
    Concrete walls and floors are difficult to work with. You would have to finish the walls or it would echo like a train station. Basements also have low ceilings that can create challenges. You can work with anything, but some things are easier than others. I built my room in a large living room. Overhead beams gave me places to hide the screen behind so it doesn't feel like a theater all the time. And I had a wet bar that I used to store all the equipment in. The most difficult thing to hide is the speakers. But there are good compact speakers that could be incorporated into a living room situation without drawing too much attention to themselves. You just want to make sure the furniture placement in the room favors the acoustics. I did that by shifting stuff around and testing for a week.
    Just because you said you care. It is just a feeling. Nothing rational to support my hypothesis or to discard your assertive that:

    But since you quoted me as if I could have misunderstood your previous post as being some kind of grant or give in towards my hypothesis, I will expand by saying that the following true assertive:

    Does seem less compatible with the hypothesis that millions of consumers did or will place 5.1 or more real speakers in their living rooms with the same audiophile dedication/devotion of @bigshot:

    Than compatible with my hypothesis of corporations providing content with 3 axis mixing that you criticize here:


    If I had to rank all that playback equipment in order of ease/practicality of adoption by consumers, I would rank them in the following order:

    1. beamforming phased array of transducers;
    2. transparent xtc algorithms (still expensive to be relevant);
    3. personalized externalization with headphones or personalized binauralization (still expensive and without HRTF acquired with biometry to be relevant);
    4. Atmos/Dolby/Auro/Ambisonics set of multiple speakers (not practical and still expensive to be relevant).

    On one hand, beamforming phased array of transducers or transparent xtc algorithms working together with personalized binauralization in playback environments are compatible with stereo (though better only possible with mixings that provide realistic ITD and ILD), 5.1 multichannel, Atmos/Dolby/Auro/Ambisonics, binaural synthesis and binaural recordings content:

    p.s.: Afaik, if a personal room impulse response - PRIR was measured with the A8, one would not need any further calibration in the Yarra and using the A8 head tracking can improve spatial the rendering.​

    To add elevation to that rig you would only need binaural content (binaural head recordings or binaural synthesis) with two channel PRIR or Atmos/Dolby/Auro/Ambisonics with a Realiser A16 and a multichannel PRIR. It is still a hassle to measure a PRIR.

    Do you think that eventually no one will market a soundbar that combine beamforming, the acquisition of HRTF through biometrics and headtracking?

    So beamforming phased array of transducers or transparent xtc algorithms working together with personalized binauralization in playback environments are compatible with stereo (though better only possible with mixings that provide realistic ITD and ILD), 5.1 multichannel, Atmos/Dolby/Auro/Ambisonics, binaural synthesis and binaural recordings content

    On the other hand, the contrary in not true, 5.1 playback enviroments are not compatilble with Atmos/Dolby/Auro/Ambisonics and binaural synthesis, that are formats better suited for movies, gaming and VR, in other words, the content that is mixed with hight/elevation information.

    One could also say that immersive sound and Ambisonics didn’t catch because they need multiple real speakers and that binaural didn’t catch because there wasn’t dsp and algorithms available for transparent xtc or personalized externalization with headphones.

    So I still fail to understand how in the “foreseeable future” 5.1 (or more speakers) playback equipment will have more penetration than a beaforming phased array soundbar with personalized binauralization.

    I also fail to understand how those 30 million consumer 5.1 systems that had been sold could also have found room and placement at the same @bigshot quality standard.

    I also have not read yet any mixing engineer accusing @bigshot of being “driven by a myth” or being “anything like the typical consumer”.

    Since surround/5.1 content is compatible with beamforming phased array of transducers or transparent xtc algorithms working together with personalized binauralization in playback environments, I sincerely wish that you really could do that, that you really could stick to surround/5.1.

    But I wish that you could also mix stereo with realistic ILD and ITD.

    However, as you already proved in this thread, mixing stereo with realistic ILD and ITD is unlikely/improbable to happen.

    Also, @pinnahertz once said:

    And people buying more surround/5.1 than Atmos/Dts:X/Auro musical content is also an hypothesis that seems to contradict @bigshot hypothesis.

    In scenarios where consumers only have access to stereo without realistic ILD and ITD and surround/5.1 musical content, playing musical content in beamforming phased array of transducers or transparent xtc algorithms playback equipment will likely cause a cognitive dissonance, in which they will blame the technology itself, not knowing how the content was really mixed:

    I wish that, instead of accusing me of being driven by a myth or being anything but the typical consumer, you could follow the examples of corporations providing content with realistic ILD/ITD or height information to decrease that cognitive dissonance. To reinforce: Chesky Records (that is helping Professor Choueiri by providing more binaural content); Netflix (that is streaming movies with Atmos encoding); Google, Youtube and Facebook (that allow first order ambisonics uploads that are downmixed to binaural) and Universal/Within (well, afaik, it seems some kind of downmixing to binaural).

    The reason there isn't enough multichannel music has more to do with format than it does the ability of multichannel to improve sound quality. SACD shot itself in the foot by not taking advantage of the primary benefits of the format. The players didn't get deep into the market. Blu-ray players did, but people expect video with a video format. DGG recently released a blu-ray audio set of Beethoven symphonies with Bernstein in Vienna. I didn't buy it because I knew the concerts had been televised. No video- no sale for me. I expect other people feel the same. Likewise, ELO released a blu-ray of their retrospective concert, but the sound was just stereo. It seems to be a dud because of that.

    There's a wealth of multichannel music on blu-ray of live concerts. That stuff sells well. So the trick is to combine pictures with the surround sound. Atmos won't catch on until they figure out how to standardize it and incorporate it neatly into normal living rooms.
    The post above from came from the thread I got a few acoustic panels - Where should I put them?

    It was written in the context of stereo reproduction, but I think it was so well written that would be also useful to compare binaural audio through two loudspeakers and beaforming phased array of transducers.

    Binaural audio through two loudspeakers without crosstalk cancellation won’t render a 3d sound field.

    Not all beaforming phased array of transducers are intended to convolve an HRTF in the digital domain and then cancel crosstalk as the yarra does.

    In fact, as Scott Wilkinson and Peter Otto mention in the video linked in the first post of this thread, Yamaha has been working with an alternative approach that relies in reflective surfaces to project all desired channels and filter the listener’s HRTF acoustically:

    Since some height channels are reflected at ceiling and it has three horizontal arrays of transducers, I would expect highly vertical and horizontal directivity in the horizontal channels.

    So how yarra differs from Yamaha?

    It convolves a generic HRTF and avoids acoustic crosstalk. So instead of projecting several channels, it projects two highly vertical beams that fill the parasaggital planes that pass through the listener’s ears. I don’t know if such beams can also have high vertical directivity since there is only one horizontal array of transducers.

    So to render a 3d sound field the yarra first needs to convolve a generic HRTF - tough adjustable to some extent - in the digital domain and not acoustically as the Yamaha does. The yarra would work better convolving a high density personal HRTF and maybe better yet with head tracking.

    So both units interact differently with acoustic room characteristics. The Yamaha needs to be placed in specific positions and absolutely relies on reflective walls. The yarra is more flexible because it does not need any reflective wall, but it has the same restrictions regarding back wall reflections and may need controlled ceiling and floor reflections.

    Afaik, none of the units enhance room bass response, to deal with standing waves and bass overhigh.

    The Bacch processor instead works to cancel crosstalk from two ordinary speakers. Although it will work better with speakers with high directivity and a room with controlled early reflections, it measures PRIR to filter not only such speaker and room components:

    The PRIR measurement also allows to track the listeners head with an infrared camera adjusting the algorithms accordingly and, in theory (I don’t know if it is done in fact), also allows to enhance room base response, to deal with standing waves and bass overhigh.

    Does it work with multichannel? If you use the output from a Realiser, probably yes.

    Does it work for multiple users? If you add a beaforming phased array of transducers, yes, it will project beams for each user.

    #1 is very similar to Illusonic approach explained in the video linked in the first post of this thread, that extracts direct sound, early reflections and diffuse field to upmix stereo to multichannel.

    So each of those technologies (stereo; beamforming phased array of transducers - for projection of multiple channels or for crosstalk avoidance; and crosstalk cancellation filters) deals with room acoustics in very different ways, but one concept seems to arise and it was already superbly explained by @gregorio.

    It is not a matter of all or nothing, but adjusting the levels between direct sound, early reflections and diffuse field to achieve the desired perception.

    Everytime the listener is out of the sweet spot those levels get corrupted and the spatial effect disappears. And both the yarra and the Bacch filter allow to restore those levels anywhere in the room for multiple users using several beams.

    So I would like to wholeheartedly thank @gregorio for such outstanding post.

    Which one of those technologies is best? IDK. We will have to wait for accessible prices to experiment all of them.
    1. Obviously, the more people you wish to accommodate, the bigger the room you'll need and the bigger the room, the more expensive it becomes. There's a larger surface area to treat, which is more time and a bit more money but you'll also need bigger amp/speakers to handle the increased room volume.

    2. I'll slightly disagree with bigshot here: While it's entirely possible to find a finished room of the right general dimensions, you're not going to find a room with the right properties acoustically. Depending on how far you're willing/able to go, you'll probably be covering all those finished walls and ceilings anyway or even possibly removing that "finishing", depending on what that finishing is, so it's a waste paying the extra for a finished room. BTW, by "covering" I'm not talking about sticking some acoustic panels on the existing walls but building stud walls in front of the existing walls, this allows you to get rid of the main acoustic issue of a cubiod room, the parallel surfaces, and in the process improve isolation, deal with some of the bass build-up issues, achieve better isolation between the speakers, etc. This all sounds like an onerous, expensive task and while it can take time, it's not particularly expensive; say some metal frame, a bunch of standard construction plywood sheets and/or gypsum boards, several rolls of rockwool, some cheap softwood to build panels, etc. So, we're talking hundreds or low 4 figures if you're DIY'ing. On the forum I mentioned previously you'll find detailed step by step instructions posted by numerous home theatre self builders. Taking this route would obviously result in a dedicated room and it really wouldn't matter much if it were just bare concrete to start with, which is why my advice is slightly different from bigshot's, who I believe is thinking more along the lines of a multi-function room and minimal treatment. However, I do agree entirely with bigshot about the typically low ceilings found in basements, which is a serious problem with no practical solution. Bare in mind that as you increase the desired square footage to accommodate more people, you really need to maintain the ratio with height. A 10' ceiling might be OK in a small room for just a couple of people but be a significant problem in a bigger room.

    I thought I made that clear. There is a big difference between sound (film/TV sound) and music. With sound we expect it to be all around us, including above us. Aircraft and birds fly overhead, leaves rustle overhead, rain falls from overhead, footsteps in the apartment/room upstairs are overhead, etc. With music, a drumkit never flies overhead, the lead guitar is never behind you, etc. There's no realistic or practical justification for it beyond relatively subtle room reflections (with acoustic music genres) and no one has come up with any really compelling artistic justification for it.

    1. But you can't have both! Forget the format (stereo, 5.1 or say Atmos), you can't have both a "mix" and realistic ITD. As soon as you mix realistic ITD sound/mic sources the timing differences between the mics/sources interact and you loose the realistic ITD. Binaural and ambisonics only provide realistic ITD because they are NOT mixed! I have played around with converters (from standard stereo mixes to binaural) and the results are unpredictable, it works moderately well on some mixes and poorly on others.

    1a. I absolutely do not want to provide content with realistic ITD as some corporations are experimenting with. I do not want to create an accurate documentary of the sound of only acoustic music performances from a single listener perspective (or even user definable perspectives), I want to create art! I want to manipulate consumers' feelings, involve them emotionally in the storytelling, not simply titillate their listening experience with a sensory novelty and, I don't want there ONLY to be acoustic music performances, I want all the non-acoustic music genres too! Pretty much all the artists I know feel exactly the same way, even the acoustic music performers! Your question itself is evidence of being driven by a myth, unless you really are saying that you only want documentaries of only acoustic music performances?!

    I don't think that hypothesis stacks up. Of the tens of millions of 5.1 consumers only a tiny fraction will have systems like/comparable to bigshot. Most of them probably have a few hundred bucks worth of small, cheap satellite speakers + sub, which are all poorly positioned and only somewhat compensated for with mediocre calibration functions. Soundbars/beamforming technology *might* one day be a good solution but today are in practice just a convenient way of getting something a bit more dimensional than stereo, depending on where you're sitting.

    I am sure not all mixing applications are created equal.

    Have you tried Professor Choueiri Bacch-dSP?

    That is NOT what I want.

    I would like to hear musicians in front of me just like you described here:

    But in a sightly different manner.

    Firstly, I want a cello at the height I would usually expect it to be, i.e., below my eye sight. I want the singer at the height I would usually expect it to be, i.e., standing up. I want the cajon at the height I would usually expect it to be, i.e., next to the floor. I don't mind if the cello was recorded in the bathroom and each instrument in different recording rooms... If you think such kind of "mild" elevation can detract creativity or the elevation will be imprecise because of HRTF mismatch, then just mix then at zero elevation and azimuth at your will...

    Secondly, I also don't want the musicians:

    I would rather want the "soundstage" to be outside the region between the speakers, because some users (if not the majority) just cannot place frontal speakers two meters apart.

    So why not get your spot microphone steams and mix them with the Bacch-dSP app into left and right front channels and apply the following filter?

    You may then:

    As I see it, that procedure could make your mixings compatible with current and future listening enviroments, because it synthetize coherent ITD according to the azimuth you choose at your artistic will.

    But you are clearly and expressly stating/asserting/mantaining that such type of mixing is impossible!

    Remember, this is the science forum. You must test the hypothesis before you can rule it out.

    Given your assertiveness, apparently you did test such hypothesis.

    I just don't understand why then Professor Choueiri insist that it is a possible path... He is professor of applied physics at Princeton University and he is the one who is lecturing that new technologies will allow people to be truly fooled by audio... Apparently he is also driven by the same myth...

    I see why my language may lead to a misunderstanding. Instead of "realistic ITD", I should have used "synthetic ITD, coherent with our spatial expectations".

    And why would a mastering engineer want to make current mixings compatible with future listening enviroments?

    Because, as you said:

    And, IMHO, as I said in the post #57 above, those new technologies (beamforming phased array of transducers, crosstalk cancellation filters, head tracking, personalized HRTF and headphone externalization) seem disruptive precisely because they relax the constraints related to placing speakers in a living room, sitting at only one precise sweet spot and the requisite to adequately treating/enhancing the acoustics of such room, which are the two most fundamental problems for achieving high quality.

    As @pinnahertz said:

    But, as @bigshot said:

    At least I am sure you are not going to hard pan steams to one channel when mixing stereo and that's, well, "half the battle" to make stereo mixing compatible with those new technologies...

    It is not only soundbars with beamforming technology. Is more than just that.

    You said before:

    I would not call phased arrays of transducers, the Bacch filter and headtracking "rudimentary", but adding personalized HRTF acquired with biometry to the former (not possible yet without heavy computing power, but research is very close to such achievement by comparing biometric data and HRTF databases for close enough samples) will lead them to a higher level of maturity.

    And if, since we are in head-fi, you want to go for headphones:

    As @pinnahertz said:

    p.s.: It does not seem fair the way you quoted my "why?":

    Although you inserted your own text in quotation marks, you didn't indicate with "(...)" that my text was omitted. That made my "why?" completely out of the context you wanted to express. There is no problem to ask rethorical questions, but please refrain quoting a different rethorical question (my "why?") as if it meant what you wanted to rethorically ask.
    From what I'm told, the biggest problem with sound quality in 5.1 systems isn't the equipment. It's the baby sleeping upstairs or the wife who isn't interested in what you're screening or the neighbor who shares a common wall with your apartment. I'm lucky. My listening room is in an attached guest house in the back yard. I can listen to whatever I want whenever I want. The speakers and amps are easy. The room is the trick.
