A layman multimedia guide to Immersive Sound for the technically minded (Immersive Audio and Holophony)

bigshot · Dec 27, 2017 at 11:56 AM

Immersive sound in recordings is no more "real" than stereo. It just adds another dimension to the illusion. Believing that the best sound is recorded by capturing it realistically is like believing rabbits magically come out of a magician's hat. Just because something appears to be like that on the surface, that doesn't mean that the end result was achieved that way.

jgazal · Dec 27, 2017 at 12:36 PM

bigshot said:
Immersive sound in recordings is no more "real" than stereo. It just adds another dimension to the illusion.

Said that way, without the word myth, I partially agree, but limited to speakers.

bigshot said:
Believing that the best sound is recorded by capturing it realistically is like believing rabbits magically come out of a magician's hat. Just because something appears to be like that on the surface, that doesn't mean that the end result was achieved that way.

I don’t see why an eigenmike microphones would capture higher distortion than spot microphones.

The problem is reproducing with speakers in a room.

I also don’t see why eigenmikes, high density personalized HRTF with headtracking and headphones transducers, would render higher distortion than spot microphones mixed in stereo and played with acoustic crosstalk.

Don’t you think that some illusions, better, emulations, have lower distortion than other illusions?

bigshot · Dec 27, 2017

It isn't a matter of distortion, it's a matter of flexibility. For sound to sound good, it has to be organized. You have to layer it with contrasts in timbre and volume all woven together into something that sounds balanced. Recorded sound rarely is balanced right from the mike, and the more instruments you have, the harder it is to capture balanced sound all in one fell swoop. Mixes are built and constructed. Elements are recorded with as much isolation from other instruments and placement as possible. This allows the maximum flexibility in the mix for creating and overall balance that reveals all the contrasting layers of sound clearly. If you try and capture all the instruments and all of the spacial information at one time, you end up too often with mush. And if you mike instruments with specific spatial perspectives, you can't layer instruments because all those different spatial perspectives might not jibe.

It's the same as a painter painting a scene. He constructs a composition and a balance of light and shade and hue on the canvas. He doesn't just photograph a scene. This gives him the flexibility to create shapes that flow into one another and allows him to highlight the aspects of the composition that he wants to be the focus.

A painter *makes* pictures, he doesn't capture images. A sound engineer *makes* a mix, he doesn't simply record it. Once it's done, it is perfect. More perfect than reality and if the painter or engineer does a really good job, it can seem more real than reality too.

jgazal · Dec 27, 2017

bigshot said:
It isn't a matter of distortion, it's a matter of flexibility. For sound to sound good, it has to be organized. You have to layer it with contrasts in timbre and volume all woven together into something that sounds balanced. Recorded sound rarely is balanced right from the mike, and the more instruments you have, the harder it is to capture balanced sound all in one fell swoop. Mixes are built and constructed. Elements are recorded with as much isolation from other instruments and placement as possible. This allows the maximum flexibility in the mix for creating and overall balance that reveals all the contrasting layers of sound clearly. If you try and capture all the instruments and all of the spacial information at one time, you end up too often with mush. And if you mike instruments with specific spatial perspectives, you can't layer instruments because all those different spatial perspectives might not jibe.

It's the same as a painter painting a scene. He constructs a composition and a balance of light and shade and hue on the canvas. He doesn't just photograph a scene. This gives him the flexibility to create shapes that flow into one another and allows him to highlight the aspects of the composition that he wants to be the focus.

A painter *makes* pictures, he doesn't capture images. A sound engineer *makes* a mix, he doesn't simply record it. Once it's done, it is perfect. More perfect than reality and if the painter or engineer does a really good job, it can seem more real than reality too.

I see it. I wouldn’t mind if the spatial information were synthesized, as long as some proximity and elevation is conveyed, even if such effects are not coincident with the recorded voices or instruments.

I still fail to understand why the sound from several soundfield microphones or from spot microphones cannot be mixed in Ambisonics.

When I was trying to describe to you the concept of proximity, you wrote the following about Atmos and Ambisonics (last paragraph):

bigshot said:
Atmos is an object based system. You take an individual track (like a specific musical instrument) as a discrete channel and the system processes it to place it anywhere within the three dimensional sound field. The channel doesn't necessarily relate to a specific speaker, rather it's plotted to groups of speakers that represent a specific point in space. The size and definition of the sound field is governed by the size of the room and the number of speakers in the installation, but the mix is the same regardless of the number of speakers. The more speakers you have, the more precise the placement in space. It's like being in a rectangular cubic grid of sound.

Then they can add distance cues in three dimensions- say for instance a plot of how reflections, reverberation and decay work in a Gothic cathedral- and they can wrap that ambient envelope around the objects. That creates scale. The rectangular cube of sound defined by the size of the listening room is now able to create an environment of any size, shape or acoustic. Again, the precision is based on the number of speakers, and the mix is the same for a small installation as a big one.

Now if that isn't holographic sound, I don't know what is. I'm sure all the research into crosstalk, phase and all that stuff can be a part of the processing of Atmos as well. It's a lot easier to get two speakers to mesh to create a coherent phantom center than it is to get a cube covered with 64 speakers to create coherent phantom centers between every possible pair of speakers. I'm sure there is a lot of room tuning going on to make it work in the real world. But since the placement and ambience is completely object based, rather than baked into the individual channels, it allows for infinite flexibility. You could theoretically take a Rolling Stones album recorded in Atmos and have them performing in an arena acoustic on stage a hundred yards from you, then change a DSP and have them standing in a recording studio six feet in front of you. I would bet that eventually, they could even create alternate mixes using the same recorded channels to make the instruments move around you in a circle or fly over your head. Since it's object based, the placement can be anything they want.

Someone once mentioned to me that the order of magnitude of improvements to the quality of a sound field is related to the doubling of channels. So stereo is an order of magnitude better than mono, quad is an order of magnitude better than that, 8 channel, 16, 32, 64, etc. It's based on evenly spaced channels defining the four walls. You have to have each of the four walls equal to create a sound field that is significantly better.

I don’t know if Atmos enabled decoding processors can handle only objects instead of relying in a more and less comprehensive bed, but if I understood what you wrote correctly, rendering in 3-axis does not detract the “more perfect than reality” mixing approach.

So now think about this procedure:

Bruce Swedien: Recording Michael Jackson

(...)

'Rock With You' is also an excellent showcase for another of Swedien's creative live‑room production techniques. Each of the backing-vocal lines was first double‑tracked with a close mic, then Jackson moved a couple of steps back from the mic for another pass, while Swedien increased the preamp gain to match his level with the previous takes. Finally, an even more distant pass was captured using a Blumlein stereo pair, again matched for level. The result: an increased density of early reflections, which creates a natural depth and width to the soundfield.

Early reflections were also an important part of the lead vocal sound on Jackson's later records from Bad onwards, where the singer was set up on Swedien's aforementioned drum riser to amplify the sound of his dancing, and then surrounded by Tube Traps (the common studio nickname for ASC's tubular Studio Traps). Not only did this approach create a dense and controllable pattern of early reflections to support the singing and dancing sounds, but it also kept the sound at the mic much more consistent as Jackson moved while dancing. "The Tube Trap, to me, is one of the greatest things since sliced bread,” he enthuses. "Michael loved my Tube Traps — he was fascinated with them. We would try all sorts of different setups with the Tube Traps to get a soundfield that was really interesting. They save a lot of time.”

What would happen if instead of a Blumlein stereo pair, an eigenmike were used?

Would it be possible to mix those three vocal lines with Ambisonics to preserve the spatial information and at the same time the “right balance” Swedien was looking for?

P.s.: can you imagine an artist singing several takes for each of those three vocal lines instead of using synthetic off the shelf reverb? Weren’t they perfectionists?

bigshot · Dec 27, 2017

jgazal said:
I still fail to understand why the sound from several soundfield microphones or from spot microphones cannot be mixed in Ambisonics.

Different instruments are miked at different distances and in different perspectives. You might mike an acoustic guitar very close to the strings. A drum might be miked from above at a distance. A piano might be miked from the perspective of the pianist, or it might be miked in the body of the piano, or it might be miked from a distance with the lid open. If you take one ambient signature and overlay it over another completely different one, and another different one, and another... all those spatial cues will muddle together. You won't read anything as dimensional any more, it'll just sound thick and confused. That's why most musical instruments are recorded dry in mono and then are placed in position in the mix. The spatial cues (reverb, echo, etc.) are added to the group to create a unified ambience after everything is recorded and brought into the mix. The only other option is to mike the whole group in position and in the proper ambience with a stereo microphone pair simulating the head of a listener. But that approach leaves you no flexibility to mix. You have to get everything perfectly balanced from beginning to end live along with the performance. That is very difficult and once you've recorded it, there's no going back to finesse things or make corrections.

In the vocal example you mention, they are basically tracking the lead vocal and doubles normally, and they're just using the paired mike for a light bed of ambience. That works fine because all of the main vocals are tracked in mono. There's just one low bed that has the ambience in it. In case you aren't familiar with the process, doubled vocals are often tracked so tightly you don't realize there's more than one voice involved. The singer sings the lead twice hitting the exact same space and consonants for the lyrics. It beefs up the sound of the lead vocals. Michael Jackson and Elton John albums have a lot of this. It doesn't sound the same as a choir. And it isn't the same as reverb either. They probably added reverb on top of the doubles and ambience to thicken it all up.

gregorio · Dec 28, 2017 at 3:54 AM

jgazal said:
I don’t see why an eigenmike microphones would capture higher distortion than spot microphones.

An eigenmic or soundfield mics would not capture higher distortion, quite the opposite in fact, they would capture much higher fidelity than spot mics and that's PRECISELY why they are useless!! Didn't you read my post, particularly 4b? Listen to the drum kit at 7:40 on the virtual tour video you posted, listen carefully to the instruments in the drum kit, the kick drum and the snare drum for example. Now listen to the album you mentioned, Thriller. Does the kick and snare in the video sound even remotely like the kick and snare in Thriller? Pick something else: Motorhead, Prince, Seal, Sade, The Prodigy, Eminem, Coldplay or in fact pretty much anyone from the last 40 or so years, do the drumkits sound remotely like the accurately captured drumkit in your video? I find it unbelievable that you can appreciate the relatively subtle, immersive qualities of 3-axis sound reproduction while being completely oblivious to the massive difference between how a drumkit actually sounds in real life and how it ends up sounding in commercial music.

jgazal said:
I prefer to think I am driven by a more or less legitimate desire to reproduce immersive audio

How can a desire to reproduce something which never existed, be legitimate? Your "legitimate desire" is a desire which necessitates effectively killing or at least massively damaging pretty much all modern popular music genres. To me, that's about as far from "legitimate" as I can imagine!

jgazal said:
Don’t you think that some illusions, better, emulations, have lower distortion than other illusions?

What has that got to do with anything? There is almost NO emulation going on here! You've posted a recording of a real drumkit and that's obviously NOT what we're emulating. Do you want to hear an emulation of a pathetic string "twang" captured with high fidelity 3-axis spatial information or do you want to hear that pathetic twang distorted completely beyond recognition, so it better matches what you think an electric guitar should sound like? A number of famous artists would struggle to sing "Twinkle, Twinkle Little Star" decently, why would we want to emulate that? Etc., etc.!

jgazal said:
I wouldn’t mind if the spatial information were synthesized, as long as some proximity and elevation is conveyed

You "wouldn't mind" something which doesn't exist? The ability to synthesise spatial information even in just 5.1 is pretty basic and the technology for what you "wouldn't mind" doesn't yet exist.

jgazal said:
I feel sad when you say I need to get away from a myth.

Me too but unfortunately, that's the reality here. You continue to miss the point that bigshot and I are trying to explain to you, that it's ALL an illusion. You seem determined to interpret this as meaning that it's actually all real, except for the illusion of stereo. That the musicians are creating real performances on real instruments which we're accurately recording and then creating a stereo illusion from those recordings. The reality is: The instruments in real life sound little or nothing like we want them to, there is no real performance and therefore, how can we accurately record something which never existed? When we say it's ALL an illusion, we don't just mean an illusion of stereo, we mean the performance and the music itself is an illusion and we CANNOT create that illusion if we attempted to record and "preserve" 3-axis spatial information!

I'm not sure how to break you out of the myth you appear trapped in. I'll try one more way; have you seen this short video? Click the link, watch it all the way through and then answer this question: How could we record/preserve and reproduce the 3-axis spatial information of the "Faa"??

G

jgazal · Dec 29, 2017

gregorio said:
An eigenmic or soundfield mics would not capture higher distortion, quite the opposite in fact, they would capture much higher fidelity than spot mics and that's PRECISELY why they are useless!! Didn't you read my post, particularly 4b? Listen to the drum kit at 7:40 on the virtual tour video you posted, listen carefully to the instruments in the drum kit, the kick drum and the snare drum for example. Now listen to the album you mentioned, Thriller. Does the kick and snare in the video sound even remotely like the kick and snare in Thriller? Pick something else: Motorhead, Prince, Seal, Sade, The Prodigy, Eminem, Coldplay or in fact pretty much anyone from the last 40 or so years, do the drumkits sound remotely like the accurately captured drumkit in your video? I find it unbelievable that you can appreciate the relatively subtle, immersive qualities of 3-axis sound reproduction while being completely oblivious to the massive difference between how a drumkit actually sounds in real life and how it ends up sounding in commercial music.

How can a desire to reproduce something which never existed, be legitimate? Your "legitimate desire" is a desire which necessitates effectively killing or at least massively damaging pretty much all modern popular music genres. To me, that's about as far from "legitimate" as I can imagine!

What has that got to do with anything? There is almost NO emulation going on here! You've posted a recording of a real drumkit and that's obviously NOT what we're emulating. Do you want to hear an emulation of a pathetic string "twang" captured with high fidelity 3-axis spatial information or do you want to hear that pathetic twang distorted completely beyond recognition, so it better matches what you think an electric guitar should sound like? A number of famous artists would struggle to sing "Twinkle, Twinkle Little Star" decently, why would we want to emulate that? Etc., etc.!

You "wouldn't mind" something which doesn't exist? The ability to synthesise spatial information even in just 5.1 is pretty basic and the technology for what you "wouldn't mind" doesn't yet exist.

Me too but unfortunately, that's the reality here. You continue to miss the point that bigshot and I are trying to explain to you, that it's ALL an illusion. You seem determined to interpret this as meaning that it's actually all real, except for the illusion of stereo. That the musicians are creating real performances on real instruments which we're accurately recording and then creating a stereo illusion from those recordings. The reality is: The instruments in real life sound little or nothing like we want them to, there is no real performance and therefore, how can we accurately record something which never existed? When we say it's ALL an illusion, we don't just mean an illusion of stereo, we mean the performance and the music itself is an illusion and we CANNOT create that illusion if we attempted to record and "preserve" 3-axis spatial information!

I'm not sure how to break you out of the myth you appear trapped in. I'll try one more way; have you seen this short video?

Click the link, watch it all the way through and then answer this question: How could we record/preserve and reproduce the 3-axis spatial information of the "Faa"??

G

Out of the “FA” does not seem the same as “out of the blue”.

Although info coming from the visual cortex can override info from the auditory cortex when all gets processed, perhaps in Broca’s area, this must not be a loosely or arbitrary “illusion”, as our brains were suffering from a “bizarre” disorder.

Speech is essential in evolutionary aspects. If you are trying to work together with other humans, with incipient language, you must get the information right. That’s why our vision override the auditory ambiguity in the particular example you mentioned.

When you see someone else lips pronouncing FA what is the chance he is trying to pronounce BA? So that “illusion” is fact highly correlated with reality.

I wouldn’t use the word “illusion” in a surrealist meaning of “more perfect than reality” and extrapolate the specific McGurk effect to the way our brain solves all possible ambiguities between vision and audition.

A precise perception of sound source location is also essential in evolutionary aspects.

So in that court, when the jury heard two loudspeakers in front of them, they might be with eyes wide open and nevertheless their brain probably processed Michael Jackson’s as he were right in the middle between speakers. It does not matter if your visual cortex delivers contradictory information that you are in a court room with nobody in that virtual spot.

IMHO, you must know what ambiguity you brain is trying to solve and which cue will prevail in each case.

Why this experiment to quantity errors of elevation with higher order ambisonics by the BBC engineering team used speech as a test signal?

jgazal said:
Localisation of Elevated Sources in Higher-Order Ambisonics
Paul Power, Chris Dunn, Bill Davies, Jos Hirst
BRITISH BROADCASTING CORPORATION

Click to expand...

So I also wouldn’t say the way our brain processes sound is uncorrelated with reality. IMHO it is actually highly correlated.

You must have heard reflections in large arenas. You know there is no sound source in the reflecting walls but still you perceive the sound as coming from the reflecting wall.

You must have also watched Professor Choueiri videos above.

He also describes the evolutionary aspects in the way our brain solves the head movement ambiguity when playing back sounds with headphones. And here we have ambiguities between sound cues.

In another instance, Professor Stephen Smyth also describes the ambiguity between a PRIR from a large room and the listener room size. It does collapse the externalization because sound cues are still altered dynamically with head tracking, but interesting enough, some users had described a sensation that speakers sound nearer than they were actually measured. I have asked if we could use a gear 360 and a gear VR to retrain our brains, but I received no answer yet.

Some say that our hearing is more precise in the horizontal plane. When seeing straight, we may perceive the elevation of sound sources in a loosely way, but as soon as the sound catch our attention we tilt our head and our the transverse plane that cuts your head now isn’t coincident anymore with the horizontal plane and you perceive that elevated sound source with more precision. The Realizer now allows elevation head tracking.

That said, I want now to describe two of my highest esteemed musical memories.

The first was an rehearsal of my cousin’s band. He is a drummer and the drum was not amplified since, obviously, it was loud enough. Then I heard them playing Hotel California. Interesting enough, Eagles has one of the best selling albums of all time. And you are right, I never heard the drum in the same way I heard that day (but that might be just my feeling).

So even being a real drum, I felt connected emotionally with that bass line and that music. Perhaps as emotionally connected as I am when hearing “The way you make me feel”.

The second was a marriage in which there was a band with all instruments amplified. There was also an saxophonist with a tenor saxophone (and an spot wireless microphone) playing around the tables.

I had never heard a saxophone playing around you with recorded music until recently. I hope the Realiser A16 and the Chesky record above can emulate that in a similar way.

Nevertheless, I do understand and respect your work and particularly the creative value added by recording and mixing engineers. I am sure certain bass lines sound better after mixing than when they were recorded.

But I have been reading your post many times and I still feel odd when I read the part about the vocals. It looked like the creative value of recording/mixing engineers would be somehow intrinsically and qualitative better than the creative value from musicians or artists performers.

People used to say that Rod Stewart had the “wrong” type of voice and nevertheless he is very successful, even when he sings in unplugged MTV shows.

What someone gifted with musical sensibility but who is not proficient in performing with acoustic instruments our with his own voice would do?

Maybe electronic music with synthesizers?

Would such genre be compatible with 3-axis mixing?

Believe or not, when you search for mixing Dolby Atmos for music, this is one video you will find:

You can find more on development of Atmos mixing with this particular genre here: https://www.dolby.com/us/en/technologies/music/dolby-atmos.html#3

Well, I don’t feel such system can convey proximity for so many people in such a large listening area (the same challenges with movie theaters), but the concept of mixing synthetic sounds in 3-axis remains the same.

So if I understood right you are saying that (a) 3-axis mixing is a bad choice or even prohibited choice for any music genre or any type of musical event.

And if, again, I understood right, you may also be saying, that (b) acoustic virtual reality is also a myth or an utopia given the complexity involved to render 3d sound-fields.

I naively thought that the creative value added by recording/mixing engineers and 3-axis mixing could be harmonized.

You are an experienced audio professional and I am, well, just a regular guy. So I will trust in good faith that your assertions (a) and (b) hold always, in any circumstances, true.

But tonight, when you lay up in your bed and put your head up on your pillow please pay attention to your feelings.

And since this is the science forum, please come back tomorrow, because I would like to know, respectfully, if you still feel okay when you advocate for everybody to dismiss, a priori, music mixing in 3-axis, in any circumstances.

If then you still tell me I am utterly wrong and that I am definitely driven by a myth, I will delete all my posts in this thread, in respect to your work and knowledge and because I don’t want people embarking in this supposed dead-end line of research driven by the same myth or utopia.

And since I mentioned “The way you make me feel”, I will confess that I felt deeply sad about your post. It is really shaking when someone put at stake your beliefs, isn’t it?

gregorio · Dec 29, 2017

jgazal said:
[1] Well, I don’t feel such system can convey proximity for so many people in such a large listening area (the same challenges with movie theaters), but the concept of mixing synthetic sounds in 3-axis remains the same.
[2] So if I understood right you are saying that (a) 3-axis mixing is a bad choice or even prohibited choice for any music genre or any type of musical event.
And if, again, I understood right, you may also be saying, that (b) acoustic virtual reality is also a myth or an utopia given the complexity involved to render 3d sound-fields.
[3] I naively thought that the creative value added by recording/mixing engineers and 3-axis mixing could be harmonized. ... Nevertheless, I do understand and respect your work and particularly the creative value added by recording and mixing engineers. [3a] I am sure certain bass lines sound better after mixing than when they were recorded.
[4] I would like to know, respectfully, if you still feel okay when you advocate for everybody to dismiss, a priori, music mixing in 3-axis, in any circumstances.
[5] If then you still tell me I am utterly wrong and that I am definitely driven by a myth ...

1. No, the challenge with movie theatres is somewhat different. In a cinema the audience is stationary, all oriented in the same direction, their position relative to the speakers is constrained and the acoustics are somewhat standardised from cinema to cinema. This is not the case with night clubs or many/most live gigs. Mixes designed for playback in clubs and live mixes at gigs tend to be rather mono with maybe just a few effects taking advantage of the stereo soundfield because a significant portion of the audience are not going to be positioned correctly to perceive the stereo effect. Dolby Atmos reduces the reliance on the stereo soundfield by providing significantly more that just two point sources but it is still reliant on stereophony to a degree. I think it's unlikely that Dolby Atmos will become a standard in clubs and even more unlikely for live gigs, although we may well see it appearing in the biggest clubs, from some of the most successful artists in certain popular music genres, EDM for example.

2a. Hang on, you're talking about something rather different now. Before, you were talking about "preserving the 3-axis spatial information" and I explained that it is impossible to record and preserve that spatial information because we don't have one coherent acoustic space to start with (but a number of different ones) and because the different processing required to all the instruments/sounds in every popular music genre would not be possible if we did try to record and preserve the spatial information. However, that's a significantly different proposition from saying (for example): Let's make a bunch of multi-tracked mono recordings, with relatively little spatial information, process those tracks individually how we want and then place them in a 3-axis soundfield. If we did this, we would obviously be recording and preserving little/nothing of the spatial information, we would be creating/manufacturing new and entirely different spatial information and, we are certainly not talking about emulating any sort of real 3-axis soundfield here but of creating a hopefully aesthetically pleasing soundfield (from a combination of mono, stereo and multi-channel spatial effects). Additionally, all this applies to the majority of music products (the various popular music genres) not to niche music genres such as say classical music, which is typically entirely acoustic, where we would have a single coherent acoustic space to start with and where relatively little processing of the instruments is required/desired. However, we still have some issues even in these circumstances which preclude (or rather, restrict us from) simply recording/preserving the 3-axis spatial information.
2b. No, I am not saying acoustic virtual reality is a myth! I'm not sure where you've got that from? I am saying that because with popular music genres there is no "reality" to start with, then logically it's obviously impossible to emulate a reality which never existed. So, we cannot have a virtual reality of popular music, although we could in theory have a sort of "virtual non-reality" or "virtual surreality" but it's not clear how we could achieve even that in practice without musical compromises and avoiding it being no more than just a cheesy gimmick (as with some early stereo popular music mixes).

3. To be honest, your questions, conclusions and statements indicate that you have relatively little understanding of our work. We do not "add value" ... putting a chassis, wheels and suspension on a car does not "add value" to a car because without a chassis, wheels and suspension you don't have a car in the first place, just an incomplete pile of car parts! Engineering is an intrinsic part of the creation of all popular music genres, not an added value. For example ...
3a. Not necessarily, in fact quite often a bass line could sound worse after mixing! This is because making the "bass lines sound better" is not the goal of mixing, making the bass line work better within the mix is the goal and that might mean making the bass line sound worse. In fact, it's important to teach student engineers not to try and do too much work to a bass (or any other) line in solo mode. Again though, it's not just a case of say a bass guitarist playing a line and then we change it after the fact during mixing, it's much more of an interactive process. What the bass player plays, how he/she plays it and what sound they produce will be informed by how it's going to be mixed and as that's rarely precisely known, this often means significantly changing what was played, even to the point overdubbing or completely re-recording it. I'm not talking about the latest technology here but about how it's been done for over 40 years!

4. How could I "still feel OK" with that when it's NOT what I've advocated in the first place?

5. Clearly you are wrong and driven by myth as far music is concerned, even acoustic music genres, although to a lesser degree. You are also somewhat wrong and driven by myth as far as most commercial sound in general is concerned. What you've presented here is not "a layman guide to immersive sound" but an hypothesis of what theoretically might occur in the future but it's a distant "might" because apparently without realising it, you're not just talking about technicalities of sound reproduction but a huge change in the art underlying music, a change to something new, as yet undiscovered and at the cost of abandoning the art we currently have and have had. If we look back in history, we see that the change from mono to stereo occurred gradually but once there was a decent installed user base of stereo then the popular music genres evolved to take advantage of it, even to the point of becoming reliant on it. Then we got 5.1 about 25 years ago and have had a decent installed user base for about 15 years or so but beyond a relatively few experimental albums, we've seen none of the huge music genre evolution to take advantage of 5.1 which we saw with the change from mono to stereo. Now you're talking about another big evolutionary step beyond 5.1, while the music itself hasn't even evolved beyond stereo yet and, shows no signs of doing so!

G

sander99 · Dec 29, 2017 at 10:51 PM

Jgazal, I strongly urge NOT to delete all posts here, because no matter if you are (completely or partly) driven by myth or not: this thread contains a nice collection of interesting pieces of information, and this whole discussion with these audio professionals is very informative as well (and has changed my own thinking about some things).
And also, maybe you already realise this yourself, I think there has been some miscommunication that is resolved somewhat in gregorio's last post.
Indeed Jgazal you have adapted to the idea that at least for most popular music there is no reality to be recreated, but you alternatively would like some "artificially created" 3d spatiality in the future. Gregorio indeed says that this is not impossible, just not likely to happen in the near future..And that for some genres, with purely acoustic live music, indeed there is a reality that you could wish for to be recreated (although there are some issues).

bigshot · Dec 30, 2017 at 12:44 AM

It's not just popular music. It's pretty much all recorded music of all types.

gregorio · Dec 30, 2017 at 3:26 AM

bigshot said:
It's not just popular music. It's pretty much all recorded music of all types.

True. I was trying to make the point that with popular genres the whole idea of preserving spatial information is nonsense. With acoustic genres, such as classical music, the idea itself isn't nonsense because we do have a coherent acoustic space and we could in theory preserve it, although in practice it's typically not desirable to do so. I have heard of engineers using an ambisonics array to record an orchestra but without exception (as far as I'm aware) not just an ambisonics array on it's own but mixed with other mic inputs and of course mixing with other mics will interact with and damage/destroy the delicate timing between the capsules of the ambisonic array required for ambisonics to work. As far as I'm aware, when an ambisonics array is used, the output channels are converted into 5.1 or 7.1 to enable mixing with the other mics without too many phase issues but of course this conversion looses the 3rd axis (height information). I've been involved in the recording of orchestras and other classical/acoustic ensembles numerous times and used a wide variety of mic'ing patterns and combinations of mic'ing patterns but I've never been involved in a session where an ambisonics setup was part of the pattern, so I'm only going on here-say from the odd engineer I've spoken to who has used an ambisonics setup rather than personal experience/knowledge.

G

jgazal · Dec 30, 2017

Heads-up:

Spatial Awareness: Inside the world of immersive sound design

(...)

Parnell says that the BBC R&D audio team has developed an object-based approach to immersive audio, where each sound source is treated as an ‘object’ and manipulated into a 3D position and rendered in a binaural mix with the relevant Head-Related Transfer Function (HRTF) filter depending where a source is positioned. “This is a very fast-moving area, particularly with regard to audio tools for VR, and there are multiple DAWs and plugins that support 3D panning and/or binaural rendering,” he adds.

“For the Proms I used IRCAM’s Panoramix console, which is designed for live 3D music production mixing and can render simultaneous outputs in stereo, Ambisonics, multiple loudspeaker, and binaural formats, enabling your choice of SOFA file (Spatially Oriented Format for Acoustics is an industry standard file format for HRTF sets) to be used in binaural.” Parnell says that the binaural Proms trial has highlighted the benefit that immersive audio can bring to listening to acoustic music, particularly where there is a spatial element to the performance. Concert hall venues are particularly suitable venues for immersive audio, the acoustics being architecturally designed to fill the hall with sound. Using immersive audio technology brings this experience right into the home (or pockets) of the listener. The BBC have gathered some statistics from those who accessed the live binaural stream on BBC Taste where a majority felt that the results were “like being there in person” while a similar majority believed that Radio 3 should broadcast more binaural sound.

“I worked on a project called Cinime a few years ago with Chris,” says Jungle’s Boardman. “This was an interactive advert portal/platform for cinema that allowed users to interact with content on the big screen via the small screen on their smart phones.” Turner says that Jungle are currently working on a project that highlights the lasting effects on children when a parent commits suicide. “It’s a difficult piece to get right and less is proving to be more but even subtle immersive movement within the music and recording the narrators in binaural audio is proving incredibly powerful.” Although immersive audio has long been a feature of both audio and video applications, its use in the field of recorded music has been limited and sporadic - however, Jungle think that is about to change. “We’re extremely excited about the prospect of the music industry getting on board the spatial bandwagon,” says Boardman.

“At the moment music presented as immersive audio isn’t common. Its use is generally driven by game engines interactively in real-time and according to game play, while music and non-diegetic audio is usually replayed in plain old stereo”. Boardman feels that music, most of all, can benefit from immersive audio techniques and if presented in this format, will ultimately allow the listener to better connect with the performance. “This is how we hear music live,” he says. “Immersive audio will enable the end-user to customise how they listen to music. In actual fact it will allow them inside the music and allow their ears and brains to choose what and how to listen.” Universal Music Group (UMG) appears to agree and are working with the company Within to create an app that can deliver immersive audio to consumers.

(...)

Limited to proprietary app, but still an intent with recorded music:

WITHIN AND UNIVERSAL MUSIC GROUP TO BRING PREMIUM IMMERSIVE EXPERIENCES TO MUSIC FANS

LOS ANGELES, October 30, 2017 – VR/AR entertainment and technology company WITHIN and Universal Music Group (UMG), the world leader in music-based entertainment, today announced a new strategic alliance to create and develop augmented reality and virtual reality music experiences featuring artists from UMG’s roster.

WITHIN and UMG will work together to create multiple immersive experiences that will be distributed on WITHIN’s app. In addition to providing consumers with premium experiences, the partnership also aims to expand and integrate augmented and virtual reality across the creation, production, marketing and promotion of new musical tracks, from the recording studio to the release parties, concert stages and beyond.

“Music is one of the most uniquely transformative mediums of human expression; combining it with immersive AR and VR experiences creates a new artform exponentially more powerful than the sum of its parts,” said Chris Milk, Co-Founder and CEO of WITHIN. “This partnership allows us the incredible opportunity to work with top artists at UMG to create ever more meaningful and expressive immersive music experiences.”

“We are huge admirers of Chris’s innovative and creative work in music and VR, as well as the premium experiences WITHIN offers to music fans,” said Michele Anthony, Executive Vice President at UMG. “Working with our labels and artists, UMG has produced numerous VR experiences and this agreement will help evolve our strategy. Together, UMG and WITHIN will push the boundaries of how audiences experience music and create new ways for artists to forge deeper connections with their fans.”

WITHIN is the premier destination for immersive stories and experiences. The company currently offers the best immersive experiences from renowned creators, including two music experiences in partnership with UMG artists. Earlier this year, WITHIN created an interactive, colorfully animated VR music experience for The Chemical Brothers’ song “Under Neon Lights” featuring St. Vincent. Using a new technology called WebVR, it is accessible through all major web browsers. Last year, the company also distributed the exclusive worldwide premiere of “KIDS” by OneRepublic on its app.

About WITHIN

WITHIN is the premier destination for innovative, entertaining, and informative story-based virtual and augmented reality. It brings together the best immersive experiences from the world’s finest VR creators—from gripping tales set in worlds of pure imagination to documentaries taking you further inside the news than ever before. WITHIN supports all major headsets, including Oculus Rift, Samsung Gear VR, HTC Vive, Sony PlayStation VR, and Google Daydream. To get started experiencing WITHIN’s ever-growing roster of experiences, just download the app for iPhone or Android.

WITHIN was founded by award-winning filmmaker Chris Milk and renowned technologist Aaron Koblin with the goal of exploring and expanding the potential of immersive storytelling. WITHIN collaborates with companies including 21st Century Fox, Oculus, Google, Apple, NBC Universal, Lytro, The New York Times, Vice Media, and the United Nations, as well as artists including U2, The Chemical Brothers and OKGO to bring to life and distribute premium immersive stories in a variety of genres.

About Universal Music Group

Universal Music Group (UMG) is the world leader in music-based entertainment, with a broad array of businesses engaged in recorded music, music publishing, merchandising and audiovisual content in more than 60 countries. Featuring the most comprehensive catalog of recordings and songs across every musical genre, UMG identifies and develops artists and produces and distributes the most critically acclaimed and commercially successful music in the world. Committed to artistry, innovation and entrepreneurship, UMG fosters the development of services, platforms and business models in order to broaden artistic and commercial opportunities for our artists and create new experiences for fans. Universal Music Group is a Vivendi company. Find out more at: http://www.universalmusic.com.

Date: October 30, 2017

If anyone has a Gear VR available:

gregorio · Dec 30, 2017 at 6:47 AM

jgazal said:
Heads-up:
... Limited to proprietary app, but still an intent with recorded music:

It seems to me that you're still putting two and two together and coming up with six! Let's take this statement for example: "Boardman feels that music, most of all, can benefit from immersive audio techniques and if presented in this format, will ultimately allow the listener to better connect with the performance. “This is how we hear music live,” he says." - This raises two points:

1. "Better connecting with the performance" is the very last thing we want in most cases! Who would want to connect with a performance of the drumkit one day, a performance of the guitar another day in a different acoustic space, a performance of the singer another day in another acoustic space, etc. And, even if the consumer did want that, the artists wouldn't allow them to have it. What the consumer wants, albeit maybe without realising that's what they want, is an artificial, highly processed illusion of all those performances occurring at the same time. So, just the fact he's talking about "the performance" means he's talking about purely acoustic music genres because that's the only time we actually have a performance.

2. It's not clear exactly what he means here but just taking the statement at face value; that is very clearly NOT how we hear music live! What we perceive at a live performance is an experience which is a combination of what we hear, what we see and indeed all the senses, plus what we expect and are feeling. It's obviously impossible to recreate that "with recorded music", we're generally going to need a great deal more than just recorded music, we're also going to need at least recorded visuals as well. What we would need to achieve is for the consumers' brains to react to the recorded material the same way as their brains would react in the real life situation and achieving that is a moving goal post. For example, there was an early theatrical film which was nothing more than an unedited shot, lasting a couple of minutes, taken by a camera placed right next to a train track as a train was approaching and due to the camera position it looked like the train was coming straight at you. It was obviously in black and white, in 2D and there was no sound. Nevertheless, some early audiences ran screaming from the cinema, believing they were about to be hit by the oncoming train. Some people are more easily fooled by an illusion than others but I doubt anyone today would be fooled by that early example. The same is true of sound and at every step in the technology, even the very first step, there have been those who thought it sounded just like being there, initially quite a few but once the initial surprise/impact of the new technology wore off, then fewer and fewer. We also have to consider that immersive sound, binaural and ambisonics for example, are not new, they've been around for 40 years or so and yet have never caught on. With the current push towards VR we're likely to see more development and use of immersive sound but: 1. We're not talking about just recorded music/sound here, we're talking about new combined audio/visual media formats. 2. It would only work for certain types of music and certain types of storytelling. 3. The technology is improving but the tools are still fairly rudimentary and we're still a fair way from being able to totally convince everyone. 4. The music industry is going through a prolonged period of falling revenue, this economic backdrop dictates a trend towards music products which cost less to produce and therefore employs less skilled labour, lower cost equipment and facilities, allows less time to experiment and requires product completion in less time. All of which is of course counter productive to high quality, new format recordings.

We're effectively going round in circles now, I've stated most of the above before and it's not making any difference because you have an unshakable belief in what you feel is most important, of what you think is possible/practical and, as with many strongly held beliefs of audiophiles, you're reinforcing that belief with advertising/marketing statements and the odd speculative experiment.

G

jgazal · Jan 3, 2018

gregorio said:
It seems to me that you're still putting two and two together and coming up with six!

(...)

We're effectively going round in circles now, I've stated most of the above before and it's not making any difference because you have an unshakable belief in what you feel is most important, of what you think is possible/practical and, as with many strongly held beliefs of audiophiles, you're reinforcing that belief with advertising/marketing statements and the odd speculative experiment.

G

We have now approximately 2000 page views in this thread, which is in the sound science forum of a headphone oriented enthusiasts site. That’s really an irrelevant number of viewers compared to the number of audiophiles, not to mention the whole universe of people consuming music. From such irrelevant number of viewers in this thread, very few will actually read my posts, because they are too long and analytical. Therefore, whatever misconception I may have written here, you can rest assured that it will not change the fate of musical industry or musical culture. And the first post has a disclaimer about potential misconceptions.

So if you take your time to answer, is because you care.

If you didn’t care, you would ignore me and let me simple write wrong things.

I appreciate your consideration for this community and to a smaller extent for me. I say smaller extent because you write as I were committing a crime.

So in retribution to your consideration, let me join your efforts to advocate your own point of view.

You probably didn’t watched the making of because I edited my last post while you were writing. If you watch that video you will see that actually there wasn’t any coincident microphone.

While the camera solves the optical parallax problem that @pinnahertz wisely exposed here, there will be still a cognitive ambiguity if only a coincident pair was used.

That’s because you can rotate a sound field with Ambisonics as the listener rotates his/her head around its axis, but you cannot calculate the sound field the listener would hear as him/she moves his/her head to another x, y or z axis coordinates. That would be really annoying with the main vocalist.

This ambiguity in sound field perspective could only be solved using objects and real time binaural synthesis or using an array of eigenmics:

jgazal said:
Models for evaluating navigational techniques for higher-order ambisonics - Joseph G. Tylka and Edgar Y. Choueiri
Virtual navigation of three-dimensional higher-order ambisonics sound fields (i.e., sound fields that have been decomposed into spherical harmonics) enables a listener to explore an acoustic space and experience a spatially-accurate perception of the sound field. Applications of sound field navigation may be found in virtual-reality reproductions of real-world spaces. For example, to reproduce an orchestral performance in virtual reality, navigation of an acoustic recording of the performance may yield superior spatial and tonal fidelity compared to that produced through acoustic simulation of the performance. Navigation of acoustic recordings may also be preferable when reproducing real-world spaces for which computer modeling of complex wave-phenomena and room characteristics may be too computationally intensive for real-time playback and interaction.
Recently, several navigational techniques for higher-order ambisonics have been developed, all of which may degrade localization information and induce spectral coloration. The severity of such penalties needs to be investigated and quantified in order to both compare existing navigational techniques and develop novel ones. Although subjective testing is the most direct method of evaluating and comparing navigational techniques, such tests are often lengthy and costly, which motivates the use of objective metrics that enable quick assessments of navigational techniques.

Click to expand...

Or to a lesser extent:

teacher1000 said:
I listened to the A16 today, and like anyone who has heard it I was thoroughly impressed. One small piece of information Stephen told me that I don't remember reading anywhere is that if you are using the head-tracker it will compensate if you sit off-axis (ie the sound will still appear to be coming from the speakers' actual positions). This will be useful for anyone who intends to use it with a second person.

But there is also the possibility of opting for a traditional mixing because, as you say, it sounds better intrinsically better. I never mixed anything in stereo or Ambisonics or binaural synthesis, so let me strengthen that I trust you in this particular aspect. So readers of this thread be aware of potential future cognitive dissonance if you were building-up expectations reading my posts!

Now that I contributed to your point of view, let me try to explain why stating the hypothesis of compatibility of musical mixing with 3 axis mixing (that you are strongly criticizing as to say what I quoted above and that I am driven by a myth) may not be harmful for the musical industry.

So in the making of we see they are probably using spot microphones on main vocal and chorus (objects) and spaced microphones for ambience (bed).

I would guess that their app mix those tracks on the fly using using binaural synthesis or DTS:X for headphones or Dolby Atmos for headphones, which is probably an Atmos or DTS objects and bed rendered in multichannel and the downmixed to binaural using a generic HRTF.

Adjusting the binaural synthesis or the objects on the fly with the head tracking input will help with externalization. But there will be still a mismatch in HRTFs.

How to use speakers instead of headphones?

If they are using Atmos or DTS:X then you would only need an compatible receiver and speakers. How many playback environments are enabled to play multichannel? Very few. And to play Atmos or DTS? Fewer.

If you have an Comhear yarra soundbar (beamforming phased array of transducers) or Theoretica Bacch4Mac (xtc) you can output the headphone binaural signal. But there will be still a mismatch in HRTFs. How many playback environments are enabled to play multichannel? Fewer than fewer.

Actually, how many listeners really care about plain vanilla regular stereo? As I see it, most listeners use lifestyle Bluetooth speakers, headphones without PRIR externalization or legacy stereo equipment. They don’t want the hassle to set up multichannel environments. Of course there are dissident voices:

bigshot said:
And of course Episode IV is multichannel audio. There was a record label that recorded the rolls royce of phonographs on the stage of a theater with great acoustics. It was encoded in matrixed multichannel sound and I'm told the results were amazing. I've never heard any of these because the matrix format is a dinosaur today, but the only way to really capture the sound of an acoustic phonograph at its best would be to do it in multichannel.

That’s why beamforming phased array of transducers, that are easier than stereo to adopt, might be disruptive, as Peter Otto says in one of the videos linked in the first post of this thread. But it lacks HRTF personalization (but you are able to alter some parameters) and head tracking.

So, as @castleofargh (he has a brilliant synthetic language while I am tediously analytical and the more you write the more you err...), summed up:

jgazal said:
jgazal said:

castleofargh said:

all in all it's only mysterious because we're lacking the tools to look at your head and say "you need that sound", but the mechanisms for the most part are well understood and modeled with success by a few smart people.

Click to expand...

Click to expand...

So again I have to agree with you that the technology is not fully mature. But Genelec is releasing In 2018 a software to capture biometrics from portraits and searching HRTF databases for a close enough match. Sony was sponsoring Professor Choueiri 3D3A labs so I would expect something on that front also. All this is mentioned in the first post of this thread.

And since I don’t know if you already mixed in Ambisonics or binaural synthesis (3 axis mixing), and played back in such environments (beamforming phased array transducers, Bacch xtc with two transducers or a crossfeed free PRIR), in my last post I just wanted to show any experiments in this regard to check if they really fail or, for still unknown techniques, they turn out well.

So even if, remotely, I am not completely driven by a myth, you don’t have to worry about the way music is currently mixed.

If mixing in 3 axis never reaches HT and audio environments and remains restricted to VR environments, you are okay and I own you and the head-fi community apologies. If it reaches the former environments and people want to consume it, you can probably use your stems and remaster for immersive environments, meaning stereo with real ILD and ITD or binaural synthesis or codecs relying in spherical integrals (you could use a “remastered for immersive sound” logo

).

Anyway, thank you very much for sharing all your knowledge. I could have not learned without your counter arguments.

castleofargh · Dec 31, 2017 at 2:26 AM

the ability to fool our brain is pretty much there IMO. we can think we're getting something "real" under the right circumstances with the right gear and material. visual cues would certainly help a lot. how real and identical to some original we'll get, that's a different can of worms.
the will to produce things that way is also a different matter, and just like with binaural, in the end only a small percentage of people are interested in trying to make a fac simile of the original. I know the audiophile world is always full of "the sound like we're with the artist", but most sound engineers I've talked to or seen interviews of, have a clear tendency to try and make a nice sound instead of a replica of the original.
it would certainly be very limiting if the position and ways to record were forced for a simulation. and same thing for mixing and mastering, those would be gone in favor of some fancy software doing its thing alone.

I mean we're satisfied with stereo panning being a gain setting per channel for most of the sounds ever released. it's clear that at least those productions aren't ready for any sort of realistic 3D. no matter how lax we can be on what we call "real".
I'm almost as enthusiastic as you are, thinking about what we could do. we all here share at the very least that idea that hifi warriors should focus more on those areas instead of some silly jitter at -125dB. I have less of a desire to replicate real events because I think they're usually not that great. what makes a live event special is often not the sound quality in my experience.
but one thing is sure, if we get tools, someone will use them! I don't think music has much to gain from full 3D, if only because we do expect musicians to have their feet on the ground most of the time, so there is that. but it doesn't mean some genres can't develop out of it or that we can't improve on the fidelity of what we already have(might require room treatment though).
and I'm confident PDD or whatever his new name is, would have loved some more advanced technics knowing the listeners would also have different playback systems, to do stuff like that

that song made me open my door to the wind so many times when I first got the album. I lived in a house which happened to have the door located where they knock in the song ^_^. real, nope. but fooled, oh yes!

Latest Thread Images

Featured Sponsor Listings

A layman multimedia guide to Immersive Sound for the technically minded (Immersive Audio and Holophony)

bigshot

Headphoneus Supremus

jgazal

500+ Head-Fier

bigshot

Headphoneus Supremus

jgazal

500+ Head-Fier

bigshot

Headphoneus Supremus

gregorio

Headphoneus Supremus

jgazal

500+ Head-Fier

gregorio

Headphoneus Supremus

sander99

Headphoneus Supremus

bigshot

Headphoneus Supremus

gregorio

Headphoneus Supremus

jgazal

500+ Head-Fier

gregorio

Headphoneus Supremus

jgazal

500+ Head-Fier

castleofargh

Sound Science Forum Moderator

Users who are viewing this thread