A layman multimedia guide to Immersive Sound for the technically minded (Immersive Audio and Holophony)
Dec 10, 2017 at 8:26 PM Post #16 of 220
to be clear, when I said "because it is typically impossible when using a mixed and mastered album", I was thinking about how albums are usually made, with most stuff in mono, and no real intent on getting the perceived positions like it happened while recording(if the musicians even happened to play at the same time while recording). with the intent it's already a massive challenge to try and replicate a live event. but when even the production didn't aim for it, like on almost all my stereo library, there is little point in targeting the live event as the right reference.
now if all the recording, mixing, mastering, was done for the specific purpose of getting the space cues of the live event right, then of course as a listener I would wish for my playback system to be able to use that reference instead, as it wouldn't be out of reach.
we make do with what we're given ^_^.
 
Dec 11, 2017 at 1:16 PM Post #17 of 220
Trying to capture a live performance from a specific single point is extremely limiting. You're locked in to balances, no ability to add or subtract elements, no way to creatively sculpt the sound. An approach like that is fine for a scientist doing an experiment, but it's no way to create art. A painter doesn't just record what he sees, he interprets it and organizes it to express his own ideas and point of view. That isn't the same as pointing a Polaroid camera at something and clicking the shutter. It's the same with music. You can just point a mike at someone and get a pretty accurate representation of what he sounds like, complete with information from the room he's standing in and the distance from the microphone. But that isn't what a sound mix is all about. In a mix, you're creating an optimized and organized sound that expresses the ideas and point of view of the creator. It's like the difference between a painting and a Polaroid.

The goal of immersive sound isn't to create sound that sounds more real. Binaural does that to a certain extent and no one really cares. 3D movies have been promoted and failed a couple of times now. People don't want "more real". They want "more expressive". An immersive sound field should be something unique and original. It should give us sound we can't hear in real life, not drag sound down to reality. My interest in multichannel audio is in how it can create sound environments from scratch. In the past, engineers have tried to do minimalist mixes to keep things sounding clean and natural. But as technology advances, we can make very complex mixes that still sound clean and natural if we organize them properly. That's what I want to learn about. These RPO SACDs are doing something I didn't think was possible... create dimensional realistic sound of an orchestra in a concert hall without recording in a concert hall. I'm trying to figure out how they did that. Because if you can synthesize the sophisticated ambience of a concert hall, you can synthesize any ambience you want. You could even create ambiences no one has ever heard before.

Imagine a symphony orchestra mixed like a Pink Floyd album... Debussy's La Mer where the sound surges through the room and crests into waves in three dimensions, or a Mozart piano concerto where the piano dances through the instrumentation of the orchestra in space, or a Wagner opera that takes John Culshaw's ideas of sound stage into the third dimension. Stokowski and Disney had ideas like this in 1940 with Fantasia, but the technology was clunky and required a live human operator. I'd love to see surround sound and Atmos do things like this. Not capturing realistic sound that exists- creating realistic sound that doesn't exist.

Here is a photo of a recording session at CTS Studios in London where these RPO discs were recorded.

CTS-studio.jpg


These recordings sound like they're in a huge hall, not a recording studio. They had 45 microphones on the band and a mixing board capable of 96 channels. It was state of the art in 1995 when these recordings were made, but now it's becoming more common in studios all over the world. Thinking about the possibilities of being able to take a huge orchestra apart into bits and reassemble them in three dimensional space is exciting to me. The idea of the performance and the individual sound objects being separate from the space the sound inhabits opens up a ton of new possibilities. Being able to play an object based recording and switch environments and mixes on the fly completely blows my mind. I can't wait to be able to do that.
 
Last edited:
Dec 11, 2017 at 4:38 PM Post #18 of 220
@bigshot
I agree completely with your last post.
Why not mix all those spot mic instruments into Ambisonics. Is that possible?
Afaik, you could preserve spatial information that is usually lost, like elevation.
Then you can playback such synthetic lifelike soundfield with headphones.
I think mixing in Ambisonics may be preferable over binaural synthesis, because there is just one HRTF convolution.
 
Dec 11, 2017 at 5:03 PM Post #19 of 220
The problem with recording from a specific perspective is that it's difficult then to adjust the perspective if you want. If something is miked too close or too far, you can't process it in the mix to work in any other perspective. That's why most music is recorded at a small distance, then distance cues are synthesized to create the overall perspective. I don't know a lot about ambisonics, but I would imagine the disconnect of mixing several instruments all in different overlapping perspectives would probably negate the value of it.
 
Dec 13, 2017 at 7:00 AM Post #20 of 220
@jgazal bigshot has tried to explain but you don't seem to be getting the implications of what he's saying. You agree with him but then continue as if you don't or rather, as if he'd not posted anything. Let's take an example, your quote of Choueiri: "For serious music listening of music recorded in real acoustic spaces ...". Much, if not everything you've stated and quoted is effectively based on this but it's contradictory and inapplicable. This statement eliminates the vast majority of recorded music because because the vast majority of music is not recorded in a real acoustic space. It also eliminates pretty much all other music because although it is recorded in a real acoustic space, due to the way it's recorded and mixed, it ends up not being a real acoustic space. So, the obvious question with "serious listening of music" would be; serious listening of what music? Choueiri has eliminated pretty much all commercial music recordings! I'm not saying Choueiri's work is definitely worthless nonsense, it *might* potentially have some influence on future developments, it might not have any influence and just be research which expands scientific knowledge or it might actually be relatively worthless. I don't know the current cutting edge of scientific knowledge and don't know if Choueiri is expanding it.

[1] There is certainly two drivers other to keep the reference locked to the mastering room when we are dealing with music content. One is objective and the other is let’s say volitive: a) the consumer environment; b) preference for such best seat in the audience. [c)] But there may be a third driver: apprehension that one single coincident microphone could detract the creative intent of artist, producers, recording, mixing and mastering engineers.
2. I don’t believe that shifting the reference can in anyway detract the creative intent ... Giving the user access to 360 degrees of freedom in the x, y and z axis of a sound-field may sound challenging, but do not detract the creativity. It may be just a new language. This analogy with the cinema limited angle of view versus VR freedom of view might be useful:

1a. There appears to be a general misconception in the audiophile world about what mastering is, what a mastering room is for and how it's used. A mastering room is a room with (hopefully) superb acoustics and accuracy of playback. Incidentally, superb acoustics doesn't mean little/no acoustics. The bigger misconception is that the mastering engineer, masters to this room, IE. They are attempting to created a master which sounds great in the mastering room. We need the accuracy of the mastering room to hear exactly what is going on with the mix and exactly what we're doing/applying but we are NOT trying to create a master which takes advantage of that accuracy. If we did, then we would be defeating the whole purpose of mastering in the first place! In practise, a mastering room will have the most accurate speakers/acoustics possible but it will also have pretty much the worst speakers possible and it will have headphones too. So when I see audiophiles suggesting that recreating the mastering room provides the highest fidelity playback, I want to ask: Which mastering room, the one with the great speakers or the one with the crappiest speakers and what about the mastering engineer's ears and subjective opinion? The last of these being the most important because the master he/she creates is a compromise between the two! The thinking being; the mastering room with the great speakers is loosely representative of the best quality playback and the mastering room with the crappy speakers is loosely representative of the worst and what virtually all consumers will experience is something inbetween these two scenarios. Therefore, if we can create a master which works reasonably well on both, we've got a master applicable to most consumers. Of course, it's virtually impossible to create a master which works perfectly on both sets of speakers. Generally, the more perfectly the master works on the crappy speakers, the less perfectly it works on the great speakers and vice versa. In other words, the reference is NOT locked to the mastering room, it's not even locked to the effectively two vastly different mastering rooms, it's locked to the mastering engineer's subjective opinion of somewhere between the two, which is in turn informed by the likely listening circumstances of the target consumers and modified by the client (artist, producer, record label). The audiophile concept of recreating or getting as close to the mastering room playback as possible, is most likely/almost certainly counter productive!!

1b. This brings us back to my initial point, that even when we're recording in a real acoustic space, that's not what we're trying to create. We're NOT trying to recreate the best seat in the house! And the reason we're not trying to recreate the best seat in the house is because we're talking about an audience member, NOT a seat! In other words, the actual sound waves which would enter an audience member's ears is substantially different to what that audience member would perceive. What we hear is always a perception, a combination of our senses and our expectations. This perception is potentially constantly changing as we decide consciously and subconsciously what to focus on and just as importantly for recording/mixing purposes, what not to focus on! For example, in reality (the actual sound waves), the audience is making constant noise, however as we're looking at (say) the orchestra the brain will decide that constant noise is irrelevant/unimportant/an unwanted distraction and reduce it's perceived level or even eliminate it entirely, unless something non-constant occurs (such as a loud cough for example) or we consciously decide to override what our brain is doing by focusing our attention on that audience noise instead of the orchestra. This is only one of tricks our perception is constantly playing in order to better hear and make sense of the world. Our hearing can also reduce some of the reflections, the number, duration and/or levels of those reflections, it will also even reduce parts of the orchestra itself, the parts we're not concentrating on. For example if the lead violin gets a solo, our brain will match that sound with our eyes and the combination will reinforce the focus of attention on that violin and reduce everything else. On the face of it, this all appears to be pretty unnatural but of course it's actually the exact opposite, it's how our hearing has evolved to work and it's the only thing we've ever experienced as individuals, from even before we're born, so it actually sounds ENTIRELY natural. There's an obvious problem here; with a music recording our sight is contradicting our hearing, we're seeing say our living room but hearing an orchestra in a concert hall and there is little/no reinforcement effect between our sight and hearing which would result in the real life scenario of our brain manipulating, subconsciously reducing and amplifying, the various elements of what is entering our ears in favour of other elements. Assuming we're talking about an actual acoustic event, such as a symphony concert for example, then what we're doing with the recording and mixing is NOT trying to capture and recreate the actual audio reality but create a sort of generalisation of what we would have perceived. This effectively means applying those reducing/amplifying brain manipulations to the recording itself, because our brain will not perform those manipulations when we're listening in our sitting rooms. There is no algorithm for this, there's too many variables at play and ultimately it all comes down to the skill/technique, perception and subjective opinion of the engineers/producer. Also, this is in addition to any creative intent! For example, with our violin solo above, we might decide to make the lead violin a tiny bit louder and more present in the recording to emulate what we would likely have perceived had we been there (and our sight and hearing had combined to create this perception). Artistically though, we might decide that what we would have perceived is still not quite right or could be subjectively better, maybe we would make the violin even louder or maybe quieter again or maybe tweak some other aspect of the sound.

1c. That one coincident pair not only severely limits what we can do artistically but even if it were a perfect coincident pair (which is impossible), it would still only be giving us a perfect recording of the sound waves entering the ear, not a recreation of what we would perceive!

2. I understand how you have arrived at that belief but it's false! It would have a massive affect/detraction on creativity. Your analogy with cinema highlights this fact, although you don't seem to realise it. What you're talking about is not just a new language, it's a different language, a language which has no words or acceptable way of expressing most of the art/creativity of filmmaking but does provide some words/expressions which current filmmaking does not contain. What VR does is defined by it's name, present a virtual REALITY but reality is not what we're after! Narrative filmmaking is ultimately all about storytelling; we read a book by an author, we watch a film by a filmmaker/s and we are limited to the authors words in the order he/she placed them and we are limited to the frame and timeline the filmmaker/s present us with. VR presents the opportunity for the consumer to look and go where they want. This has an advantage and a disadvantage. The disadvantage is that the filmmakers no longer completely controls exactly what you're seeing and hearing at any instant in time and this means most of the subtle artistic storytelling tools evaporate or become chance. For example, a particular scene or even the whole story might depend on a simple gesture, subtle facial expression, a subtle inflection, something going on in the background of a frame, something implied by a camera angle/movement or even a subtle sound. With VR we cannot use any of these and many other similar tools because there's a fair chance the audience will miss most/all of it and it's almost certain they'll miss at least some of it because they will be looking wherever they want, instead of where they're intended to. We'd have to create a film where it wouldn't matter if all the subtle cues were missed and that has massive implications for the complexities of the stories, character development and interaction, if fact pretty much every aspect of the art of modern narrative filmmaking. In effect, the advantage of VR is that the consumer themselves effectively becomes the storyteller and this opens up a whole new set of interesting possibilities but, this is also the disadvantage, the more control the consumer has, the less control the filmmakers have and I'm effectively substituting my own storytelling abilities for those of the filmmakers. I don't know about anyone else but I go and watch a film because of the storytelling abilities of Speilberg, Nolan, Kubrik or Singer, my own storytelling abilities are pathetic in comparison! I'm not saying VR is therefore pathetic, I think it has a bright and interesting future but as a different thing, as a different entertainment experience altogether rather than as an evolution and replacement for film.

The idea of the performance and the individual sound objects being separate from the space the sound inhabits opens up a ton of new possibilities. Being able to play an object based recording and switch environments and mixes on the fly completely blows my mind. I can't wait to be able to do that.

The idea would be fun to an extent but in practise you'd loose more than you gained. Some of your "guesses" of the SACDs you mentioned were quite a bit wide of the mark, the technology for what you're guessing is barely possible even today, let alone in the 1990's. I did some work at CTS by the way, in the 90's, so probably around the time those recordings were made, I knew some of the personnel there, one of them quite well but that's going back a bit, CTS closed it's doors for the last time about 15 or so years ago.

G
 
Last edited:
Dec 13, 2017 at 11:53 AM Post #21 of 220
I look at the photos of the room the orchestra recorded in and it doesn't reflect the sound on the SACDs at all. There is a depth in the sound of these recordings and a hall ambience with the back of the band in a different perspective than the front of the band. Assuming they're telling the truth about using 45 mikes, they must have done considerable isolation of instruments and groups of instruments and synthesizing of ambiences, factoring for the position of the instruments from front to back. I'm used to hearing ambiences where the whole band is the same basic degree of wet. But this is quite different.

I suspect that in stereo, these recordings wouldn't sound nearly as good. It takes multichannel to be able to sort out all the varying degrees of perspective. The liner notes only mention recording dates. I'm sure the multichannel mix was done fairly recently. That's where the uniqueness of these comes in.
 
Last edited:
Dec 13, 2017 at 7:11 PM Post #22 of 220
@bigshot, @gregorio and @pinnahertz, thank you very much for sharing your knowledge, experience and sensibility.

I don’t consider myself an audiophile. I don’t feel excited by recreating the mastering room... But I think I am always curious to know how things work. I have been captivated by human hearing and auditory perception. And the possibility to recreate a real sound field just got me over excited.

There will always be moments in which people will consume music regardless of its spatial information. In fact, I do it that way most often than not.

Please do not think that I do not read every post you three carefully craft to settle down my disquietude. I really didn’t forget what you said about the mixing creation before and I still believe you are completely right. I will quote it here, because I believe it is relevant if we want to understand the scope of Professor Choueiri assertive that “many of well-made popular music recordings over the past two decades have been recorded and mastered by engineers who understand natural sound localization and construct mostly natural-like stereo images, albeit artificially, using realistic ILD and ITD values”:

I'm not really sure of the context of the statements you've quoted. But on the face of it, some/many appear to be nonsense.

The percentage of popular music recordings deliberately mixed with both ILD and ITD is tiny. At a guess, less than 1% and probably a lot less!

Popular music is always recorded as a collection of mono sound sources or of one or two stereo sources mixed with mono sound sources. The stereo-image is therefore constructed artificially and even some of those stereo sources are commonly artificial (stereo synth pads for example).

So, virtually without exception, popular music has a stereo image which is an artificial construct and then the question becomes, how do we construct it?

Well, it's a combination of tools, one of which is reverb. Some reverbs are mono in, stereo out, others are stereo in, stereo out. The latter potentially providing a reasonably natural/realistic ITD relative to the mixed L/R position of the of the source channel/s feeding the reverb, the former cannot.

If we're talking about the L/R position of the individual source channels themselves though (rather than the reverb applied to those channels), then almost without exception that is accomplished purely with ILD (panning) and in fact, ITD is usually deliberately avoided, let alone realistic values calculated and applied!

The deliberate use of ITD for panning (more commonly called "psycho-acoustic panning" by the audio engineering community) is generally avoided for a few reasons:

1. It's far more time and resource consuming to set-up initially and adjust later.

2. The mix is unlikely to have decent mono compatibility and

3. The resultant L/R position achieved by psycho-acoustic panning on a channel is very fragile/unreliable:

A. Any subsequent application of any delay based effects (chourusing, doubling, DD or reverb for example) to that channel will almost certainly change or completely destroy the L/R position.

B. It's far more sensitive (than ILD panning) to small changes/inaccuracies in speaker positioning, room acoustics and listener position.

C. I can't even imagine trying to create a mix where all the L/R positioning is achieved by psycho-acoustically panning individual channels, I don't know how you'd avoid a complete mess.​

The only exception I'm aware of is an old, rather obscure trick on those rare occasions where it's desired that the kick and/or bass guitar be positioned some place other than the centre or near centre and psycho-acoustic panning maybe employed to more evenly distribute the high energy levels between channels/speakers.

All the above relates to popular music recordings, as quoted, it's not necessarily true of classical recordings.

G

Your line of thought is similar to @pinnahertz, who I also learned to admire through his posts. He also informed me that mixing and the vast majority of microphones arrangements rarely consider time differences. For instance:

1. All the spatial information there is to have?

Even modest familiarity with stereo microphone arrays should reveal the complete nonsense of that statement.

ORTF/XY: less than a hemisphere.

Coincident pair: no ITD.

M/S: no ITD.

The Decca Tree: scrambled ITD, and unique ILD, but capturing less than a hemisphere.

Spaced omnis: fully scrambled ITD and ILD.

Spot mic: mono, no 3D spatial information.

And those are the commonly used ones.

They all fall far short of capturing “all the spatial information there is to have”, but each is usable as an element for creating a believable mix.

So that indicates that Professor Choueiri might be overestimating the percentage of recordings that would render lifelike soundstages with crosstalk cancellation.

Smyth brothers, who have been for a long time in the music industry, are clearly more cautious. Even now they don’t promote their product as a Virtual Reality device. They emphasized that it emulates real rooms and speakers, crosstalk included. Just the enthusiasts wanted to find a way to simulate a room with crosstalk cancellation. They don’t talk about it, they don’t recommend it nor admonish.

Perhaps it is time for me to calm down and wait for things to follow their natural course. I am not a researcher neither someone who works in the music industry, so nothing that I could possibly do now will change how consumers will behave in the future. I also don’t know if such technologies will potentially have some influence on future consumer behavior.

I for one would certainly like to hear more music content mixed in Ambisonics or recorded with techniques that preserve spatial information in 3 axis. I would love Spotify to follow google path and give the route to stream Ambisonics.

But as @pinnahertz once said:

Thanks, but you are nothing like the typical consumer.

Perhaps then, in the future, there still won’t be soup for me.

In any case, I learned a lot by interacting with you three and others here in the Sound Science forum. Certainly as much as I learned with Professors Gerzon, Choueiri and Smyth brothers.
 
Last edited:
Dec 14, 2017 at 9:17 AM Post #23 of 220
I look at the photos of the room the orchestra recorded in and it doesn't reflect the sound on the SACDs at all. There is a depth in the sound of these recordings and a hall ambience with the back of the band in a different perspective than the front of the band. Assuming they're telling the truth about using 45 mikes, they must have done considerable isolation of instruments and groups of instruments and synthesizing of ambiences, factoring for the position of the instruments from front to back. I'm used to hearing ambiences where the whole band is the same basic degree of wet. But this is quite different.

It's always a two edged sword. Using more mics provides greater flexibility but also causes more timing/phase error between all those mics and, we've got relatively poor isolation between mics with an orchestra. It's a bit like multi-mic'ing a drum kit, except worse! We've got more mics interacting, less isolation between mics and with a drum kit we can create more isolation with editing and/or some other means and in the process make it slightly less natural sounding, which is not necessarily much of a problem as we're virtually always after a drum kit sound which is not entirely natural anyway but that's not the case with an orch. While we can (and typically do) insert a delay with some of the many mics used in an orch recording, it's typically impractical to do so with all the mics, not just impractical, impossible really because there's so many mics with different timing/phase interactions between all of them, effectively in this case 45 x 45. Additionally, the "synthesising of ambiences" in the 1990's was very significantly less sophisticated that it is today and even today it's still relatively limited! There was no convolution reverb for example, only algorithmic verbs and they were only stereo, not multi-channel/surround. There were certain tricks to get aound this fact and certainly the guys at CTS would have known them, I'm sure better than me, but nevertheless what could be done was very limited, especially as we're talking about hardware units here, not the numerous, sophisticated, interlinked plugin instances possible today. I wasn't there for these recordings, I don't know exactly what they did or what, if anything, was subsequently done to those mixes/recordings to make them suitable for SACD release. I'm hesitant to say that what you're hearing is just a happy coincidence and certainly it's not purely happy coincidence but on the other hand, what you're ascibing to those 1990's recordings is still barely/imperfectly possible with today's technology.

[1] I really didn’t forget what you said about the mixing creation before and I still believe you are completely right. I will quote it here, because I believe it is relevant if we want to understand the scope of Professor Choueiri assertive that “many of well-made popular music recordings over the past two decades have been recorded and mastered by engineers who understand natural sound localization and construct mostly natural-like stereo images, albeit artificially, using realistic ILD and ITD values”:
[2] Your line of thought is similar to @pinnahertz, who I also learned to admire through his posts. He also informed me that mixing and the vast majority of microphones arrangements rarely consider time differences.
[3] So that indicates that Professor Choueiri might be overestimating the percentage of recordings that would render lifelike soundstages with crosstalk cancellation.
[4] Perhaps it is time for me to calm down and wait for things to follow their natural course. I am not a researcher neither someone who works in the music industry, so nothing that I could possibly do now will change how consumers will behave in the future. I also don’t know if such technologies will potentially have some influence on future consumer behavior.
[4a] I for one would certainly like to hear more music content mixed in Ambisonics or recorded with techniques that preserve spatial information in 3 axis.

1. The truth of that quote depends on how one defines "natural-like stereo images" and "realistic ILD and ITD values". In a loose sense you could justify this statement but, if we take the mix as a whole, then it's really not true. It's a cleverly worded statement which has far more in common with the more sophisticated marketing tactics than it does with the actual facts/science.

2. I'd broadly agree with pinnahertz. In practise, each of the many different mic arrangements has it's advantages and disadvantages and there is no perfect solution, only a particular solution in a given circumstance which has fewer/lesser disadvantages than another solution. In the case of something as complex as an orchestra, for several decades we've lessened the disadvantages of the individual mic arrangements by combining multiple different mic arrangements, typically: Closer mono mics + a close/coincident stereo pair or Decca Tree + a widely spaced stereo pair (outriggers) + some omni mono or paired room mics. This solution doesn't simply remove all the disadvantages and leave us with only the advantages of each of the individual mic arrangements, it only reduces rather than removes some of the disadvantages and makes some of the disadvantges worse, timing/phase for example. So, as with virtually every other given recording circumstance, it's a trade-off and a judgement call, there is no one right way to record and orchestra, just a subjectively better or worse way with a particular orch, on a particular day, with a particular technical goal and a particular aesthetic goal!

3. I don't know what Choueiri is doing so it's impossible to say. On the other hand, neither orchestral nor any other type of music is recorded and mixed the same way every time. In fact, even comparing just orchestral recordings with other orchestral recordings, they're very rarely recorded and mixed exactly the same way, so how can you compensate for that? Again though, what do you mean by "lifelike"? Quite a lot of popular music sounds "lifelike" and all orch recordings do (to a greater or lesser extent), even though in practice they're little or nothing like "lifelike".

4. There's lot's of factors at play here beyond the purely "realistic" or even anything directly to do with sound quality. For example the technical and practical implementation, IE. The ease and cost of implementing it and it's flexibility in terms of media types. And, the marketing, an area where Ambisonics spectacularly failed.
4a. In addition to point 4, we've got the problems mentioned above and in my previous post. The artistic options and the perception options/manipulations. The more we preserve that spatial information the lesser/fewer options we have for art and perception. With Ambisonics for example it's very difficult to process that natively, so typically we convert to say 5.1 but then we don't have 3 axis information and, Ambisonics has the same basic disadvantages as a near-coincident stereo pair and is somewhat tricky to mix with other mic arrangements without loosing the advantages of using Ambisonics in the first place. Again, there's no perfect solution here.

G
 
Last edited:
Dec 14, 2017 at 1:48 PM Post #24 of 220
These were recorded in the mid 1990s, but that doesn't mean they were mixed in the mid 1990s. These were originally released in stereo and later as an SACD. I wouldn't think that this multi miked approach would work well, but these SACDs sound excellent, while sounding quite different from the typical classical music release. It's an unique approach.
 
Dec 15, 2017 at 7:59 AM Post #25 of 220
[1] These were recorded in the mid 1990s, but that doesn't mean they were mixed in the mid 1990s. These were originally released in stereo and later as an SACD. [2] I wouldn't think that this multi miked approach would work well, but these SACDs sound excellent, while sounding quite different from the typical classical music release. [3] It's an unique approach.

1. Yes, they would they would almost certainly have been remixed, as SACD did not exist in the mid 1990s. Still, probably in the early 2000's rather than with the cutting edge technology of today.
2. That sort of complex multi-mic'ing of an orchestra is common, standard practise even, from about the mid '80's. Fairly simple multi-mic'ing, say 5 or so mics started in the late '50's as far as I'm aware, although mixed in real-time.
3. Not at all, it can work very well, as I mentioned in my previous post. It only works poorly from the point of view of some absolute notion of acoustic accuracy, in a real space, in a particular location. Very large, 30+ mic setups have been common for a long time, particularly with film scoring which requires greater flexibility and most of CTS's orchestral recording work was for film scores, so it's not surprising they took that approach. It was not unique though, pretty much all the decent scoring stages were using that approach by the mid 1990's.

G
 
Dec 15, 2017 at 11:35 AM Post #26 of 220
Well, the mix is certainly unique. I don't think I've ever heard orchestral recordings that had as elaborate of a manufactured ambience.
 
Dec 26, 2017 at 7:04 PM Post #28 of 220
It's even more shocking than that. Today, JS Bach is of course one of just a handful of the most widely known composers in history and yet within about 40 years of his death he was almost completely unknown, outside the small group of scholars/studying composers. Mendelssohn is widely credited as (re)popularising JS Bach with the general public. So, it's maybe not so shocking that numerous other talented composers and works vanished for good. Of course today we have recording technology but of all the hundreds of great popular pieces of music in the last 40 years or so, I wonder how many of them will still be known in a couple of centuries or so?

G

Perhaps the book “River of Conciousness” by Oliver Sacks and particularly the chapter “Scotoma: forgetting and neglect in science” may give you peace of mind.

I was trying to let the things you wrote about the work of mastering engineerings work in my subconscious and suddenly I came across this interesting case of Quincy Jones versus the Estate of Michael Jackson.

Some artist, producers and recording engineers were perfectionists and had time and budget to pursue excellence:



Bruce Swedien: Recording Michael Jackson

(...)

'Rock With You' is also an excellent showcase for another of Swedien's creative live‑room production techniques. Each of the backing-vocal lines was first double‑tracked with a close mic, then Jackson moved a couple of steps back from the mic for another pass, while Swedien increased the preamp gain to match his level with the previous takes. Finally, an even more distant pass was captured using a Blumlein stereo pair, again matched for level. The result: an increased density of early reflections, which creates a natural depth and width to the soundfield.

Early reflections were also an important part of the lead vocal sound on Jackson's later records from Bad onwards, where the singer was set up on Swedien's aforementioned drum riser to amplify the sound of his dancing, and then surrounded by Tube Traps (the common studio nickname for ASC's tubular Studio Traps). Not only did this approach create a dense and controllable pattern of early reflections to support the singing and dancing sounds, but it also kept the sound at the mic much more consistent as Jackson moved while dancing. "The Tube Trap, to me, is one of the greatest things since sliced bread,” he enthuses. "Michael loved my Tube Traps — he was fascinated with them. We would try all sorts of different setups with the Tube Traps to get a soundfield that was really interesting. They save a lot of time.”

(...)

Talk of compression turns our conversation towards the mixdown process, and here Swedien is quick to point out that, as a colour‑sound synaesthete, he works in a world where there is a direct connection between sounds and colours. Although this mode of perception is rare, he's by no means the only musician who has been touched by it. Classical luminaries such as Liszt, Sibelius, Rimsky‑Korsakov, Messiaen and Ligeti have all correlated colours with keys, chords and timbres; and, beyond that field, musicians such as Duke Ellington, Leonard Bernstein, Billy Joel, Eddie Van Halen, Tori Amos and Aphex Twin have all shown evidence of synaesthesia, as have well‑known producers such as Geoff Emerick, Pharrell Williams, Rollo Armstrong... and Quincy Jones!

"I have synaesthesia, and Quincy does too,” confirms Swedien. "The low frequencies are represented by dark colours like black and purple, while high frequencies are bright colours such as silver and gold. When I listen to a mix I want to see all those colours. I mix in the control room with very low light level, because I think that the human being is primarily a visual animal, but the way the music hits us is purely an aural experience, so I try to minimise the visual aspect of what is affecting me while I mix, by keeping the control room rather dark. And, of course, I'll close my eyes for some of the time.”

(...)

Looking at Bruce Swedien's monitoring setup at his own West Viking Studio in Florida, I immediately spotted some familiar little cubic speakers sitting atop the meterbridge: a pair of Auratone 5Cs. Does he recommend them? He almost jumps on ut of his chair: "I love Auratones! You know what Quincy calls them? The Truth Speakers. There's no hype with an Auratone, and it's sad that you can't buy them any more. I knew the guy who made them out in San Diego, but he died a couple of years ago. If you see any Auratones on eBay, buy them! I have about three or four sets of them. Probably 80 percent of the mix is done on the Auratones, and then I'll have a final listen or two on the big speakers — I have Westlake speakers which I absolutely love, with special custom‑built power amps. I don't listen very loud on the Auratones; the SPL is maybe 85dB for the bulk of the mixing work. If I were to allow conversation to go on in the control room while I was mixing, it would be easy to do, but I hate distractions when I'm mixing, which is another reason why I usually ask everybody to leave.” Many engineers use a single Auratone in mono, but Swedien has no truck with that: "I hate mono, and I'm not a big fan of surround either. I love stereo. If you've got your crap together, and you know what you're doing, you can do as much with two‑channel stereo as most mixers can do with surround.”

(...)

I find amazing mentions to such kind of “manual analog reverb” (very clever and creative), mainstream monitors and mainly to synaesthesia, a neurological condition that Oliver Sacks also depicts in some of his books.

But the 99% (maybe morre) music mass consumption nowadays is played back in less than optimal playback environments that may not render such spatial perception:

Bernie Grundman wants to change the way you hear music — for the better

(...)

Yet for all his facility with the nuts and bolts of audio technology, Grundman, 73, insists that what he really deals in — the reason A-list producers and pop stars have been coming to him for decades — is feeling.

"Our object here is to make sure that these recordings connect emotionally with the listener," he said on a recent morning at his studio, where he's scheduled to present a seminar Monday as part of the month-long Red Bull Music Academy series that has also featured performances by St. Vincent and Ryoji Ikeda. "We want them to feel all the depth and the value of the music, that expression of the human experience."

(...)

But in an era when many music fans are abandoning physical formats for lower-quality digital streaming (and listening through crummy earbuds), Grundman's pricey mastering work strikes some acts and labels as an unnecessary expense.

"Budgets are low," he said, one result of a dramatic drop in record sales that began around 2000 with the advent of peer-to-peer software like Napster. "And because a lot of records only come out on iTunes or Spotify, these inferior formats, you're not going to hear the difference" between what Grundman does and what a computer plug-in can do.

"Say what I do for somebody is 30% better," he continued. "Well, when you put it through the coding device that does the digital compression [for streaming], that 30% is now only 10 or 15."

In its effort to reduce the size of a digital file, the compression "makes all the instruments sound like each other," he said. "They start to lose their individual integrity," which is precisely the thing Grundman says he's seeking to preserve.

"But some people don't mind that, because it doesn't distract them from the other things they're doing. Now we can work on our computer or look at our phone and not be distracted by the music." His face flashed a rueful expression.

"It's kind of sad."

(...)

I don’t agree with many aspects of Michael Fremer following description, but probably the young jury he mentions still felt amazed by regular stereo spatial suspension of disbelief:



What if Fremer testimonial was “expunged from the records” because the vast majority of consumers don’t listen in playback environments with such audiophile “dedication; devotion”?

In the end, Quincy Jones Awarded $9.4M in Michael Jackson Royalty Trial.

I now clearly see what you meant before about the chain variables and subjective conceptions of the recording and mastering engineers:

1a. There appears to be a general misconception in the audiophile world about what mastering is, what a mastering room is for and how it's used.

A mastering room is a room with (hopefully) superb acoustics and accuracy of playback. Incidentally, superb acoustics doesn't mean little/no acoustics. The bigger misconception is that the mastering engineer, masters to this room, IE. They are attempting to created a master which sounds great in the mastering room.

We need the accuracy of the mastering room to hear exactly what is going on with the mix and exactly what we're doing/applying but we are NOT trying to create a master which takes advantage of that accuracy. If we did, then we would be defeating the whole purpose of mastering in the first place!

In practise, a mastering room will have the most accurate speakers/acoustics possible but it will also have pretty much the worst speakers possible and it will have headphones too.

So when I see audiophiles suggesting that recreating the mastering room provides the highest fidelity playback, I want to ask: Which mastering room, the one with the great speakers or the one with the crappiest speakers and what about the mastering engineer's ears and subjective opinion?

The last of these being the most important because the master he/she creates is a compromise between the two! The thinking being; the mastering room with the great speakers is loosely representative of the best quality playback and the mastering room with the crappy speakers is loosely representative of the worst and what virtually all consumers will experience is something inbetween these two scenarios.

Therefore, if we can create a master which works reasonably well on both, we've got a master applicable to most consumers. Of course, it's virtually impossible to create a master which works perfectly on both sets of speakers. Generally, the more perfectly the master works on the crappy speakers, the less perfectly it works on the great speakers and vice versa. In other words, the reference is NOT locked to the mastering room, it's not even locked to the effectively two vastly different mastering rooms, it's locked to the mastering engineer's subjective opinion of somewhere between the two, which is in turn informed by the likely listening circumstances of the target consumers and modified by the client (artist, producer, record label).

The audiophile concept of recreating or getting as close to the mastering room playback as possible, is most likely/almost certainly counter productive!!

1b. This brings us back to my initial point, that even when we're recording in a real acoustic space, that's not what we're trying to create. (...)

Assuming we're talking about an actual acoustic event, such as a symphony concert for example, then what we're doing with the recording and mixing is NOT trying to capture and recreate the actual audio reality but create a sort of generalisation of what we would have perceived. This effectively means applying those reducing/amplifying brain manipulations to the recording itself, because our brain will not perform those manipulations when we're listening in our sitting rooms.

There is no algorithm for this, there's too many variables at play and ultimately it all comes down to the skill/technique, perception and subjective opinion of the engineers/producer.

Also, this is in addition to any creative intent! For example, with our violin solo above, we might decide to make the lead violin a tiny bit louder and more present in the recording to emulate what we would likely have perceived had we been there (and our sight and hearing had combined to create this perception). Artistically though, we might decide that what we would have perceived is still not quite right or could be subjectively better, maybe we would make the violin even louder or maybe quieter again or maybe tweak some other aspect of the sound.

(...)

G

One the one hand, what stimulus typical streaming consumers have to enjoy regular stereo spatial perception with mobile phones and headphones? Is regular stereo as much surprising with headphones as it was with speakers to th yong jury?

I know you are going to be disappointed with my insistence, but I still believe the preservation of 3-axis spatial information and headphone reproduction has the potential to be even more surprising.

And because such 3-axis effect is so picky with play back environment, it could stimulate the industry and consumers to seek suitable environments (HRTF simple acquisition, good quality headphones with integrated head trackers and DSP mass produced chips with HRTF and head tracking algorithms), something that regular stereo has failed.

On the other hand, I fear the limits of Ambisonics you have just described. I wanted to go deep into the caveats of Ambisonics and I discovered this really interesting case from Hamburg, Germany, that matches your experience:



So many variables and still so many things in common (hi-fi ATC monitors and electrostatic headphones and mainstream Aurasound and Yamaha NS-10M monitors; and an interesting microphone arrangement with a hybrid kind of ambisonics mixing...), but still a cutting edge virtual tour: Visiting Clouds Hill.

The Clouds Hill studio owner feels, as you feel, that stereo will remain an standard for music. Perhaps because, as I said before, people still have strong emotional connection with recorded music regardless its 2-axis (and potentially 3-axis) spatial information.

This is a really difficult subject...

I tend to agree that music reproduction is not restricted to the realm of physics, science and technology, but intricately merged into the human nature, culture and arts. And emotions are certainly more related with the latter than the former.

I really hope you recording/mastering engineers are able to work together with electronic engineers to reconcile art and 3-axis spatial information.

Anyway, much of the emotional connection is related to the listener memories and that depends on the family, friends and educational music repertoire or the contemporary broadcast music their are exposed to, don’t you think?

No matter how eclectic one would be, one just not have time and resources enough to be exposed to all time all cultures genres (Yo Yo Ma might be an exception to confirm the rule...). So a lot of them must be forgotten (as you said, vanished for good). I still find amazing how certain songs can achieve world success. And Thriller was certainly one of the best examples.
 
Last edited:
Dec 27, 2017 at 7:00 AM Post #29 of 220
You've covered a lot of different and complex areas but I'll try to respond to some of it:

This is a really difficult subject...

Yes and no. On the one hand, if we try to objectively analyse exactly what is going on, we very quickly get into so much complexity, covering such a wide range of variables, that it ends up being meaningless nonsense. On the other hand, it's pretty simple, we have some rough mental image of what we think the piece should sound like before we start and then, through every step of the process, we just do what we think sounds good/right/better. The actual reality of creating/producing music is somewhere between these two extremes, although much more towards the latter than the former. And this is what makes the former so difficult/impossible, you are effectively trying to objectively analyse a combined multitude of subjective/artistic decisions and furthermore, you only have a very vague (and in some instances, incorrect) knowledge of the steps, tools and processes of creating a music product. All this inevitably results in at least some, if not all conclusions being erroneous. BTW, I'm not this at you personally, it's a general observation of the audiophile world.

I know you are going to be disappointed with my insistence, but I still believe the preservation of 3-axis spatial information and headphone reproduction has the potential to be even more surprising.

In some cases yes it does have that potential but it ENTIRELY depends on the goal! If the goal is a music product, then typically that completely precludes "the preservation of 3-axis spatial information", it precludes the preservation of even of 1 or 2 axis spatial information, let alone 3 axis and indeed, it precludes the preservation of a lot more besides just the spatial information! The Clouds Hill Studio's virtual tour you posted is a good demonstration of this. Listening to the drum kit at about 7:40 raises a number of points:
1. That is (most likely) an accurate representation of what a drum kit in a room sounds like.
2. It is interesting (arguably surprising) from a purely sonic reproduction, listening experience point of view but ...
3. It's not interesting from a music point of view! The goal here is a virtual tour of a recording studio though, not a music product.
4. From a music point of view we have two (hopefully!) obvious problems, an artistic problem and a logistical problem:
4a. The logistical problem is: We have a drum kit it one room, a cello in another room, a piano in another, etc. It's entirely standard in all popular music genres to have all, some or most of the musicians in different acoustic spaces. Obviously, a song is not only a drum kit but all the instruments and vocals mixed together. So how is it possible to "preserve the 3-axis spatial information" of say 3 or more substantially different acoustic spaces all at the same time? Mixing say a small toilet acoustic space with a medium room acoustic space doesn't preserve both those acoustic spaces, it mixes them up into a single bizarre acoustic space which cannot exist in the real world but as bizarre/wrong as that mixture should sound in theory, in practice it can sound good/pleasing. Typically though, to make it sound good/pleasing means processing during mixing, IE. Changing and therefore NOT preserving that spatial information.
4b. How many pieces of music have you heard, from the last 30 or 40 years, that contain a drum kit which sounds like the (real!!) drum kit on the video? There's none I'm aware of! Through the use of processing, the drum kit sound on music products has evolved over many years, it's virtually always very significantly different to the (real) sound actually created in the recording room and indeed, the processing variations of the drum kit sound is one of the factors which actually define the different popular music genres. The huge gated reverb snare in the old rock ballads, the tight sampled snare in EDM, the small trashy plate-like snare in punk, the broader/bigger but not so wet snare of heavy metal, the list goes on and on for most music genres/sub-genres, and there's a similarly long list of different variations of how the kick is processed from genre to genre but in every case it means changing or very severely changing (not preserving) the original information, both the spatial information and the direct timing, frequency and amplitude information. In other words, to "preserve the information" and thereby limit ourselves to a real drum kit sound, effectively eliminates much of what actually defines different music genres actually sounding like different music genres!

In a sense, you are looking at all of this completely backwards! You are thinking of the music and then of the format employed to present that music to the consumer, and that leads you to logically wondering why that music cannot be presented in the format you want/would enjoy (an immersive format). The reason this is backwards is because the music does not come first, the format does! Everything revolves around the format; the composition, the arrangement, the performance, the recording, the editing, the mixing/processing, the mastering and of course, many of the mono/stereo tools for accomplishing these tasks. All of these things would have to change to fufill your desire and that change would necessarily change the music genres themselves, to the point of eliminating many of them. You really have to get away from this old audiophile myth of the music existing and then it just being a process of accurately recording/preserving the sound waves created by that music!!!

G
 
Last edited:
Dec 27, 2017 at 10:04 AM Post #30 of 220
You really have to get away from this old audiophile myth of the music existing and then it just being a process of accurately recording/preserving the sound waves created by that music!!!

G

I feel sad when you say I need to get away from a myth.

I prefer to think I am driven by a more or less legitimate desire to reproduce immersive audio, although the first time I saw the Realiser A8, I thought only the externalization of stereo speakers crosstalk included was possible and it was then the only thing I intended...

I wouldn't say Gerzon with Ambisonics, Choueiri with Binaural through loudspeakers (xtc) or even Dolby with Atmos (if its algorithm relies on spherical harmonics or Kirschhoff-Helmholtz integral) are were driven by the same myth. Maybe they were/are trying to popularize their format?

I agree that for music, stereo seems easier.

You may criticise if the height axis worth all the hassle, particularly for music, but there will be formats for VR, that is for sure.

That's why I believe the studio owner opinion is so sensible. Perhaps immersive audio may have more acceptance in live recordings, but the currently stereo format will remain standard.

Anyway, I have more doubts. Call me smilie :grin:instead of smellie! :joy:

In the Sound on Sound video above, the recording engineer says that, as soon as Ambisonics is compressed, the 3 axis sound field gets messed up.

Why do you think that happens?

Do you think the bandwidth necessary for HOA does not allow transparent bitrates? P.s.: I think first order is what web-based scripts allow.

Or decoding itself has different latencies for each channel and the asymmetrical delays ruin the rendering in a similar way you said that perfect coincident microphones do not exist?

Do you believe the eigenmikes, ambeo or soundfield mikes are factory calibrated to deal with such placement, pick-up pattern and sensitivite assymetry between capsules?
 
Last edited:

Users who are viewing this thread

Back
Top