A layman multimedia guide to Immersive Sound for the technically minded (Immersive Audio and Holophony)

jgazal · Jan 9, 2018 at 5:16 PM

Thank you, @Erik Garci. I thought it was ready to market. Too many open questions. I’ll wait.

Erik Garci · Jan 10, 2018

jgazal said:
I thought it was ready to market. Too many open questions. I’ll wait.

According to the TechHive video, it will come out this summer for around $150. They are using in-ear mics at the CES demo (similar to the Realiser) and claim that using photos will achieve an 85% match.

https://www.techhive.com/article/3246194/ces/creative-super-x-fi-headphone-audio-holography.html

http://www.hardwarezone.com.sg/feat...ew-super-x-fi-headphone-holography-technology

jgazal · Jan 11, 2018

Erik Garci said:
According to the TechHive video, it will come out this summer for around $150. They are using in-ear mics at the CES demo (similar to the Realiser) and claim that using photos will achieve an 85% match.

https://www.techhive.com/article/3246194/ces/creative-super-x-fi-headphone-audio-holography.html

http://www.hardwarezone.com.sg/feat...ew-super-x-fi-headphone-holography-technology

Convolution of personal binaural room impulse responses is not something new. See, for instance, this brilliant lecture from Bruce Land published in 2012 (Cornell):

But I didn’t read anything about head tracking with third party headphones and the only innovative feature, which is using biometrics to find a close matching HRTF was not demonstrated. How can we know 85% is close enough to acceptable performance?

The more I read about it, the more I get suspicious.

Nevertheless, I admit miniaturized dedicated chips are a huge step in the right direction.

jgazal · Jan 10, 2018

I’ve just edited the first post to add two brilliant lectures from Chris Brown (MITOpenCourseWare, Sensory Systems, Fall 2013, published on 2014).

I feel bad about animal experiments, but the lectures are so good that we can’t ignore them.

Cheers!

Erik Garci · Jan 12, 2018 at 11:41 AM

There is a new thread for the Creative Super X-Fi.

https://www.head-fi.org/threads/creative-super-x-fi-headphone-holography.869663/

jgazal · Jan 15, 2018

gregorio said:
I agree that the majority of the time we are trying to make it "better than real".

However, there are some music genres where this isn't the case, effectively where we're trying to make it better so that it does sound real.

Quite often in audiophile discussions the topic is brought around to the comparison of a live acoustic performance, such as orchestral music, with a recorded equivalent.

The problem here is quite different to the "better than [and not even directly concerned with] real" which is the case with the non-acoustic genres.

In the case of acoustic genres such as orchestral, I would re-word the part I've highlighted in bold to: "The result would often not appear to be entirely realistic or very exciting, because what we hear at an orchestral concert is not real in the first place!" - What actually enters our ears and what we perceive are two different things. Our brain will filter/reduce what it thinks is irrelevant, such as the constant noise floor of the audience for example, and increase the level of what it thinks is most important, such as what we are looking at (the instrument/s with the solo line for example).

This isn't "real" at all, although of course it feels entirely real. Clearly, even with a theoretically perfect capture system, all we're going to record is the real sound waves but when reproduced, the brain is generally not going to perceive those sound waves as it would in the live performance because the visual cues and other biases which informed that perception are entirely different.

So, the trend over the decades has been to create a orchestral music product which sounds realistic relative to human perception rather than just accurately capture the sound waves which would enter one's ears. To achieve this we use elaborate mic'ing setups which allows us to alter the relative levels of various parts of the orchestra in mixing (as our perception would in the live performance).

However, a consequence of this is messed-up timing, as sound wave arrival times are going to vary between all the different mics (which are necessarily in significantly different positions). This is an unavoidable trade-off, we're always going to get messed-up spatial information but with careful adjustment during mixing we can hopefully end up with a mix which is not perceived to be too spatially messed-up (even though it still is).

This "careful adjustment" is done mainly on speakers but is typically checked on HPs and further adjustments may be made if the illusion/perception of not being spatially messed-up is considered to be too negatively affected by HP presentation.

This brings me back to what I stated previously, that pretty much whatever we listen to and however we're listening to it (speakers, HPs, HPs with crossfeed, etc.) we've always got messed-up timing, "spatial distortion" or whatever else you want to call it.

PS. I know you're probably aware of all this already bigshot.

(...)

G

jgazal said:
Thought experiment:

Imagine that you record an orchestra with an eigenmic (32 capsules) placed at row A, seat 2, and you have and that you convolve the highest number possible of virtual speakers a high density HRTF. At row A, seat 3, there is a born blind listener. At row A, seat 1, there is a viewer with normal eyesight. Finally, at row B, seat 2 you have a listener that recently acquired blindness. Full audience.

Questions:

Are you saying that the viewer with normal eyesight would only perceive, with headphones playback, an soundfield identically to the one he/she heard live if, and only if, he/she uses a “perfect” virtual reality headset displaying images at where he/she were seated?

Are you saying that blind listeners cannot precisely locate sounds at the live event, for instance, identify where the soloist is playing?

Are you saying that only blind listeners would perceive, with headphones playback, an soundfield identically to the one he/she heard live?

Are you saying that accuracy to locate sounds (at least in the horizontal plane) differs from a blind listener and a blindfolded viewer who has normal eyesight?

Are you saying that blind listeners are not capable of sound selective attention (cocktail party effect)?

Do you think that the born blind listener and the listener that recently acquired blindness will achieve different sound location accuracy?

I agree that vision can in some circumstances override sound cues. I also agree that vision is normally the sense that allows to train your brain to locate sound sources with your ears and that you can retrain your brain if your vision does not match your sound cues.

But I don’t know if that is the only route to create a virtual soundfield map in your brain (or maybe is it a neural network physical simulacrum of a soundfield map?).

Someone that was born blinded can walk to his mother when she is calling “my angel”. Some are capable of echolocation. Some play blind soccer.

But I don’t know if all psychoacoustics processing phenomena are caused by visual and sound cues ambiguities.

Are you sure you can claim that?

pinnahertz said:
Perhaps we should be asking you the question: "Is gregorio actually saying any of that?" Or do you need a lesson in reading comprehension?

Please don't dumb-down the discussion by putting words in peoples mouths that they've never said or implied. I know it's tempting because the thread is so dumb already, but resist...resist....resist.

jgazal said:
This is what gregorio wrote:
So when I wrote about crosstalk cancellation filters, beaforming phased array of transducers and headphone externalization I wrote something that might theoretically occur and I was wrong.

But now he writes about acoustic music genres and the problem is mainly in visual cues?

So tell me, what is worst problem: “visual cues and other biases” “in the live performance” or acoustic crosstalk in the playback?

I am not trying to put words in his mouth. Sometimes the absurd argument is useful to express a mild idea.

What I am trying to say, respectfully, is that mixing without carefully considering ITD is a potential problem.

You say no because stereo acoustic crosstalk with speakers is ubiquitous, it happens in any “loudspeakers in a room” listening environment.

Fine, but you don’t have to rage at what I wrote.

Did I really dumb down the discussion?

I will refrain posting at all, then.

I have reading comprehension issues. And I am delusional.

pinnahertz said:
Somehow you missed his point and instead focused on the point that he correctly made regarding visual reinforcement of spatial hearing, but then took it out of his balanced context over to the ridiculous. Those "Are you saying..." questions were way out of context.

I'm not raging, I'm asking you to not blow things out of perportion or take minor points out of context. This is a challenged thread that needs no more confusion or interference.

I'm not saying you should refrain from posting. That's also a polar extreme. Just keep it real.

gregorio said:
(...)

I'm not even sure I can claim to understand your post, let alone making the claim you're asserting! From what I can tell though, pinnahertz is correct, I'm not necessarily claiming any of that and you seem to have missed the point of what I'm saying about how orchestral recordings are made and why they're made that way.

If we take your example, then:

1. I've got no idea how a person born blind will perceive a live orchestral performance.
2. I have a vague idea of how the recently blind person might be perceiving the performance.
3. I have a vague idea of the listener with normal eyesight will perceive the performance but not as vague as #2, I can make some generalised assumptions, which will apply much of the time to the average audience member. And, I've already explained those assumptions but let's be a bit more specific, let's take an example of a section of music in which say all the strings are playing an accompanying role to a prominent/solo part for the french horn section. Our brain will rapidly latch-on to this, our eyes and conscious attention will be drawn to the horns and brought more into focus, making the horn section clearer/louder relative to what we're not focusing on, the audience noise floor and to a lesser extent the strings for example. We're not really consciously aware of this effect, it sounds entirely natural/real because that's what our hearing is does all the time with all sound.

With a sound recording (even a theoretically perfect one), we're listening in our sitting room, we don't have the same biases affecting our perception, certainly not the same sight, so we're not likely to have the same perception/experience (of this effect) or not experience it as strongly, so what are we to do? Typically, we'd use another mic, placed appropriately near the horns so it picks up more of the horns relative to the strings and room acoustics and then, when mixing, bring this mic up a couple of dBs or so during this section of the music. This would make the horns very slightly louder and clearer than what our perfect recording would be but more in line with what our perception would do at the live event. The downside is that we're going to have a timing issue, the horn sound will arrive at our spot mic much earlier than it will arrive at our perfect mic setup (in row A, seat 2), maybe 20 milliseconds or more. So, we've seriously messed-up the timing (spatial distortion). Maybe the recording still sounds fine and we can leave it like that but almost certainly we'd apply some delay to the spot mic. Even if we applied the exact delay to that spot mic as the distance to the perfect mic setup would suggest (about 1ms per 1.1 feet), that would give us the correct arrival time but we'd still have spatial distortion because the early reflections and reverb from row A, seat 2 will be significantly different to the early reflections at the position of our spot mic. Interestingly though, applying the 1ms per 1.1 feet formula often doesn't work very well, or rather we use it as a starting point and adjust the amount of delay from there until it sounds right but what we end up with is therefore actually wrong by several/many milliseconds.

Now I'm sure your going to say something like, why introduce that mic and all that timing error/spatial distortion in exchange for just a small gain in perception? My answer to that would be: You seem to have a real thing about timing error, spatial information/distortion. I'm not saying it's unimportant, it is important and we (engineers) spend a considerable amount of our time adjusting and manipulating it but the absolute, perfect accuracy you seem to be craving simply isn't that important, the brain is quite easily deceived and constantly messes with that spacial information itself anyway, to increase clarity and various other reasons. Relatively speaking it's a good exchange, a significant improvement in perception/the listening experience for a relatively insignificant amount of spatial distortion. And that is why the recording and mixing of orchestras has evolved to using more and more mics, starting in the early 1950's.

(...)

G

pinnahertz said:
Of course you do! You are the master of made-up terminology! It's part of spatial hearing and localization. If it causes a problem you might term it "acoustic crosstalk", which is what we called it when working on "acoustic crosstalk cancellation", something the exact inverse of your cross-feed, and also interesting, sometimes desirable, and off-topic.
(...)
You've done NO LISTENING RESEARCH like this with your cross-feed, so you have not researched why some like it and some don't. Yet you repeatedly insist cross-feed is right, and a universal improvement. You have absolutely nothing on which to base this!

pinnahertz said:
Could you possibly avoid insulting people? (...)

1. You don't have the right 3D acoustic space or natural correlation on speakers either. It's all an illusion, and a deliberate one.
(...)
Having trouble with the concept of "reference"? Real life IS the reference. You can't have a reference for the reference. What would THAT be, sound in the vacuum of space?
That's part of how spatial hearing works, a tiny portion of HRTF. So?
Because we don't have a vast library of binaural recordings that take HRFT into account! Cross-feed doesn't address the full HRFT, hence it's hobbled.

I'm addressing you. The "others" are a very quiet minority.

pinnahertz said:
(...)

I agree in principle with everything except that cross-feed reproduces the "technical-to-psychoacoustic modification" of speakers. Not even close! To do that you'd have to introduce the correct HRTF and ambient acoustics of speakers in a room, similar to what the Smyth Realizer does. That's a very, very long way from your cross-feed! I like what the Realizer does, but it's impractical. I don't like cross-feed on most material, but do on some.

But, I have knowlege of it the same way, yet make different choices. The main difference here is I don't say your choice is wrong, it's your choice. I've also explained why I don't agree with your choice, but from a preference standpoint and a technical one. My choice is right for me, but you say I'm spatially ignorant, immature, unenlightened, spatially deaf, and a whole string of other insults. I don't even know what your point or purpose is anymore.

I agree my post was off-topic and the concept of “spatial distortion” adopted by 71 dB can cause confusion. I still don’t know if this post is going to be somehow useful to anyone, but since I decided to give it a shot perhaps this is the appropriate thread...

The objection I had with gregorio first reference to visual cues is exactly that it was not specific and thus allowed an ambiguous interpretation that he was considering visual cues as having more weight than, for instance, acoustic crosstalk at playback.

That’s why I appreciate that he further explained the good practices he adopts. I am not saying that such practices are wrong, but they may not be exclusive:

The cocktail party problem

(...)

Cocktail party solutions

The cocktail party problem is partially solved with perceptual mechanisms that allow the auditory system to estimate individual sound sources from mixtures.

(...)

Localization cues afforded by our two ears are another source of information — if a target sound has a different spatial location than distractor sounds, it tends to be easier to detect and understand. Visual cues to speech (as used in lip reading) also help improve intelligibility. Both location and visual cues may help in part by guiding attention to the relevant part of the auditory input, enabling listeners to suppress the parts of the mixture that do not belong to the target signal.

(...)

Sound segregation in music

Music provides spectacular examples of sound segregation in action — recordings often have more instruments or vocalists than can be counted, and in the best such examples, we feel like we can hear every one of them. Why does following a particular sound source in a piece of music often feel effortless? Unlike a naturally occurring auditory scene, music is often speci cally engineered to facilitate sound segregation. Recording engineers apply an extensive bag of tricks to make instruments audible in their mix, ltering them, for instance, so that they overlap less in frequency than they normally would, thus minimizing masking. The levels of different instruments are also carefully calibrated so that each does not overwhelm the others. Real-life cocktail parties unfortunately do not come with a sound engineer.

Sound segregation in music also no doubt bene ts from our familiarity with instrument sounds and musical structure — we often have well-de ned expectations, and this knowledge of what is likely to be present surely helps us distinguish instruments and voices.

Music also provides interesting examples where sound segregation is intentionally made dif cult for aesthetic effect. For instance, sometimes a producer may want to cause two instruments to perceptually fuse to create a new sort of sound. By carefully coordinating the onset and offset of two instrument sounds, our auditory system can be tricked into thinking the two sounds are part of the same thing.

I am most of the time very very impressed with bigshot, gregorio and pinnahertz descriptions, knowledge, skills and techniques, mainly if we consider the limits of standard stereo reproduction.

What I was trying to express with my way out of context questions is that we would need further listening research to know if such visual reinforcement of spatial hearing is really much worse than acoustic crosstalk at playback. What would happen if engineers had the freedom to expand the soundstage? I don’t know, I just think such hypothesis must be tested.

And different skills and deficiencies from listeners with different eyesight histories may provide new insights. See, for instance: Auditory Spatial Perception without vision.

If I were a scientist or if I worked in the industry and had time and resources, I would test two different recordings (a. one with two microphones spaced at the average of a human head diameter; and b. the other using the technique you mentioned) in two different listening scenarios (a. one with standard acoustic crosstalk; and b. the other other with tonal transparent crosstalk cancellation). No binaural recording, no rendering of elevation. I would choose untrained listeners (people who are anyway related to the entertainment industry or sensory research) that don’t know the interpreted piece/composition. I would choose three different groups of listeners (born blind, listeners that recently got visual impairment and listeners with normal eyesight). I would separate those groups into four subgroups (recording a, listening environment a; recording a, listening environment b; recording b, listening environment a; recording b, listening environment b). After the playback, I would ask them: have you noticed there was prominent/solo part for the french horn? Where they were located? Then finally I would compare their answers and try to find any possible patterns to conclude that the issue of visual reinforcement of spatial hearing is in fact worse than acoustic crosstalk at playback.

So when pinnahertz says “keep it real”, he is saying that, given the universe of listening environments, the number of people that have access to crosstalk cancellation filters, beamforming phased array of transducers and or headphone externalization devices is so small, my inquietude is irrelevant to recording and mixing engineers, who must concentrate their current efforts to make songs suit mainstream listening environments with acoustic crosstalk.

Although those questions may seem theoretical, the existence of such acoustic crosstalk free environments is an evidence that I am not talking of theoretical future technologies.

So I don’t have “a thing” with spatial distortion.

I’ve actually quoted two very real and practical cognitive dissonances of listeners (a) using a PRIR’s set to not add crossfeed with externalization filters and headphones playback and (b) crosstalk cancellation filters with two loudspeakers:

A. Crossfeed free PRIR in headphones listening environment and content with hard panned steams and no ITD

Erik Garci said:
A) I rank the realism as follows:
1. Most realistic: Moving your head even just slightly with head-tracking. Speakers seem to get farther from you instantly and fixed in space.
2. Keeping your head perfectly still (with or without head-tracking). Speakers seem to get farther from you gradually the longer you keep still.
3. Least realistic: Moving your head even just slightly without head-tracking. Speakers seem to get closer to you instantly and stuck your head.
I especially notice the improvement for sounds from the center speaker or phantom center.

B) I use the A8 for stereo, which works very well, and prefer it to regular headphone listening.

By the way, I recently created a PRIR for stereo sources that simulates perfect crosstalk cancelation. To create it, I measured just the center speaker, and fed both the left and right channel to that speaker, but the left ear only hears the left channel because I muted the mic for the right ear when it played the sweep tones for the left channel, and the right ear only hears the right channel because I muted the mic for the left ear when it played the sweep tones for the right channel. The result is a 180-degree sound field, and sounds in the center come from the simulated center speaker directly in front you, not from a phantom center between two speakers, so they do not have comb-filtering artifacts as they would from a phantom center.

Binaural recordings sound amazing with this PRIR and head tracking.

Erik Garci said:
Using the first PRIR, central sounds seem to be in front of you, and they move properly as you turn your head. However, far-left and far-right sounds stay about where they were. That is, they sound about the same as they did without a PRIR, and they don't move as you turn your head. In other words, far-left sounds stay stuck to your left ear, and far-right sounds stay stuck to your right ear. It's possible to shift the far-left and far-right sounds towards the front by using the Realiser's mix block, which can add a bit of the left signal to the front speaker for the right ear, and a bit of the right signal to the front speaker for the left ear.
(...)

Erik Garci said:
That would be much easier than manually muting the microphones during measurements, and just about any PRIR could be used.

Allowing fractional values would be even better, such as 0.5 (-6 dB) or 0.1 (-20 dB).

jgazal said:
I don’t know if my question really address the issue, but let’s wait for they answer:

@Mike Smyth, @Stephen Smyth, once I asked if it would be possible to implement an optional function that allows the user to experiment a playback mode in which the signals assigned to left side speakers are not played back at the right headphone driver and vice versa and the answer was yes.

I am sorry to bother once again, but I have just noticed that sometimes an instrument track is fully assigned to one channel and that could sound odd.

I’ve read in the Realiser A8 manual that one can blend channels in the mix block with 0.1 increments until full 1.0 mix.

But I just can’t figure out if such function is equivalent to adding less crossfeed than the crosstalk measured in the room the PRIR was acquired.

So in the end my new question is: would it be possible to mix individual channels or add lower dB of crossfeed than one would find in the real PRIR into the ipsolateral channels all at once, but with finer increments than 0.1?

Click to expand...

B. Stereo speakers with crosstalk cancellation filter and content with hard panned steams and no ITD

bmoura said:
The Bacch processor is designed to play binaural recordings, like the Chesky Binaural + series and the Binaural DSD Downloads you find on NativeDSD Music, over loudspeakers.
Different from the Smyth which is designed to take movies and surround sound music recordings, not made binaurally, and play them over Stereo headphones.

The Bacch is interesting. But the effects vary in the listening demo I heard at one of the audio shows.

In some cases, instruments appeared "outside" the left and right speakers (as intended).
But on other tracks, the instrumentalists came towards you (the trumpeter is out to get you) vs. across the front. (Not as intended).

Needs more development to my ears. Not yet "revolutionary".

To reinforce, when pinnahertz says “keep it real”, he also wants to say that, considering the universe of playback environments, the number of users that have access to crosstalk cancellation filters and headphone externalization filters is irrelevant, which makes my disquietude also an irrelevant issue. I agree. There is no problem at all with that approach. Smyth Research Realiser didn’t have an optional setting in which the user could choose not to add crossfeed (that simulates the acoustic crosstalk with the measured speakers in the measured room).

So as someone who is “nothing like the typical consumer”, one could mitigate those cognitive dissonances with three options:

a) asking the technology developers to provide finer channel mixing increments (see quote above);

b) asking politely recording and mixing engineers to release alternative masterings in which ILD and ITD are carefully taken into consideration for such playback environments;

c) not using the crosstalk cancellation filter or the PRIR set to not add crossfeed.

I have tried a and b routes, but up to now I only got Chesky binaural records and “c” option.

B route is what I am trying to figure out from the questions I’ve been doing to bigshot, gregorio and pinnahertz, and apparently the only answer I got from them is infeasibility or that it does not worth the effort.

Whenever I have a doubt, I try to ask you. I try to adopt a very neutral language, because everything is funny as long as it is happening to somebody else.

So I would also like to praise the succinct, polite, elegant and erudite way bigshot, a gentleman in the old fashioned way, adopted to claim that:

bigshot said:
It would be possible to capture a very realistic sound field and reproduce it. It would take a very specific kind of miking, and a custom speaker array that is precisely matched to it. It would be basically a "capture only" system. You couldn't edit or overdub or balance levels. The result would be realistic, but not very exciting. We hear realistic sound every moment of our lives. Recorded music is intended to be *better* than real... more organized, more balanced, more clear, more interesting sounding.

The problem isn't that realism is unattainable. It's that pursuing realism is a waste of a great deal of materials and effort for minimal returns. The first law of being an artist is to know how to use your medium to its strengths. Recording is no different.

Without using words and expressions that can trigger negative emotions, such as “delusional”, “audiophile myth”, “dumbing-down”, “lesson in reading comprehension” or “keep it real”, he made me concede that the engineer’s aim of creating better than real in popular genres is legitimate. Obviously that previous lectures from gregorio and pinnahertz about the way recording and mixing is done also fundamentally contributed to such education, but only with bigshot claim my reason could override my desire for absolute reference to a reality (that, as gregorio glouriously explained, does not exist in the first place for popular genres). There are no words to express my gratitude to all of you.

I still have hope that gregorio will be able to make his way of mixing compatible with Professor Choueiri binaural synthesis mixing app by creating virtual speakers in the content itself. I politely asked gregorio to investigate this route in post #59, but there is no answer yet. Sincerely, I don’t know if he has incentive or willingness to do that, because, unconsciously omitting the hypothesis of binaural synthesis in the way described in post #59, this is apparently his definitive response:

gregorio said:
That's the problem, there are extremely few binaural recordings. Binaural recording can only be employed for acoustic music genres and even then, it removes the possibility of mixing, of the art to enhance the perception/psycho-acoustics (as would occur during an actual performance).

G

I don’t know if he has actually already tested such techniques or he completely dominates the subject, allowing him to previously rule it out in the theoretical domain.

Thinking about an incentive for him to further investigate such route, I thought I could state this conundrum in a slightly different and more speculative way.

Currently there is a market in which acoustic crosstalk at playback and impractical PRIR and HRTF acquisition are the standard.

But several shifts in the entertainment industry business model were also driven by convenience (i.e. cd, mp3, streaming etc). So it doesn’t seems unreasonable to speculate that hardware manufacturers will sell more soundbars or stereo speakers (capable of crosstalk cancellation) than discrete multichannel speakers.

There is two ways to render height with soundbars: a) having discrete up firing transducers in the soundbar that rely on ceiling reflections or b) to use beaforming to reduce crosstalk and then convolve/convolute a PRIR or a HRTF. The first is strongly dependent of room characteristics and users may not be willing to deal with the hassle of optimal location. The second is strongly dependent of personalized BRIR or HRTF (compare, for intance, yarra + Realiser versus Yamaha YAS-207, that relies on DTS Virtual X).

There is only one way to imprecisely render height with two channels and two stereo speakers: playing binaural content with a crosstalk cancellation filter.

Now imagine a market in which PRIR and HRTF becomes automated by prompting users allow biometric capture and crosstalk cancellation is the mainstream.

Would releasing an specific “binaural synthesis master” (that allow mixing a drum in the cherished way gregorio described, but also conveying at least a 180 degree hemifield sounstage and proximity) represent a competitive advantage?

If you say yes, then your offer will match my propensity to buy your mastering instead of an standard stereo one.

And since we are all talking about preferences, I wish the demand for crosstalk cancellation filters, beamforming phased array of transducers and externalization playback equipment to exceed the demand for standard stereo and multichannel equipment.

It is not a “thing”, it is just a preference to: 1. avoid the cognitive dissonances above; 2. have at least 180 degree hemifield sounstage and 3. convey proximity.

I won’t die by not using a crossfeed free PRIR, because after the education I received about recording and mixing now I can grasp what it might be happening.

I still think it is a competitive advantage for recording and mastering engineers being free to convey at least a 180 degree hemifield sounstage and proximity using only two loudspeakers or a soundbar capable of beamforming. But that is just me.

Will you know how to use the above mentioned medium (crosstalk free environments and easy acquisition of personal HRTF) to its strengths if they eventually become mainstream?

bigshot · Jan 14, 2018 at 1:46 PM

The thing is, people are talking about solving the problem of spacial distortion... and I'm not convinced it even is a problem.

When I'm supervising a mix, I'm not even trying to make it sound realistic. I'm just trying to make it sound "good". To me, the difference between a good and a bad mix is all about balance and organization. I try to make everything clear... no element of the mix overpowering other elements, spaces carved out for sounds to exist in, a soundstage that fills the room, emphasis for elements that are supposed to be emphasized, no little distractions to draw you away from the focus... basically, just organization. I might mix for mono or stereo or 5.1, but it's no different. Organized sound is organized sound.

When I finish a mix, I usually check it on small speakers, just to make sure the balances still work with a limited frequency response, but I've never checked a mix for headphones, and I don't really see why I would have to check. If something is clearly organized for speakers, it'll still be clearly organized for headphones. If someone wants to listen to my mix with cross feed, great. The same goes for DSPs that rechannel it out to more speakers, or ones that add ambience to the room. None of that will alter my balances and organization. I'm not trying to create a specific aural perspective, so spatial distortion just doesn't matter. It isn't a problem, so it doesn't need any solution.

One of my pet peeves about audiophoolery is the tendency to latch on to problems that don't exist and to jump through hoops to solve problems that aren't really problems. We see that all the time around here- maybe my wall power is "dirty". Will a power conditioner fix that? I'm concerned about jitter. What kind of DAC do I need to buy to not have any jitter? I have cheap cables. Would ones made of sterling silver sound better?

We see this all the time- solutions to problems that don't exist. There are enough problems that are real problems. No need to waste time and effort on solving non-problems. There's nothing wrong with using signal processing. If you like the effect, great. You can even encourage others to try it and see if they like it. But you really don't have to think up a theory about why it might be necessary, especially when the theory doesn't even relate to an actual problem.

jgazal · Jan 16, 2018

IMHO, every testable hypothesis (falsifiability) has a potential worth.

It is just that we don’t know its potential worth a priori.

In the lectures linked in the first post of this thread Chris Brown asks about the implications of sensory research in people with cochlear implants.

So if one could separate pinnae filtering from head and torso filtering, a person with cochlear implant that computes head movements could, in theory, restore his/her skills to localize sounds in space.

What would be the worth of having cochlear implants and such DSP for someone with art sensibilities comparable to Beethoven who also lose his/her hearing?

IMHO, invaluable.

Wrong idea? Is the microphone in the entrance of ear canal? IDK

jgazal · Jan 17, 2018

bigshot said:
It's important to remember that when it comes to recorded music, the science is intended to serve the art, not the other way around.

And

bigshot said:
(...). The first law of being an artist is to know how to use your medium to its strengths. (...).

And your medium may change.

:L3000:

bigshot said:
Speakers produce a dimensional soundstage. Headphones don’t. You need space to have dimension. The room provides that, not the speakers.

bigshot said:
Recordings contain secondary depth cues... reverb, reflection of sound off studio walls, ambience, etc. But that is baked into the mix and is no different on headphones or speakers. The thing that adds real dimensional space is real dimensional space (i.e. your living room). Blending two channels together doesn't even qualify as a secondary depth cue. It's just blending two channels together.

Binaural is a gimmick for headphones. It isn't the way music is recorded or played back. And it isn't particularly dimensional. Certainly not in the sense that speaker soundstage is.

bigshot said:
Nope. A stereo recording can only contain secondary depth cues, not actual spatial information. For that you need space. Multichannel can do a better job of reproducing secondary depth cues in an immersive way, but it still isn’t the spatial information from the church recording venue. For actual spatial information people need to be able to move their head to locate objects in space. When you make a recording, all that information gets stripped off. Head movement on playback is dictated by the spatial placement of the speakers in the room. That is the space you are hearing, not the church’s space.

Secondary depth cues are great for adding a specific atmosphere once you have real depth. But they don’t convey actual space. The room is the space, not the recording. Speakers have space in a room. Headphones have no space because they’re clamped to your head. As we’ve said many times, recordings are not created to reproduce the space of the recording venue. With multichannel, the mixer is creating an artificial balance that will wrap around the room like wallpaper on the walls of the listening room itself.

Is it possible to monitor, only using headphones and avoiding speakers, the step in which you add reverberation?

In other words, do you get any error if you do not monitor such steps with speakers?

On more try: if an engineer monitor such step only using headphones and avoiding speakers, does he need to be careful to not overshoot?

Your assertives made me curious.

Even with head tracking, the intensity of reverb may be a fundamental incompatibility between mixes for speakers with acoustic crosstalk (or PRIR convolution, that account for the same acoustic crosstalk) versus mixes for speakers with acoustic crosstalk cancellation, beaforming phased array of transducers that avoid crosstalk or pure HRTF convolution without crossfeed.

Pure HRTF convolution may be not good then for current content. I wonder if finding a similar HRTF in Creative’s database of HRTFs would be only the first step in their DSP to then actually insert a RIR to also compensate that reverberation superposition... Who knows...

What is the result of Steve Reich content using, for intance, the crosstalk cancellation circuit @pinnahertz developed?

@bigshot, are you planning to demo the Realiser A16? Do you know anyone from you professional network that already uses the Realiser A8 for monitoring multichannel mixings? I am so curious to know your opinion!

bigshot · Jan 17, 2018 at 8:03 PM

I've never supervised any mix where the engineers used headphones. The talent uses beater cans for isolation from feedback, but that is the only time I've seen headphones in a studio. Everything is done on calibrated speakers, and checked on little speakers at the end. I'm sure that there are homemade DIY mixes for rock bands who do their own recording and engineering. They might use headphones to mix, but the results probably aren't very good.

Headphones have a tendency to suck up bass and emphasize treble. If you mix with cans, it might end up sounding boomy and muffled if you aren't careful. I'd also be concerned about the way reverbs that involve phase shifts would play in a room. In headphones you could easily set up cancellation issues and not even know it.

I seriously doubt that any engineer will use the Realizer for production. The goal is to mix to a standard. Using a very esoteric way to monitor your mix is dangerous.

jgazal · Jan 17, 2018 at 8:21 PM

I see.

AFAIK, the Realiser A8 with a PRIR from your mastering room and head tracking turned on will emulate the sound of your very own mastering room: https://www.soundonsound.com/reviews/smyth-research-realiser-a8.

The manual says you can cut or fade out the reverberation tale. But it also says that very short reverberation times may sound unnatural.

And that matches the results people reach when they mistakenly over damp listening rooms.

That’s why I was thinking pure HRTF convolution may lack a bit of reverberation and that’s a compatibility problem. Perhaps summing pure HRTF and a RIR from a reference room is one solution. I really don’t know. Just wondering...

bigshot · Jan 17, 2018 at 8:23 PM

I don't know why anyone would want to mix on a technology that doesn't match what people have in their homes. Why would an engineer want to simulate speakers in a room when he has speakers in his room already? It's like someone painting a portrait of a person on TV... why not just sit in front of the person and do the portrait?

jgazal · Jan 18, 2018

bigshot said:
I don't know why anyone would want to mix on a technology that doesn't match what people have in their homes.

Afaik, standard personal room impulse response convolution actually matches what people have in their homes.

It is an step “before” the crossfeed free PRIR I was suggesting for binaural synthesis content.

bigshot said:
Why would an engineer want to simulate speakers in a room when he has speakers in his room already? It's like someone painting a portrait of a person on TV... why not just sit in front of the person and do the portrait?

That analogy is quite interesting.

But, IMHO, it can be proposed the other way around.

Consider the following Artists:

In depth:

They probably dominate several physical supports, but digital is one more option.

So they know the fundamental principles of painting, but they go further and acquire familiarity with digital medium.

There are tools that they can only use within the digital medium.

But of course, as any medium, digital may give you pros and cons.

I am sure in that sound on sound article pros and cons of using the A8 dsp are mentioned.

You choose.

So the other way of asking your analogy is: why painting only with physical support (pencil/paper) or a (brush/acrylic paint/canvas)? Why not trying digital support?

IMHO, dsp may give recording/mastering engineers more artistic freedom and listeners a different perspective.

How can you/we know without trying?

bigshot · Jan 18, 2018 at 12:21 AM

When you produce sound it's all about control. How tightly can you control the sound so you stand a good chance that what you're hearing and approving is what your customers are hearing. You don't get esoteric. You don't even get technical. There is a separate crew whose job it is to keep the studio in spec. Once you set foot on the stage, all that has to be dummy proofed so you can just work. It seems that a lot of audiophiles don't understand the priorities and the workflow. It isn't about justifying your pet theories. It's about getting the artist's intent across. I can separate the pros from the duffers quickly in forums because some people spend all their time talking about details, and some people talk about their creative intents.

jgazal · Jan 18, 2018

bigshot said:
When you produce sound it's all about control. How tightly can you control the sound so you stand a good chance that what you're hearing and approving is what your customers are hearing. You don't get esoteric. You don't even get technical. There is a separate crew whose job it is to keep the studio in spec. Once you set foot on the stage, all that has to be dummy proofed so you can just work. It seems that a lot of audiophiles don't understand the priorities and the workflow. It isn't about justifying your pet theories. It's about getting the artist's intent across. I can separate the pros from the duffers quickly in forums because some people spend all their time talking about details, and some people talk about their creative intents.

Okay. I give up.

Latest Thread Images

Featured Sponsor Listings

A layman multimedia guide to Immersive Sound for the technically minded (Immersive Audio and Holophony)

jgazal

500+ Head-Fier

Erik Garci

100+ Head-Fier

jgazal

500+ Head-Fier

jgazal

500+ Head-Fier

Erik Garci

100+ Head-Fier

jgazal

500+ Head-Fier

bigshot

Headphoneus Supremus

jgazal

500+ Head-Fier

jgazal

500+ Head-Fier

bigshot

Headphoneus Supremus

jgazal

500+ Head-Fier

bigshot

Headphoneus Supremus

jgazal

500+ Head-Fier

bigshot

Headphoneus Supremus

jgazal

500+ Head-Fier

Users who are viewing this thread