jgazal
500+ Head-Fier
- Joined
- Nov 28, 2006
- Posts
- 730
- Likes
- 138
Thank you, @Erik Garci. I thought it was ready to market. Too many open questions. I’ll wait.
According to the TechHive video, it will come out this summer for around $150. They are using in-ear mics at the CES demo (similar to the Realiser) and claim that using photos will achieve an 85% match.I thought it was ready to market. Too many open questions. I’ll wait.
According to the TechHive video, it will come out this summer for around $150. They are using in-ear mics at the CES demo (similar to the Realiser) and claim that using photos will achieve an 85% match.
https://www.techhive.com/article/3246194/ces/creative-super-x-fi-headphone-audio-holography.html
http://www.hardwarezone.com.sg/feat...ew-super-x-fi-headphone-holography-technology
I agree that the majority of the time we are trying to make it "better than real".
However, there are some music genres where this isn't the case, effectively where we're trying to make it better so that it does sound real.
Quite often in audiophile discussions the topic is brought around to the comparison of a live acoustic performance, such as orchestral music, with a recorded equivalent.
The problem here is quite different to the "better than [and not even directly concerned with] real" which is the case with the non-acoustic genres.
In the case of acoustic genres such as orchestral, I would re-word the part I've highlighted in bold to: "The result would often not appear to be entirely realistic or very exciting, because what we hear at an orchestral concert is not real in the first place!" - What actually enters our ears and what we perceive are two different things. Our brain will filter/reduce what it thinks is irrelevant, such as the constant noise floor of the audience for example, and increase the level of what it thinks is most important, such as what we are looking at (the instrument/s with the solo line for example).
This isn't "real" at all, although of course it feels entirely real. Clearly, even with a theoretically perfect capture system, all we're going to record is the real sound waves but when reproduced, the brain is generally not going to perceive those sound waves as it would in the live performance because the visual cues and other biases which informed that perception are entirely different.
So, the trend over the decades has been to create a orchestral music product which sounds realistic relative to human perception rather than just accurately capture the sound waves which would enter one's ears. To achieve this we use elaborate mic'ing setups which allows us to alter the relative levels of various parts of the orchestra in mixing (as our perception would in the live performance).
However, a consequence of this is messed-up timing, as sound wave arrival times are going to vary between all the different mics (which are necessarily in significantly different positions). This is an unavoidable trade-off, we're always going to get messed-up spatial information but with careful adjustment during mixing we can hopefully end up with a mix which is not perceived to be too spatially messed-up (even though it still is).
This "careful adjustment" is done mainly on speakers but is typically checked on HPs and further adjustments may be made if the illusion/perception of not being spatially messed-up is considered to be too negatively affected by HP presentation.
This brings me back to what I stated previously, that pretty much whatever we listen to and however we're listening to it (speakers, HPs, HPs with crossfeed, etc.) we've always got messed-up timing, "spatial distortion" or whatever else you want to call it.
PS. I know you're probably aware of all this already bigshot.
(...)
G
Thought experiment:
Imagine that you record an orchestra with an eigenmic (32 capsules) placed at row A, seat 2, and you have and that you convolve the highest number possible of virtual speakers a high density HRTF. At row A, seat 3, there is a born blind listener. At row A, seat 1, there is a viewer with normal eyesight. Finally, at row B, seat 2 you have a listener that recently acquired blindness. Full audience.
Questions:
Are you saying that the viewer with normal eyesight would only perceive, with headphones playback, an soundfield identically to the one he/she heard live if, and only if, he/she uses a “perfect” virtual reality headset displaying images at where he/she were seated?
Are you saying that blind listeners cannot precisely locate sounds at the live event, for instance, identify where the soloist is playing?
Are you saying that only blind listeners would perceive, with headphones playback, an soundfield identically to the one he/she heard live?
Are you saying that accuracy to locate sounds (at least in the horizontal plane) differs from a blind listener and a blindfolded viewer who has normal eyesight?
Are you saying that blind listeners are not capable of sound selective attention (cocktail party effect)?
Do you think that the born blind listener and the listener that recently acquired blindness will achieve different sound location accuracy?
I agree that vision can in some circumstances override sound cues. I also agree that vision is normally the sense that allows to train your brain to locate sound sources with your ears and that you can retrain your brain if your vision does not match your sound cues.
But I don’t know if that is the only route to create a virtual soundfield map in your brain (or maybe is it a neural network physical simulacrum of a soundfield map?).
Someone that was born blinded can walk to his mother when she is calling “my angel”. Some are capable of echolocation. Some play blind soccer.
But I don’t know if all psychoacoustics processing phenomena are caused by visual and sound cues ambiguities.
Are you sure you can claim that?
Perhaps we should be asking you the question: "Is gregorio actually saying any of that?" Or do you need a lesson in reading comprehension?
Please don't dumb-down the discussion by putting words in peoples mouths that they've never said or implied. I know it's tempting because the thread is so dumb already, but resist...resist....resist.
This is what gregorio wrote:
So when I wrote about crosstalk cancellation filters, beaforming phased array of transducers and headphone externalization I wrote something that might theoretically occur and I was wrong.
But now he writes about acoustic music genres and the problem is mainly in visual cues?
So tell me, what is worst problem: “visual cues and other biases” “in the live performance” or acoustic crosstalk in the playback?
I am not trying to put words in his mouth. Sometimes the absurd argument is useful to express a mild idea.
What I am trying to say, respectfully, is that mixing without carefully considering ITD is a potential problem.
You say no because stereo acoustic crosstalk with speakers is ubiquitous, it happens in any “loudspeakers in a room” listening environment.
Fine, but you don’t have to rage at what I wrote.
Did I really dumb down the discussion?
I will refrain posting at all, then.
I have reading comprehension issues. And I am delusional.
Somehow you missed his point and instead focused on the point that he correctly made regarding visual reinforcement of spatial hearing, but then took it out of his balanced context over to the ridiculous. Those "Are you saying..." questions were way out of context.
I'm not raging, I'm asking you to not blow things out of perportion or take minor points out of context. This is a challenged thread that needs no more confusion or interference.
I'm not saying you should refrain from posting. That's also a polar extreme. Just keep it real.
(...)
I'm not even sure I can claim to understand your post, let alone making the claim you're asserting! From what I can tell though, pinnahertz is correct, I'm not necessarily claiming any of that and you seem to have missed the point of what I'm saying about how orchestral recordings are made and why they're made that way.
If we take your example, then:
1. I've got no idea how a person born blind will perceive a live orchestral performance.
2. I have a vague idea of how the recently blind person might be perceiving the performance.
3. I have a vague idea of the listener with normal eyesight will perceive the performance but not as vague as #2, I can make some generalised assumptions, which will apply much of the time to the average audience member. And, I've already explained those assumptions but let's be a bit more specific, let's take an example of a section of music in which say all the strings are playing an accompanying role to a prominent/solo part for the french horn section. Our brain will rapidly latch-on to this, our eyes and conscious attention will be drawn to the horns and brought more into focus, making the horn section clearer/louder relative to what we're not focusing on, the audience noise floor and to a lesser extent the strings for example. We're not really consciously aware of this effect, it sounds entirely natural/real because that's what our hearing is does all the time with all sound.
With a sound recording (even a theoretically perfect one), we're listening in our sitting room, we don't have the same biases affecting our perception, certainly not the same sight, so we're not likely to have the same perception/experience (of this effect) or not experience it as strongly, so what are we to do? Typically, we'd use another mic, placed appropriately near the horns so it picks up more of the horns relative to the strings and room acoustics and then, when mixing, bring this mic up a couple of dBs or so during this section of the music. This would make the horns very slightly louder and clearer than what our perfect recording would be but more in line with what our perception would do at the live event. The downside is that we're going to have a timing issue, the horn sound will arrive at our spot mic much earlier than it will arrive at our perfect mic setup (in row A, seat 2), maybe 20 milliseconds or more. So, we've seriously messed-up the timing (spatial distortion). Maybe the recording still sounds fine and we can leave it like that but almost certainly we'd apply some delay to the spot mic. Even if we applied the exact delay to that spot mic as the distance to the perfect mic setup would suggest (about 1ms per 1.1 feet), that would give us the correct arrival time but we'd still have spatial distortion because the early reflections and reverb from row A, seat 2 will be significantly different to the early reflections at the position of our spot mic. Interestingly though, applying the 1ms per 1.1 feet formula often doesn't work very well, or rather we use it as a starting point and adjust the amount of delay from there until it sounds right but what we end up with is therefore actually wrong by several/many milliseconds.
Now I'm sure your going to say something like, why introduce that mic and all that timing error/spatial distortion in exchange for just a small gain in perception? My answer to that would be: You seem to have a real thing about timing error, spatial information/distortion. I'm not saying it's unimportant, it is important and we (engineers) spend a considerable amount of our time adjusting and manipulating it but the absolute, perfect accuracy you seem to be craving simply isn't that important, the brain is quite easily deceived and constantly messes with that spacial information itself anyway, to increase clarity and various other reasons. Relatively speaking it's a good exchange, a significant improvement in perception/the listening experience for a relatively insignificant amount of spatial distortion. And that is why the recording and mixing of orchestras has evolved to using more and more mics, starting in the early 1950's.
(...)
G
Of course you do! You are the master of made-up terminology! It's part of spatial hearing and localization. If it causes a problem you might term it "acoustic crosstalk", which is what we called it when working on "acoustic crosstalk cancellation", something the exact inverse of your cross-feed, and also interesting, sometimes desirable, and off-topic.
(...)
You've done NO LISTENING RESEARCH like this with your cross-feed, so you have not researched why some like it and some don't. Yet you repeatedly insist cross-feed is right, and a universal improvement. You have absolutely nothing on which to base this!
Could you possibly avoid insulting people? (...)
1. You don't have the right 3D acoustic space or natural correlation on speakers either. It's all an illusion, and a deliberate one.
(...)
Having trouble with the concept of "reference"? Real life IS the reference. You can't have a reference for the reference. What would THAT be, sound in the vacuum of space?
That's part of how spatial hearing works, a tiny portion of HRTF. So?
Because we don't have a vast library of binaural recordings that take HRFT into account! Cross-feed doesn't address the full HRFT, hence it's hobbled.
I'm addressing you. The "others" are a very quiet minority.
(...)
I agree in principle with everything except that cross-feed reproduces the "technical-to-psychoacoustic modification" of speakers. Not even close! To do that you'd have to introduce the correct HRTF and ambient acoustics of speakers in a room, similar to what the Smyth Realizer does. That's a very, very long way from your cross-feed! I like what the Realizer does, but it's impractical. I don't like cross-feed on most material, but do on some.
But, I have knowlege of it the same way, yet make different choices. The main difference here is I don't say your choice is wrong, it's your choice. I've also explained why I don't agree with your choice, but from a preference standpoint and a technical one. My choice is right for me, but you say I'm spatially ignorant, immature, unenlightened, spatially deaf, and a whole string of other insults. I don't even know what your point or purpose is anymore.
The cocktail party problem
(...)
Cocktail party solutions
The cocktail party problem is partially solved with perceptual mechanisms that allow the auditory system to estimate individual sound sources from mixtures.
(...)
Localization cues afforded by our two ears are another source of information — if a target sound has a different spatial location than distractor sounds, it tends to be easier to detect and understand. Visual cues to speech (as used in lip reading) also help improve intelligibility. Both location and visual cues may help in part by guiding attention to the relevant part of the auditory input, enabling listeners to suppress the parts of the mixture that do not belong to the target signal.
(...)
Sound segregation in music
Music provides spectacular examples of sound segregation in action — recordings often have more instruments or vocalists than can be counted, and in the best such examples, we feel like we can hear every one of them. Why does following a particular sound source in a piece of music often feel effortless? Unlike a naturally occurring auditory scene, music is often speci cally engineered to facilitate sound segregation. Recording engineers apply an extensive bag of tricks to make instruments audible in their mix, ltering them, for instance, so that they overlap less in frequency than they normally would, thus minimizing masking. The levels of different instruments are also carefully calibrated so that each does not overwhelm the others. Real-life cocktail parties unfortunately do not come with a sound engineer.
Sound segregation in music also no doubt bene ts from our familiarity with instrument sounds and musical structure — we often have well-de ned expectations, and this knowledge of what is likely to be present surely helps us distinguish instruments and voices.
Music also provides interesting examples where sound segregation is intentionally made dif cult for aesthetic effect. For instance, sometimes a producer may want to cause two instruments to perceptually fuse to create a new sort of sound. By carefully coordinating the onset and offset of two instrument sounds, our auditory system can be tricked into thinking the two sounds are part of the same thing.
A) I rank the realism as follows:
1. Most realistic: Moving your head even just slightly with head-tracking. Speakers seem to get farther from you instantly and fixed in space.
2. Keeping your head perfectly still (with or without head-tracking). Speakers seem to get farther from you gradually the longer you keep still.
3. Least realistic: Moving your head even just slightly without head-tracking. Speakers seem to get closer to you instantly and stuck your head.
I especially notice the improvement for sounds from the center speaker or phantom center.
B) I use the A8 for stereo, which works very well, and prefer it to regular headphone listening.
By the way, I recently created a PRIR for stereo sources that simulates perfect crosstalk cancelation. To create it, I measured just the center speaker, and fed both the left and right channel to that speaker, but the left ear only hears the left channel because I muted the mic for the right ear when it played the sweep tones for the left channel, and the right ear only hears the right channel because I muted the mic for the left ear when it played the sweep tones for the right channel. The result is a 180-degree sound field, and sounds in the center come from the simulated center speaker directly in front you, not from a phantom center between two speakers, so they do not have comb-filtering artifacts as they would from a phantom center.
Binaural recordings sound amazing with this PRIR and head tracking.
Using the first PRIR, central sounds seem to be in front of you, and they move properly as you turn your head. However, far-left and far-right sounds stay about where they were. That is, they sound about the same as they did without a PRIR, and they don't move as you turn your head. In other words, far-left sounds stay stuck to your left ear, and far-right sounds stay stuck to your right ear. It's possible to shift the far-left and far-right sounds towards the front by using the Realiser's mix block, which can add a bit of the left signal to the front speaker for the right ear, and a bit of the right signal to the front speaker for the left ear.
(...)
That would be much easier than manually muting the microphones during measurements, and just about any PRIR could be used.
Allowing fractional values would be even better, such as 0.5 (-6 dB) or 0.1 (-20 dB).
I don’t know if my question really address the issue, but let’s wait for they answer:
@Mike Smyth, @Stephen Smyth, once I asked if it would be possible to implement an optional function that allows the user to experiment a playback mode in which the signals assigned to left side speakers are not played back at the right headphone driver and vice versa and the answer was yes.
I am sorry to bother once again, but I have just noticed that sometimes an instrument track is fully assigned to one channel and that could sound odd.
I’ve read in the Realiser A8 manual that one can blend channels in the mix block with 0.1 increments until full 1.0 mix.
But I just can’t figure out if such function is equivalent to adding less crossfeed than the crosstalk measured in the room the PRIR was acquired.
So in the end my new question is: would it be possible to mix individual channels or add lower dB of crossfeed than one would find in the real PRIR into the ipsolateral channels all at once, but with finer increments than 0.1?
The Bacch processor is designed to play binaural recordings, like the Chesky Binaural + series and the Binaural DSD Downloads you find on NativeDSD Music, over loudspeakers.
Different from the Smyth which is designed to take movies and surround sound music recordings, not made binaurally, and play them over Stereo headphones.
The Bacch is interesting. But the effects vary in the listening demo I heard at one of the audio shows.
In some cases, instruments appeared "outside" the left and right speakers (as intended).
But on other tracks, the instrumentalists came towards you (the trumpeter is out to get you) vs. across the front. (Not as intended).
Needs more development to my ears. Not yet "revolutionary".![]()
It would be possible to capture a very realistic sound field and reproduce it. It would take a very specific kind of miking, and a custom speaker array that is precisely matched to it. It would be basically a "capture only" system. You couldn't edit or overdub or balance levels. The result would be realistic, but not very exciting. We hear realistic sound every moment of our lives. Recorded music is intended to be *better* than real... more organized, more balanced, more clear, more interesting sounding.
The problem isn't that realism is unattainable. It's that pursuing realism is a waste of a great deal of materials and effort for minimal returns. The first law of being an artist is to know how to use your medium to its strengths. Recording is no different.
That's the problem, there are extremely few binaural recordings. Binaural recording can only be employed for acoustic music genres and even then, it removes the possibility of mixing, of the art to enhance the perception/psycho-acoustics (as would occur during an actual performance).
G
It's important to remember that when it comes to recorded music, the science is intended to serve the art, not the other way around.
(...). The first law of being an artist is to know how to use your medium to its strengths. (...).
Speakers produce a dimensional soundstage. Headphones don’t. You need space to have dimension. The room provides that, not the speakers.
Recordings contain secondary depth cues... reverb, reflection of sound off studio walls, ambience, etc. But that is baked into the mix and is no different on headphones or speakers. The thing that adds real dimensional space is real dimensional space (i.e. your living room). Blending two channels together doesn't even qualify as a secondary depth cue. It's just blending two channels together.
Binaural is a gimmick for headphones. It isn't the way music is recorded or played back. And it isn't particularly dimensional. Certainly not in the sense that speaker soundstage is.
Nope. A stereo recording can only contain secondary depth cues, not actual spatial information. For that you need space. Multichannel can do a better job of reproducing secondary depth cues in an immersive way, but it still isn’t the spatial information from the church recording venue. For actual spatial information people need to be able to move their head to locate objects in space. When you make a recording, all that information gets stripped off. Head movement on playback is dictated by the spatial placement of the speakers in the room. That is the space you are hearing, not the church’s space.
Secondary depth cues are great for adding a specific atmosphere once you have real depth. But they don’t convey actual space. The room is the space, not the recording. Speakers have space in a room. Headphones have no space because they’re clamped to your head. As we’ve said many times, recordings are not created to reproduce the space of the recording venue. With multichannel, the mixer is creating an artificial balance that will wrap around the room like wallpaper on the walls of the listening room itself.
I don't know why anyone would want to mix on a technology that doesn't match what people have in their homes.
Why would an engineer want to simulate speakers in a room when he has speakers in his room already? It's like someone painting a portrait of a person on TV... why not just sit in front of the person and do the portrait?
When you produce sound it's all about control. How tightly can you control the sound so you stand a good chance that what you're hearing and approving is what your customers are hearing. You don't get esoteric. You don't even get technical. There is a separate crew whose job it is to keep the studio in spec. Once you set foot on the stage, all that has to be dummy proofed so you can just work. It seems that a lot of audiophiles don't understand the priorities and the workflow. It isn't about justifying your pet theories. It's about getting the artist's intent across. I can separate the pros from the duffers quickly in forums because some people spend all their time talking about details, and some people talk about their creative intents.