Is soundstage actually detrimental to spatial audio?
Oct 6, 2019 at 10:49 PM Post #121 of 162
This thread definitely reminds me of the Torvalds/Tannenbaum debate. If this is the first time that this has happened in this sub-forum, this is historic.
 
Oct 7, 2019 at 12:06 AM Post #122 of 162
Looks like some folks here have forgotten to read the 'How to disagree' link in the Sound Science header.............:darthsmile:
 
Oct 7, 2019 at 2:12 AM Post #123 of 162
Who reads that stuff up there? They put that there to punish us when this banishment group was new. They've gotten smarter since then. They just leave us alone and we work it out for ourselves. I'm not upset, and Gregorio is the energizer bunny responding to this stuff. As long as 71 dB is happy, which he says he is, all is well.
 
Oct 7, 2019 at 4:28 AM Post #124 of 162
This is an intruiging thread and I'm happy I took the time to read it.

1. I agree that to create soundstage a room is necessary, note that this means an actual soundstage and not the simple "perception" of one.

2. I also agree that headphones can sound different spatially, as can speakers. In speakers this could perhaps be chalked up to just being dispersion characteristics (so the room interaction is different) while for headphones I am unsure why spatiality (or rather, the perception of it) varies even equalized to the same or extremely similar responses, perhaps something to do with the (small) acoustic space's interaction with one's ears?

3. There have been attempts to geat greater spatiality and even attempts to create an actual soundstage in headphones. The Stax SR-Sigma is an example of a headphone created with a small "room" to simulate what you would more typically hear in speakers. The Sigma does not account for quite a few things that are mentioned here, however, such as soundstage independent of listeners head. It is also not reflective enough to simulate a typical room and the space is still small. There are also headphones which achieve a very slight degree of crossfeed such as the AKG K1000 and the more recent RAAL SR1A, this is insufficient for proper spatial information in headphones because you do not hear the room, only the reverb and the spatial cues embedded in the recording. Their crossfeed is also less than that of speakers (for obvious reasons).

4. My personal perception of headphones is that they can sound quite wide (the extreme end of left and right can be percieved/experienced as quite far away) but utterly lack the depth of speakers in a room, even in near-field. Even on binaural recordings the very center of the mix sounds at best to be a few centimetres in front of my face (and that's absolute best case scenario!). Anecdotally speaking I have been fooled by my headphones before, thinking a sound from something I was listening to came from an external source and vice versa. To me it seems obvious that while spatiality in headphones technically doesn't exist our ears are quite easy to fool, and it would be possible to create a simulation of soundstage on headphones through proper lag and reverb simulating a room (which I think is partially what the A16 does?).

5.I wonder, if you put someone inside an anechoic chamber, outfitted him with a pair of completely open headphones (i.e. it does not affect external sound in any way), put his head in a vice, applied digital crossfeed equal to that which the speakers would have naturally and then played back the same recording on both a pair of speakers and the headphones (equalized to the same response given the test subjects HRTF etc.) would the difference in staging then be imperceptible? There is no longer any spatial cues from the speakers themselves that do not exist in the headphones. To make it even more fair you could even use a free field headphone such as the K1000, MySphere or SR1A so there is no earcup interaction. This would be a very interesting thing to test, and I wonder if something similar has been done yet?

Lastly, I do agree that the majority of spatiality found in playback comes from the recording itself, especially so for headphones. A mono mix will sound quite flat no matter how good your room and speakers are, while a proper stereo mix can at least give the slightest (SLIGHT!) impression of depth on a headphone. Just as a personal (anecdotal) example one the greatest degrees of depth I've heard in a non-binaural recording on headphones is track 13 - "Back to business" from the Violet Evergarden OST. There are a few very faint kicks that happen between 30 and 40 seconds into the song that sound like they are happening just barely in front of me. This is miles better than most stereo recordings in terms of depth spatiality but still falls completely flat compared to properly set up speakers. Cheers!
 
Last edited:
Oct 7, 2019 at 6:10 AM Post #125 of 162
[1] "I have done this for 25 years and you haven't" -argument doesn't intimidate me anymore. Not anymore. I realized I can never build a healthy self-esteem if I let everything intimidate me.
[2] I have studied acoustics in the University. ... I have thought about these things alot.
[2a] I must have insight and I simply refuse the claim that don't know anything.
[3] What I lack is the burden of being married to the conventions of the recording/mixing business.
[3a] My knowledge is not nullified just because someone somewhere has recorded and mixed music for 25 years.
[4] Applying HRTF reduces ILD, just using significantly more detailed responses than what basic crossfeed does.
[5] The philosophy of crossfeed is: Better not do anything above 1 kHz, but why not fix things below 1 kHz since that's pretty easy and important to have natural levels of ILD?

1. That argument was NOT made to intimidate you, it was made to demonstrate a level of practical knowledge, experience and expertise that's valued, because the marketplace for professional engineers is highly competitive. Whether you let that intimidate you or not is entirely your issue. However, it should at least dissuade you from just making-up nonsense about what engineers are and do, but it clearly hasn't in your case!

2. And what did your university teach you about having say 30 different acoustics at the same time?? This is a fundamental issue that you just refuse to acknowledge! Some/Much of what you learned about acoustics is invalid with regards to music recordings because we're effectively dealing with fantasy "spatial information" that doesn't exist in the real world and doesn't follow the rules/theory of acoustics that you learned.
2a. I'm not saying that you don't have "insight", I'm saying that some/much of that insight is invalid when you try to apply it to commercial music recordings!

3. You are NOT reading what I've written. I made it very clear that there is NOT one mic setup for recording an orchestra or other musicians, that the "conventions" are very broad and engineers are NOT married to them, as mic setups vary for every recording, even of the same thing and sometimes very significantly. Furthermore, the broad conventions we do have, exist because 70 years of global experience and competition broadly indicates what works in practice and what doesn't. You on the other hand are married to rules/conventions of acoustics that you learned in university which aren't even applicable to commercial music recordings!
3a. True, that's not the reason your knowledge is "nullified", the reason it's nullified is because it's either not applicable or you've simply made it up and it contradicts the facts/science anyway!!

4. No it does not! A "Head Related Transfer Function" is an ENTIRE SET of COMBINED parameters and ILD is ONLY ONE of those many parameters. You state you've studied and thought "a lot" about these things for many years, so how is it that you apparently don't know the basics of what HRTFs are?

5. Again, NO IT'S NOT, that's a philosophy you've simply made-up and is false! It is not "pretty easy" to "fix things" below 1kHz and crossfeed doesn't even attempt to, it attempts to fix ONE thing at the expense of making the "other things" worse (which is why we've since developed HRTFs)! And, it's NOT particularly important to have natural levels of ILD, because firstly we're not dealing with "natural" anyway and secondly, there are other parameters which are at least as important (which crossfeed makes worse), freq response being just one example. How many times???

G
 
Oct 7, 2019 at 6:22 AM Post #126 of 162
Just focus on the facts.

Easier said than done it seems. I consider myself a fact-based person. I'm beginning to think there's a lot of other aspects playing here than just facts. First of all the termilogy regarding headphone spatiality is a mess in the sense that we can't even agree about what words mean. I am not good at terminology. Headphone spatiality terminology isn't very established. We are arguing things like what does "natural spatiality" mean. This discussion needs the will of trying to understand what the other person tries to say and I am not sensing such will. On the contrary, if your use of termilogy isn't 100 % exact you will be attacked for it, harshly! So, no wonder this has been like this with all this not so established terminology. Terminology may be established in your workplace or in your country, but that doesn't mean it's established universally.

If we put an artificial head to a typical room with a typical speaker set up and measure the impulse responses from the speakers to the artificial head in typical listening spot we can detect the typical level of acoustic crossfeed that happens when listening to speakers. To my knowledge resulting max ILD is about 3 dB at low frequencies, rises to about 10 dB at 1 kHz and continues to rise up to 30 dB in the highest frequencies. We get a max ILD vs frequency curve.

What should we call headphone spatiality that doesn't exceed much this limit?
What should we call the perceived sound caused by headphone spatiality that does exceed this curve?

This is just one example of the teminology mess we have here. In my opinion having established terminology for things like this is needed, because this is how speaker listening differs drastically from headphone listening: speakers in a room regulate the ILD levels so that they will always be within the max ILD curve whereas headphones don't do anything and allow practically infinite ILD.
 
Last edited:
Oct 7, 2019 at 7:38 AM Post #127 of 162
1. That argument was NOT made to intimidate you, it was made to demonstrate a level of practical knowledge, experience and expertise that's valued, because the marketplace for professional engineers is highly competitive. Whether you let that intimidate you or not is entirely your issue. However, it should at least dissuade you from just making-up nonsense about what engineers are and do, but it clearly hasn't in your case!

2. And what did your university teach you about having say 30 different acoustics at the same time?? This is a fundamental issue that you just refuse to acknowledge! Some/Much of what you learned about acoustics is invalid with regards to music recordings because we're effectively dealing with fantasy "spatial information" that doesn't exist in the real world and doesn't follow the rules/theory of acoustics that you learned.
2a. I'm not saying that you don't have "insight", I'm saying that some/much of that insight is invalid when you try to apply it to commercial music recordings!

3. You are NOT reading what I've written. I made it very clear that there is NOT one mic setup for recording an orchestra or other musicians, that the "conventions" are very broad and engineers are NOT married to them, as mic setups vary for every recording, even of the same thing and sometimes very significantly. Furthermore, the broad conventions we do have, exist because 70 years of global experience and competition broadly indicates what works in practice and what doesn't. You on the other hand are married to rules/conventions of acoustics that you learned in university which aren't even applicable to commercial music recordings!
3a. True, that's not the reason your knowledge is "nullified", the reason it's nullified is because it's either not applicable or you've simply made it up and it contradicts the facts/science anyway!!

4. No it does not! A "Head Related Transfer Function" is an ENTIRE SET of COMBINED parameters and ILD is ONLY ONE of those many parameters. You state you've studied and thought "a lot" about these things for many years, so how is it that you apparently don't know the basics of what HRTFs are?

5. Again, NO IT'S NOT, that's a philosophy you've simply made-up and is false! It is not "pretty easy" to "fix things" below 1kHz and crossfeed doesn't even attempt to, it attempts to fix ONE thing at the expense of making the "other things" worse (which is why we've since developed HRTFs)! And, it's NOT particularly important to have natural levels of ILD, because firstly we're not dealing with "natural" anyway and secondly, there are other parameters which are at least as important (which crossfeed makes worse), freq response being just one example. How many times???

G
1. I have let this intimidate me, but not anymore. I am not saying much what engineers are and do. I am talking about how stereophonic sound that clearly has spatiality suitable for speaker listening should be modified to be suitable for headphone listening. I am interested to learn about sound engineering, but it's difficult to learn if you can't work in the business because you learn by doing. I don't have money to buy tons of mics to play with and learn. I don't have money to purchase even a proper DAW. Not everyone is so lucky, talented or have the connections to find himself/herself in the studio learning by doing.

2. Nothing. The courses I took didn't touch these issues. My University teached things like how you encode speech using linear prediction algoritms and how the air passage works to produce speech, how hearing works. It was very speech communication oriented and very little about music, just some basic stuff about music scales and instruments and how they generate the sound they do. I was teached nothing about how to record and produce music. Human hearing works how it works no matter how sound engineers produce music so that's why I think I am qualified to comment on things like ILD levels in headphone listening using my knowledge of human hearing. It's the same as when a person into ergonomics comments a chair without knowing how chairs are manufactured. You are like a chair manufacturer who tells the professor of ergonomics he/she can't know what chairs are ergonomic because he/she doesn't know about chair manufacturing. I don't know much about recording music, but I can tell the ILD level is too high on headphones because I know how spatial hearing works and what the ILD levels should be. If your artistic intention is to do against spatial hearing and have 12 dB of ILD at bass then good luck having that with speakers when the room acoustics regulate the ILD level at bass to 3 dB. We are looking the same scientific facts from different perspectives: You as a sound engineer and I as an acoustic engineer. You have to make sure what you do is within the professional conventions and traditions of your business. I don't.

3. I believe you because different recorgings have different spatiality.

3a. I dream of the day when people say my knowledge is applicable…

4. I do know very well what HRFT is and I know ILD is just one aspect of it. I have mentioned ITD and ISD several times so clearly I know about the other parameters. Crossfeed is efectively an ILD scaler so that is why I talk about ILD only most of the time. Crossfeed doesn't "mess up" the ITD. If it did, acoustic crossfeed with speakers would mess it up too. If anything, crossfeed clears up the temporal structure creating delayed correlation between the ears. It's kind of giving focus points. Personally I hear this as a much clearier stereo image where instruments are on their own places instead of all over the place as shattered pieces, but that's me. You hear things your way.

5. HRTF is clearly more advanced than crossfeed, but also more demanding. You need simple electronic circuit to do crossfeed, but you need high speed signal prosessing to do HRTF (+ measurements!). I don't have technology to do real time HRTF convolution, but I have technology to do real time crossfeed. It's a minimum phase analog circuit so no lip sync issues. If you can use HRTF then great, but I'm just saying crossfeed is much better than nothing.

Crossfeed does affect frequency response, it's in the ballpark of 1 dB. That's nothing, an insignificant price of getting the massive benefits of crossfeed. Room + speakers affect the response in the ballpark of 10 dB unless your room is acoustically very good, but the 1 dB accuracy is a dream. When I got into crossfeed I studied the downsides and concluded they are absolutely miniscule compared to the benefits.
 
Oct 7, 2019 at 7:43 AM Post #128 of 162
5.I wonder, if you put someone inside an anechoic chamber, outfitted him with a pair of completely open headphones (i.e. it does not affect external sound in any way), put his head in a vice, applied digital crossfeed equal to that which the speakers would have naturally and then played back the same recording on both a pair of speakers and the headphones (equalized to the same response given the test subjects HRTF etc.) would the difference in staging then be imperceptible? There is no longer any spatial cues from the speakers themselves that do not exist in the headphones. To make it even more fair you could even use a free field headphone such as the K1000, MySphere or SR1A so there is no earcup interaction. This would be a very interesting thing to test, and I wonder if something similar has been done yet?

nerdy concerns:
your initial conditions assume that the reference speaker playback also happens in the anechoic chamber(for those who, like me, didn't get it at first :sweat_smile:). it is your way to put room reverb aside and focus on the rest. with the actual same frequency response and crossfeed delay measured under those conditions, we can assume that we would get pretty much the same sound beside each device's own distortions(the linear ones could probably be copied pretty well if we use convolution instead of just measuring a FR at the ears). for a realistic situation, I would also suggest to remove low frequencies from test signals and sample tracks. because even in an anechoic chamber, you need the trippy absorbing wedgy or pyramidal stuff on the walls to be very long to deal with sub frequencies properly(I kind of think I'm remembering a rule of thumb with wavelength/4 for the absorbing material's length, but I could be completely wrong about this. anybody?). and we'd also need to do that to remove most sub vibrations felt by the body on speakers, and it would make it easier to reproduce the sounds on headphone without putting pads on the ears and let the listener know what's playing(maybe holding a K1000 over the listener's head would do that well). 3 birds, one high pass filter.
not sure that would affect positioning as subs are not what we rely on for that, but it would certainly be a clue about the room, give tactile bass, and tell when the headphone is being used instead. hard to tell(for me at least ^_^) how much all those cues might affect the global listening experience?

once we effectively get about the same sound, I'd assume that the listeners would place sound sources at the same place. under those conditions, I would expect that much. but I have to admit that it's only my educated guess. that situation is too different from the stuff I've tried or read about.
 
Oct 7, 2019 at 8:29 AM Post #129 of 162
nerdy concerns:
your initial conditions assume that the reference speaker playback also happens in the anechoic chamber(for those who, like me, didn't get it at first :sweat_smile:). it is your way to put room reverb aside and focus on the rest. with the actual same frequency response and crossfeed delay measured under those conditions, we can assume that we would get pretty much the same sound beside each device's own distortions(the linear ones could probably be copied pretty well if we use convolution instead of just measuring a FR at the ears). for a realistic situation, I would also suggest to remove low frequencies from test signals and sample tracks. because even in an anechoic chamber, you need the trippy absorbing wedgy or pyramidal stuff on the walls to be very long to deal with sub frequencies properly(I kind of think I'm remembering a rule of thumb with wavelength/4 for the absorbing material's length, but I could be completely wrong about this. anybody?). and we'd also need to do that to remove most sub vibrations felt by the body on speakers, and it would make it easier to reproduce the sounds on headphone without putting pads on the ears and let the listener know what's playing(maybe holding a K1000 over the listener's head would do that well). 3 birds, one high pass filter.
not sure that would affect positioning as subs are not what we rely on for that, but it would certainly be a clue about the room, give tactile bass, and tell when the headphone is being used instead. hard to tell(for me at least ^_^) how much all those cues might affect the global listening experience?

once we effectively get about the same sound, I'd assume that the listeners would place sound sources at the same place. under those conditions, I would expect that much. but I have to admit that it's only my educated guess. that situation is too different from the stuff I've tried or read about.

Yes, because this was purely theoretical and wholly impractical (because of non-linear distortions, the need for headphones that are absolutely unobtrusive and so on) I didn't go into super deep detail about exactly what would be needed. I'm sorry about not properly spelling out that the speakers would of course be in the very same anechoic chamber. I did also think about the condition for bass because it is felt as much as heard but I don't know how much feeling the bass on your body really affects staging, especially frontal staging. This was mostly just a... nerdy rant? Something like that :)
 
Oct 7, 2019 at 10:27 AM Post #131 of 162
Sorry, I am not a native English speaker and have no idea what the word vice means in this context (dictionary was not helpful either).
This is a vice:
upload_2019-10-7_16-26-32.jpeg


"Put his head in a vice", in this context, means stabilize the head so directional auditory cues don't change. It is a joke. Stabilizing heads has to be done sometimes (I've done it), but not with a vice.
 
Last edited:
Oct 7, 2019 at 10:37 AM Post #132 of 162
Aha, earlier I googled 'vice' and only found 'bad habits'.
Now translating back from dutch I found 'vise' with an s so that's the correct spelling I assume.
 
Oct 7, 2019 at 12:33 PM Post #134 of 162
(I kind of think I'm remembering a rule of thumb with wavelength/4 for the absorbing material's length, but I could be completely wrong about this. anybody?)
I know a rule of thumb that goes something like: "If a soundwave travels down a tube longer than its 1/4 wavelength, it gets amplified." Could you be thinking of that?
 
Oct 7, 2019 at 1:20 PM Post #135 of 162
I've just ordered a vice from Amazon! Great post!

A mono mix will sound quite flat no matter how good your room and speakers are

That's not actually true. It sounds flat if you are playing it back with a normal stereo system with two speakers. But mono from a good quality single speaker in the right spot in a good room can make a huge improvement in spatiality. Imagine a piano on the stage of Carnegie Hall. It is a single point source of sound. But the sound radiates outwards, reflecting off the walls and bouncing back, creating tremendously rich depth cues. It's true on a smaller scale in a living room.

I collect antique acoustic phonographs, and one of the instructions they gave purchasers was to put the phonograph in the corner of the room, pointing to the center of the room. Then sit slightly off axis to listen. This does several things... it uses the corner as an extension of the horn, extending the bass response a bit. It also projects the perceived sound source away from the phonograph towards the center of the room. And the natural acoustics of the room to wrap around the dry recording (no reverb) to create an envelope of room depth cues that exactly match the ones that would be heard if the live singer was standing in the corner performing. When you present the sound like this, the naturalness and presence is uncanny, even though you're playing sound with a steel needle that wasn't even recorded electrically. It is possible to give depth to a mono recording using room acoustics, but it takes a different approach to the sort of setup that favors stereo soundstage.

Easier said than done it seems. I consider myself a fact-based person. I'm beginning to think there's a lot of other aspects playing here than just facts.

My suggestion is before you hit reply, make an effort to understand what the other people are saying to you. Neither I nor Gregorio nor Castle are having the problems you are. Perhaps if you tried to understand what we were talking about and think of the definitions of the terms we are using, you might not have those troubles communicating.

Here is a hint... Gregorio asked you to imagine "30 different acoustics at the same time". That is the crux of your problem. You're focusing on a single aspect at a time and ignoring the fact that the effect of a room on sound is incredibly complex. Add the artificial acoustic cobbled together into the mix of the music itself, and trying to boil that down to a single variable is impossible.

Say you had a nice mono recording of Horowitz playing Mussorgsky's Pictures at Carnegie Hall and you wanted to make it sound exactly like you were in the audience at Carnegie Hall on your headphones. How would you go about doing that? It would be a hell of a lot more than crossfeed. You would even have trouble doing that with the most sophisticated multichannel system. The envelope of the room is an extremely complex variable.

Even simplifying it from Carnegie Hall down to your own living room would be extremely difficult... maybe even more difficult because achieving the proper scale of the reflections would require some pretty sophisticated computer processing and head tracking... you're back to the Smyth Realizer, not crossfeed.

Crossfeed is nice if you like it and it is set properly. But it doesn't restore spatial cues you would hear if you were listening to speakers in a room... not even close.
 
Last edited:

Users who are viewing this thread

Back
Top