Speaker vs. Headphone Soundstage / Positional cues / Imaging
Oct 12, 2011 at 6:29 PM Post #17 of 42


Quote:
Quote:
 
That did make sense and that was in line with how I understood depth / positioning to be perceived.  I think the crux of my question lies in the bolded portion above.
 
Why is that?  Wouldn't the relative positioning of width and depth change if you played the same recording on two loudspeakers facing each other?  (or am I wrong there?)
 
I'm basically equating the headphone experience to the experience of having two loudspeakers facing each other.
 

Headphones are indeed akin to sitting between two speakers.  :)
 
The primary difference between this and normal speaker positioning is the timing.  Consider two speakers 10' apart.  The imaging is great when you sit 10' away, centered between the speakers. 
 
As you walk forward, the distance between the speakers remain constant.  However, the distance between you and the speakers decreases.  The soundstage becomes more diffuse and may collapse, but things on the left still sound on the left.  The width is in the same place.  The same with depth, although this effect falls apart more quickly.
 
We are astoundingly sensitive to timing.  Sound travels roughly a foot per millisecond (1/1000).  Yet, with some experience, one can detect that speakers 10' apart have been moved a fraction of an inch. 
 
Keep in mind our ears do not care where the sounds actually come from.  All sound we experience enters our heads from the sides.  :)  It is because of spatial cues that we have experience that we learn to localize sound. 
 
Binaurel recordings work because the mics record sound just as our ears receive the sound, roughly 6" apart.  Played back on headphones this can be strikingly real.  Played back on speakers they are a mess.
 
 
 
Oct 12, 2011 at 6:32 PM Post #18 of 42
Quote:
Not sure I understand your post.  I'm specifically talking about non-binaural recordings here.  
 


just basically saying certain headphones try to eliminate ''in your head'' type sound and flawed imaging by using special equalizations in specific sound chambers using head dummies in the room with omni-directional mics in the ears. these techniques were usually called ''free-field natural hearing'' or ''binaural hearing'' or so forth.

it has nothing to do with the type of recording. these headphones were designed basically as special monitoring tools to sound close to speaker-like imaging.

i also mentioned crosstalk cause speaker power amps are always grounded for both left and right channel so speakers don't suffer from crosstalk like headphones do cause most headphones use a shared ''ground/negative'' wire while speakers always uses separate positive and negative. lot of these specially equalized headphones are made with balanced connectors(balanced 1/4'' TRS,3 or 4-pin XLR,DIN,ect.) to prevent crosstalking cause each channel has a dedicated ground/negative connector which explain also better stereo imaging.
 
Oct 12, 2011 at 6:54 PM Post #19 of 42
 
 
Quote:
i also mentioned crosstalk cause speaker power amps are always grounded for both left and right channel so speakers don't suffer from crosstalk like headphones do cause most headphones use a shared ''ground/negative'' wire while speakers always uses separate positive and negative. lot of these specially equalized headphones are made with balanced connectors(balanced 1/4'' TRS,3 or 4-pin XLR,DIN,ect.) to prevent crosstalking cause each channel has a dedicated ground/negative connector which explain also better stereo imaging.


Supposed benefits of balanced drive for headphones don't lie here I'm afraid. As far as speakers being immune to cross-talk, think about how each speaker is heard by both ears vs. headphones and tell me again that they don't suffer of xtlak :wink:.
 
 
Oct 12, 2011 at 7:02 PM Post #20 of 42


Quote:
Quote:
Understood.  The question I asked of Wapiti, please take a crack it.
 
And I'm glad everyone liked my drawing!  
tongue.gif
  



I don't know, his second reply is just as brilliant as the first one. what to add?
 
BTW, you can perceive depth from a single speaker placed in front of you (through the amount of reverberation in the recording), which does not quite tie up with the chart you draw.
 
Another example: using cross-talk cancellation techniques, you can reproduce some form of 3D sound field with just 2 speaker placed right next to each other in front of you (look up ISVR work on auralization). I think I could come up with many more examples but you see the point...
 
Oct 12, 2011 at 8:36 PM Post #21 of 42
 
 
Supposed benefits of balanced drive for headphones don't lie here I'm afraid. As far as speakers being immune to cross-talk, think about how each speaker is heard by both ears vs. headphones and tell me again that they don't suffer of xtlak :wink:.
 


i would be lying if i say they didn't. even though each speaker output is grounded to help prevent it doesn't mean it'll help eliminate crossover crosstalk. it's usually the faults in the speaker's internal crossover design that still produces crosstalk when fed certain voltages when the frequency spikes and dips.
 
Oct 12, 2011 at 11:04 PM Post #22 of 42


Quote:
BTW, you can perceive depth from a single speaker placed in front of you (through the amount of reverberation in the recording), which does not quite tie up with the chart you draw.
 


Excellent point.  Relative volume, timbre, comparative presence of detail can all contribute to create depth in mono.
 
A good mono recording is exceedingly satisfying.  Most of the Beatles' recordings were originally mixed to mono.  The sound is incredible.
 
Oct 13, 2011 at 12:32 AM Post #23 of 42
Quote:
All sounstaging, imaging, etc. is delicious artifice.  Our ears and brains interpret the sounds coming out of speakers as a representation of actual performers, based upon the cues we have learned from every day experience.
 
I record orchestras and chamber ensembles.  Let's consider a simple microphone setup recording an orchestra: two mics, 15' in the air, a foot apart, cardiod capsules, the left mic pointing far left and the right far right.
 
Stereo positioning/imaging is a result of differences in timing and sound pressure.  A violin to the immediate left will be louder in the left mic and will arrive sooner than the right.  A tympani's sounds on the far left will arrive even later.  The sound of the flutes in the middle arrives at the same time and at the same intensity.  When speakers reproduce these recorded sounds the positional cues allow our brains to place the images in the sound stage from left to right.
 
Depth is slightly more complicated.  Sounds that are further away are less loud.  They also contain less treble energy (consider how a band at a distance sounds very bass heavy, thumping).  The further away sounds will also be accompanied by more sound of the room (close sounds contain little echo/reverberation - far sounds much more room reverberation.)  The sound of a violin in the front row is brighter, louder and has less room ambiance than a violin in back.  We also rely on our experience.  We know a trumpet will drown out a flute.  Thus our brains will interpret the trumpet further back if we can hear the flute..
 
Headphones have difficulty reproducing imaging and depth.  For example, our brains rely on each ear hearing both speakers.  As another example, the best imaging typically occurs when the speakers are the same distance apart and the listener is this same distance from each speaker.  This timing relationship is not maintained with headphones. 
 
The relative positioning of "width" and "depth" do not change with headphones.  They are just harder to discern, particularly with multi-tracked studio recordings as there is no real life counterpart - we do not know what is real.
 
I hope this makes sense.

 
Quote:
Headphones are indeed akin to sitting between two speakers.  :)
 
The primary difference between this and normal speaker positioning is the timing.  Consider two speakers 10' apart.  The imaging is great when you sit 10' away, centered between the speakers. 
 
As you walk forward, the distance between the speakers remain constant.  However, the distance between you and the speakers decreases.  The soundstage becomes more diffuse and may collapse, but things on the left still sound on the left.  The width is in the same place.  The same with depth, although this effect falls apart more quickly.
 
We are astoundingly sensitive to timing.  Sound travels roughly a foot per millisecond (1/1000).  Yet, with some experience, one can detect that speakers 10' apart have been moved a fraction of an inch. 
 
Keep in mind our ears do not care where the sounds actually come from.  All sound we experience enters our heads from the sides.  :)  It is because of spatial cues that we have experience that we learn to localize sound. 
 
Binaurel recordings work because the mics record sound just as our ears receive the sound, roughly 6" apart.  Played back on headphones this can be strikingly real.  Played back on speakers they are a mess. 


Wapiti, both your posts were astoundingly informative.  Thanks.  It took me a few passes to grasp everything, and to be honest I'm still not sure I have completely grasped everything.  I think Arnaud is right, I need to find a good book on the topic.  
 
After reading your second post I had to go back to your first one.  Now I'm able to understand how the width is intact going from speakers to headphones and how "headphones [would] have difficulty producing imaging and depth."  
 
Oct 13, 2011 at 10:57 AM Post #24 of 42


Quote:
Quote:
 
Wapiti, both your posts were astoundingly informative.  Thanks.  It took me a few passes to grasp everything, and to be honest I'm still not sure I have completely grasped everything.  I think Arnaud is right, I need to find a good book on the topic.


Excellent!
 
It is fascinating stuff and a good share of it is non-intuitive for most.  Plus, even though the science is well established, a lot of it still appears to be magic.
 
 
Oct 13, 2011 at 9:13 PM Post #25 of 42
This very topic is what brought me to this place and higher-end headphones in general, actually...but I'm coming from a different perspective.
 
Most, if not all, of you are aiming to reproduce the experience of sitting in front of a live band or an orchestra from pre-recorded sources, most of them made with stereo loudspeakers in mind.
 
I'm aiming for the more accurate and immersive imaging in games, with sound environments generated and changing on-the-fly, ideally not pre-mixed for any speaker or headphone configuration. "Where did that gunshot come from? Where did it land? Did those footsteps pass by on the floor above or below? How far away are they?"
 
For a long time, I thought I'd need a 5.1 or 7.1 speaker system for that...then I tried CMSS-3D Headphone and decided I didn't want loudspeakers and their hassles (particularly room-related ones) any more, because I was getting exactly what I wanted now using nothing more than a competent pair of stereo headphones. Sure, the sound's a bit muffled as a side effect (possibly a necessity for the aural location cues to work), but the imaging improvement and immersion more than made up for it. (My only issue now is that newer games tend to not have a proper 3D sound environment for binaural techniques to work their best, but instead pre-mix to 7.1 at most and stereo at worst, which at best allows a 2D sound scape devoid of height, and at worst a one-dimensional sound scape only spanning left and right, with no depth.)
 
Now I'm curious as to how to improve these binaural implementations for even better positioning and sound quality, possibly to the point where the sound quality is unchanged other than that the user can suddenly hear exactly where everything is coming from. I've heard great things about the Smyth SVS Realiser here, but that requires a recording of the user sitting in an existing home theater system. Even then, the imaging would be inherently limited to 7.1 until speaker formats with even more loudspeakers strewn across the room become commonplace. Fine for movies, not so fine for games that should not be constrained to any fixed configuration of speakers. What would it take to get that sort of personalized HRTF with no physical, real-world reference to use? (If anything, the use of generic HRTFs might be the bottleneck more than anything else...)
 
Also, one more thing I should point out is that I still don't have much of a sense of distance with headphones, certainly not as I would with speakers, but it still seems like I have a greater sense of exact direction (to the point of knowing that the sound source is roughly 50 degrees to the right and 30 degrees upward, more than just a simple "off to the front right") due to binaural techniques. I don't know if that's a side effect of knowing that the things generating sound are right next to my ears, or if it's something that just won't be resolved with a traditional headphone without a binaural technique that uses the listener's exact HRTF.
 
(By the way, that SR-007 Omega II article linked earlier was a great read.)
 
Oct 14, 2011 at 1:47 PM Post #26 of 42
I am surprised that games do not provide a good headphone mix to provide what you describe.  The sounds in games are synthetic and easily implemented.  Processing can make sounds swirl around your head, go up and down, etc.  Perhaps gaming with headphones is not as popular as I would guess it is.
 
Oct 15, 2011 at 3:29 AM Post #27 of 42
 
Quote:
I am surprised that games do not provide a good headphone mix to provide what you describe.  The sounds in games are synthetic and easily implemented.  Processing can make sounds swirl around your head, go up and down, etc.  Perhaps gaming with headphones is not as popular as I would guess it is.


As surprising as it seems, the impression I get is that sound is more of an afterthought in games these days, and worse, game developers run under the assumption that everyone plays games with a surround speaker setup.
 
However, in the older days of PC gaming, the way DirectSound3D and OpenAL presented sound was such that it just maps out the location for each sound in a virtual 3D space. It's up to the sound device's driver to decide where and how to play back those sounds. This is why CMSS-3D Headphone is quite effective in such games; it's directly processing each sound source in the virtual 3D space, not mixing them to virtual speaker positions and then processing those! As flawed as its rendition of binaural audio may be, no other binaural filter I know of does this-not Dolby Headphone, and certainly not the Smyth Realiser.
 
The developers of XAudio2, FMOD, and so forth apparently think they're doing us a favor to pre-mix everything to 7.1 or stereo with everything in between, before it hits the sound device driver...and there's no binaural option in sight, because headphones are probably seen as totally incapable of surround sound with just two drivers (or they can't be bothered to develop a binaural filter of their own). For whatever reason, most PC games released nowadays use one of those software-driven APIs, presumably because they also work on consoles, and consoles tend to be the lead development platforms nowadays. On top of that, consoles are generally expected to be used in a living room with a big home theater speaker system...those who use Astro Mixamps and similar DAC/DSPs with headphones recommended here are still a minority.
 
I still wish more games had true binaural implementations (which would be much, much easier if OpenAL was more prevalent amongst newer PC game releases), but the problem of everyone having a unique HRTF is still a barrier to mass adoption. There needs to be some level of tweakability that even the layman could understand.
 
Oct 15, 2011 at 4:41 AM Post #28 of 42
I have a feeling of deja-vu, didn't we discuss this in the Realiser thread? It looks to like you're asking Sony to come up with an auralization headset as extension to the next generation playstation! Something that would use dummy head HRTFs as process them in realtime to generate a binaural headphone signal. The headset would be compensated (as you don't want to go through the pinna twice...). The difference with the realizer is that the bank of HRTFs wouldn't be personalized (you'd need access to an acoustics laboratory to get them otherwise with full anechoic chamber). I guess it misses the point (such artificial virtualizers have been sold before, including the plug-ins you mention I guess) but I am not sure we can practically expect there will ever be a market for a product which requires you to take a flight to one of the few test cells in the worlds equipment with such measurement system...
 
Practically speaking, the only route is probably to offer multiple banks of HRTFs, maybe some research is done to identify population trends. Already, in the other thread there were mentions of neat decomposition techniques to extrapolate HRTFs between azimuth / elevations (because you can't possibly measure them all). Similarly, with if some gross dimensions of the ear lobe / head, you could come up with some improved (yet generic) HRTFs, I'd see this thing going somewhere...
 
Oct 15, 2011 at 4:45 AM Post #29 of 42
Or, yet another option would be to to remove room and speaker effects from the PRIR (by doing 2 measurements, one with you, one without you can at least equalize out the speaker dynamics). If the speaker is placed much closer to the ear and the room is large enough / sufficiently acoustically treated  (you need to capture only the direct field of the speaker + diffracted field around the head, much before the first room reflections), this may also be possible!
 
Oct 17, 2011 at 12:19 PM Post #30 of 42


Quote:
The primary difference between this and normal speaker positioning is the timing.  Consider two speakers 10' apart.  The imaging is great when you sit 10' away, centered between the speakers.   
As you walk forward, the distance between the speakers remain constant.  However, the distance between you and the speakers decreases.  The soundstage becomes more diffuse and may collapse, but things on the left still sound on the left.  The width is in the same place.  The same with depth, although this effect falls apart more quickly.
 
We are astoundingly sensitive to timing.  Sound travels roughly a foot per millisecond (1/1000).  Yet, with some experience, one can detect that speakers 10' apart have been moved a fraction of an inch. 
 

 




Hi,
Might I add a little here.  As has already been explained, the 3D image you perceive is a learnt response.  There are a few inferences that can be drawn from this situation. 
 
1.  In order to get a good perception of soundstage the brain must have a 'reference' against which it can compare what is being heard in order to interpret the apparent location of the different sound sources.  For instance,if you were listening to a recording of a grand piano that was miked by a single stereo mic 5ft 7in off the floor about 8ft away from the piano, how could the brain interpret what it was hearing without having some idea of what a grand piano should sound like at this distance?  Sadly, many recordings of pianos are miked at this distance. When you hear all the mechanics of the piano (damper buzz, pedal thump, bushing clicks, body noises from the pianist
blink.gif
) , it is a dead give away.  At a reasonable distance a solo grand piano sounds very different and has little to no sound stage of its own, the sound stage consists almost entirely of the reverberant recording space.
 
2.  In order to get a good perception of the sound stage the brain must receive sufficient information in order to guess the location of the various sound sources. To illustrate this I will use an example from my youth.  For a short while I worked in a large open plan office space where there were literally 100s of phones.  When I started there all the phones were dial phones with mechanical bells (yes, I am that old), it was quite easy to determine the location of a ringing phone, even a distant one, indeed you didn't even need to look up or move.  However, the office was converted to push button phones with electronic ringers. After this is was very difficult to determine the location of a ringing phone, often requiring you to turn your head this way or that, even getting up sometimes to get a different perspective.  The problem was that although the mechanical bell had a lot of quite complex information in its sound signature, information that changed with the relative position of the phone, the electronic ringer had very little information and thus gave virtually no 'tonal' clue to it location and thus required the much more primitive method of location by moving the head.
 
These two points will do for the moment, even though there are a number of other things that can be considered.
 
Does distortion (harmonic) compromise the sound stage by altering the tonal sound of musical instruments (despite the fact that mechanical musical instruments have partials that are often not truly harmonic to the fundamental).
 
Does IMD further compromise the sound stage by adding 'hash' to the sound?
 
Does a small difference in FR cause issues with the imaging?
 
How can you have a good perception of sound stage without a reference of what the sound should be?
 
How long does it take to form a good reference if a reference is required?  Does this have implication for some other things such as burn in?
 
How does all of this tie in with the claims made for improvement in sound stage due to seemingly unrelated changes to the system such as cables?
 
 
A couple of other interesting things. 
 
Large planar diaphragm headphones such as electrostatics exhibit high order modes where they radiate sound from different parts of the diaphragm at different frequency.  For instance, at certain frequencies, much of the sound will be radiated from the outer ring of the diaphragm whereas at other times it will be radiated from the centre as expected (rectangular diaphragms are even worse with dominant modes in the corners).  Since out ears and especially the pinna are well within the near field of the diaphragm, do these modes alter the perception of sound stage through changes of the radiating position of the sound?
 
I was interested in the part of Wapiti's post quoted above.  I am not so sure about how sensitive our ears are to discrete changes to timing.  Yes, I could hear changes in the sound when I moved my speakers, although certainly not down to fractions of an inch.  However, I have mostly considered this more a response to changes in the reverberant field (most of my speakers were dipole or binomial radiators) as I could also perceive change when the angles were changed without any changes in the path distances.
 
Is it possible to get a convincing sound stage from an electronically produced sound source such as a game?  Is there sufficient complex information available to be convinced of what you perceive?
 
'Tis indeed an interesting subject,
 
Regards,
Bob
 

Users who are viewing this thread

Back
Top