Headphone vs. Speaker sound-field discussion.
Aug 9, 2011 at 10:46 AM Thread Starter Post #1 of 22

googleli

Headphoneus Supremus
Joined
Dec 17, 2008
Posts
2,300
Likes
59
Frankly I don't think any headphone can stand a chance against speakers in producing the sound field which imitates live music. Many hifi-ers go to a lot of orchestral performances and compare with their speaker systems to see what they can achieve. Although I think no gear on earth can ever replace true live performances, some of the speaker setups can get real close. With headphones it is just not possible - maybe until the Smyth Realizer becomes a popular built-in feature for most headphones, and until all recordings come out in a "normal" version and a binaural verion.
 
Quote:
The problem I see with this is, there were people saying that, in the studio, music they recorded sounded exactly the same when they played it back through their HD-800s.  Someone else said that the LCD-2s sounded like live music they'd heard or recorded (I forget which). Erm...it's not going to work, is it? 
evil_smiley.gif

 



 
 
Aug 9, 2011 at 3:45 PM Post #2 of 22


 
Quote:
Frankly I don't think any headphone can stand a chance against speakers in producing the sound field which imitates live music. Many hifi-ers go to a lot of orchestral performances and compare with their speaker systems to see what they can achieve. Although I think no gear on earth can ever replace true live performances, some of the speaker setups can get real close. With headphones it is just not possible - maybe until the Smyth Realizer becomes a popular built-in feature for most headphones, and until all recordings come out in a "normal" version and a binaural verion.
 


 



I am always amazed at how few people, even in a headphone forum are  aware that loudpeaker listening creates phantom channel artifacts because each ear hears both channels.  Speaker image localization/soundfield is very poor for this reason compared to headphones.  I agree that binaural recording with headphones  is the best, but I don't expect it to ever catch on.
 
 
Aug 12, 2011 at 2:41 PM Post #3 of 22


Quote:
 


I am always amazed at how few people, even in a headphone forum are  aware that loudpeaker listening creates phantom channel artifacts because each ear hears both channels.  Speaker image localization/soundfield is very poor for this reason compared to headphones.  I agree that binaural recording with headphones  is the best, but I don't expect it to ever catch on.
 


And yet when I play my Maggie setup that cost half of my O2/BH to people that came over I have never heard anyone say the headphones sounded better (includes "regular" folk as well as audiophiles).
 
Call it what you will, good speakers will create a more realistic image in front of you, that can't be achieved with headphones unless you have the Smyth Realiser.
 
 
Aug 12, 2011 at 2:45 PM Post #4 of 22


Quote:
Call it what you will, good speakers will create a more realistic image in front of you, that can't be achieved with headphones unless you have the Smyth Realiser.
 


Speakers sound different than headphones. Given my requirements I'll take the O2 over a Maggie, especially ones that blend (awkwardly) quasi-ribbons and planar magnetic drivers. The top to bottom cohesion is nowhere near as good as the Stax.
 
Aug 12, 2011 at 5:11 PM Post #5 of 22
You still don't get it.  There's nothing especially realistic about a speaker sonic  image other than that it is in front of you.  Localization is poor.  The Smythe gadget does not get around the limitations posed by the fact that each speaker is heard by both ears. 
 
Aug 12, 2011 at 5:38 PM Post #6 of 22
Oh don't worry I get it :wink:
 
I've actually listened to my speaker setup and know what it sounds like, the room is very well treated (back front and sides) with no rack in between the speakers and no TV. The imaging is more realistic (that means localization is good). At a live orchestra/jazz concert the sound doesn't sound like small images placed on either ear. The image is in front of you and more distant sounds are muted, and more fuzzy. Closer sounds are clearer and louder. It is the same with image placement.
 
Aug 12, 2011 at 9:19 PM Post #7 of 22


Quote:
You still don't get it.  There's nothing especially realistic about a speaker sonic  image other than that it is in front of you.  Localization is poor.  The Smythe gadget does not get around the limitations posed by the fact that each speaker is heard by both ears. 



 
@edstrelow
 
Have a look at this microphone: http://www.soundfield.com/products/sps200.php. Then imagine an Head-Related-Transfer-Function (HRTF) algorithm that places four channels at the right elevation an azimuth. Do you still think the 3D sound field is a phantom image?
 
NHK is going further with 22.1 channels: http://www.nhk.or.jp/strl/english/aboutstrl/annual2009/en/r1-3-1.html. You can stack two Realisers for 16 channels. Both in 3D sound field... Both for HRTF headphones playback.
 
Of course, there is no audio content available... Yet.
 
Aug 13, 2011 at 10:36 AM Post #8 of 22


Quote:
@edstrelow
 
Have a look at this microphone: http://www.soundfield.com/products/sps200.php. Then imagine an Head-Related-Transfer-Function (HRTF) algorithm that places four channels at the right elevation an azimuth. Do you still think the 3D sound field is a phantom image?
 
NHK is going further with 22.1 channels: http://www.nhk.or.jp/strl/english/aboutstrl/annual2009/en/r1-3-1.html. You can stack two Realisers for 16 channels. Both in 3D sound field... Both for HRTF headphones playback.
 
Of course, there is no audio content available... Yet.


Sorry, I can't follow your reasoning. If I understand correctly, you're showing 2 examples of recording techniques to pick-up audio from different directions (more than 2), which you are then supposed to play back with a multi-speaker system (more than 2 also). Do you mean that regular stereo recordings are just based on the same principle, except based on 2 speakers in front of you at say 30-45 degrees and there's nothing really wrong with it?
 
Where I am confused is that, regardless of the number of independent headings you're picking up in the recording, you're still facing some challenge with reproducing this in a regular listening environment. For example, I don't see how the 22.2 thing makes any sense when you think that you get a blend from both direct and reverberant fields. So, while you can use mic arrays and clever processing to obtain very directional signals, you can't possibly beam in the same way once you play it back (because nobody listens in an anechoic chamber).
 
But note however that this has nothing to do with speaker induced cross-talk. Actually edstrelow, I am not sure I follow your point either. Please correct me where I am wrong, but isn't this cross-talk between the each speaker and both ears a problem only when trying to replay binaural recordings? For instance, any instrument playing in front of you is heard by both ears and there's nothing fundamentally wrong with using a pair (or more) of loudspeakers to realistically reproduce this. As I mentioned above, I believe where this is elusive is that the room kills it all with lots of early / late reflections that pollute the imaging (and the tonality too but that's another topic).
 
Quote:
Dear Arnaud,  Have you ever compared the protype C32 with the SR009 which is commercially available.  Do you think there is any difference?
     I have been using SR009 for quite a while and it seems to be better than the O2 mkI in some aspect : soundstage, imaging, detail. transparency but lack the darker tone and lush midrange of the O2 mkI .  Again comparing to the SR Omega,  SR009 has a little smaller soundstage but more precision, imaging,  equal detail.  The pace of the the SR009 is more forwarding or a little bit faster than the SR Omega, O2mkI or O2mkII.   The tone of the SR009 is more on the bright side like the O2mkII but better in other aspect.   I do not use  the SR009 much now as well as some of my friend who have all of the headphones mentioned, we only use SR omega or HE90. 
     So I really wonder is there any difference between the prototype and the commercial one.  Ours do not have the channel imbalance or noise and I listen to almost all kind of music that is listenable.   Why the production to send outside Japan is so slow?


Dear Kiertijai,
No, I really just can compare the notes I took at the 2 times I listened to the C32 prototype (october 2010 and jan 2011) to the more recent ones since owning the 009. But, as I recall, I don't think my impressions have changed at all since the beginning. When I listened to the C32 in January, I did feel that some not so good recordings did not pass through well and this got confirmed in my own system / going through my library. As I recall, Stax said in a video interview that only little cosmetic changes were made between the C32 and 009 (like the hinge connecting the earcup swivel to the headband). I don't remember in detail but you could look it up on the other site.
As for your preferring the original Omega to the 009, I guess this is all natural, you've found the one model that sounds best to you and have sampled a fairly large number of high end headphones. I haven't had the chance to experience the original Omega, ignorance is bliss as they say :wink:.
 
As your our pairs being ok, let just say we've been knocking on wood given the large number of faulty pairs. For the international order still not getting fulfilled, I have absolutely zero information. I get updates from local store here but they are not concerned about export units since official sales channels have territorial restrictions.
 
 
Aug 13, 2011 at 8:39 PM Post #9 of 22
Not sure what the crosstalk thingy is, but the imaging produced by speakers is still way better than any headphones or IEMs I have, and my speakers are only a fraction of the price of the 009. However source plays a big part here. There is a substantial improvement in terms of 3D imaging when I upgraded from D07 to K01, most noticeably in the Y axis. I have a full blown 7.1 home theatre system but am still amazed at the 3D sound field just the two front speakers can produce via the Esoteric and Leben when I listen to stereo music.. Almost each friend who listened to it asked me whether the voice came out from the center speaker when we try vocal, and whether the surround and surround back speakers were on when we listened to instrumental jazz, while only the left and right front speakers were playing.
 
Aug 14, 2011 at 8:03 AM Post #10 of 22
 
 
Quote:
Originally Posted by arnaud /img/forum/go_quote.gif
 
Sorry, I can't follow your reasoning. If I understand correctly, you're showing 2 examples of recording techniques to pick-up audio from different directions (more than 2), which you are then supposed to play back with a multi-speaker system (more than 2 also). Do you mean that regular stereo recordings are just based on the same principle, except based on 2 speakers in front of you at say 30-45 degrees and there's nothing really wrong with it?
 
Where I am confused is that, regardless of the number of independent headings you're picking up in the recording, you're still facing some challenge with reproducing this in a regular listening environment. For example, I don't see how the 22.2 thing makes any sense when you think that you get a blend from both direct and reverberant fields. So, while you can use mic arrays and clever processing to obtain very directional signals, you can't possibly beam in the same way once you play it back (because nobody listens in an anechoic chamber).
 
But note however that this has nothing to do with speaker induced cross-talk. Actually edstrelow, I am not sure I follow your point either. Please correct me where I am wrong, but isn't this cross-talk between the each speaker and both ears a problem only when trying to replay binaural recordings? For instance, any instrument playing in front of you is heard by both ears and there's nothing fundamentally wrong with using a pair (or more) of loudspeakers to realistically reproduce this. As I mentioned above, I believe where this is elusive is that the room kills it all with lots of early / late reflections that pollute the imaging (and the tonality too but that's another topic).
 

 
My phrasing was not clear indeed. 
redface.gif

 
I was referring to both recording techniques reproduced with: headphones (so the listening room is out of the equation) and playback DSP (a head related transfer function that transform multichannel content into idiosyncratic two channel output; okay the measured playback room reference has its influence here, but the DSP is able to deal with that). 
 
How two ears are able to sense sound sources in a 3D field? Suppose a singular sound source (like a bird) is within an imaginary sphere. Roughly: a) Inter-aural delays would explain the horizontal displacement (azimuth cues); b) tonal modulation from head, torso and outer-ear would explain vertical displacement (elevation cues*); and, c) reverberation would explain source distance (that such singular source is within a near or further imaginary sphere, in other words, different radius). 
 
Is it possible to fix all that cues in mass distributed audio content. I believe it is not. We have several problems: at least two of those cues are very idiosyncratic and one is very room dependent. So XY microphone pattern or the Neumann KU-100 is not an ideal solution.
 
With 2, 5.1 or 7.1 channels content you are able to reconstruct horizontal displacement (azimuth cues) by crosstalk in your listening room. As you pointed out, source distance is more problematic given that your listening room imprints its own reverberation mode. You may add ambience to this recording with a Neumann KU-100, but this will not translate into precise elevation cues, which are, I believe, very idiosyncratic.
 
So the Realiser comes into the playback chain of regular 2, 5.1 or 7.1 channels content. You are able to capture your azimuth and elevation idiosyncratic cues and your ideal reverberation listening room. Then a function transforms your audio multichannel stream into a two channel headphone output. What do you have here? You will listen to a very convincing out of the head circle on the horizontal plane. Do you have a sphere? Do you have a 3D sound field? Nope. 
 
Then you take such function and add some variables that allow placing such your virtual speakers (a fixed base that comes from your recorded content and feed your HRTF computation). Believe or not, the Realiser does that, allow the user to change azimuth and elevation of the virtual speaker (see Realiser A8 manual, page 27). Reverberation of the playback room and speakers proximity(!) can also be altered (see Realiser A8 manual, pages 55, 56). If the recorded content has two layers (NHK example) or a 3D omnidirectional pattern (SPS200), voilà, now you are able to place your virtual speakers into the right virtual spot and reproduce a 3D sound field.
 
But why NHK needs to fix 22.2 channels instead of 4? When the target for your audio content is not only people with a DSP and headphone playback, but a movie theater audience, then such original tracks might be useful to reduce the influence of the sweet spot in the latter. At home theaters, less channels are needed. NHK mention that:
 
 
Quote:
22.2-channel sound for homes
With the goal of introducing 22.2 multichannel sound into homes, we are advancing research on signal processing that will allow sound reproduction with fewer loudspeakers while maintaining the sound's spatial impression. In FY2009, we developed a method to automatically convert 22 channels into 8 channels, while maintaining sound pressure and directionality at the listening point. We also developed a method for reproducing 22-channel sound using only three forward speakers, by using the Head-Related Transfer Function, which represents the propagation characteristics of sound arriving at both ears from various directions. We also performed experiments to investigate the perception of the apparent sound source's elevation when reproducing sounds on loudspeakers and headphones. This research was in order to improve the spatial reproduction capabilities of the 22.2 multichannel headphone processor. We found that for sound coming from directly in front, the perceptual resolution of the sound's elevation was degraded through loudspeakers when the elevation angles exceeded 70 degrees or through headphones when the angle exceeded 40 degrees .

 

 
I was trying to say that while playing back a regular two channel with speakers may add an artificial sound stage, playing back a 3D sound field, which I repute a faithful playback method, also relies in some kind of crosstalk (inter-aural cues). Two channels via speakers is an artificial reconstruction of reality, but it is acceptable at the actual state of the art.
 
They are all very interesting technologies.
 
Gosh, we should start a new thread for this subject. Forgive me. 
bigsmile_face.gif

 
*Directly in front azimuth and elevation cues are the worst (0º azimuth). Unconsciously we slightly turn or head to feel the cues and identify the source localization at such spots. That’s why some sort of gyroscope at the listener head might be useful. The Realiser has the head-track... Outstanding.
 
 
 
Aug 14, 2011 at 9:12 AM Post #11 of 22
Very very nice read, thank you!! It all clicked now that you put this back in the perspective of localization clues. I am trying to summarize the many points that were raised, hopefully you can clarify where I got it wrong:
 
1. Standard stereo recording / speaker reproduction only does a reasonably good job for horizontal positioning and possibly distance if you got a properly treated room (so do a reasonably good job for most music recordings).
2. Adding more channels does not help with elevation clues unless you physically record / reproduce sounds with some elevation (NHK idea)
3. Sound reproduction with speaker in standard listening environment is flawed because of the room signature (you're need to have the setup in an anechoic environment - and loudspeakers designed for such usage)
4. Binaural recordings are not much help either because elevation clues are idiosyncratic (e.g. you need personalized HRTFs to do this reasonably well and even then there are still scenarios where head movement is useful to help localization, hence the need to tracking)
5. Binaural recordings are also difficult to deal with using loudspeakers because of the need for cross-talk cancellation (really only feasible in anechoic environment)
6. Personalized HRTFs for standard surround system (Smyth virtualizer) are good to reproduce a surround speaker experience with headphone but this suffers from the same limitations as the speaker surround system itself (point 2 above)
7. Personalized HRTFs + some compensation of the speaker placement (to elevate them, as you mentioned from the realizer - I assume it requires you to get PRIR measured for a few elevations before hand) goes a long way to improve the rendering of elevation as long as you can feed it with properly mastered source material (e.g. not the usual surround material)
 
Personally, in all this, I believe the future is rather in DSP equalization of few loudspeakers below the TV screen with personalized HRTFs and x-talk cancellation rather than having a zillion loudspeakers in your non acoustically treated living room... See research from ISVR where they could process binaural recordings with only 2 loudspeakers in front of the listener (no need for additional speakers to perform the cross-talk cancellation in anechoic environment).
 
The realizer is in my radar, just got to prioritize the purchases... 
 
arnaud
 
PS: I agree we should take this to another thread as it's probably the most boring discussion ever for many and there's nothing special in the 009 that's relevant to this :wink:. This is probably for the sound science section?
 
Quote:
 
My phrasing was not clear indeed. 
redface.gif

 
I was referring to both recording techniques reproduced with: headphones (so the listening room is out of the equation) and playback DSP (a head related transfer function that transform multichannel content into idiosyncratic two channel output; okay the measured playback room reference has its influence here, but the DSP is able to deal with that). 
 
How two ears are able to sense sound sources in a 3D field? Suppose a singular sound source (like a bird) is within an imaginary sphere. Roughly: a) Inter-aural delays would explain the horizontal displacement (azimuth cues); b) tonal modulation from head, torso and outer-ear would explain vertical displacement (elevation cues*); and, c) reverberation would explain source distance (that such singular source is within a near or further imaginary sphere, in other words, different radius). 
 
Is it possible to fix all that cues in mass distributed audio content. I believe it is not. We have several problems: at least two of those cues are very idiosyncratic and one is very room dependent. So XY microphone pattern or the Neumann KU-100 is not an ideal solution.
 
With 2, 5.1 or 7.1 channels content you are able to reconstruct horizontal displacement (azimuth cues) by crosstalk in your listening room. As you pointed out, source distance is more problematic given that your listening room imprints its own reverberation mode. You may add ambience to this recording with a Neumann KU-100, but this will not translate into precise elevation cues, which are, I believe, very idiosyncratic.
 
So the Realiser comes into the playback chain of regular 2, 5.1 or 7.1 channels content. You are able to capture your azimuth and elevation idiosyncratic cues and your ideal reverberation listening room. Then a function transforms your audio multichannel stream into a two channel headphone output. What do you have here? You will listen to a very convincing out of the head circle on the horizontal plane. Do you have a sphere? Do you have a 3D sound field? Nope. 
 
Then you take such function and add some variables that allow placing such your virtual speakers (a fixed base that comes from your recorded content and feed your HRTF computation). Believe or not, the Realiser does that, allow the user to change azimuth and elevation of the virtual speaker (see Realiser A8 manual, page 27). Reverberation of the playback room and speakers proximity(!) can also be altered (see Realiser A8 manual, pages 55, 56). If the recorded content has two layers (NHK example) or a 3D omnidirectional pattern (SPS200), voilà, now you are able to place your virtual speakers into the right virtual spot and reproduce a 3D sound field.
 
But why NHK needs to fix 22.2 channels instead of 4? When the target for your audio content is not only people with a DSP and headphone playback, but a movie theater audience, then such original tracks might be useful to reduce the influence of the sweet spot in the latter. At home theaters, less channels are needed. NHK mention that:
 
 
 

 
I was trying to say that while playing back a regular two channel with speakers may add an artificial sound stage, playing back a 3D sound field, which I repute a faithful playback method, also relies in some kind of crosstalk (inter-aural cues). Two channels via speakers is an artificial reconstruction of reality, but it is acceptable at the actual state of the art.
 
They are all very interesting technologies.
 
Gosh, we should start a new thread for this subject. Forgive me. 
bigsmile_face.gif

 
*Directly in front azimuth and elevation cues are the worst (0º azimuth). Unconsciously we slightly turn or head to feel the cues and identify the source localization at such spots. That’s why some sort of gyroscope at the listener head might be useful. The Realiser has the head-track... Outstanding.
 
 



 
 
Aug 14, 2011 at 10:22 AM Post #12 of 22
Do not get me wrong. I would never dare to clarify things in this field. No matter how technical my words are, I am not an expert so it is just my view of this subject. Talking about these things as a consumer/user is easy. To take the HRTF theory and then write a mathematical function that really works is work for brilliant minds... 
 
Nevertheless, I feel comfortable to comment some points:
 
3. IMHO, Anechoic chamber is not a solution and I am not talking about the economic cost of building one just for audio playback. A regular room has its signature, but a dry room is very annoying also. Some concert halls have not only reflective walls, but also a moving ceiling (i.e. chamber music, ceiling down, small room). If the room has symmetry, perfect. A golden ratio size would be ideal, but there are DSP to deal with that also.
5. I do not think an anechoic chamber cancel crosstalk. It would cancel the reflected crosstalk. You still have direct crosstalk.
6. I would not say that the Realiser  has an inherited "limitation". Two Realiser are able to work together up to 16 channels. The bottle-neck is the audio content available (there are only 2, 5.1 or 7.1 flat recordings).
7. I am not sure you need a new/different measure for elevation with Realiser A8. It seems to extract elevation clues from the original measurement. We would need to better understand the HRTF. But this is totally beyond my capabilities. Have a look at this paper. It may be useful to you as you are an engineer with some familiarity with mathematics.
 
Currawong (moderation) would move this subject to the relevant thread if needed.
 
Quote:
Very very nice read, thank you!! It all clicked now that you put this back in the perspective of localization clues. I am trying to summarize the many points that were raised, hopefully you can clarify where I got it wrong:
 
1. Standard stereo recording / speaker reproduction only does a reasonably good job for horizontal positioning and possibly distance if you got a properly treated room (so do a reasonably good job for most music recordings).
2. Adding more channels does not help with elevation clues unless you physically record / reproduce sounds with some elevation (NHK idea)
3. Sound reproduction with speaker in standard listening environment is flawed because of the room signature (you're need to have the setup in an anechoic environment - and loudspeakers designed for such usage)
4. Binaural recordings are not much help either because elevation clues are idiosyncratic (e.g. you need personalized HRTFs to do this reasonably well and even then there are still scenarios where head movement is useful to help localization, hence the need to tracking)
5. Binaural recordings are also difficult to deal with using loudspeakers because of the need for cross-talk cancellation (really only feasible in anechoic environment)
6. Personalized HRTFs for standard surround system (Smyth virtualizer) are good to reproduce a surround speaker experience with headphone but this suffers from the same limitations as the speaker surround system itself (point 2 above)
7. Personalized HRTFs + some compensation of the speaker placement (to elevate them, as you mentioned from the realizer - I assume it requires you to get PRIR measured for a few elevations before hand) goes a long way to improve the rendering of elevation as long as you can feed it with properly mastered source material (e.g. not the usual surround material)
 
Personally, in all this, I believe the future is rather in DSP equalization of few loudspeakers below the TV screen with personalized HRTFs and x-talk cancellation rather than having a zillion loudspeakers in your non acoustically treated living room... See research from ISVR where they could process binaural recordings with only 2 loudspeakers in front of the listener (no need for additional speakers to perform the cross-talk cancellation in anechoic environment).
 
The realizer is in my radar, just got to prioritize the purchases... 
 
arnaud
 
PS: I agree we should take this to another thread as it's probably the most boring discussion ever for many and there's nothing special in the 009 that's relevant to this :wink:. This is probably for the sound science section?
 


 



 
 
Aug 14, 2011 at 11:25 AM Post #13 of 22
Even though binaural may not perfectly handle elevation cues because of individual shapes of heads, it's still an almost perfect technology for recreating sound if you have a good artificial head to record it. If you've ever heard example sound files with sounds moving around your head at different distances, you know that it works brilliantly. At the same time it's not complex at all.
 
In the end, all sounds we hear are combined in our two ear canals, hence two channel audio is in theory all you ever need to perfectly recreate a sound experience. Catch these combined sounds in each ear canal in an artificial head, and then reproduce them at that point in a real person's ear canal and you've recreated the sound experience perfectly or almost perfectly. I guess therefore IEMs are even better suited for this than normal headphones, but normal ones come very close.
 
More recordings should be made with binaural technology (duh). It would be easier to use DSP to reproduce a binaural recording on speakers than to try to make an ordinary recording sound as if it was done binaurally. At least with a binaural recording, you know how the microphones were placed; with ordinary recordings, you do not and can therefore not rebuild the spatial information well.
 
 
Aug 14, 2011 at 7:18 PM Post #14 of 22

 
Quote:
3. IMHO, Anechoic chamber is not a solution and I am not talking about the economic cost of building one just for audio playback. A regular room has its signature, but a dry room is very annoying also. Some concert halls have not only reflective walls, but also a moving ceiling (i.e. chamber music, ceiling down, small room). If the room has symmetry, perfect. A golden ratio size would be ideal, but there are DSP to deal with that also.

 
I think this really depends on the goal. Indeed the right amount of reverberation and balance between direct and reverberant field is required in any hall, be it a classroom or a concert hall. But this is where the real event is happening and the goal to keep intelligibility and / or project the instruments sound optimally for the majority of the audience. But in current discussion, we are referring to the artificial playback environment. The only reason why people tell you too much room treatment kills the sound of a speaker system is that 1) most speakers are designed to have good / flat response in standard living room (i.e. it's not just about their response on-axis but also overall radiated sound power and directivity, most speakers are designed accounting for some side / back wall / roof / floor reflections), 2) as you mentioned regular stereo recording and playback is flawed so somehow the artificial reverberation from living room sometimes help to create a bit more natural sound reproduction (i.e. bit wider soundstage at the experience of precision in the placement). The golden ratio is more about the effect of the room on the low frequency response and trying to prevent coincident modes in any 2 directions (or worse 3 directions for a cubic room!). Regardless of room shape, the modes are there but you can make them a bit less obtrusive that way.
 
 
Quote:
5. I do not think an anechoic chamber cancel crosstalk. It would cancel the reflected crosstalk. You still have direct crosstalk.



 
Sorry, misunderstanding. What I mean is that with 2 sources (2 loudspeakers), you really can only optimally cancel cross-talk from a single propagating wave (e.g. direct field). As soon as you get reflections, you fundamentally need additional sources to perform the left/right ear cross-talk cancellation (see ISVR website). 

Quote:
6. I would not say that the Realiser  has an inherited "limitation". Two Realiser are able to work together up to 16 channels. The bottle-neck is the audio content available (there are only 2, 5.1 or 7.1 flat recordings).



 
Agreed.
 

Quote:
7. I am not sure you need a new/different measure for elevation with Realiser A8. It seems to extract elevation clues from the original measurement. We would need to better understand the HRTF. But this is totally beyond my capabilities. Have a look at this paper. It may be useful to you as you are an engineer with some familiarity with mathematics.



 
Only guess is that this works similarly to the implemented head tracking, you can derive ITD / ILD from a limited set of HRIR and then use that to "interpolate" HRIRs for azimuth where no HRTF was recorded. But I would think then that you do need some basic HRIRs at various elevations to start with.  More knowledgeable people (michgelsen ? ) could chip in. I will look at your reference paper, thanks!
 
Edit: just really quickly flew over the paper so I could be wrong by my understanding is that these are just different ways of interpolating HRTFs (HRIRs) in between headings. You do need so reference in between the headings (like +/-45 degrees and 0 azimuth for the realizer or something like that). The paper talks about converting the HRIRs into simplified IIR filters with so called poles and zeros which then helps to calculate intermediate HRIRs (easier to work in that space than with the raw time domain data).



 
Aug 18, 2011 at 3:35 AM Post #15 of 22


Quote:
 
 
But note however that this has nothing to do with speaker induced cross-talk. Actually edstrelow, I am not sure I follow your point either. Please correct me where I am wrong, but isn't this cross-talk between the each speaker and both ears a problem only when trying to replay binaural recordings? For instance, any instrument playing in front of you is heard by both ears and there's nothing fundamentally wrong with using a pair (or more) of loudspeakers to realistically reproduce this. As I mentioned above, I believe where this is elusive is that the room kills it all with lots of early / late reflections that pollute the imaging (and the tonality too but that's another topic).
 

As I said above,  I am always surprised that so few headphone listeners grasp the fundamental difference between speaker and headphone reproduction.  I seem to provide this explanation every few months.
 
Speakers create "phantom channels" of signals whereby, for example the left channel feeds the right ear and vice-versa with a slight time delay due to the extra travel time for the right signal to get to the left ear etc.. Thus you get 4 channels of sound from 2 speakers, two of them time delayed and which  interefere with the two correct signals which initially arrive at the ears. These phantom channels are complete artifacts and hence unnatural, compared to headphones which provide  pure left and right channel signals of the left and right signals, to the correct ear. 
 
 
 
 
Several systems have been devised to reduce crossfeed, such as Polk's now-discontinued SDA series speakers and these produce much better localization than conventional speakers.   However they do add a lot to the cost of systems and the effect is somewhat limited to a listening sweet spot.
 
In simple terms, 2 channels of stereo  through 2 speakers gives 4 channels, 2 being artifacts or if you will, with 100% distortion.  Dolby through 5 speakers give you 10 channels, 5 of which are artifacts, etc.
 
This is not rocket science, just analysis of the stimuli being presented to the ears.
 
The fact that we tolerate speaker presentation says a lot about the forgiving nature of the auditory system.  It can take a rubbishy set of stereo signals, and with 2 speakers you are getting an awful lot of rubbish, and hears it as vaguely localized sound with much of the phantom channel rubbish probably being heard as ambience.
 
Maggies, Quads whatever, still a lot of rubbish in what they present to the ears compared to headphones.  It may sometimes sound nice but it doesn't give accurate spatial localization.

 

Users who are viewing this thread

Back
Top