Quote:
Originally Posted by rickcr42
I personally do not see any reason why not ?
|
Okie dokie!
To limit the complexity of my answer, I am going to discuss how crossfeed is implemented in a headphone amp in order to mimic the acoustic of two speakers. I’ll start with rickcr42s comment:
Quote:
You may like the way stereo is reproduced over headphones with your head in the way but no way is the stereo image correct if it was mixed on loudspeakers. The requirements are so totally different you can not have a "happy medium" that works for both so only a true binaural recording image is accurate with headphone listening.
Crossfeed is the attempt at "electronic comprimise" where some left channel signal is allowed to bleed through to the right ear and right channel signal is allowed to bleed through to the left ear just as would take place with loudspeakers in an open space.
Not perfect but there is no perfect only reduced comprimises. |
With the single coment that I think he needs to include that there is a short time delay added to the crossfeed signal as well, I do believe that he is spot on, and it is legitimate to try to model two speakers as it is EXACTLY how the original sound engineer was mixing the image which was intended.
If we want to discuss optimal audio playback, we’d have to discuss sound field microphones, 5.1 channel reproduction, ect. For the purposes of crossfeed we have to accept that it is intended to reproduce sound designed to be heard on two speakers.
First, let’s describe the problem; we’ll start with the right speaker only.
Sound from the right speaker makes it to the right ear, but it also makes it to the left ear, slightly attenuated and a short time later. Generally speaking the right ear gets a pretty clear acoustic view of the right speaker, so to a first approximation the sound that the right ear is hearing is flat. Because the left ear is slightly shadowed by the head, it hears sound at a slightly lower level (about 4dB) and it hears it about 300 to 400 micro seconds after the right ear. The left ear acoustic signal is also slightly EQed by the shadowing of the head. At low frequencies (under say 200Hz) the head is fairly small relative to the size of the half wavelength (at aprox 1000 ft/sec the half wavelength of 200Hz is about 2.5 feet).
Imagine the piling of a pier. As long waves go by, the piling has little effect on the passing wave, but if you have short, high frequency waves, the piling begins to interfere with the wave, and the piling casts a “shadow” of calmer water away from the direction of the wave. So it happens with your head. Low frequency sounds are relatively undisturbed by your head, but as the frequency goes up, the head begins to cast an acoustic shadow, and the higher frequency sounds are attenuated as they reach the left ear. Now you have a frequency response curve at the left ear that is roughly flat til around 200 Hz at which point we start to see a roll-off begin. This continues until you get to something like 2000Hz.
At that point a strang phenomina called “skin effect” begins to set in. Here the high frequency sound starts to find a way to stick to surfaces in little “bubbles”---I guess that’s the best way to describe it---and can propagate with less attenuation along the surface. So at about 2K sound begins to travel along the surface of the face and make it around the corner to your left ear a little more easily. This results in a rising frequency response after 2K up to about 5K. At that point the half wavelength is about an inch long and the bumps on your face (nose, brow, cheekbones) begin to disturb propagation and the eq curve begins to roll of again, now for good. (This is actually advantageous because once you get above 5K the distance between ears has gotten longer than the wavelength and interaural time differences can no longer be phase discriminated.)
To sum up: to a first approximation, with the right speaker, the right ear hears roughly flat, and the left ear hears the sound 300-400 micro seconds later and with a somewhat humpy eq curve that has about 4db less energy than the right ear. Analog (not digital) crossfeed signals try to mimic this phenomina by providing a crossfeed signal with an EQ change and time delay.
The first (known to me) attempt at this was done by Ben Bauer. He put together a completely passive circuit of inductors, capacitors, and resistors that was driven by a speaker amplifier and had to be used with 8 ohm headphones. A little Eli the Iceman might be in order here.
When you pass a sin wave through a capacitor, you shift the voltage and current signal 90 degrees from each other. This is probably far to complicated to fully explain without doing a heck of a lot of work drawing a bunch of picture for you, so I’ll simply say that Eli the Iceman means: ELI= voltage (E) is ahead of current (I) through an inductor (L) and ICE=current (I) is ahead of voltage (E) in a capacitor (C). Using Inductors, Capacitors, and resistors in somewhat complex networks you can begin to form a time delay constructed from the various phase shifting properties of these reactive components. Ben Bauer built such a circuit for the crossfeed channel for headphones. (There was actually a commercial product from Altec (I believe) that was produced based on this circuit. It died quickly from lack of demand. There really wasn’t a need given that you were still strapped to a power amp in your listening room, and Walkmans and office cubes hadn’t driven people to headphones en mass )
By the time I became aware of all this, and began to figure out how to build one, op-amp ICs were available and I was aware that all-pass phase delay filters existed. I figure it would be possible to revive the Bauer idea using contemporary parts and technology and create an analog crossfeed network. Anybody interested in the detail of these types of circuits could simply get a comprehensive op-amp design book and look up these linear time delay topologies.
It is worth noting that we (HeadRoom) do not provide the crossfeed signal at 4dB down but rather at about 11dB down. We do this because the stronger the crossfeed signal is, the stronger we see comb filter effects change the eq of the mono component of the stereo signal. To reduce this eq change we lower the volume of the crossfeed channel so that it is just audible enough for your brain to get its cues. This is a fine balancing act. It has been commonly noted here that our crossfeed circuit does indeed provide a bit of a bass boost, it actually also creates a shallow notch at about 2K and a hump at 5K as it rolls of to about 30dB down at 20K. But if you’ll read carefully above, the acoustics do demand this exact slight complexity to the crossfeed channel. Saddly, we don’t get it on the crossfeed channel but in the mono component. None-the-less I’ve always felt that it had promise and was a legitimate approximation. We have done some tweeking in the crossfeed again in the latest release of modules and we do feel that the eq effects have been significantly reduced in these changes. However, unnatural artifacts remain in the presentation, but it is my belief through long experience and experimentation that they are FAR less unnatural artifacts than listening to headphones without the natural crossfeed you hear on speakers.
As far as I know, the crossfeeds implemented by others (Meier, Xin, Moy) are all done with passive parts. This limits the total phase angle that can be achieve and I think delays are shorter then in the HeadRoom circuit. The Meier circuit I believe also assumes that the ear nearest the speaker should not be flat but should have a slightly elevated high frequency response. There is good argument for this acoustically, but I took the approach that it would be better to have the direct channel for each ear to have as little deviation from the original signal as possible.
So far, I have only talked about inter-aural time differences (ITD) and inter-aural amplitude differences (IAD) between ears. They are the primary lateralization cues for listening, but there are others that are important as well. The folds and ridges that make up the outer ear (pinna) are positioned asymmetrically around the entrance to your ear canal in such a way that the reflected/direct sound path changes as a function of the approach direction of the sound causing comb filter notches in the heard audio spectrum. These path lengths are so short that they really only come into play above 5kHz. But that’s dandy! Above I mentioned the ITD cues begin to be meaningless above 5K. Mother Nature has provided yet another mechanism for providing sound localization in the high frequency range! Thanks, Mom. Because these folds and ridges are much more highly individual when compare to the relatively consistan size of peoples heads, it is much more difficult to make generalizations, but they remain important. You can prove this to your self by placing some putty in the rear of the concha ridge (the fairly well defined little cup around your ear canal). Then go for a walk in the park and try to forget that the putty is there (concentration will cause you to compare visual cues to auditory cues and your brain will rapidly begin to compensate), as you hear defined sounds (like dog barks and car horns) and turn towards them, you will find that you are consistently significantly in error in your ability to localize the direction from which the sounds come. Experimants have been done where researchers make molds of peoples ears, put small mikes in them, and then place them together right next to each other. As the listener listens to headphones off this mike set up, they get no ITD cues because the ears are so close together, but they are still able to locate sounds with fairly good accuracy as long as there is a significant high frequency component to the sound.
Elevation cues are likewise almost purely generated by these pinna reflections. This is a much subtler process and leads to the fact that good vertical sound placement is much more difficult to achieve than horizontal imaging. I have always experienced that the HeadRoom crossfeed does provide better image specificity left-to-right than up-and-down, and I always experience the sound moving up as well as forward when I turn the crossfeed on. We are not trying to solve this problem with our current analog design. If we were ever to do a digitally processed version of our crossfeed we would likely begin to incorporate some functions that intend to mimic these pinna cues for high frequency left/right and up/down localization. Because of the highly individual nature of these cues, however, I am skeptical that we could find tweeks that work consistently amongst the population of users. My starting position would be that we should err on the side of audio fidelity in these cases.
There are those that have tried to accomplish these tweeks in the analog domain, but I will refer you to the library over at Headwize for further discourse on the topic. (Chu has done an OUTSTANDING job of accumulating a resource on this topic and rather than poorly replicate it in a post here I would strongly encourage your further reading at the Headwize Library. Great job, Chu!)
Though not really a localization cue there is another phenomina worth a short note: When you put a mike on a stand in front of a speaker and it measures flat that’s not like putting an ear in front of the speaker. Because your ear is located in your head atop the large mass (in my case anyway) of your body, your ears experience some bass build up do to the interference of the shape shape and volume occupied by your torso. I have at times used this argument to say the some of the bass build up of our crossfeed is actually desired. It’s a pretty week argument, not because it’s wrong, but just that the magnitude is not quite as great as what our circuit does. But it does have some merit.
It is by going into the didgital domain that we can really begin to address these problems. Dolby Headphone and AKGs IFA techniques are the high performers in this area I have heard.
Another note worth making is that ALL these synthetic cues are about only 1/10th as effective as they could be unless your are willing to put some kind of head movement tracking device on the headphones and change the cues with head movement. Study after study shows that it is not so much the various cues themselves, but how the various cues change with the movement of your head relative to the sound source that provide for truly realistic audio localization. You will find many attempts at this among the more recent digital signal processed versions of headphone localization equipment. In the end, however, every one that I’ve heard have a hard time fooling Mother Nature and the degredation in audio from the processing (and the almost universally cheap implementation) lead to a loss of audio quality that I find unacceptable. I have no doubt, however, that designs that are both immersive in their psychoacoustic properties and satisfying in their fidelity are possible…and I’ll be struggling to achieve them like a dog on a bone in future products.