Quote:
Originally Posted by arnaud 
Sorry, I can't follow your reasoning. If I understand correctly, you're showing 2 examples of recording techniques to pick-up audio from different directions (more than 2), which you are then supposed to play back with a multi-speaker system (more than 2 also). Do you mean that regular stereo recordings are just based on the same principle, except based on 2 speakers in front of you at say 30-45 degrees and there's nothing really wrong with it?
Where I am confused is that, regardless of the number of independent headings you're picking up in the recording, you're still facing some challenge with reproducing this in a regular listening environment. For example, I don't see how the 22.2 thing makes any sense when you think that you get a blend from both direct and reverberant fields. So, while you can use mic arrays and clever processing to obtain very directional signals, you can't possibly beam in the same way once you play it back (because nobody listens in an anechoic chamber).
But note however that this has nothing to do with speaker induced cross-talk. Actually edstrelow, I am not sure I follow your point either. Please correct me where I am wrong, but isn't this cross-talk between the each speaker and both ears a problem only when trying to replay binaural recordings? For instance, any instrument playing in front of you is heard by both ears and there's nothing fundamentally wrong with using a pair (or more) of loudspeakers to realistically reproduce this. As I mentioned above, I believe where this is elusive is that the room kills it all with lots of early / late reflections that pollute the imaging (and the tonality too but that's another topic).
My phrasing was not clear indeed.

I was referring to both recording techniques reproduced with: headphones (so the listening room is out of the equation) and playback DSP (a head related transfer function that transform multichannel content into idiosyncratic two channel output; okay the measured playback room reference has its influence here, but the DSP is able to deal with that).
How two ears are able to sense sound sources in a 3D field? Suppose a singular sound source (like a bird) is within an imaginary sphere. Roughly: a) Inter-aural delays would explain the horizontal displacement (azimuth cues); b) tonal modulation from head, torso and outer-ear would explain vertical displacement (elevation cues*); and, c) reverberation would explain source distance (that such singular source is within a near or further imaginary sphere, in other words, different radius).
Is it possible to fix all that cues in mass distributed audio content. I believe it is not. We have several problems: at least two of those cues are very idiosyncratic and one is very room dependent. So XY microphone pattern or the
Neumann KU-100 is not an ideal solution.
With 2, 5.1 or 7.1 channels content you are able to reconstruct horizontal displacement (azimuth cues) by crosstalk in your listening room. As you pointed out, source distance is more problematic given that your listening room imprints its own reverberation mode. You may add ambience to this recording with a Neumann KU-100, but this will not translate into precise elevation cues, which are, I believe, very idiosyncratic.
So the Realiser comes into the playback chain of regular 2, 5.1 or 7.1 channels content. You are able to capture your azimuth and elevation idiosyncratic cues and your ideal reverberation listening room. Then a function transforms your audio multichannel stream into a two channel headphone output. What do you have here? You will listen to a very convincing out of the head circle on the horizontal plane. Do you have a sphere? Do you have a 3D sound field? Nope.
Then you take such function and add some variables that allow placing such your virtual speakers (a fixed base that comes from your recorded content and feed your HRTF computation). Believe or not, the Realiser does that, allow the user to change azimuth and elevation of the virtual speaker (see
Realiser A8 manual, page 27). Reverberation of the playback room and speakers proximity(!) can also be altered (see
Realiser A8 manual, pages 55, 56). If the recorded content has two layers (NHK example) or a 3D omnidirectional pattern (SPS200), voilà, now you are able to place your virtual speakers into the right virtual spot and reproduce a 3D sound field.
But why NHK needs to fix 22.2 channels instead of 4? When the target for your audio content is not only people with a DSP and headphone playback, but a movie theater audience, then such original tracks might be useful to reduce the influence of the sweet spot in the latter. At home theaters, less channels are needed. NHK mention that:
Quote:
22.2-channel sound for homes
With the goal of introducing 22.2 multichannel sound into homes, we are advancing research on signal processing that will allow sound reproduction with fewer loudspeakers while maintaining the sound's spatial impression. In FY2009, we developed a method to automatically convert 22 channels into 8 channels, while maintaining sound pressure and directionality at the listening point. We also developed a method for reproducing 22-channel sound using only three forward speakers, by using the Head-Related Transfer Function, which represents the propagation characteristics of sound arriving at both ears from various directions. We also performed experiments to investigate the perception of the apparent sound source's elevation when reproducing sounds on loudspeakers and headphones. This research was in order to improve the spatial reproduction capabilities of the 22.2 multichannel headphone processor. We found that for sound coming from directly in front, the perceptual resolution of the sound's elevation was degraded through loudspeakers when the elevation angles exceeded 70 degrees or through headphones when the angle exceeded 40 degrees .
I was trying to say that while playing back a regular two channel with speakers may add an artificial sound stage, playing back a 3D sound field, which I repute a faithful playback method, also relies in some kind of crosstalk (inter-aural cues). Two channels via speakers is an artificial reconstruction of reality, but it is acceptable at the actual state of the art.
They are all very interesting technologies.
Gosh, we should start a new thread for this subject. Forgive me.

*Directly in front azimuth and elevation cues are the worst (0º azimuth). Unconsciously we slightly turn or head to feel the cues and identify the source localization at such spots. That’s why some sort of gyroscope at the listener head might be useful. The Realiser has the head-track... Outstanding.