I mean the ITD is not the same in your listening room it is in the mixing room unless the rooms are identical.
The Interaural Time Difference is the same, the distance between your ears doesn’t change when you listen in a different room but the sound entering your ears is obviously not the same.
For example if your room is bigger and less treated acoustically, the reflections will come to your ears later and with different level/spectrum/angle than in the studio.
Yes of course, you’re going to get a significantly different set of reflections, more early reflections, more time delay between the ERs, more spectral interaction between the ERs and between the direct sounds, longer RT60, etc. In addition, there’s going to be differences in the direct sound reproduced to start with, the different speakers in the two listening environments are going to have a somewhat different spectral/freq response and a somewhat different time domain response (group delay, etc.).
If that doesn't affect spatiality, I don't know what does.
Yes, it does of course affect spatiality but this is the spatiality of the sound that is entering the ears and then of course that sound interacts with the ears; the pinnae, the skull, body, etc.
One room gives nice bass, another room has good diffuse airy treble etc. This is the origin of my attitude of "omitting" factors.
If we could not hear the difference between those two different rooms/speakers/presentations that would be a justification for omitting those factors. That is why your reason for omitting factors doesn’t make any sense to me because obviously we can hear the difference between the two different rooms/presentations.
The studio where I had the mixing course had near and far field speakers and the difference between them was quite dramatic!
Exactly but this effectively contradicts your theory that “
Not omitting them [those factors]
can lead to extremely difficult and costly measures to control them for relatively minor benefits.” - You said above the difference “
was quite dramatic!” but here you’re saying “
relatively minor benefits”. Obviously we have a lot of factors involved here, not just the factors you omitted, but your rationale for omitting factors doesn’t correlate with your own observation and that was in exactly the same room just with a different speaker presentation.
I believe so too, hence the mentality of omitting these factors.
You are taking one example/property of human perception and applying it to a different context (headphone use). There’s two problems with this, either of which on their own can invalidate your “
mentality of omitting these factors”: Firstly, you are ignoring other examples/properties of human perception when listening to speakers and secondly, you don’t have any reliable evidence that the different context (headphone use) doesn’t affect any of these factors.
Floyd Toole (and others) didn’t only demonstrate that the brain can adapt over time/training to certain weaknesses in frequency response, he demonstrated a great deal more, such as the importance of time domain performance, off-axis and other speaker/room performance issues. A good practical example of this is the old Yamaha NS10 phenomenon (see
this article and accompanying research). In this case we’ve effectively got the same room and even the same presentation (near field monitors), the only difference is the actual speakers. Why, when there were dozens of different near field monitors available, did virtually all commercial studios in the world have NS10s and more interestingly in the context of this thread, why does it elicit more polarised opinion than pretty much any other “industry standard”? The freq response was poor compared to other near fields but what set it apart was it’s time domain response, it’s group delay/impulse response.
I believe it is not much different with headphones without acoustic environment.
What basis, apart from your personal perception, do you have for that belief? In fact your belief is contrary to a considerable amount of reliable evidence. In an acoustic environment we have speakers in a room and a considerable amount of resultant spatial/acoustic information, however, all this information correlates to our sense of sight. We see the speakers and the room and the spatiality/acoustic information we hear obviously correlates with that. Furthermore, when we listen in such an environment, we don’t have our head clamped in a vice, it’s moving around at least slightly and the resultant slight (or significant) changes in ITD and other factors reinforces the location/s of what we’re seeing (and hearing). And lastly, reliable evidence demonstrates that sight significantly influences positional hearing/perception. We don’t have any of this with headphones (unless they have HRTFs, head tracking and a reverb applied). And continuing:
The phase difference is too small to create comb-filter effects.
How do you know? Firstly, phase differences as low as 100micro-secs can cause audible comb-filter effects, although probably not in the lower freq band affected by crossfeed. Secondly, even though the phase difference is too small to create audible comb-filter effects in the lower band with coherent test signals, how do you know that’s the case with a stereo music mix, which is almost never phase coherent to start with, contains all sorts of direct mono and stereo sound sources, reflections and all sorts of audio effects, some/many of which are already partly out of phase to start with. Overlaying that and adding a further delay with crossfeed can indeed cause comb-filter effects or any similar/related type of effect (Doppler, flanging, phasing, etc.) and although not extreme, I’ve certainly perceived such effects when using crossfeed.
In addition, we’re not just talking about the spectral side effects of crossfeed delay but also the positional perception factors, and reliable evidence demonstrates that differences of as little as 5micro-secs can affect location perception. So for example, a sound in the mix with say a 600Hz fundamental will have that fundamental crossfed and delayed but it’s 2nd, 3rd and subsequent harmonics will not be, so you could perceive different spectral parts of the same sound to be in somewhat different locations.
With headphones, we’ve got no visual reference to influence/correct our perception of location perception and we’ve got the added complexity that the signals we’re listening to (stereo music mixes) not only are not spatially consistent/coherent internally but also do not correlate acoustically to our listening environment. The evidence indicates that under such conditions of limited sensory references (and those we do have conflict), plus the complex, confusing and contradictory aural cues within the music mixes themselves, results in our perception effectively having little to rely on and simply making-up whatever seems to make the most sense. This is why there is such a wide variety of individual responses to headphone listening from “sounds just like being there” to “a bit strange but I still like it” to “this is nonsense and very annoying”. Research is relatively limited in this case, it’s so far been mainly limited to understanding basic processes, such as location perception of single simple test signals, not multiple complex sounds with different locations and different acoustic information all occurring simultaneously. But, we can’t rationally just discount/ignore much of what we have discovered simply on the basis of one person’s perception, especially as it’s not representative of the majority.
G