MrHaelscheir
500+ Head-Fier
As a mere hobbyist with an interest in measurements and binaural head-tracking, I wish to clarify my assumptions regarding how stereo imaging works or what constitutes "objectively correct imaging", and hence how this would guide this imaging evaluation method I describe.
tl;dr: Is stereo in the ideal case supposed to image perfectly along a line between the channels, and if so, is panning pink noise a reasonable means for assessing the precision of that imaging?
Firstly, am I right to assume that in normal/traditional stereo mixing with volume pans and no special HRTF DSP or phase tricks, as much as the microphones might capture timbral or loudness cues for distance, when playing the stereo mix through speakers in a well-treated room if not anechoic chamber, while at the sweet spot, sounds would image nigh perfectly along a line between the two speaker channels, maybe with slight height variations depending on one's HRTF or imperfections in one's sound localization faculties? That is, it would be conceivable if not correct for there to exist people for which panned sound sources are imaged purely on the line between the two channels (which implies my present understanding that surround sound or ambisonics are founded on the addition of channels to create more lines or planes in 3D space along which to locate if not "pan" sounds) with the ability to hear the timbral distance effects independent of the actual perceived imaging localization? In effect, is "soundstage depth/layering/height" or "3D holography" merely an illusion of spatial properties not actually in the (traditional stereo) recording that are technically erroneously created by room acoustics, happenstance HRTF interactions with said room or one's headphones, or the lack of the faculty to separate tonal cues from one's main localization faculties?
Now, one exception to my experience of imaging for classical recordings might be the case of off-stage trumpets in some Mahler and other, whereby I don't know if there was a special mixing trick or if it was down to micing or a combination thereof, perhaps just a good timbral cue and wide reverb otherwise technically largely smeared across the 1D stereo line ahead of me with some height from bass cues. Maybe some classical recordings seemed like they had actually captured and imaged ceiling reflections or the height of the choir loft. Otherwise, I suspect it is good stereo micing and mixing that allows string sections in some recordings to image along realistic diffuse lines rather than being squished into points on the left and right.
Given this, is it fair if not an existing practice to use a volume pan of a pink noise signal between the left and right channels to assess a stereo system's imaging accuracy and coherence? Here, "accuracy" to me implies that a linear pan should incur the perception of a linear rate of motion of the sound pink noise sound source along a line from one channel to the other with no variations in height. Then "coherence" to me refers to the case of all frequencies within said pink noise being perceived as imaging from the same point. That is, in an anechoic chamber, I would expect during the pan for all frequencies to image from the same, single moving point at least as one's HRTF and localization faculties allow (e.g. any physiological asymmetries or HRTF "edge cases" that the brain didn't adapt to, if that's even a thing). If one hears different noise bands shifting up or down or lagging or leading, this incurs what I call "imaging incoherence" which for music could cause the same instrument or sound to image from multiple directions per different parts of its spectrum interacting with your HRTF differently, else cause different parts of the mix to attain spatial height variations that are not necessarily intentional or accurate. From my experience, this can be caused by errors in the HRTF measurement or binaural decoder implementation (see https://www.head-fi.org/threads/rec...-virtualization.890719/page-121#post-18027627 (post #1,812)), sometimes inescapable issues with how the headphones interact with your ears (e.g. I without DSP correction almost always hear the treble imaging high), or for real speakers, the effects of room reflections skewing the image, like causing some vocals to image higher and left of center where it images perfectly ahead through a headphone simulation of anechoic stereo speakers or in an actual treated room.
https://www.audiosciencereview.com/...out-headphone-measurements.18451/post-2016279 (post #1,278) documents how I hear pink noise pans and hence the imaging of music through nigh all my headphones regardless of shape, size, frequency response, and cost or reputation. This is the "terrible imaging imperfection" I hear through headphones and probably heard through the Stax SR-X9000 and Sennheiser HE-1 when I encountered them, same as any other, whereby along the lines of https://www.head-fi.org/threads/can-poor-soundstage-and-imaging-be-fixed-digitally.949757/, personalized HRTF measurements and binaural head-tracking DSP allows me to with the click of a button switch to exquisitely coherent and linear imaging.
tl;dr: Is stereo in the ideal case supposed to image perfectly along a line between the channels, and if so, is panning pink noise a reasonable means for assessing the precision of that imaging?
Firstly, am I right to assume that in normal/traditional stereo mixing with volume pans and no special HRTF DSP or phase tricks, as much as the microphones might capture timbral or loudness cues for distance, when playing the stereo mix through speakers in a well-treated room if not anechoic chamber, while at the sweet spot, sounds would image nigh perfectly along a line between the two speaker channels, maybe with slight height variations depending on one's HRTF or imperfections in one's sound localization faculties? That is, it would be conceivable if not correct for there to exist people for which panned sound sources are imaged purely on the line between the two channels (which implies my present understanding that surround sound or ambisonics are founded on the addition of channels to create more lines or planes in 3D space along which to locate if not "pan" sounds) with the ability to hear the timbral distance effects independent of the actual perceived imaging localization? In effect, is "soundstage depth/layering/height" or "3D holography" merely an illusion of spatial properties not actually in the (traditional stereo) recording that are technically erroneously created by room acoustics, happenstance HRTF interactions with said room or one's headphones, or the lack of the faculty to separate tonal cues from one's main localization faculties?
Now, one exception to my experience of imaging for classical recordings might be the case of off-stage trumpets in some Mahler and other, whereby I don't know if there was a special mixing trick or if it was down to micing or a combination thereof, perhaps just a good timbral cue and wide reverb otherwise technically largely smeared across the 1D stereo line ahead of me with some height from bass cues. Maybe some classical recordings seemed like they had actually captured and imaged ceiling reflections or the height of the choir loft. Otherwise, I suspect it is good stereo micing and mixing that allows string sections in some recordings to image along realistic diffuse lines rather than being squished into points on the left and right.
Given this, is it fair if not an existing practice to use a volume pan of a pink noise signal between the left and right channels to assess a stereo system's imaging accuracy and coherence? Here, "accuracy" to me implies that a linear pan should incur the perception of a linear rate of motion of the sound pink noise sound source along a line from one channel to the other with no variations in height. Then "coherence" to me refers to the case of all frequencies within said pink noise being perceived as imaging from the same point. That is, in an anechoic chamber, I would expect during the pan for all frequencies to image from the same, single moving point at least as one's HRTF and localization faculties allow (e.g. any physiological asymmetries or HRTF "edge cases" that the brain didn't adapt to, if that's even a thing). If one hears different noise bands shifting up or down or lagging or leading, this incurs what I call "imaging incoherence" which for music could cause the same instrument or sound to image from multiple directions per different parts of its spectrum interacting with your HRTF differently, else cause different parts of the mix to attain spatial height variations that are not necessarily intentional or accurate. From my experience, this can be caused by errors in the HRTF measurement or binaural decoder implementation (see https://www.head-fi.org/threads/rec...-virtualization.890719/page-121#post-18027627 (post #1,812)), sometimes inescapable issues with how the headphones interact with your ears (e.g. I without DSP correction almost always hear the treble imaging high), or for real speakers, the effects of room reflections skewing the image, like causing some vocals to image higher and left of center where it images perfectly ahead through a headphone simulation of anechoic stereo speakers or in an actual treated room.
https://www.audiosciencereview.com/...out-headphone-measurements.18451/post-2016279 (post #1,278) documents how I hear pink noise pans and hence the imaging of music through nigh all my headphones regardless of shape, size, frequency response, and cost or reputation. This is the "terrible imaging imperfection" I hear through headphones and probably heard through the Stax SR-X9000 and Sennheiser HE-1 when I encountered them, same as any other, whereby along the lines of https://www.head-fi.org/threads/can-poor-soundstage-and-imaging-be-fixed-digitally.949757/, personalized HRTF measurements and binaural head-tracking DSP allows me to with the click of a button switch to exquisitely coherent and linear imaging.