@sander99 wow. Here’s the deal and it is not that complicated on the face, while the details of actual measurement may be beyond reach at the moment. Many audio enthusiasts, myself included, claim we hear repeatable changes in the soundstage by changing just one component, a DAC, amplifier, phono cartridge, even cables. To objectivists, this is nonsense because these differences are not revealed in bench testing.
THE QUESTION IS THIS, CAN WE DEVISE A NON-HUMAN MEASUREMENT SYSTEM (MICROPHONES, ADC, SOFTWARE) TO IDENTIFY IN-ROOM CHANGES IN SOUND LOCALIZATION IN A STEREOPHONIC RECORDING RESULTING FROM A SWAP IN JUST ONE PIECE IN THE ELECTRONIC REPRODUCTION CHAIN AHEAD OF THE SPEAKERS HOLDING ALL OTHER VARIABLES CONSTANT INCLUDING THE ROOM?
I think the researchers at MIT are pursuing a course of study that could provide the basis for this kind of measurement. What are others thoughts about this specific question?
kn
I think
@sander99's post brought up most of the relevant parameters.
Sound localization is fairly simple in design. It's our ears working as 2 microphones, and then the brain using differences it managed to correlate with seen positions(because sight is what the brain trusts most, even though we think we're seeing one image, not upside down, without gaps where the optical nerves are, or a nose in the picture, so most def, true real reality^_^), to infer that next time those types of cues are in the sound, it will mean sound is coming from over there.
The main audio elements are frequency response, global(for elevation or big distances "losing" high frequencies along the way) and the differences between ear, along with interaural delays.
We can add secondary cues coming from reverb, as that can help place something in a room or even help define the room itself. It's not very accurate, but we can still get a feeling about the general size or if maybe one wall is very close to the sound source. We might want to call that soundstage, or not. It's hard to even agree on what soundstage is on the forum.
Beyond that, it becomes either very minute and not considered relevant in regard to the other cues, like, again, what sander99 suggested, or it simply goes beyond hearing. Vision can affect what we hear, and in ways sneakier than my ultra obvious scenario for defining positions in space. That's why blind tests are a must for listening tests, where listening means sound and only sound.
Then comes the issue of playback. Speakers and headphones do not offer stereo the way we hear during the day from single sound sources around us. And now the problem of measurement or prediction becomes really complex because it's not about how humans locate sound, but how humans will locate sound that is partially incorrect with added or removed cues. While we have some fairly consistent variables leading to fairly consistent subjective results in those scenarios, like panning on a stereo speaker system that works surprisingly well for everybody(I think) even though the time delay is not at all what it should be, other "unnatural" changes from playback systems can affect different people differently. One example would be how for some listeners, there will never exist a perception of a sound source at some distance right in front of them when listening to headphones. To the side, that's possible, and while the perceived distance will change from people to people, it can exist. But for a select group, mono on headphone feels stuck inside the head and that's it.
Then people like myself, lose all sense of distance with headphones the moment they move their head.
Obviously, you cannot expect a machine to set a standard on what is not constant for all listeners. We need a model and whichever model gets selected, it will not match the experience of some people. That's why we tend to say that there is no way to measure soundstage because more than a made up thing inside one's head, it is likely to be more or less significantly different for other listeners for very many reasons.
For sound localization, under the condition that we know the listener and that he only has audio cues to work with(blind listening and no intel about the gear), we can absolutely measure his HRTF and get a model for where he should perceive sound A. And that in practice works amazingly so long as we're ready to spend the time to get all the required custom measurements for the listener and the playback rig. So in that specific way, we do manage to measure sound localization and, depending on definition, soundstage.
The general conclusion from work on exactly that is that to feel right, we "just" have to simulate something that feels right, i.e. copy cues reaching the listener's ears, including the frequency and timing related to the shape and size of his head(and torso).
If you go ask such a programmed model how much the antijitter design of DAC XXX or the apodizing filter of DAC YYY affect soundstage, the model will probably respond that it isn't relevant. Because those stuff should not affect the relevant localization cues in a meaningful way, so they probably won't ever factor in the simulating model. The same way, they won't factor in how the color of your shirt that day will impact your perception of soundstage. It's not to say that it cannot affect you and your impressions, it's just that it's not in the model.
If we wished for an exhaustive model, then we'd need to know how salty were your last meals, how your day went, how hot was the water when you cleaned your ears in the shower, how much you like DAC XXX, how convinced you are that jitter ruins soundstage, what color is your shirt, etc. There is virtually no end to the complexity of such a model because there is nearly no end to what can influence a human brain. IMO that's the real issue here, along with different people simply being different people and getting a different experience from life in general.
Because otherwise, we have extremely impressive models made from recording sound at our ears, and just as impressive simulation of environments made from recording those environments with a "crown" of mics around our head. Very convincing, very realistic and beside avoiding using a crappy playback system, there is no real concern about something like a DAC. I'm sure we can in a sighted listening get experience of differences, and that even in a blind test, we can find a bunch that, even properly volume matched, will have audible differences leading to a change in our overall interpretation of the spatial cues. It not like anybody claimed that all DACs have and will forever sound exactly the same. It's just that the differences between ok DACs used properly, shouldn't create change in interaural delays or frequency response, which are the most important cues for localization. One DAC might have 0,3dB more at 18kHz on both channels than some other DAC, and that can be enough to change our experience of space and "soundstage" I think. But if we go and conclude that one DAC has a better soundstage just because of that different feeling, I would argue that it's a personal opinion and kind of a BS one when objectively defining the DACs. Because to me and my understanding of spatial cues, I get how it can cause a change, but I don't agree that it's an important or even relevant aspect, just like wearing a blue shirt. Just because our brain takes everything in and doesn't know when to keep information separated, it doesn't mean, IMO!!!!, that we should integrate all variables as important for sound localization or soundstage. Less so if we're considering measurements and spatial models. Because an endless list means too many variables to create a useful model, and IDK about you, but a model that's useless, that doesn't impress me much. ^_^
I'd rather stick to the fundamentals and keep in mind that, indeed, just about anything else including a vast array of non audio variables(like moving my head, seeing speakers in the room, watching the artist on TV or closing my eye, being hyped when listening to my new audio toy, have the potential to influence my subjective interpretation of an event. Including maybe some DACs. Without a proper listening test, I wouldn't discount a non audio impact or just one DAC being louder and tricking me into feeling a bigger stage or something. But I also have no evidence to say that some audio difference(outside of plain volume level), can't be audible between 2 specific DACs and alter my experience of the audio stage. I think both options are alive, so long as no controlled listening test demonstrates otherwise.