No, that’s just another audiophile myth. It sounds logical on the face of it but think about it for a moment and it’s nonsensical. Ask yourself what mastering is and why it exists? We have a finished mix from the recording studio/s, why does this need mastering? If all we’re doing is exchanging the “ideal circumstances” of the recording (mixing) studio for the mastering studio, what would be the point of mastering, considering that consumers do not have the “ideal circumstances” of either the recording studio or the mastering studio? The whole point of mastering is to take the mix that sounded as intended in the recording/mixing studio and alter it so it sounds as intended on the target consumers’ equipment.Ok, so under ideal circumstances we would have listened to the finished track in the same studio with the same setup the mastering engineer used and compared that to the resulting track through the IEM + as similar an audio chain as possible to the original (I imagine the amp would have to be modified a bit to not obliterate the IEM, you know a hell of a lot more about that than I do).
Again, “this purist approach” isn’t a purist approach, it’s just an audiophile myth. But to answer your points any way:I don't think this is strictly necessary to yield usable information from a reviewer. While the ideal circumstances are not strictly platonic in nature, it gets pretty close practically because most of us don't have the time or inclination to do that. I think there are two practically feasible alternatives to this purist approach.
1: Set one transducer as a reference and compare everything else to that. This requires some investment from the audience, but it gives a common point of reference to work off of.
2: Use binaural recordings to judge the technical capabilities of the HPs/IEMs. Binaural recordings of everyday noises avoid the problem of artificial spatiality, so that's less to worry about.
1. But that does not “give a common point of reference to work off”. In the case of speakers, room acoustics has far more effect on the sound reaching the listener’s ears than the performance of the speakers themselves. In the case of HPs/IEMs, then individual: Fit, HRTF and perception have more effect.
2. How do “binaural recordings of everyday noises avoid the problem of artificial spatiality”? The spatiality of binaural recordings is defined (typically) by timing, level and frequency variations caused by an artificial (dummy) head. So, this will only provide a point of reference to those reviewers who happen to have a HRTF similar enough to the dummy head used for the recording.
That only controls the variability between that/those particular track/s but it does not control “the variables of production”. The “variables of production” are obviously still there and there’s no reference of what the soundstage/spatiality should be. In addition to this, the HRTF and listening skills of each individual reviewer are also going to affect their perception, not to mention their biases (as they virtually never control for these). Of course, it’s entirely up to you how much weight you give such opinions.I'm saying that people choose a set of tracks to use as testing material and stick to that material across their IEMs. That is enough IMO to sufficiently control the variables of production and provide useful opinions on the IEMs themselves.
“Panning” is just a relatively simple control which sends signal level to one or more channels. It does not know whether those channels correspond to speakers or earphones, so it’s the same regardless of whether you’re using speakers in a room or HPs/IEMs.I was about to type about panning not being just a level difference, but I guess it is if you are using speakers in a room.
Sure but then FR changes are only one of numerous variables that change with distance and positioning, for example; the number of initial reflections, the timing of those reflections, the direction of those reflections, the density and duration of the reverb and the overall volume, none of which are accounted for by filters. Furthermore, it’s the relative different amounts of all these variables (including FR) that creates depth in a recording. For example, will say a mid-sized room reverb on a particular sound/instrument in the mix cause that sound/instrument to appear further away than another sound/instrument?What I meant by varied filters is to account for FR changes due to distance and positioning. IIRC, higher frequencies decay faster than lower frequencies, so the further away you are from the source, the bassier the sound is.
There is no height in stereophonic recordings (2 channel stereo, 5.1 or 7.1) and as mentioned, panning does not affect FR (or anything other than level), so obviously “audio engineers don’t do this” and neither does software. There are various reasons why height information might be perceived by a listener with stereophonic recordings (when there isn’t any), for example inappropriate ceiling reflections when using speakers or simply a perceptual error (misinterpretation) when using HPs/IEMs. Why for example, would an engineer mix a female vocal to move up or down, or be at a different height/elevation to the rest of the band/ensemble? The exception to the above is potentially binaural recordings and the “immersive” formats (such as Dolby Atmos) that do have height information, which if converted into binaural may or may not include HRTF height information (applied by the encoding software).Height has its own effect (female vocals moving up or down depending on the 1.5kHz and 2.9kHz response for instance), and panning causing FR changes to each channel based on HRTF, etc. I guess audio engineers don't do this? Or is it applied by software?
Partially, yes. Differences in positioning/soundstage can be caused by differences in timing between the channels (ears/speakers) but how much of a time differential are we going to get between two nominally identical speaker/earphones, isn’t the “distortion introduced by the IR” going to be virtually the same for both speakers/earphones? Let’s say for example we’ve got a speaker/earphone that introduces a hypothetical 1ms delay, the other speaker/earphone in the pair will also add a hypothetical 1ms delay, so we have a timing differential of 0ms.I think I understand what you are arguing here. The linear distortion introduced by IR won't matter for timing as much because IR primarily affects post impulse amplitude, so I accept your argument here.
I strongly disagree. Reviewers cannot control many of the influential variables anyway, such as their HRTF for example and they typically don’t even consider arguably the most influential variables (cognitive/perceptual biases), let alone control them. Additionally, extremely few appear to train their listening skills, they just seem to assume that enough expenditure on audiophile products, enough time listening to them and enough passion automatically means they have good/great listening skills and lastly, very few have any sort of reasonable reference, their only reference is other consumer equipment/environments.For the second part though, those variables apart from the technical performance of the IEM are being controlled for by the juxtapositional analysis if the reviewer is properly doing that, which I will concede is not always in evidence.
The point I’m trying to make is that contrary to your assertion, there is no “quantifiable metric” that can be “directly linked” with soundstage perception. There are numerous variables and reviewers/audiophiles typically ignore (or dismiss) many of the most influential ones and focus on lesser variables or commonly, variables so tiny they don’t even affect the sound reproduced, let alone are audible.
G
Last edited: