What makes an IEM "technical"?
Dec 28, 2023 at 5:10 AM Post #16 of 36
Ok, so under ideal circumstances we would have listened to the finished track in the same studio with the same setup the mastering engineer used and compared that to the resulting track through the IEM + as similar an audio chain as possible to the original (I imagine the amp would have to be modified a bit to not obliterate the IEM, you know a hell of a lot more about that than I do).
No, that’s just another audiophile myth. It sounds logical on the face of it but think about it for a moment and it’s nonsensical. Ask yourself what mastering is and why it exists? We have a finished mix from the recording studio/s, why does this need mastering? If all we’re doing is exchanging the “ideal circumstances” of the recording (mixing) studio for the mastering studio, what would be the point of mastering, considering that consumers do not have the “ideal circumstances” of either the recording studio or the mastering studio? The whole point of mastering is to take the mix that sounded as intended in the recording/mixing studio and alter it so it sounds as intended on the target consumers’ equipment.
I don't think this is strictly necessary to yield usable information from a reviewer. While the ideal circumstances are not strictly platonic in nature, it gets pretty close practically because most of us don't have the time or inclination to do that. I think there are two practically feasible alternatives to this purist approach.

1: Set one transducer as a reference and compare everything else to that. This requires some investment from the audience, but it gives a common point of reference to work off of.
2: Use binaural recordings to judge the technical capabilities of the HPs/IEMs. Binaural recordings of everyday noises avoid the problem of artificial spatiality, so that's less to worry about.
Again, “this purist approach” isn’t a purist approach, it’s just an audiophile myth. But to answer your points any way:
1. But that does not “give a common point of reference to work off”. In the case of speakers, room acoustics has far more effect on the sound reaching the listener’s ears than the performance of the speakers themselves. In the case of HPs/IEMs, then individual: Fit, HRTF and perception have more effect.
2. How do “binaural recordings of everyday noises avoid the problem of artificial spatiality”? The spatiality of binaural recordings is defined (typically) by timing, level and frequency variations caused by an artificial (dummy) head. So, this will only provide a point of reference to those reviewers who happen to have a HRTF similar enough to the dummy head used for the recording.
I'm saying that people choose a set of tracks to use as testing material and stick to that material across their IEMs. That is enough IMO to sufficiently control the variables of production and provide useful opinions on the IEMs themselves.
That only controls the variability between that/those particular track/s but it does not control “the variables of production”. The “variables of production” are obviously still there and there’s no reference of what the soundstage/spatiality should be. In addition to this, the HRTF and listening skills of each individual reviewer are also going to affect their perception, not to mention their biases (as they virtually never control for these). Of course, it’s entirely up to you how much weight you give such opinions.
I was about to type about panning not being just a level difference, but I guess it is if you are using speakers in a room.
“Panning” is just a relatively simple control which sends signal level to one or more channels. It does not know whether those channels correspond to speakers or earphones, so it’s the same regardless of whether you’re using speakers in a room or HPs/IEMs.
What I meant by varied filters is to account for FR changes due to distance and positioning. IIRC, higher frequencies decay faster than lower frequencies, so the further away you are from the source, the bassier the sound is.
Sure but then FR changes are only one of numerous variables that change with distance and positioning, for example; the number of initial reflections, the timing of those reflections, the direction of those reflections, the density and duration of the reverb and the overall volume, none of which are accounted for by filters. Furthermore, it’s the relative different amounts of all these variables (including FR) that creates depth in a recording. For example, will say a mid-sized room reverb on a particular sound/instrument in the mix cause that sound/instrument to appear further away than another sound/instrument?
Height has its own effect (female vocals moving up or down depending on the 1.5kHz and 2.9kHz response for instance), and panning causing FR changes to each channel based on HRTF, etc. I guess audio engineers don't do this? Or is it applied by software?
There is no height in stereophonic recordings (2 channel stereo, 5.1 or 7.1) and as mentioned, panning does not affect FR (or anything other than level), so obviously “audio engineers don’t do this” and neither does software. There are various reasons why height information might be perceived by a listener with stereophonic recordings (when there isn’t any), for example inappropriate ceiling reflections when using speakers or simply a perceptual error (misinterpretation) when using HPs/IEMs. Why for example, would an engineer mix a female vocal to move up or down, or be at a different height/elevation to the rest of the band/ensemble? The exception to the above is potentially binaural recordings and the “immersive” formats (such as Dolby Atmos) that do have height information, which if converted into binaural may or may not include HRTF height information (applied by the encoding software).
I think I understand what you are arguing here. The linear distortion introduced by IR won't matter for timing as much because IR primarily affects post impulse amplitude, so I accept your argument here.
Partially, yes. Differences in positioning/soundstage can be caused by differences in timing between the channels (ears/speakers) but how much of a time differential are we going to get between two nominally identical speaker/earphones, isn’t the “distortion introduced by the IR” going to be virtually the same for both speakers/earphones? Let’s say for example we’ve got a speaker/earphone that introduces a hypothetical 1ms delay, the other speaker/earphone in the pair will also add a hypothetical 1ms delay, so we have a timing differential of 0ms.
For the second part though, those variables apart from the technical performance of the IEM are being controlled for by the juxtapositional analysis if the reviewer is properly doing that, which I will concede is not always in evidence.
I strongly disagree. Reviewers cannot control many of the influential variables anyway, such as their HRTF for example and they typically don’t even consider arguably the most influential variables (cognitive/perceptual biases), let alone control them. Additionally, extremely few appear to train their listening skills, they just seem to assume that enough expenditure on audiophile products, enough time listening to them and enough passion automatically means they have good/great listening skills and lastly, very few have any sort of reasonable reference, their only reference is other consumer equipment/environments.

The point I’m trying to make is that contrary to your assertion, there is no “quantifiable metric” that can be “directly linked” with soundstage perception. There are numerous variables and reviewers/audiophiles typically ignore (or dismiss) many of the most influential ones and focus on lesser variables or commonly, variables so tiny they don’t even affect the sound reproduced, let alone are audible.

G
 
Last edited:
Dec 28, 2023 at 6:15 AM Post #17 of 36
No, that’s just another audiophile myth. It sounds logical on the face of it but think about it for a moment and it’s nonsensical. Ask yourself what mastering is and why it exists? We have a finished mix from the recording studio/s, why does this need mastering? If all we’re doing is exchanging the “ideal circumstances” of the recording (mixing) studio for the mastering studio, what would be the point of mastering, considering that consumers do not have the “ideal circumstances” of either the recording studio or the mastering studio? The whole point of mastering is to take the mix that sounded as intended in the recording/mixing studio and alter it so it sounds as intended on the target consumers’ equipment.
I think you misunderstood my argument. I agree with you on what you are saying here. What I am arguing is that the ideal scenario is to listen to the mastered track on the exact same equipment used to master it in exactly the same conditions and use that as the reference for subsequent comparisons. Once it's mastered, those variables of production no longer matter for a juxtapositional analysis because that master stays exactly consistent across tests, therefore are controlled for the purpose of evaluation. This is not exactly practical, but ideals don't have to be.
Again, “this purist approach” isn’t a purist approach, it’s just an audiophile myth. But to answer your points any way:
1. But that does not “give a common point of reference to work off”. In the case of speakers, room acoustics has far more effect on the sound reaching the listener’s ears than the performance of the speakers themselves. In the case of HPs/IEMs, then individual: Fit, HRTF and perception have more effect.
2. How do “binaural recordings of everyday noises avoid the problem of artificial spatiality”? The spatiality of binaural recordings is defined (typically) by timing, level and frequency variations caused by an artificial (dummy) head. So, this will only provide a point of reference to those reviewers who happen to have a HRTF similar enough to the dummy head used for the recording.
My wording was vague, I apologize. Reviewers typically choose a set of HPs/IEMs they favor as being the best using whatever metrics they do as the reference point, then compare everything else to that. As slapdash as some of them can be, they at least understand that apples should be compared to other apples, not oranges.

Binaural content isn't a perfect fit for everyone, that is not my argument. Perfect reproduction is not required for a juxtapositional analysis, a consistent point of reference is good enough. The biggest factor that interferes with this kind of analysis is fiscal in nature, this happens in the gun review space constantly and I'm sure it happens here too. If the reference is consistent and the person is honest about their impression, generally the information gleaned is low resolution but translatable.
That only controls the variability between that/those particular track/s but it does not control “the variables of production”. The “variables of production” are obviously still there and there’s no reference of what the soundstage/spatiality should be. In addition to this, the HRTF and listening skills of each individual reviewer are also going to affect their perception, not to mention their biases (as they virtually never control for these). Of course, it’s entirely up to you how much weight you give such opinions.
This is coming up consistently, and this is an epistemological error on your part in my assessment. This argument presumes the master is changing per iteration, in which case production variables do come into play and I would agree. However, if the test track in use does not change, nothing about the mastering of that track changes unless you are willing to start arguing digital reproduction is unreliable?
“Panning” is just a relatively simple control which sends signal level to one or more channels. It does not know whether those channels correspond to speakers or earphones, so it’s the same regardless of whether you’re using speakers in a room or HPs/IEMs.

Sure but then FR changes are only one of numerous variables that change with distance and positioning, for example; the number of initial reflections, the timing of those reflections, the direction of those reflections, the density and duration of the reverb and the overall volume, none of which are accounted for by filters. Furthermore, it’s the relative different amounts of all these variables (including FR) that creates depth in a recording. For example, will say a mid-sized room reverb on a particular sound/instrument in the mix cause that sound/instrument to appear further away than another sound/instrument?

There is no height in stereophonic recordings (2 channel stereo, 5.1 or 7.1) and as mentioned, panning does not affect FR (or anything other than level), so obviously “audio engineers don’t do this” and neither does software. There are various reasons why height information might be perceived by a listener with stereophonic recordings (when there isn’t any), for example inappropriate ceiling reflections when using speakers or simply a perceptual error (misinterpretation) when using HPs/IEMs. Why for example, would an engineer mix a female vocal to move up or down, or be at a different height/elevation to the rest of the band/ensemble? The exception to the above is potentially binaural recordings and the “immersive” formats (such as Dolby Atmos) that do have height information, which if converted into binaural may or may not include HRTF height information (applied by the encoding software).
That makes sense. Thanks for the insight.
I strongly disagree. Reviewers cannot control many of the influential variables anyway, such as their HRTF for example and they typically don’t even consider arguably the most influential variables (cognitive/perceptual biases), let alone control them. Additionally, extremely few appear to train their listening skills, they just seem to assume that enough expenditure on audiophile products, enough time listening to them and enough passion automatically means they have good/great listening skills and lastly, very few have any sort of reasonable reference, their only reference is other consumer equipment/environments.
I won't attempt to argue on their behalf, it was a pain for me to try to understand and extract useful information from their content.

The one aspect that did help was reference sets, that allowed me to connect their lexicon to actual phenomena and find my way to my endgame without buying an unholy amount of IEMs. I still think this argument and it's premise are the weakest of your contentions because you are discounting the value of anchoring phenomena outside of practically unattainable knowledge, it's a platonic ideal and a bit dismissive of the vast calculative potential of the unconscious.
The point I’m trying to make is that contrary to your assertion, there is no “quantifiable metric” that can be “directly linked” with soundstage perception. There are numerous variables and reviewers/audiophiles typically ignore (or dismiss) many of the most influential ones and focus on lesser variables or commonly, variables so tiny they don’t even affect the sound reproduced, let alone are audible.

G
This puzzled me the first time you said it. Let's ignore reviewers and draw out this argument to it's logical conclusion. Soundstage perception has no causal link to any quantifiable metric? Not even theoretically quantifiable metrics? So taken to the opposite extreme, we can take a horribly distorted set of transducers with an incomprehensible, even nonexistent, tuning strategy with horrible phase errors and an awfully long IR and said transducers will not affect spatial information in any way?

This position is incomprehensible to me given your expertise. The main goal of science is to map out causal relationships between phenomena and create functional field theories and subsequently models with utility. The fact we are able to even have this conversation is thanks to deciphering previously incomprehensible maps of causality, let alone enjoy a hobby like audiophilia. I think it is incontrovertible that causal relationships between the objective and subjective exist if we are able to achieve consistent results.
 
Dec 28, 2023 at 6:39 AM Post #18 of 36
Thanks for your explanation! I think this clears my doubt very well. Based on my understanding by the word "technical", people may refer to things like the IEM's ability to reproduce fast sections of songs (e.g. fast EDM, double kick sections in heavy metal), etc
Good to know I could help.

Yeah, that part is right too I think. IME, EDM is pretty good at highlighting a set's strengths and weaknesses because of how clean the synthetic sounds can be.



This track is one I use for evaluation. Distortion and impulse response become pretty obvious on clean bass drops and beeps like these.
 
Dec 28, 2023 at 8:56 AM Post #19 of 36
What I am arguing is that the ideal scenario is to listen to the mastered track on the exact same equipment used to master it in exactly the same conditions and use that as the reference for subsequent comparisons.
Which is somewhat different from what I’m arguing. Firstly, “the exact same equipment” is irrelevant, pretty much any consumer equipment (source, DAC, amp, cables, etc.) will provide the same result, the only exception could be some pathological equipment (certain tube amps/DACs for example) and the transducers. However, “exactly the same conditions” is relevant, Eg. Levels, room acoustics, listening position.
Once it's mastered, those variables of production no longer matter for a juxtapositional analysis because that master stays exactly consistent across tests, therefore are controlled for the purpose of evaluation.
For a purely “juxtapositional analysis” that would be true but I don’t recall ever seeing a purely “juxtapositional analysis”, reviews/impressions typically also include an “evaluation”. Ignoring the variables of perception, HRTFs, etc., for the sake of argument, a review typically won’t just state something like IEM “A” has a wider soundstage, more depth, separation, etc., than IEM “B”, it will also present an evaluation, that IEM “A” is therefore better than IEM “B” for example. In this case “those variables of production” do matter because there is no reference. Maybe the actual soundstage should be narrower, have less depth and separation, and therefore IEM “B” is better than IEM “A” (because IEM “A” is over-hyping these qualities), which is the opposite of the assertion!
Reviewers typically choose a set of HPs/IEMs they favor as being the best using whatever metrics they do as the reference point, then compare everything else to that. As slapdash as some of them can be, they at least understand that apples should be compared to other apples, not oranges.
But that’s the thing, their reference point is not “a set of HPs/IEMs”, it’s a set of HPs/IEMs with their personal ears, HRTF, biases, preferences and listening skills. With the exception of the actual HPs/IEMs all of those things are likely to be different for different consumers and those last three things are likely to be different even for that same individual reviewer! So they’re not comparing apples to apples.
If the reference is consistent and the person is honest about their impression, generally the information gleaned is low resolution but translatable.
Unless they carefully control their listening tests (which they virtually never do) then their reference is not consistent and as mentioned above, their reference isn’t a reference, it’s an arbitrary perception of another consumer product. Additionally, they’re pretty much never “honest about their impression”, not that they’re deliberately lying (although that’s not uncommon), just that they consistently make assertions about differences in sound that they heard, without ascertaining if there actually were any differences in the sound or that they actually heard them (as opposed to just imagining differences).
The one aspect that did help was reference sets, that allowed me to connect their lexicon to actual phenomena and find my way to my endgame without buying an unholy amount of IEMs.
TBH, that’s never allowed me “to connect their lexicon to actual phenomena”, because much of their lexicon is not related to actual phenomena, it’s related to imaginary properties, biases and perceptual errors which don’t actually exist.
Soundstage perception has no causal link to any quantifiable metric?
Correct. Soundstage perception is a perception, in the case of recording/reproducing audio recordings it’s a manufactured illusion that is influenced by numerous different variables and the relationship between them. There is no metric (system or standard of measurement) for this illusion.
Not even theoretically quantifiable metrics?
Not that exists or that I can think of. One would need to completely deconstruct a mix, measure all the individual variables applied and then come up with some metric that encompasses them all and relates to our perception of the stereophonic illusion. Maybe AI will come up with something in the future but I can’t see how this could be theoretically achieved currently.
So taken to the opposite extreme, we can take a horribly distorted set of transducers with an incomprehensible, even nonexistent, tuning strategy with horrible phase errors and an awfully long IR and said transducers will not affect spatial information in any way? This position is incomprehensible to me given your expertise.
But that is not my position! I have clearly stated that FR is one of numerous variables that can influence the illusion/perception of soundstage and as distortion changes the FR then obviously significant amounts of distortion can (under certain conditions) change our perception of the soundstage.
The main goal of science is to map out causal relationships between phenomena and create functional field theories and subsequently models with utility.
True and that’s what the field of psychoacoustics has been attempting for ~130 years but it’s not yet complete, we only have useable models for some of the most basic, simplistic aural illusions/perceptions (such as loudness for example) which only have a few variables, not for complex illusions/perceptions such as soundstage that have numerous variables. Obviously we know the variables and can manipulate them, that’s what sound/music engineers do, how we manufacture the soundstage you perceive when reproducing our recordings but it’s done by reference to our own subjective perception, through trial and error and there’s no metric for it.

G
 
Dec 28, 2023 at 9:44 AM Post #20 of 36
I think I'm starting to see where the disconnect is. This is also too abstract, so let's use a practical example.

I was where the OP is 1 year 7 months ago. I didn't really understand anything about how different IEMs were from HPs, so I started by looking at the portable audio section of head fi to get places to start. I started with the Aria since that was the most widely praised and reviewed budget IEM. I then started watching reviewers, learned about squig.link and learned how to use it, and looked for information on how to interpret graphs. This leads to the Harman IE target and it's methodology, HRTF in general, and how to use an EQ in depth.

Reviewers were unintelligible at the start because they all used arbitrary words that I didn't have any reference to. To fix that, I listened to the Aria for a while and started mapping out what reviewers were saying about it to the FR graph of the Aria using a diagram on how instruments line up to the audible spectrum. This did two things: I learned a bit about my preferences, and I started mapping out each reviewer's preferences on the Aria.

I worked my way up the budget ranges from there, adding to my understanding as I went along, and for the most part I began to be able to predict what reviewers would think about a set before I watched the video (except Zeos, that dude is all over the place lol). At this point, I knew what I wanted and bought my QDC V14, so now I'm just watching for anything actually revolutionary that piques my interest.

It's not all arbitrary and completely detached from reality, there is a method to the madness that is the human experience. I think you are very skeptical of human beings, and while I sympathize with that, people tell the truth a lot more than you think. Many times you have to ignore what they are saying to get to what they really mean.
 
Dec 28, 2023 at 11:50 AM Post #21 of 36
It's not all arbitrary and completely detached from reality, there is a method to the madness that is the human experience.
It’s not all completely arbitrary and detached from reality but it is more detached and arbitrary than most audiophiles seem to realise.
I think you are very skeptical of human beings, and while I sympathize with that, people tell the truth a lot more than you think.
Yes, I am very sceptical of human beings. Firstly, that’s a requirement of science and effectively why it exists. If we could just trust human beings’ opinions/assertions, we wouldn’t need science or the scientific method. Secondly, it’s not really about people telling the truth, people can tell the truth as they see/believe it but still be wrong, people are frequently misinformed, ignorant or fooled about all sorts of things and this is especially common in the audiophile world because it’s so driven by marketing misinformation. Lastly, my whole working life has been dedicated to manipulating people’s emotions/perceptions through sound, first as a professional musician then as a sound/music engineer. So I’m very well versed in how easily and comprehensively the human perception of sound/hearing can be manipulated and fooled, often without them having any idea they’re being fooled.

G
 
Dec 28, 2023 at 12:16 PM Post #22 of 36
What makes an IEM "technical"?


Dunno, perhaps a shrill treble?, I generally move on to the graph and measurements if they are listed, if not I don’t bother reading any any further as flowery prose may fill the allocated word count for the article/video but is meaningless.
 
Dec 28, 2023 at 12:40 PM Post #23 of 36
@gregorio
I'll say this on people. There is a reason rooted in cognitive psychology you listen to what people do before what they say. Humans make a lot more sense through that heuristic, they will shout the truth of what they think and believe through everything but their words. It takes a specific type of person or a lot of training in self deception to change that, and thankfully most people are not and do not.
 
Dec 28, 2023 at 1:02 PM Post #24 of 36
No, that’s just another audiophile myth. It sounds logical on the face of it but think about it for a moment and it’s nonsensical. Ask yourself what mastering is and why it exists? We have a finished mix from the recording studio/s, why does this need mastering? If all we’re doing is exchanging the “ideal circumstances” of the recording (mixing) studio for the mastering studio, what would be the point of mastering, considering that consumers do not have the “ideal circumstances” of either the recording studio or the mastering studio? The whole point of mastering is to take the mix that sounded as intended in the recording/mixing studio and alter it so it sounds as intended on the target consumers’ equipment.
I remember reading in an interview somewhere, that I will never be able to relocate, that laptop speakers are one of the most common ways to listen to music. Unsure if it was meant to be taken literal, even so this unattributed quote communicates a general idea. Commercial digital plugins also allow the audio engineer to emulate what their track will sound like via the major streaming services
 
Dec 29, 2023 at 2:06 AM Post #25 of 36
Commercial digital plugins also allow the audio engineer to emulate what their track will sound like via the major streaming services
Not as far as I’m aware. This can be quite a large set of variables, the streaming music services all apply loudness normalisation but there’s no standard, so they each apply different amounts. This can be defeated in some cases but not others, the playback software may also apply some sort of processing and some/many of the services will apply variable bit rates depending on the service user’s connection quality. That’s quite a set of variables, any or all of which are liable/likely to change and the services usually don’t publicise when they implement changes or exactly what those changes/settings/variables are, which would make it difficult for a plugin developer to stay current. I’m not saying there definitely isn’t such a plugin, just that it would be difficult to keep it current and I’m not aware one.

Playing back through laptop or even mobile phone speakers is not uncommon. A mastering engineer may use some crappy laptop speakers as reference when creating a master for a streaming service, although that would be more for checking rather than changing the master too much to accommodate such speakers. A mastering engineer may also check the mix with HPs/IEMs, as these are also commonly used with streaming services, especially if mixing to Dolby Atmos. However, there’s no guarantee of any of this, a mastering engineer may only use their main monitors or it may vary according to what different masters the client orders.

G
 
Jan 5, 2024 at 12:01 PM Post #26 of 36
With things like sound imaging and the like, I think it's easier to draw the causal link. Width, layering, isolation of individual sounds in a track, and all that would at least be influenced by, if not defined by, the transducer's THD levels and impulse response, would it not? The original signal has all that data in it ideally, so added distortion from a lower quality transducer would obscure that information and subsequently lead to a perception of "fuzziness" in the sound because the extraneous compression/rarefaction of the diaphragm(s) would add energy to the signal that shouldn't be there.
The ER4XR average THD is 0.8 ~ 1.6%, Yet many state It sounds just as clear as the ER4SR and Planar that are <0.4% THD wise.
 
Jan 5, 2024 at 2:46 PM Post #27 of 36
The ER4XR average THD is 0.8 ~ 1.6%, Yet many state It sounds just as clear as the ER4SR and Planar that are <0.4% THD wise.
Well, if we are talking anecdotally, I hear a very clear difference between my 4XR to my V14 or even the Orchestra Lite, which I can detect a difference down to 0.1% and 0.5% THD respectively (Orchestra Lite probably is a bit lower since they advertise 0.3%).
 
Jan 26, 2024 at 3:25 AM Post #28 of 36
Well, if we are talking anecdotally, I hear a very clear difference between my 4XR to my V14 or even the Orchestra Lite, which I can detect a difference down to 0.1% and 0.5% THD respectively (Orchestra Lite probably is a bit lower since they advertise 0.3%).
The very high detail on ER4XR could be unmasking recording noise that a DD would mask from It slow decay smearing the mid range. I've been using BA IEM's for so long I find even Dynamic speakers quite dirty sounding.
 
Jan 26, 2024 at 10:49 AM Post #29 of 36
Well, if we are talking anecdotally, I hear a very clear difference between my 4XR to my V14 or even the Orchestra Lite, which I can detect a difference down to 0.1% and 0.5% THD respectively (Orchestra Lite probably is a bit lower since they advertise 0.3%).
You have to be kidding.
 
Jan 26, 2024 at 11:10 AM Post #30 of 36
You have to be kidding.
Because what reason?

Using pure sine waves and artificially generating THD makes it easier to isolate and listen for distortion, so the theory is as you start at 5% THD and step down, the difference is noticeable if the driver is capable of rendering that difference. The inherent THD associated with the driver will add a bit so it's not an ideal test, but it's a practical demonstration of how capable a transducer is at particular discrete frequencies.

Using that method, the best I can discern is down to 0.1%, below that is inaudible. The generator goes down to 0.001% which seems academic to me.
 

Users who are viewing this thread

Back
Top