At least from my inquiry I think you guys are thinking way to deep here.
Let's take 2 different IEM's. ER4 and W3 (or IE8). Don't even want to get into to which is better or more accurate as that is not the issue at hand. The ER4 sounds very quick in attack, maybe too much to maintain naturalness while the later two seem NOT necessarily slow but slower in attack and decay where the presentation sounds like in a bigger room or hall and NOT so directly wired to the brain which can be fatiguing.
So what is it about the design or tuning that makes this difference? And, yes, I have to assume it is intentional and not arbitrary.
Maybe an even easier example is UM3X versus W3. Both triple drivers with about the same amount of bass. UM3X is a stage monitor and sounds directly wired to the brain with little "space" where the W3 sounds about 10 rows back and generally more pleasurable to my ears.