Eke, I think the truth is somewhat more complicated than you address below. 'Soundstage' is neither a coloration, nor a 'voicing option'. Soundstage is the result of proper mic technique, when recording an acoustic performance. Much of the information about the size and ambience of the recording space is captured in the low level detail in a well done recording. Mask or miss that detail (can you say "lossy compression"?) and that info is gone.....
There are also some clever studio engineering tricks where subtle "adjustments" can simulated soundstage - this you address quite nicely below. In other words, every recording does not have 'great soundstage', great bass, or great anything. if it does, you system has rose colored glasses on and may be fun, you may enjoy it, but it is adding distortion. I know it would be incredibly tedious to read (or write) a review where every sentence had the words 'if the recording allows it'..... but I sometimes get the feeling that, as you mention below, people often lose sight of that. On the other hand, I sometimes think you underplay what you call voicing differences.....
I'm referring mainly to soundstage in terms of variations between top tier IEMs.
Here's an example of what I'm talking about...an excerpt from the 1plus2 to ASG-2 comparison I did a while back. Note the bolded areas.
When I first listened to the 1Plus2, I was immediately struck by how similar the tonal balance was to the Beyerdynamic DT880 that I owned for a couple of months. Both phones share a slightly boosted low end, and mids that are scooped just enough to showcase an amazing treble response. The treble itself works wonders for the overall sound, bringing clarity, detail emphasis, and changes in timbre that are either negative or positive depending on the song.
To me though, the real hallmark of the 1Plus2 is its soundstage and imaging. The stage is about the same size as that of the ASG-2, but the 1P2 has the sound signature to take advantage of that space, creating a starker image thanks to the increased treble and lessened bass response. The scooped midrange also moves the vocals far back enough to allow sound cues to dance around the stage more.
Note weight is decidedly lighter on the 1P2 than on the ASG-2. I've outlined the benefits of this already, but there are also downsides. For instance, I like a good bit of rock music, and I like my distortion and electric guitars full, crunchy, and powerful, toms deep and impactful, and vocals that have emotion. I don't get this with the 1P2, at least not vs the ASG-2. Guitars sound remarkably less full, and the song loses the engagement factor for me.
I went on to test the soundstage of both iems using Chesky's Explorations in Space and Time binaural album (seriously, you need to buy it now if you don't already own it). the test confirmed for me that both iems are equal in soundstage dimensions, but the 1P2's change in sound sig goes a long way towards creating a starker image.
As for an explanation of detail, look no further than the Etymotic ER4S' tuning. It exposes so much in a recording, yet it does it with one single balanced armature driver. No fancy multi-BAs, no hybrid design. Look no further than its raw FR graph for the explanation...hint, it's that massive boost in the upper midrange.
There needs to be a change in the whole paradigm of "lossy" compression. In reality, there is no compression at all. There is only the removal of unnecessary waste from the parent file that is useless to the end consumer. Again, see the RAW vs JPG debate
after post processing is complete. These hi-res files contain space that is left there for more scalpel like precision when it comes to mastering. You can make changes without affecting nearby vital bits. For instance, you can alter the pitch of a note without doing so for the next one. Such is the use for these large files. Realistically, and
scientifically, there is absolutely no sonic benefit of a FLAC file over a properly encoded MP3 file.
Now, when you start to go down from 320, you are beginning to compress necessary bits in the file. at 128kbps, you've effectively trimmed way too much from the file.
A more visual example. Imagine that a FLAC file is a thick cut of NY Strip steak. A master chef comes in and carefully trims the fat off the meat. This is the resulting 320 kbps file.
If you let a janitor come in and try to do the same trimming, the steak obviously won't be to the same standard as the one created by the master chef. This is the difference between encoders.
I'll be bookmarking this post for future talks, as I don't know how many times I can say the same thing.
Sources: Over a decade of computer experience, as well as a CCNA certification that taught me how digital files are packaged (headers, body, etc.).