A bit late reply, but what does soundstage look like in waterfall measurements?
The brain determines the position of a sound source by three methods:
1) Timing of the sound arriving at your ear.
Left/Right/Distance is determined by the minute delay between left and right ear (a sound from the left arrives earlier at the left ear) and also between high/low frequencies (higher frequencies travel faster and so they arrive earlier than low frequencies if the sound is coming from far away).
This method can determine left/right and distance but not up/down or front/back
2) Directional absorption characteristics of your body
Your upper body, your outer ear and inner ear all absorb frequencies different depending on the direction the sound is coming from. Your outer ear filters frequencies coming from behind and reflects (increases) frequencies coming from the front.
This is the main method to determine front/back or top/bottom.
3) Change in sound after tilting your head
If you hear something and want to pinpoint the source you instinctively tilt your head to get a more precise location.
Number 3) is rarely addressed in headphones/IEMs as you would need a head tracker for that and 3D sound information. Maybe that Atmos will develop into something like that in the future.
Number 2) is very dependent on the body of the listener. Usually binaural setups or digital effects simulate simple body models that are good enough.
Most music is mixed to be played on speakers, where your body affects the sound as the sound is coming from speakers in front of you. With headphones only your ear influences the sound (which is why most modern headphones have angled drivers) but with IEM you don’t even have that and you have to rely on it being mixed into the music you’re listening to.
So we’re left with number 1), timing accuracy. And that you can see on the waterfall chart (to some extent)