I reckon the reason why the soundstage is diminished in MP3 vs FLAC is due to the loss of the "air"... frequencies above 16-18khz. The ear and brain use many tricks to position things around you, and two sounds that are just every so slightly offset left-right need a higher frequency to be differentiable. It is something like the phase offset of the sound to the two ears.. Since if a sound is closer to one ear than the other the waveform is moved forward or back from the other ear's, so that the rise of the wave entering the closer ear happens before the rise of the other. If this is a 100hz sound like a subwoofer, there is not much difference between the level of the two waves to both ears when the sound source is moved a few degrees over, because the waves are so long. This is why sub position in a room doesn't matter much, e.g. you don't hear the fact that there is only one sub on the left side. Positioning a guitar, especially considering rotation of the guitar (so like the head resonating at high frequency and the body being heavier, resonating lower) needs very high f content in order for you to be able to make out whether it is rotating in the guitarists grasp or moving a few degrees over with each strum. The high f has a lot to do with imaging, and there is no limit to how high you need the content to go. It simply increases the spacial resolution of the soundstage.
The wavelength in normal air of a 20khz wave is 17mm. That means when you turn your head such that one ear is 1mm closer to the sound source (visualize this for a moment, moving one ear 100mm closer to the sound source turns your world about 45 degrees, so half a degree of rotation), your ears and brain have to hear and calculate the fact that the peaks (or zero crossing or any identifiable point) of each wave are shifted over by about 3 microseconds. For a 20,000hz wave, this means that one ear has the peak of the wave hit it (100% power of the wave) while the other is approaching the peak or dropping away from it with the wave power level of 93%. I am not sure if the brain/ears can sense this accurately in time for this power difference. If the rotation of an object is not half a degree but instead two degrees off center, the wave is at zero signal in one ear and +100% in the other. This is more realistic. What if the wave was eight degrees off centre, so that the peak of each hits at the same time. Well, this is why sine waves are difficult to localize in space. For real-world sounds, the crude positioning is done using lower frequency, where there is an offset, and finer and finer positioning with higher frequency content.
The sampling rate, or rather temporal accuracy/resolution of the brain areas processing this difference in phase limit the spacial resolution of your environment (or a recording). Thus this content has to either exist to give a higher spacial resolution, or your brainears have to be super sensitive to a one percent difference in wave height between the two ears for a lower frequency sound at any instant in time.