Jesus H Christ, if you could be any denser we would have been swallowed by a blackhole already...
Look (listen), this is 3D = Positionnal = Spatial Audio:
This is exactly what, maybe 5 games do today and what most games did 10 years ago (if you had the proper hardware). It's object based audio, meaning that the position of every sound is reported as accurately as possible then mixed via HRTF algorithms to stereo directly by the game's audio engine, elevation included.
Most other games, however, use channel based = surround audio. Meaning this:
A 2D circle around the player's character with sounds distributed between the discrete channels, notice the lack of elevation. For example a sound coming from a 110° angle is somewhere between the left surround speaker and left back speaker, by playing the LS speaker slightly louder than the LB speaker you get a better sense of direction. This is when you're using a physical 7.1 setup (which does not include elevation, but if you have a fancy atmos setup you can enable it in BF1). If you only have a pair of speakers or headphones, again, 99% of modern games operate a simple downmix like such:
Whatever's on the left of the red line goes to the left channel, everything else goes to the right, the center is played on both sides at the same time. I am not pulling this out of my bottom it's how it works.
Now what the GSX and other VSS do (and what atmos for headphones and windows sonic don't, but are supposed to), they present themselves as physical 7.1 interfaces to fool the games into outputting surround. Then the 8 (or less) discrete channels are mixed into stereo using HRTF algorithms, but this time by recreating a virtual room with virtual speakers in it to mimic a physical setup (because there is no other information available). It's far from perfect, but until every game integrates real 3D audio it's better than nothing.
Note that the exact same virtual room is applied to a stereo signal when VSS is enabled, but you only get the two front speakers and the center, plus some crossfeed (meaning you hear a bit of the left channel in your right ear, and vice versa) and reverb, but in no fricking' way it can pull the rest of the channels from a simple stereo signal. Some games with "enhanced stereo" (like BF1) also use crossfeed, but no HRTF.
As for why the pros use only stereo could be explained by the fact that some VSS solutions are pure garbage and by the fact that generic HRTF doesn't work on everyone. Plus because once they're used to something, they don't want to change, like old people.