Let's back up and think about what the limitations of stereo are and what virtual surround sound does. Since stereo is 2 channels, it creates a fundamental limitation where you can only have 2 distinct directions, left and right. To hear front and back, you have to turn in game, and listen for the change in channel balance, because you don't have those directions natively.
With 7.1 virtual surround sound for headphones, you take the 7.1 audio from the game, and convert it to a binaural mix for headphones. This means you take into account the effects of the outer ear, head and room, and how it effects one ear relative to the other depending on where the sound is coming from. For example, a sound 30 degrees to the left will hit the left ear first, get certain frequencies amplified by the left outer ear because of the angle, get it's high frequencies cut by the head as it passes to the right side, hit the right ear second with a different sound profile because of what it's gone through, then bounce off the room and hit the right side again with yet another sound profile. There is a pattern there that encodes the direction, and it is distinct for 30 degrees left vs 150 degrees left. This information is absent in regular stereo headphone listening, and why there can't be a distinction between those different directions - there is just one left on stereo.
The entire system where you are subconsciously cross referencing the same sound as it travels over time and changes as it hits different structures is missing with normal stereo headphone listening, and is responsible for the poor imaging. That's not something you can fix just by getting better headphones, you have to use a virtual surround sound system to add that information back digitally.
Here is a demonstration where the lack of information in stereo causes problems in tracking an object (at 2m42s).
Because there isn't a distinction between front and back in stereo, there are multiple plausible ways the object could be moving, and there isn't a way to know which is correct. Only with virtual surround sound (baked into the video) do you have enough information to know how the object is moving.