Correct me if I am wrong on how head tracking works here
Head tracking will make the monitor the fixed centre point of the audio and keep it there, regardless of how you move your head/body - the same way a 5.1 speaker system would.
Hold a pen out in flat just in front of your nose and move your head, but keeping the pen stationary - the sound will stay in place like the pen does.
Normal headphone VSS makes an area somewhere in front of your nose the fixed centre point. That point moves with your head.
Hold the pen touching your nose this time. When you move your head, move the pen with it. The sound will move in relation to your head, like the pen does.
If a character in a game or movie standing in front of you is talking to you while looking directly at you - that sound will be coming from the centre line of the monitor.
With head tracking that voice will remain stationary while you turn your head (YOUR HEAD, NOT YOUR CHARACTERS HEAD)
ie. If you turn look at a corner of your lounge room the voice will still be coming from the centre of the monitor.
With VSS that voice will remain in line with your nose, no matter where you turn your head (your head again, not your characters)
ie. If you turn to look at a corner of your lounge room the voice will now sound like it's coming from the corner of your room, not the monitor.
Both make use of HRTF - the head tracking takes it a step further and locks the position of it down to be fixed in the room, instead of fixed on your head.
edit: as you're usually looking directly at the screen anyway the focal points will be very similar in both cases for the most part. I'm hoping the tracking and the "micro movements" of my head help the VSS fool my brain even better to make the cues resolve clearer and much more accurate in my head.