There's no magic involved. Middleware has all the exact positional information, voices.
It could do the same as CMSS-3D (like elevation), it's just too CPU-bound so it's less precise and accurate and often only does the most basic stuff like doppler, panning, matrix etc...and relies more on prebaked audio (Dead Space is a heavy contender here). Some Xbox 360 games occupied a single core just for audio, it was inevitable for the Xbox One to get a dedicated audio chip. The current consoles kept that standard in all these years. Easier for multiplats, easier for porting to PC, less driver problems or hickups compared with DS3D etc...
A few titles like Killzone etc..went beyond that standard and did stuff like wave tracing. Killzone game audio is among the best you can get. Killzone Shadow Fall uses the whole Ram of a Xbox 360 just for audio (DF tech analysis).
With the new generation the bar will be raised and we get more CPU-intensive audio, which is not a problem for the platform PC. There's the technical potential and then there's the designer's talent. All the horsepower is useless if the designer is mediocre. See plenty of movies, blockbusters as well.
Middleware like fmod, miles etc...could start working with binaural algorithms, FIR filters, HRTF measurements to offer something for the headphone craze so ingame headphone modes behave like CMSS-3D and better. Then we don't need DHP, SBX etc.. anymore.