Indeed. Leaving aside whether the measurements were well done (not something one can necessarily assume), they can fail to present an accurate portrait and comparisons for two reasons: 1) the measured differences are inaudible, for instance well below the noise floor (don't laugh), and 2) the measurements are time/frequency averages that miss subtler aspects of sound dynamics that matter in actual perception. Standard measurements can detect gross weaknesses, but once they look good within the audible range, they provide no additional information for selecting a DAC.
I reckon with #2 you've hit upon an important point missed in most arguments about audibility. There are, in my mind, two levels of "audible". The first is what I call "overt" audibility, that is signals above -95dB which it is possible to make out clear differences, to varying degrees. The second deals with what you mention: Aspects of the signal that our brain interprets spacial and other information from that is below what we can readily discern in, say, short A/B tests.