I disagree with audioengr. Metrics that are not perceptually weighted are of limited utility. Blind listening tests are the answer, but if conducted properly.
The most sensitive to distortion information the brain gets from music is spatial information, not tonal. Thus one should use a good binaural recording with very high quality headphones, since non-binaural recording and playback geometry guarantees that spatial information is ruined and the soundfield cannot be recreated properly at the ears.
A couple of years ago I was playing around with opamps, and I noticed that chaining multiple opamps in series (unity gain) didn't seem to cause an audible effect, until I paid careful attention to imaging (it was a binaural recording)--there was a slight reduction of my ability to localize the sound sources.
Conducting a blind test this way is difficult. One problem is that binaural recordings are made with different HRTFs than the listener's and even small variations affect auditory localization--a testament to the incredible sensitivity to distortion of spatial information. An alternative is to use a convolver plugin in the player application and HRTF data from
the database; there are sets from many subjects there so one can usually find a set among those that works well for them.
Unfortunately, I'd have to say most headphones are immediately disqualified when one looks at the terrible response curves. Electrostatics like Stax Omega 2 are low distortion but in bad need of equalization. The ultimate driver for such testing would be the Plasmasonic headphones; nothing I've seen in the speaker or headphone world comes close to these curves: