Distortion is the best and only reliable way to test headphones. Period.
Comparing 2D visualizations is so primitive and inconclusive it is no wonder that manufacturers continue to sell magical wires and tubes. Comparison of devices is a computational problem in the same ways that banks calculate bond values and insurance companies calculate risk. Humans can't come close to doing a decent job - so I can't understand why so many insist on doing it. It is time for more audiophiles to embrace 21st century digital testing methodologies.
Distortion is the central measurement
Distortion or variation in relation to time and amplitude is the only measure that matters.
There are just two basic measurement steps:
- Digitally sample a source signal S and output from a measured headphone M at a sufficiently high clock rate to reliably detect variances in both the amplitude axis and time axis.
- The resulting distortion is represented as a surface D that is the dimensional difference between S and M with respect to amplitude and time.
All further investigation and analysis is performed on distortion surface D.
Various types of regression analysis of the distortion surface D create a distortion coefficient used to compare devices. Algorithms that penalize time axis errors but forgive amplitude errors (a.k.a., frequency response variations) will be needed to properly flag 'accurate' versus 'colored' devices. This type of supervised machine learning is commonplace in fields as diverse as medicine and robotics. In these industries, errors rates can be 30% or higher. Audio measurement - where % error rates are in the single digits - is relative child's play.
Human listening can be employed in a restricted manner. Descriptions of audiophile terms like "warm" and "forward" can be collected via surveys. The results can be used to categorize classes of distortion surfaces. There will be large variations in the results. But humans have to do something, so you can at least ask them their opinion. Just don't use humans as measurement sensors.
You can also graph surface D as a visualization for human consumption if you wish. These images are inconclusive but large variations can at least be used to help filter really good or bad performers.
We could do this type of testing now. Not sure why we don't.
Edited by Gr8Desire - 4/23/15 at 8:30am