POINT 1
Headphones don't have personalities. They don't just decide on a whim to produce a recording a different way than they did the last time. The physics behind them is considerably less complicated than the human brain. They're fed a signal, and the driver moves accordingly to produce sound waves. Those sound waves can bounce around, interfere with other waves, and leak out. It's predictable, though less predictable than an amp or DAC.
The measurements by the OP would be done with real music. They will be indicative of real world performance. Why would they only apply to the songs tested? The headphone doesn't care if it's being fed Bach or Lady Gaga. If it can accurately produce one, it can accurately produce the other. Why, specifically, would that not be the case?
POINT 2
I already said I'm not talking about perceived accuracy. I'm talking about objective accuracy. Similarity to the original source, after accounting for different types of field equalizations. The goal is not to make something sound subjectively good. The goal is to make it sound like the original recording. Perception doesn't even come into play. You're welcome to prefer a headphone that doesn't measure up. People love Grados, even I do. Their measurements are not pretty, and I wouldn't call them accurate.