Lets take a few steps back.
IMHO the holy grail of sound would be hearing music exactly as the artist wants me to hear it. The sound is part of his performance, so if he wants the vocals to sound like i'm in a tiny bathroom with the vocalist singing softly into my ear, and the rest of the band is playing outside the building, then that is how it should sound. It's the same as him deciding to play a certain note at a certain time, or deciding to have the bass player only play in the chorus. Imagine if you could change the notes an artist plays with something as abstract and inaccurate as picking out a headphone based on online reviews! If this holy grail were achieved and you wouldn't like the sound, it's the same as not liking the performance imo.
***Reasons why scientifically finding the best response curve is nearly impossible without listening tests start here (u can skip)***
There are several barriers to achieving this holy grail, the most relevant of which are the lack of a clear-cut reference reproduction system, and the wide variety of reproduction systems used by music listeners.
Most people think a speaker that measures flat with no distortion or noise is the reference, but in reality room acoustics make this much less clear-cut. Not only do acoustics add a reverberation that cannot be controlled by the artist, they also add time-dependent frequency response changes, making it impossible for both the direct sound and the room tone to measure flat, or even all of the room tone. Over time, the reproduction systems used for making sound changes to recordings (aka mixing and mastering) have leaned in the direction of slightly boosted bass, and slightly reduced treble, relative to a flat-measuring speaker. I don't claim to exactly know the reasons for this (though I have some guesses), but I don't think they matter much for the sake of this discussion.
Also, nearly all sound engineers tend to consider the listeners who use low quality reproduction systems such as apple ear buds, and attempt to make minimal sacrifices to the sound on their good reproduction system so that the music can be listened to on bad reproduction systems.
When you get to headphones, this gets even more complicated, for reasons most of you know, one of the biggest reasons being that headphones bypass our individually shaped heads and bodies, which in real life interact acoustically with everything we hear. with that said, there has been developed an approximate equivalent to a flat measuring set of speakers on headphones, which is called a diffuse field response. Etymotic uses a modified version of this response their target curve. However as I have stated, a flat speaker is not the reference, and there isn't really a reference that everybody acknowledges.
Another point to consider is that modern recordings don't necessarily have anything to do with how music sounds in concerts, and don't necessarily aim to replicate purely acoustic things.
***Reasons ... nearly impossible ... end here***
With all of these factors considered, it is pretty impossible to scientifically find a target response for headphones.
So, the next best thing IMHO (and that of Sean Olive) is to assume that most peoples' concept of what sounds good matches how most artists want their recordings to sound (a pretty good assumption considering artists generally want to please their listeners), and perform listening tests to find what curve is most enjoyable on headphones. This is what I meant by my previous comment.
Sorry for the long post, just wanted to make my opinion as clear as possible.
Edit: Please correct me if I made mistakes in my reasoning.