Well, obviously that is to some extent a judgment call. However, as I mentioned, there does seem to be a broad (but not unanimous) consensus about the coloration of a number of TOTL headphones - e.g., the HD800 and SR009 are on the analytical side of neutral, the HD650 and SR007 Mk I are on the warm side of neutral - they tilt one way or another from the ideal. Since Tyll, for example, fits within that consensus, my conclusion is that his judgments of headphone "tilt", if you will, are close to the consensus. Most recordings are made using speakers, not headphones as the playback transducer. I think the majority of monitor speakers these days are relatively flat in terms of frequency response, or are equalized to be relatively flat. Again, what is "flat" is to some degree a judgment call - for example, Bob Katz, a respected mastering engineer, says that he adjusts his reference speakers to be subjectively flat based on a selection of around 50 recordings made with different microphones, etc. Those recordings include some of his own, and since he was there at the recording session, he probably has a better grasp on what they should sound like than most. One reason the BBC series of monitor speakers are legendary is that the persons who engineered them were able to do live-vs-recorded testing when they were designing them.
Since headphones will never sound like the original in terms of imaging (nobody has ever fit a singer, let alone an orchestra, inside their head), we have to judge on other grounds - detail reproduction, coherence, and yes, frequency response among them. If a headphone tonally sounds like the original (not too bright, not too warm, no nasal or other coloration) then it is relatively neutral. That's probably the best we can do. To some extent it's a circular argument, because we make our judgment of recording neutrality based on listening to recordings on speakers or headphones, the recording uses microphones, and none of these, mics, speakers, headphones, or recordings, are perfect. Sometimes using an imperfect recording is useful, for example, I have a recording on both LP and CD, and compared to the LP the violin sound on the CD sounds a bit shrill and "acid", so if a headphone (for example, the Stax SR007 MK I) makes that recording sound less shrill, then I conclude that the headphone is on the warm side.
Now, if you have a copy of the original Stereophile test CD, one of the tracks has the late J. Gordon Holt reading one of his articles, as recorded by a variety of microphones. As you listen to it, you will notice how noticeably the timbre of his voice changes depending on the microphone. One test that you can do yourself is to take a condenser microphone that is relatively flat (Sony makes some good, inexpensive electret condenser mics) and record the voices of some people you know well, e.g. family members, then play it back on your speakers or headphones and compare it to the sound of the original person. That won't cover the deep bass or highest highs, but will give you a fair take on what sounds neutral. If a headphone makes Julie Andrews sound like Ethel Merman, it's wrong, no matter how much of an Ethel Merman fan you may be.
Finally, you seem to assume that a neutral transducer will sound sterile. To the contrary, a sterile transducer is not neutral - it subtracts from the original. A neutral transducer should convey all the emotion and beauty of the original, no more, no less. That's the goal. Part of that is conveying the timbre of the original - a singer on a recording should sound like they do in live performance, not deeper, or lighter, or like they have a chest cold, or like they're spitting into the microphone, etc. And my contention is, that a subjectively flat frequency response allows that to happen more frequently for more people than a subjectively non-flat frequency response, no matter how pleasant that may sound for some. And that is supported by Toole and Olive's research.