A lot of this discussion is a bit over my head (and I'm worried about hostility... be nice, everyone!)
But actually, I've been asking myself very similar questions. Why do headphones sound so different when so many of the numbers are so similar (that is, some cheap ones seem remarkably similar in numerical tests to uber expensive ones).
My conclusion: the particular numbers we are looking at don't tell the whole story (duh). Perhaps our analysis, to get a fuller picture and answer the question, needs to go further. Maybe more square or triangular waves, or harmonic tests, of more frequencies. The interesting thing is that Headphone.com tests occur with the microphones in a synthetic "head" with ears and such that should match human noggins pretty well.
I'm not entirely buying the "larger diaphragm" or "differing angle" argument too much, as first, the test uses a synthetic realistic head/ears, and also, changing driver angles should be pretty easy anyway. Not something that justifies kilobuck investments.
Also: The "golden ears" tidbit- rubbish. Good instrumentation can capture far more precise/more information than humans could ever capture.