It's generally believed that the answer to understanding what "detail" is (and how it relates to a headphone's technical performance (TP)) can be found in the frequency response (FR) of a headphone. To understand this concept more fully, one needs to also understand how the frequency response is directly related to the impulse response, and therefore any sort of transient performance (also interchanged with TP in some audio circles) must also be tied to the FR.
The wrench gets thrown into the conversation when we look at a frequency response plot of two headphones that appear to be equivalent, yet one sounds vastly superior to the other.
So what gives then? Well, we often judge TP by how a headphone presents certain percussive sounds, like a snap of a snare, pluck of a string, tap of an electronic generator, or a strike on a wood block. When these sounds come across as separated from the other parts of the soundscape by large dBs as well as sounding "real", we often think of the headphone being "fast" and "snappy". These sounds can also be thought of as "transients". So what makes a good transient sound? This is where the relation between FR and impulse response comes in.
Might I introduce you to the world of the Fourier transform:
https://en.wikipedia.org/wiki/Fourier_transform
What can be deduced by using the Fourier transform is quite important! In essence, a transient sound is a pulse of sound pressure that increases in amplitude while there is an increase in time. What is important to understand is that for any certain signal input into the system (headphone), it's amplitude (loudness in regards to the other sound in the signal) is completely related to the plot of the FR. The more frequencies that are present in an even distribution along the FR, the higher impulse we get from the time domain response.
Essentially, every frequency that is presentable by the transducer gets added up to create the impulse response.
So why does any of this matter if both headphones seem to have similar frequency response?
It matters because the headphone with the "most" technical performance has characteristics in the FR that give the listener the perception of a louder /more realistic transient sounds.
We don't see this in the FR, because we don't know how to interpret such characteristics from the FR. Having evenness in the upper treble helps, but the evenness we see and interpret is not judged equally by the impulse response. Higher frequencies have more weight (not literally, just in reference to the amplitude of the impulse) than lower frequencies when certain sounds are played. So to properly judge the upper FR of a headphone, one would not simply mentally calculate the deviation that the average frequency has from the "target curve" (Harman target, etc.), but a weighted target calculation that puts more weight in deviations in the frequency response the further up the frequency spectrum the measurement goes, and with weights that are specific to human hearing that we don't fully understand yet.
So in short, there is no "standard" for analyzing the frequency response in a weighted way that takes into account how upper frequency response (which is responsible for the "fast" and "technical" sounds) can output an agreeable and realistic sounding transient sound. To throw another wrench into the matter, our individual ears and heads will alter the frequency response of all sounds we perceive! So if a headphone's FR is mismatched with our own ear/ head response, we will perceive a combined FR that we may not enjoy.
*Full disclosure: I am not an expert in headphones, this is just my explanation as an electrical engineer who has listened to a lot of audio engineers*