First you need to know what kind of music you will listen to and if possible bring a playlist. Things that you are VERY familiar with. And go with 3 or 4 tracks and the same one on every headphone.
A bit hard to be "very" familiar with an entire symphony, especially when you are cycling between 10s of different recordings and many other pieces, but that's just my personal listening practice. Even for the same headphone and EQ profile, I go by feel and subjective qualities when evaluating
recordings. My reference is the sense of "realness" or "vividness" reminding me of live classical concerts. Some recordings can make your eyes light up in similar fashion to that first time you put on a headphone at the shop that surprised you with a lovely sound. Clarity, dynamic range (of the recording), fullness, spatial cues of the recording, balance. Otherwise, yes, one may be familiar with particular instrument timbres at least as rendered by one playback system that they like and then focus on those when comparing headphones for the same track.
I like Lindholdt prefer faster (and reasonably volume-matched) A/B setups (even investing in a switchbox to facilitate a complete headphone and EQ switch within 7 seconds) as this better allows me to find those cases where I think I hear some quality in one headphone and end up finding the same in the other (e.g. "detail" or the width of the trajectory of the pencil in Yosi Horikawa's "Letter"), else having that quality disappear entirely. Let's say fast A/Bing should reveal the truly "night and day" differences that are worthwhile.
@Schmackofatz
While perhaps not something a beginner should be concerned about, in the case of one's being willing to explore the rabbit hole of EQ, I've found in practice that some qualities in one headphone that you don't find in another such as "sweetness" can upon closer inspection end up lying within the frequency response as well as differences between published measurements of a test head and how the headphones interact with your own ears (see
https://www.head-fi.org/threads/mez...eadphone-official-thread.959445/post-17549413 (post #4,665); this also covers how the EQing down of perceived peaks can unlock exquisite clarity and maybe spaciousness). Likewise, I've mainly found subjective differences in imaging or soundstage size to be merely related to the size of the earpads and the distance of the drivers from your ears, plus some tonal effects like having a dip between 1 kHz and 3 kHz. Imaging accuracy I'd say depends on how well-matched the frequency responses of the left and right drivers are, such matching not always correlating with price. Comfort and earpad feel can also have quite a bearing even between headphones EQed to similar frequency response and that have similar amounts of space around your ears (e.g. my Arya Stealth and Meze Elite). Comfort can also be a complete deal-breaker for some headphones that can otherwise sound quite great or have excellent EQing potential (e.g. my recent experience with the Stax SR-X9000 which I have been saving up for).
I personally already knew that I liked neutral sound or the best I could get a hold of at the time (I've now gone as far as
purchasing Genelec 8341As and measuring my HRTF outside so I can listen to my best approximation of perfectly flat speakers in an anechoic room (post #61)), but understandably, you might still be looking for the sound signature that
you prefer. Even then, I would lean against headphones that dip too much of any part of the frequency response as it is better to later EQ a region down than to try to fill a big dip as that might drive the headphone into audible distortion, and likewise demand more power out of your amp. If ever you do choose to EQ, I would look for a "minimum-phase" implementation, which I believe Equalizer APO is, whereby I've measurably found that it can
correct some phase errors (post #5,152) when flattening the response. Likewise, I would favour the headphone that is known to have the lower distortion. Otherwise, though it probably barely matters in practice, one test that I've found seems to be independent of frequency response is the qualitative sharpness and decay of transients as played in
http://pcfarina.eng.unipr.it/Acustica-samples/Dirac.wav (a single sample sticking out at 48 kHz sample rate); to me, a truly "fast" headphone would have this file sounding exquisitely sharp and incisive with a very quite and/or fast decay possibly consistent with a clean Cumulative Spectral Decay graph.