The reason they *should* sound the same is because these amps have supposedly been designed around being wonderful at reproducing audio, not simply adding some pleasant colouration. The O2 is provably transparent to the best of our knowledge by virtue of the fact it excels in every known parameter. As for more complex tests, why would you expect they are needed to characterise performance? NwAvGuy has written moderately extensively about why he feels absolutely fine armed...
Replaygain is probably your best bet.
By source they mean DAC. The drive quality and network connection has absolutely no effect on audio reproduction: as long as the data gets to the Squeezebox, it doesn't really care.    The output has also been measured as low jitter, which gets rid of the one flimsy excuse people hide behind for all these supposed giant differences between sources.
  No, not at all. Claiming that nothing can be said either way is simply bollocks and a horrendous distortion of the burden of proof.   There are recognised numbers for the audibility of measurable imperfections. The O2 keeps everything at least 86db below the signal and the distortion is generally of a relatively benign nature.  There are established thresholds of audibility. The O2 takes the stricter ones, makes them a bit more paranoid for good measure and then makes...
Heh, I'm quite happy for people to hear differences between cables/amps/DACs just as long as it's under controlled conditions. When it is under completely uncontrolled conditions...it doesn't really show anything of use.
+1. @bmiamihk: Why do you bother?  
I really don't see on what logical basis claiming the O2's measurements are not comprehensive is made unless these comparisons are done under blind, volume-matched conditions. Likewise, I'm not entirely sure what Anaxilius is on, regardless of his "scientific background." Perhaps he could explain himself in a post that uses logic instead of cheap humour?   I still think there is this incredibly annoying perception whereby people think "Well, it's not all in my head....
Wait, how did we get from "The O2 will sound the same as any other amp that measures equally well or better (could probably get off with a fair bit worse TBH)" to "The O2 will sound the same as a given popular SS amp that has never been measured?"
 The HRTF for headphones, at least as far as frequency response is concerned, varies hugely with position, head size, age ect.
As nobody has ever publically measured the Leckerton (although what they do have on their site looks OK, although highly limited in what we can derive from it) and we don't know what it does to the sound (if indeed what it supposedly does to the sound is audible under controlled conditions) that could prove difficult. If you're rolled opamps it gets even murkier.   Once Tyll has got his "measure popular amps based on NwAvGuy's template" plan underway, we should have...
