We can't all do double blind tests and ABX with the way our configurations work. I did do an ABX though with my Bifrost Uber and Theta Progeny v. A since I have the ability to switch sources since I am using USB and coax separate. I also spent months listening to the Uber before receiving the more recent DAC so my audio memory is fresh.
Conquerator is spot on when he says that the imaging reaches out of the earcups and doesn't hit this "wall". Also the slightest bit of glare in the treble is evident on the Uber. Take the imaging and treble glare and that is my main caveat with the Uber. I would go as far as saying the Uber's bass extends very low and nice, but doesn't have the same kind of slam as the Theta. This slam in the upper mids and lower bass is what makes the Theta sound more enjoyable and musical with percussion. Then of course it has the super burrito DSP filter that stabilizes the image and offers a 3D soundstage that the Uber just doesn't have.
No, I'm not suggesting that all tests should be done via ABX, I realize the logistics. I was asking more about the original assertion that something 'sounds bad' and whether that was opinion, or the result of well-controlled listening tests with many subjects.
It does kind of make me wonder if at the CAN JAMs ABX tests are run for specific electronics. (Dr.) Sean Olive presented some interesting findings here in Detroit back in '14 (at least I think it was) about the difficulty of doing blind tests with 'phones due to differences in (cup) aspect ratios, clamping force etc., so I get that in terms of 'hiding' the identity of headphones. However, for electronics (DACs what have you), ABX is fairly easily implemented; I think it would be
interesting to perhaps have a poll of gear that people would like to see tested in a non-biased fashion (ABX) at one of the CAN JAMs to see the outcome.
We all make comparisons with gear that we own (it's the practical thing to do), but that we know what we are listening to skews the results - it just 'is'. I mean, if there are clear, unmistakable differences between two signals, pieces of gear or whatever, then the confirmation bias matters less, if at all (if the signals are very disparate) and one needn't worry about double-blind. However, if two signals or pieces of gear are very close, performance-wise, then double-blind really has to be performed to determine if there is a repeatable, identifiable difference.
Again, I am not saying that for someone to say that something 'sounds awful' or 'sounds great' is wrong when comparing two things. What I am saying is that 'awful' and 'great' are subjective and unless backed by controlled tests are opinion. That does not invalidate the opinion or its weight, but opinion should not be confused with a factual case wherein a repeatable statistically significant result is realized as a consequence of double-blind tests. This is why I had asked the question about peer-reviewed papers having been published on the subject - I'm somewhat curious about that...there have to be several...somewhere out there...