So, I finally took some time to read this article. I am not an audio expert, as many here. But as I have said, I know a thing or two about research. There are just some baffling, staggering statements made in the full 2 part article when you read it.
DBT plays a vital role in medical and drug research and can also be useful in audio for detecting sonic differences that fall within the range of the measurement system’s and subject’s sensitivities. Such tests are essential for proving that two conditions are different at a statistical level of high probability when properly executed. To be scientifically proven, an individual must guess correctly no less than 23 times out of 24 trials. If you obtain such a result you can go home happy. But if you obtain anything less or a completely random result “proving” no difference, then you could still be left with the nagging question of whether your experimental design or your assay method was flawed or not sensitive enough. So finding no statistical difference, as has happened in so many pseudoscientific audio tests, is not conclusive.
I love that these pseudo-scientific dingbats prove out the gate that they're crazy biased. And I have read this exact sentiment on this forum. "ABX testing is bogus. I know this because I tested to sources with the ABX method that I know are different, and I couldn't tell them apart. Therefore ABX testing shouldn't be used." I mean, the inherent stupidity in such a statement should be apparent to anyone with functioning brain cells.
But that the above got published means it throws the entire publication that would print such idiocy in to question. I mean, it literally says "If you prove that two things are different, then your test was good, otherwise it wasn't." For me, the way that I was taught long ago to weed scientific inquiry from pseudo-science was this: did the "researcher" present a theory and a way to disprove it. That is the goal of science, to
disprove your hypothesis. The greatest scientists the world over layout methods to disprove their theories. That's how it works. That is why an experiment can disprove or fail to disprove a hypothesis. It can not prove it. They have immediately (and clearly unknowingly) thrown their own credibility out the window with this paragraph.
If you do find a difference and you can run a legitimate statistical analysis on the data, then you have proven your case that there is a difference between two conditions.
Where the pseudo-scientific objectivists go wrong is when they engage in “triple-blind testing”. This we define (with tongue in cheek) as limitedsensitivity, double-blind-tests, coupled with negative expectation bias, an unholy trinity
Alright, firstly, there is definitely the possibility that we have an expectation bias here in SS for there to not be a difference. This is why you can't take a group of us, and generalize results if you conducted an experiment with us as your subjects. Nor can you generalize results of a trial with audiophiles, or engineers, or young girls, or old men, or anything else. Selection bias isn't oft talked about in regards to AES studies and the like. I mean, there tends to be so many other ways to pick apart these studies, there really isn't any reason to expound in to something like selection bias. But I have yet to see a single study referenced here that appear to have a sample size that was sufficiently large,
and wherein the sample was selected randomly. So generalization isn't possible. That's why I've tested stuff on my own with an A/B switch. I can't generalize any of the results I've seen (though I know which side I err to).
Anyhow, my point is this, if they were really interested in proving the pseudo-scientific objectivists wrong, it would be simple, but they would have to have a good random sample to experiment on. And they are quite clearly uninterested in understanding and/or performing good science.
In addition, there are many articles available discussing the limitations of DBT and ABX testing for audio, including the Bell Labs scientists who originated the ABX test. A random sampling of a few references is given below*
One of the references is a freaking forum thread. A. Forum. Thread. Awesome. I didn't bother to read it because it's a wall of text not broken in to paragraphs. One is some random dude's blog who says that blind tests aren't any good because listeners would have to be trained to hear the difference between anything but transducers. We have had similar arguments so many times here where someone wants to throw ABX testing under the bus because of something that isn't inherent in ABX testing. You can train someone before their ABX. And the article was actually trashing all blind testing. Which is, of course, ludicrous.
They also provided as I guess they were mentioning when they said that they were including the researchers that originated the ABX, the 1950 article Standardizing Audio Tests. I think they're confused because here is a quote from the abstract:
The purpose of the present paper is to describe a test procedure which has shown promise in this direction and to give descriptions of equipment which have been found helpful in minimizing the variability of the test results. The procedure, which we have called the “ABX” test
So in telling us why ABX is bad, they cite the paper that says why we need ABX testing. I am sure they discuss the limitations of it. The authors (that is the FLAC is bad authors, not the authors of Standardizing Audio Tests), of course, have shown a complete misunderstanding of how research works, and probably don't get that you always talk about limitations of your testing procedures. That's just standard part of research (which pseudo-scientists wouldn't ever think about doing).
...most of the measurement variations you see in our results come from the mind wandering a bit during these listening sessions. Fortunately, most differences found were large enough to clearly pass statistical tests of significance....
Were the listeners' minds wandering, or the measurers'? That seems like a pretty big issue. "We measured a difference. Most was because we weren't paying attention."
They repeatedly talk about how and older version of their playback software didn't allocate enough memory, so if there is a difference, it could be resulting from a bug in that software. They noted that the differences were significantly reduced in the current version of the software. Which sounds like there was just a bug.
From what I have gathered. The person doing the measuring isn't blind? It's a different person than the listener, correct? Or is the listener eyeballing the tape measure? Either way, I'd really LOVE LOVE LOVE a video of this experiment, and would be willing to bet the person who knows which source it is provides cues. Like, stepping down a step-stool, starting to point places on the ruler.
I can tell you that their conclusions are wrong because, as was quoted earlier:
Much to our surprise we found that the derived WAV files exhibited a highly audible, hyperbolic decline in sound quality, as estimated on our subjective scale, despite measuring identical by standard null testing.
I mean, you just can't make this article up.
@bilboda How closely did you read this before you concluded that it "Seems like good science?"