We should think that it’s the same old BS audiophiles/reviewers have been coming up with for 35 years or so, ever since properly executed double blind tests started demonstrating that no one, including audiophiles, could discern audible differences between many/most/all components. So in order for audiophile manufacturers to justify their prices, audiophiles to justify their purchases or reviewers to justify their jobs/reputations, there was no alternative but to (falsely) discredit DBT/ABX testing. Numerous methods have been employed to accomplish this over the years: Not understanding and/or misrepresenting what blind testing is, what it’s for or how it should be used, performing invalid blind tests, making up false conclusions/assertions not indicated by the results, concentrating on *potential* deficiencies of the testing protocol (even those already solved decades ago) and various other ways and variations on the above. The quoted article employed pretty much ALL of the mentioned methods!!
I think
https://www.audiosciencereview.com/...-error-metric-discussion-and-beta-test.19841/ was the latest development for that on ASR, but I still need to do more reading.
“
Latest development” in terms of judging the audibility of a null test result. However, it’s still somewhat limited in this regard as it has to rely on psychoacoustic models rather than on the actual hearing ability and listening skills of individual listeners. So it should be viewed as a sort of “ball park” guesstimate. For an accurate answer, then an audibility discernment test is required, say an ABX test.
I would be more interested in software that could plot any spectral differences with respect to time for two chains playing the same music through the same transducer.
Personally, I’m far less interested in that. Firstly, “two chains” is an awful lot of variables and would therefore tell us little/nothing about individual components in each chain and Secondly, measuring the output of the transducer introduces a bunch of potential measurement inaccuracies, particularly with HPs/IEMs, that are very likely to be of far higher magnitude than the differences we’re trying to measure between many other components.
As for Resolve, others may be better at explaining the details of what may have been wrong with the test and what additional controls were needed.
God, where to start! In fact, it’s difficult to think of anything he got right, even the very premise of blind testing, let alone how he executed it!
G