I've designed many test protocols for situations other than audio, and, especially if you're trying to determine the smallest difference or value the subject can distinguish (maximum sensitivity), it's almost always best to make the test as simple as possible - and to avoid all extraneous activity or thought. For example, if you want to test how well your subjects can distinguish differences in colors, you make colored tiles of different colors, hold them up in pairs next to each other, and simply ask the subject "Do they look like the same color?" (You don't hold up three tiles and ask them which ones look more like which other ones; that would be testing a more complex "function".)
With this in mind, if you're simply testing whether something is "audibly different or not", then any form of "full ABX testing" is needlessly complicated.
Here's how I would do the test.....
First, again to simplify, let's simply refer to our signals as "Reference" and "X".
The subject will have a simple way of selecting either the Reference signal or the X signal to listen to.
(It could be a toggle switch, labelled "Reference" and "X", or a pushbutton that toggles between the choices and an indicator light showing which is currently selected.)
The test run will consist of a series of individual tests.
For each test, the test set will be configured so that either X is a copy of the Reference signal, or X is the modified signal it's being compared to.
The test subject will then be allowed to play the test sample, switching between the Reference and X signals as quickly as they like, and as often as they like.
When they have decided whether they think that the X signal is or is not the same as the Reference signal they will report their choice.
(If they aren't sure they should be asked to guess. I'm pretty sure that most subjects will find guessing "yes or no" to be less stressful, and to require less thought, than guessing "which something is most like" if they're uncertain. In order to compare the results we get to those expected by simple guessing, we do require each test subject to complete all tests, and answer all of them, so we want to make that as easy as possible for the subject to do.)
By doing it this way we have minimized the requirement for any sort of memory, or for any "cognitive load" associated with deciding upon matches.
We have simply made the question "Did the sound change when you flipped the switch or not?"
Note that there SHOULD be a very slight audible tick or pause each time the switch is actuated (this will cover up any slight differences between switching between copies of the same sample, and switching between different samples, which might serve as conscious or unconscious cues as to which is occurring).
Obviously, if the signals are really audibly identical, then we would expect results consistent with the subject guessing.
And, if they statistically do better than we would expect from guessing, then that suggests that there are in fact audible differences.
Note that this test very specifically determines whether there is ANY audible difference between a Reference signal and a test signal.
It does NOT determine what the difference is, or which signal is better, and does NOT require the test subject to quantify what the difference is.
(This avoids the possibility that the test subject will think they hear a difference, but be "uncomfortable" reporting a difference that they can't quantify.)
(Also, interestingly, it also "covers" situations where the subject may not be consciously aware of the difference, but it may still bias their choice.)
Also note that we STILL need to perform the test with a lot of subjects and a variety of equipment.
If the test shows that audible differences DO exist, then we have shown both that differences exist AND that our test equipment is able to demonstrate those differences.
However, if the test shows no audible difference, we still can't know for sure if the null result is due to limitations in our equipment, test samples, or even our test population.
Therefore, if we get a null result, the test should be repeated many times with different equipment and conditions to rule out that possibility.
(This could be done in the form of a "challenge", with some sort of prize offered as incentive for vendors or individuals to try it with their own chosen equipment and test samples.)
Also note that this protocol could be implemented with VERY primitive (and even passive) equipment.
As long as the levels are matched, it doesn't even require computer control or relays.
Whether the test signal for each test is the Reference signal or X could be set using a manual toggle switch.
(A simple computer program could print out a random list of the necessary settings for each individual test.)
(The results should be reasonably valid as long as the test subject can't see the position of the configuration switch.
However, an automated system would be better, because it would rule out unconscious information leakage from the operator to the test subject.)
Quote:
Without some more complete references, I can't tell what the above means, or even be sure that it was directed to me.
the comments section discussion of ABX in your link earlier was odd
while not having explored the experimental design literature to that level of detail, in my own use of foorbar2000 ABX plugin it seemed natural to switch A/B to try to learn to discriminate, then A/X, B/X trying to decide same/different, often returning to A/B over several cycles when the difference wasn't obvious
another fun point is how audiophile gurus disagree - Schiit's "megaburrito filter" from the few hints given uses a narrower transition band, contrary to many other's recommendations - but we are assured that it is the latest, greatest