Nope. Just no.
What do you mean they "either hear it or they don't?" We have no idea what they are hearing. The only input we get is what they are saying. That involves hearing, perception and giving an answer.
Take me playing the same content for them as A and B. I then ask if A and B are different. You are telling me they will say they are the same? If so, you are in dire need of some actual experience in this domain.
The listeners can actually "hear" differences in A and B even though they are identical. All they have to do is pay different attention when listening to B versus A. All of a sudden they hear detail in one that they did not hear in the other.
Let's say they did not hear a difference. They wonder though, "was there a difference and I was too deaf to say there was a difference?' Result: they say they are different even though they heard otherwise!
Inversely let's assume there are differences in A and B that are audible. I play them and ask if there is a difference. Listener hears the difference but thinks, "hmmm, I wonder if this is a trick and I am imagining there is a difference." So they say no, there is no difference!
Or there is an audible difference and they truly can't perceive it as being the same. You know, how the general public would act if you asked about high fidelity content versus not. It is all "music" to then and they don't know why you ask them if they are different.
Due to all of these variabilities because we involved humans in evaluation, these tests are always, always considered subjective. Unlike measurements where we can repeat them and the same very precise answers, listener tests have vagaries that dismiss them from being such.
And don't confuse gathering objective conclusions from subjective data. They are two different things.
Please, please spend some time conducting real blind tests. Have a loved one test you that way. Have them change nothing versus doing so. Get some first-hand experience of what it is like to take these tests instead of relying on lay intuition.
Well put.
We humans aren't simple measuring instruments when it comes to hearing. Music perception occurs at both subconscious and conscious levels, and the conscious mind tries to evaluate and report on what was heard. But, by definition, the conscious mind has limited awareness of what was perceived at the subconscious level, so there's already a disconnect there (this is why we have great difficulty detecting our own cognitive biases in action).
Moreover, music perception is highly affected by the level of attention, and where attention is being directed (details, stage, bass, sibilance, etc.) - like memory, perception is an active process, it's not passive like a recording going on tape or an image being developed on film.
Music perception is also variable from moment to moment, is affected by the unreliability of memory, fatigue, adaptation to sounds, etc. It can also be affected by the artificial conditions of a listening test, which may be unrepresentative to some degree of how music perception occurs in normal extended listening for enjoyment.
In other words, as 'measuring instruments' for music perception, we humans have considerable 'measurement error' with each 'measurement' (the questions posed to listeners). This complicates blind testing because, for example, if you're comparing the sound of two different DACs, lack of detecting an objective difference could be due to unreliability of the listener, and likewise for detecting a difference which isn't objectively there - we're dealing
simultaneously with potential differences (or not) in the sound objectively produced and with variability in the listener's perception of the sound.
Moreover, there are various ways to design the tests, such as asking listeners to rate differences on a scale (say 0 to 10) so that it's not just binary, and asking listeners to focus on a specific aspect of the sound (e.g., level of detail or amount of bass) rather than just 'does it sound the same or different'. As already discussed in this thread, the answers and information you get from tests depend on the questions you design tests to answer, and it can be a mistake to generalize findings well beyond the testing protocol. For example, we can't hear a DAC alone, apart from a signal chain, so whether or not a difference is heard could depend on which headphones are used, which music is played, etc. When you do 'controlled' tests so that only one variable is changed, by definition you don't know what the effect of changing the other variables would be unless you do more tests.
If there are threads where these aspects of blind testing have already been discussed, I'd appreciate someone pointing me to them.