That picture illustrates how difficult it can be to bypass instincts and be objective, but it doesn't mean it is impossible or not worth trying. It's the purpose of the brain to present sensory data in a way that is meaningful for the survival of a human being. If one diffuses his focus to a wide area, things will be interpreted as relative to each other, and that's all that is important because the subconscious contains memory to put all that together into a meaningful perception. It is like if you were scanning the environment for very specific things like things to eat or things to avoid, particular software would be preferable to simply perfect hardware and perfect data processing which would be a lot more complicated and cpu intensive than is really necessary.
If you focus your vision on single pixels, like the grey of tiles B and A, you are telling your brain you don't want to focus on relatives, and that allows you to see that the two greys are identical. This trick of focusing on only one very tiny point at any time is how you can see through visual illusions or even real life objects that utilize our instinctive optical mental fudging. Visual diffusion and centralization are two distinct ways for sight to function, passively and actively. When things are sensed passively, they can be affected in a million different ways by the ocean of the subconscious, usually accurately and beneficially if humans are in their natural environment, often wrong and detrimentally in unnatural environments. Music reproduction is artificial in many ways, with many new variables, and to listen with a good degree of accuracy requires one to not trigger the subconscious or passive way of listening. Might sound easy, but I think our current data on subjective tests of hardware shows just how difficult it might be to make into a science. Music appreciation of real life acoustic music is quite less artificial, has less but different kinds of variables, and should be a much simpler thing to solve, yet there are still many unsolved or unasked questions here. Not saying that we shouldn't try to make progress with subjective listening of hardware, but it is one level above plain psychoacoustics, a topic that is hardly scientific or standardized today.