I just use "pink noise" as it's the standard and after a while turn the volume up, just like getting a new car... but was stomped about the resistor on the headphone cable after 120 hours of burn-in. When I was using this resistor to control the volume (and left it untouched for a hour) sudden changes occurred in volume, I had to additionally burn-in the resistor for 30 hours and the sudden changes of volume disappeared. The only reason as I see it was annealing of metal, but others would just say it's a fluke, can't be reproduced under... bla bla bla.
Swedish Radio developed an elaborate listening methodology called “double-blind, triple-stimulus, hidden-reference.” A “subject” (listener) would hear three “objects” (musical presentations); presentation A was always the unprocessed signal, with the listener required to identify if presentation B or C had been processed through the codec.
The test involved 60 “expert” listeners spanning 20,000 evaluations over a period of two years. Swedish Radio announced in 1991 that it had narrowed the field to two codecs, and that “both codecs have now reached a level of performance where they fulfill the EBU requirements for a distribution codec.” In other words, Swedish Radio said the codec was good enough to replace analog FM broadcasts in Europe. This decision was based on data gathered during the 20,000 “double-blind, triple-stimulus, hidden-reference” listening trials. (The listening-test methodology and statistical analysis are documented in detail in “Subjective Assessments on Low Bit-Rate Audio Codecs,” by C. Grewin and T. Rydén, published in the proceedings of the 10th International Audio Engineering Society Conference, “Images of Audio.”)
After announcing its decision, Swedish Radio sent a tape of music processed by the selected codec to the late Bart Locanthi, an acknowledged expert in digital audio and chairman of an ad hoc committee formed to independently evaluate low-bit rate codecs. Using the same non-blind observational-listening techniques that audiophiles routinely use to evaluate sound quality, Locanthi instantly identified an artifact of the codec. After Locanthi informed Swedish Radio of the artifact (an idle tone at 1.5kHz), listeners at Swedish Radio also instantly heard the distortion. (Locanthi’s account of the episode is documented in an audio recording played at workshop on low-bit-rate codecs at the 91st AES convention.)
How is it possible that a single listener, using non-blind observational listening techniques, was able to discover—in less than ten minutes—a distortion that escaped the scrutiny of 60 expert listeners, 20,000 trials conducted over a two-year period, and elaborate “double-blind, triple-stimulus, hidden-reference” methodology, and sophisticated statistical analysis?
The answer is that blind listening tests fundamentally distort the listening process and are worthless in determining the audibility of a certain phenomenon.
The Swedish experiment highlights what I think is a common flaw in double-blind tests confounding the percentage of times a listener can perceive a difference in a single sound sample with the percentage of samples that reveal a perceptible difference. Imagine that 70% of the test samples had sufficient masking sound at 1.5 kHz, so that the untrained ear wouldn't have picked up the idle tone in them. Now even if the listeners caught the difference in the remaining samples some 90% of the time, they would probably have gotten the right answer in only about (70% * 50%) + (30% * 90%) = 35% + 27% = 62% of the comparisons, which might well be within the statistical variance for the experiment. Getting around this problem requires knowing which sound samples are audibly different via the systems under test, before conducting the test. And good luck with that.
Of course, I don't really know if masking was the problem with the Swedish Radio experiment. But the next time somebody describes a blind A/B listening test, ask yourself if their procedure would detect such detectable problems as dynamic compression, or a loss of the top or bottom octave -- problems that won't show up in every music sample used. Seems to me that, with the samples chosen randomly and treated equally, such tests are doomed to failure.
There's a book called "Unobtrusive Measurement", the central tenent of this book was that it was impossible to measure a phenomenon without changing it. Naturally, the more "unobtrusive" the measurement process, the more limited the damage to assessing the actual phenomenon. There is the constant tension in science between two equally valid goals. One is to control the variables in question sufficiently to be able to make the conclusion that it is the variables being assessed that are being measured, rather than extraneous confounds. The other is to assess a real, natural phenomenon, rather than an artificial experience that cannot be generalized outside out of a tightly controlled laboratory situation. This affects all science, even the so-called "hard sciences". When it comes to investigating human perception, emotion and cognition, understanding this tension becomes increasingly important.
It is with this perspective that I evaluate ABX testing as valid proof of the audibility of differences between components. It clearly comes up short. Take the time to consider the difference between ABX testing and how people listen to music at any other time. Clearly, we are talking about very different experiences. Therefore, the generalizability of ABX results is highly limited. The failure of people to detect differences during ABX testing shows that this is not a difference that ABX testing can reveal. No more. No less. Anyone who thinks this type of testing is the last word on the audibility of different audio components is not looking at the entire picture.
If I were to trust blind testing I would inserting the worst op-amps, capacitors, cables, and other components in the signal path, because the 20-year-old Stereo Review double-blind test (that "confirmed" that all power amplifiers sound identical).
Blind testing often reduces the results of groups into averages. It does not allow the members in that group to be more "skilled" than others, as the results are averaged in with any "unskilled" persons.This causes a problem, as we cant tell if some people "passed the test" ( for eample guessing the "better" amplifier 5 out of 5 times) or whether they are a "fluke", because if other people didn't get it right (0 out of 5), this takes the group average down to 50/50.
If we got a group of runners and asked then to do the 100 meter dash A one did it in 9.8 second while the other stumbled in around 15-20 seconds, we would not average his result in assuming there can be no faster runners, as the group average was much lower, nor would we assume his race result was a fluke and that normally he would run 15 seconds or more.
Yet this is what double blind ABX tests are inescapably designed to do, reduce individual results into a group score. To take a medical example, if a drug test was carried out on a hundred people to see if i had bad side effects, and 50 people died but 50 people did not, would that prove there were no side effects?
But that's how many audio ABX supporters are interpreting the results of various tests knocking around. Instead it tells us that 50 people reacted differently to the other 50 , not why.
There's actually research that has correlated perception of wine quality with the claimed price. Identical wines were perceived as "better" when they were identified as more expensive. This research went further and was using MRI-scans demonstrated changes in brain activity correlated to the claimed wine price. I would suggest that your belief that unbiased listening tests are less valid the unbiased listening tests would demonstrate the same changes. People are hardwired to believe more expensive is better.
The introduction of new equipment to an audio system is filled with chemical anticipation flushing through the reward centers of the brain. Price = value. This, regardless of the fact that most of today's supermarket boxes and "audiophile" quality components are both made and labeled in exactly the same factory in China.
So it makes sense to buy a cheap knock off from China, modding it and sit back enjoying the sound of music.
When you stop listening to your system and start listening to the music you are finally on your way to a happier and healthier life. The desire to own more music is a sure sign of balance in any audio system and its owner. The pursuit of audio perfection has nothing to do with music and is destined to lead to lifelong, obsessive dissatisfaction and bitter disillusionment.
A few successful prosecutions would destroy the audio market bubble and probably make a lot of people angry and very unhappy to be reminded of their foolishness. Is it better to point out to the audio faithfull that their religion has absolutely no basis in fact? Who knows? With increasing emphasis on digital, and the visual entertainment aspects in particular, the days are already numbered for this particular fad. The visual glamor and excitement of glowing valves and spinning platters is morphing into faceless, black, digital computer boxes with a single blue eye. A/B/X won't kill audio, because it's all to easy pointing out the flaws.
All over the place, from the popular culture to the propaganda system, there is constant pressure to make people feel that they are helpless, that the only role they can have is to ratify decisions and to consume.
I agree that blind-testing can be an eye opener, but considering this.. does your current system make you want to buy or hear more music?