Apr 14, 2015 at 9:52 PM
- Apr 23, 2013
- Reaction score
- Ann Arbor, MI. USA
Again, I am not discounting opinions or attempting to denigrate them; everyone's opinion and preference counts (or should). I am however attempting to make a distinction between the results of well-designed tests conducted in an controlled fashion and opinion. Again, when differences are clear (and I'll concede that's a relative term) then yes, such structured tests are overkill.I agree with what you wrote but 'entirely subjective' might be a little strong. I (cautiously) think there may be some validity and statistical power in opinions found in this thread and in these forums generally. That is, these opinions' ability to determine a better DAC or whatever is better than one would obtain from a purely random selection.
It sure would be interesting to see some well done research looking into 'the audiophile experience' - but it's not a hot research topic AFAIK. There is a phenomenology here backed up by a loose, shared vocabulary among audiophiles that poses some interesting questions in terms of what objective correlates might be discoverable.
Indeed, our thread-starter purrin spent quite a lot of time in past years conducting CSD plots and looking (informally I think) at what correlates he and his fellow listeners could and could not establish. Purrin is most certainly not an outright subjectivist; rather he seems interested in exactly these broader questions. But also in enjoying the music
I can say that, having worked (since 1990) in signal processing, NVH, and sound quality, double-blind tests are what is used industry-wide (and not just in automotive, but pretty much everywhere) to perform anything from simple rank-order of preference (product "A" against "B", "C", "D" and so on) down to regression analyses to better understand which psychoacoustic metrics best correlate with listener preference of the sounds evaluated. This is done for everything from power tools, to door closures, to hair dryers, leaf-blowers, lawn mowers...pretty much everything. Sometimes the answer is pretty immediate and clear, and other times, it's inconclusive.
One other thing that's done in industry that's never really done in these threads (not specifically headfi.org but others as well) is that in product development circles, the sounds that are recorded for evaluation by jurors are played back at 1:1 loudness levels. Let that sink in for a minute... in other words, if (for example) a dishwasher produces 57.5 dB(L), then that sound and others to be presented are presented at their correct levels in the headphones (in this example, the dishwasher sound, when played back in the headphones will produce 57.5 dB(L) SPL).
This really does matter, because of the anything-but-linear nature of the Human hearing mechanism. So, by ensuring that sounds are played back at a 1:1 loudness level (whether in sone or decibel) a fair comparison between sounds can be made. Otherwise, if left uncontrolled, by raising the gain, you may hear certain things that would otherwise be masked when the sound is generated by (in this case) the dishwasher. Conversely, if one were to conduct such tests at a lower-than-actual playback level, there is a reasonable chance that one would miss certain attributes of the sound. I think it's clear that after spending time (and costs) in setting up the test etc.to have the validity compromised by evaluating the products at levels other than their actual levels could be 'problematic'.
This should sort of make sense to anyone reading this even if they are not acquainted with the Human hearing mechanism's workings. That is, who among us hasn't noticed that what we perceive as well as emote can be dramatically affected by the level at which we choose to evaluate the sound (or song), or even simply listen to a track for pure enjoyment? This is a pretty important aspect of hearing and one that seems to be seldom controlled in most of the listening evaluations most often discussed on the web.
The problem with calibrating the levels of music is, of course, the recordings are seldom if ever level-calibrated (it is possible to do...), so there is a lot of latitude resulting in terms of what levels can (and will) be chosen when attempting to make an unbiased comparison between sounds / products. Mind you, if the Human hearing mechanism were linear (and by linear I do not mean 'flat') then the level at which sounds were presented would matter not, but this is not the case.
It's for this reason that juried sound evaluation software packages allow a loop calibration to be performed thereby ensuring 1:1 playback of actual sound levels. But...don't take my word for it - check the Society of Automotive Engineers (S.A.E.) database as well as those of the Institute of The Institute of Noise Control Engineering (I.N.C.E.) as well as the Acoustical Society of America (A.S.A.) for published works on juried testing (basically, similar to ABX, but with a few more options). If you are interested in knowing more about listening tests, you can web-search for 'juried tests' and "Bradley-Terry" as a start, but if you likewise search the names "Otto", "Lake", "Blommer", "Crewe", and "Cerrato" you will find many published works on double-blind juries testing and the results. Interesting stuff.