Testing audiophile claims and myths
May 8, 2012 at 10:22 AM Post #1,336 of 17,336
1. It could be related to EFI noise in the computer (and what else is on that data port) - USB often must share resources. 
2. USB often requires an additional step of processing before it hits the DAC, likewise Optical, whereas Coax/AES/EBU get passed through more directly. 
3. It could be in your head.
 
May 16, 2012 at 5:31 AM Post #1,337 of 17,336
Thanks for compiling all these links.
 
Today I looked at the Spanish matrix hifi test - http://www.matrixhifi.com/ENG_contenedor_ppec.htm
 
and the ~djcarlst ABX tests - http://home.provide.net/~djcarlst/abx_wire.htm / http://www.nousaine.com/pdfs/Wired%20Wisdom.pdf (scan?)
 
They both seem invalid / flawed to me, the first one concludes 10 listeners identified the more expensive unit, without exploring chance (further testing), the second link has data in conflict with the real study, which I found scanned from the magazine, my posts here and here.
 
So, looking at these 43 links you've compiled, are there actually any scientific ABX's on speaker or IEM cables, specifically the pure silver kind?
 
I'm not super curious about cables, just curious if there actually are any valid blind tests on pure silver cables at the very end of the chain...
 
May 16, 2012 at 7:27 AM Post #1,338 of 17,336
Small sample size, and I was mistaken, the cable was not silver. Or was it? I have googled both of the discussed cables and the manufacturer specs do not discuss material.
 
(Sorry)
 
http://home.provide.net/~djcarlst/abx_wire.htm
 
May 16, 2012 at 9:26 AM Post #1,339 of 17,336
May 16, 2012 at 10:22 AM Post #1,340 of 17,336
Quote:
 
Yes, here's the source of those tests - http://www.nousaine.com/pdfs/Wired%20Wisdom.pdf
 
The cables are referred to like "Cable Z" to keep the manufacturer anonymous, I think.

 
Am I correct in understanding that the matrixhifi.com experiment only puts subjects through the test once and had 38 individuals take part?  If this is correct you are right in that this is a pretty poor testing method  as individual subjects are not called upon to repeatedly identify the systems.
 
I'm not so much of a fan of the same/different ABX method - it is notoriously difficult to make sense of what you [think you] hear in this format, but it certainly can work in showing differences.
 
I also believe ABX testing with speakers is inherently flawed due to comb filter effects.
 
I think if one used the testing method where subjects know which system is A and which is B and can ask to change between these with headphones this would be the setup I would consider ideal.  I have found though from personal experience that even with this sort of ABX test where you know which is A or B and can select and listen to the options numerous times it takes a number of hours, along with awareness of your success rate in order to reliably discern differences between 128 kbps and 320 kbps mp3 files.  If I were using the "different or not" method or using speakers I cant say for sure if I would have been successful.
 
I know provide.net use etymotic earphones but what is the rest of their testing methodology?
 
May 17, 2012 at 1:17 AM Post #1,342 of 17,336
I think the amount of people was alright, but they just needed to be tested multiple times to check if they could reliably tell which setup they preferred, and if they could continually detect this setup.  For example for each music track, the names of system A and system B are randomised but remain allocated to the same machines for each track being listened to.  Then for the next track, the names are randomised again for setup A and setup B but remain the same while one particular song is being played.  This way each individual is tested on multiple instances to see if they can reliably detect differences, and in this process the factors of individual listening skills are more effectively negated.  Increasing sample size does nothing to remove the factor of variation in people's listening skills.
 
May 17, 2012 at 2:03 AM Post #1,343 of 17,336
Ultimately, retesting people is pseudoreplicating. Retesting would become trivial given sufficient sample size, sufficient being dictated by optimization theory. Think of distributions, random would be normally distributed (centered at 50%), any actual differentiation would not be, and would instead be left-shifted. Significant distinction being significantly different from a normal distribution, given the null. This could be done using AIC, if you were so inclined or using Bayes law. Bayesian testing could be interesting, given some informative priors, you could minimize the necessary sample size.
 
May 17, 2012 at 2:32 AM Post #1,344 of 17,336
Quote:
I think the amount of people was alright, but they just needed to be tested multiple times to check if they could reliably tell which setup they preferred, and if they could continually detect this setup.  For example for each music track, the names of system A and system B are randomised but remain allocated to the same machines for each track being listened to.  Then for the next track, the names are randomised again for setup A and setup B but remain the same while one particular song is being played.  This way each individual is tested on multiple instances to see if they can reliably detect differences, and in this process the factors of individual listening skills are more effectively negated.  Increasing sample size does nothing to remove the factor of variation in people's listening skills.

That's actually exactly what it would do.  Given a large enough sample size things like that become irrelevant - the more people with different tastes and perception the less likely the results will be biased.
 
May 17, 2012 at 2:56 AM Post #1,345 of 17,336
 
Am I correct in understanding that the matrixhifi.com experiment only puts subjects through the test once and had 38 individuals take part?

 
Yes, 38 individuals took part, each selecting A, B or X one time, A (= system A), B (= system B), or X (= decline / hear no difference).
 
They then assembled the data into a pie chart with fairly even distribution (14, 10, 14) and I think they decided the 10 correct answers were chance, however there is no science or statistics behind that, and they're unclear on how they've deciphered the data.
 
I'll assume they're positing that even if the 10 correct answers were skillful choices and not chance, it's irrelevant since 28 people out of 38 in ideal listening conditions can't hear the superior system, so what they were looking for here is striking differences, which something like 35 out of 38 people can hear?  They're unclear on this.
 
In the testing method, there's a picture of 6 individuals looking at the testing setup, the article says the superior system was chosen by pointing to it, wouldn't that influence the other testers?  Furthermore they said:
 
"In order to avoid biasing outside the testing room, we took the license to swap the names of the systems from A to B randomly, so nobody knew wich system was playing when listening to the A or B."
 
So if in fact you don't know which system is playing (left or right), and you're only told "system A / B is playing now", then how do you point to the system which you thought sounded better?  Clearly outside of the testing room people could discuss which system sounded better, left or right.
 
When looking at who was subjected to the test, it says
 
"The human testers were all trained ears and used to extensively listening to high end equipments, a good number of them participated, each with his own conception of the high end world, some totally subjectivists, some completely objectivists, some in between."
 
So they've asserted "total subjectivists" and "complete objectivists" in the testing pool, wouldn't such self-labelled people with a hunch of what the test is about under the red sheets be inclined to answer they hear no difference, or point in the other direction to the subjectivist sitting next to them?
 
This test is unclear and unscientific on so many levels, the only conclusion it reaches is 10 individuals out of 38 correctly selected the "more expensive system", so does that mean that there were audible differences and we should buy expensive power cables and expensive interconnects?
 
Another factor is, system A is using a perfectly good CD player, perfectly good interconnects, a studio level speaker amplifier, some fancy / expensive 'tempflex' speaker cables, connected to the exact same speakers as the other rig, so why should system A not sound very good, with very faint differences?  If 10 in 38 people can pick up the differences in these systems at an event, not even being familiar with any of the components (+ all components are hidden under the red sheets) then a result 10 people correctly hearing the 'better' sounding system is pretty damn high.
 
Another issue is identifying which system sounds better isn't the same as identifying which system sounds different.
 
So to me the validity of the results in this test is as much of a joke as the Meyer & Moran study.
 
It's shameful that electrical engineers and self-labelled scientists or objectivists cite these two studies as 'evidence of snake oil' and it only discredits their views on audio, imho.
 
 
 
 
I'm not so much of a fan of the same/different ABX method - it is notoriously difficult to make sense of what you [think you] hear in this format, but it certainly can work in showing differences.
 
I also believe ABX testing with speakers is inherently flawed due to comb filter effects.

 
I disagree, there is no reason to discard the same-different method in light of ABX.  It depends on what you're listening for, very very small changes in volume or FR are much easier to detect in a time aligned rapid switching (i.e. 0.1 seconds) ABX method, that doesn't give anyone the right to assume all audible differences are easier in that method, which I severely doubt.
 
Not familiar with the speaker comb filter theory.
 
 
 
I know provide.net use etymotic earphones but what is the rest of their testing methodology?

 
Please read carefully.  I don't think "provide.net" (http://home.provide.net/~djcarlst/abx_wire.htm) are performing their own tests, they've compiled data from other tests, like this thread has done.
 
I'll copy paste the results here.
 
 
Interconnects and Speaker Wires Result Correct p less than Listeners
$2.50 blister pack phono cable vs. PSACS Best
same.gif
70 / 139 = 50% - 7
$418 Type "T1" Biwire vs. 16 Gauge Zip Cord
same.gif
4 / 10 = 40% - 1
Type "Z" Biwired Speaker Cable vs. 16 Gauge Zip Cord
same.gif
70 / 139 = 50% - 7
$990 "T2" Speaker Cable vs. 16 Gauge Zip Cord
same.gif
16 / 32 = 50% - 2
 
 
Now read this carefully.
 
The first test "$2.50 blister pack" used Etymotic ER-4 IEM's and compared different brands of interconnects, I'm not very interested in comparing brand names (marketing) or interconnects (a passive component pretty far down the chain) so I haven't looked into that test, especially since they don't cite a source for it other than saying "for further info contact this guy".
 
The next three tests are from here, read it - http://www.nousaine.com/pdfs/Wired%20Wisdom.pdf
 
That's a magazine article from 1994, is that the best we have on speaker cables?
 
Now let's look at the data again, the magazine indicates:
 
 
Interconnects and Speaker Wires Result Correct p less than Listeners
         
$418 Type "T1" Biwire vs. 16 Gauge Zip Cord
same.gif
3 / 10 - 1
Type "Z" Biwired Speaker Cable vs. 16 Gauge Zip Cord
same.gif
4 / 12 - 1
$990 "T2" Speaker Cable vs. 16 Gauge Zip Cord
same.gif
1 / 5, 2 / 4, 7 / 16 - 1
 
 
So, do these results look the same?
 
Let's look at the http://home.provide.net/~djcarlst/abx_wire.htm link again, on the Type "Z" test it says the results were 70 / 139 = 50% and 7 people participated.
 
Does that look familiar?  The "$2.50 blister pack phono cable vs. PSACS Best" has the exact same results, and has no references.  This highly suggests all the results were randomized.
 
 
So, I wouldn't put much faith in any of the data at that site, likewise I wouldn't share much respect for people that keep citing links like these as some kind of empirical evidence, when evidently they didn't care to look at the study at all, only glanced over the results and conclued "these results suit my view, the outcome is negative, so it must be scientific".
 
On the contrary when people link to studies like these with a the-sky-is-blue assertiveness it's only indicative of how unscientific and impartial their views in audio are. 
 
p.s. I'm not writing this as a cable supporter, I just want to look at all the data fairly, and I'd prefer if sources like Wikipedia told the truth and didn't link to fake data or unscientific studies.
 
 
 
I think the problem here is the incredibly small sample size. It would be cool to set up shop somewhere public, maybe the middle of a shopping mall and ask hundreds to ABX.

 
In a shopping mall you will be testing the random populace so then it will be either...
 
A) A test for striking differences
 
B) A hunt for someone that can find the subtle differences.
 
 
If we call DVD versus blu-ray striking differences then you can collect the data and say something like 75% said blu-ray looked better in the shopping mall, 15% said DVD looked better, and 10% declined to answer or said they could see no difference.
 
If you're testing 50Hz versus 60Hz refresh rates then you're hunting for someone that can find the difference (consciously).  Audio is mostly 50Hz versus 60Hz type testing.
 
You have to keep in mind there is conscious and subconscious perception, when exposed to UV light in a tanning salon, the endorphin release is subtle, and you are not consciously aware of that effect.  Likewise, if someone lives in Alaska during the winter and the lack of sunlight effects their seratonin levels, that is not something they can consciously assess.  Likewise, incandescent versus flouroescent lighting in your own home.
 
So, if you ask someone at a shopping mall if they can perceive the difference between UVA / UVB / UVC light in an ABX test, you may as well ask them if they can hear subtle differences in audio components they're not familiar with and have no idea what to listen for, or see the difference between different Hz refresh rates.
 

 
While I'm on this topic I may as well link to this article which discussed UV and infrared light in relation to 24/192 recordings, calling them "spectrophiles" - http://people.xiph.org/~xiphmont/demo/neil-young.html#toc_s
 
Let's look at their listening tests section - http://people.xiph.org/~xiphmont/demo/neil-young.html#toc_lt
 
Look, they link to the flawed / invalid Meyer & Moran study, just like Wikipedia, yet another parrot, now let's see what they write in the next section called Caveat Lector...
 
"it's easy to find minority opinions that appear to vindicate every imaginable conclusion. Regardless, the papers and links above are representative of the vast weight and breadth of the experimental record. No peer-reviewed paper that has stood the test of time disagrees substantially with these results. Controversy exists only within the consumer and enthusiast audiophile communities."
 
Firstly the M&M paper didn't stand the test of time, secondly here is a peer-reviewed paper which "disagrees substantially with these results" - http://www.aes.org/e-lib/browse.cfm?elib=15398
 
The M&M paper asserted that 16/44.1 and 24/192 are identical, the Pras & Gustavino asserts a difference between 24/44.1 and 24/88.2, when you look at both papers impartially, the second one is actually more fair and scientific (however still not perfect and with some flaws here and there I think).
 
Pras made a comment on the paper at the hydrogenaudio forums...
 
"although the topic is interesting, mainly these days when the Blue Ray Pure Audio is to be defined, never forget that differences between formats, ADC, DAC,... remain extremely subtle compared to differences between miking techniques, room acoustics, and of courses musicians and their instruments!"
 
I think that much is true, the basic consumer or even the passionate audio enthusiant should still focus on musicians and their instruments, the technology of ADC, DAC and speaker/IEM cables is subtle and the money definitely has higher reward in the technology of speakers, IEM's and CD's (music).
 
High-end DAC's, OPA627, silver cables and 24/192 all have subtle differences I think that's the only conclusive evidence there is.
 
May 17, 2012 at 4:41 AM Post #1,346 of 17,336
Quote:
That's actually exactly what it would do.  Given a large enough sample size things like that become irrelevant - the more people with different tastes and perception the less likely the results will be biased.

 
But this assumes one answer is correct - if it is a choice between A or B and one is testing subjective preference then just think about the variation in preference for different sounding audio gear.  I read somewhere about a test which showed a statistical preference for lower distortion, but in this case A and B would need to have different audio distortion figures, and I am not familiar with the specifics of this test.  Either way increasing sample size is not really viable - how large a sample would we need, say hundreds, thousands.  Why not just cherry pick certain individuals, say without hearing loss, or possibly of certain professions such as musicians, audio engineers etc. or repeat the experiment with the same subjects on several occasions.
 
May 17, 2012 at 5:03 AM Post #1,347 of 17,336
Most ABX testing is "do you hear a difference" not "which is better." (If a difference is heard, then there may be a qualifier statement - added to that effect, however). But with many of these tests, there is an easy, non-subjective answer - either there is a difference or there is not. 
 
May 17, 2012 at 6:02 AM Post #1,348 of 17,336
^ Hmm I guess this would pretty effective in eliminating the factor of subjective preference, on the other hand though it trades this for a little more confusing test conditions in that the subjects have no guarantee that whatever differences they think have perceived are not guaranteed to be there.  I would be interested if there is a difference between blind tests where there is a guaranteed difference but one must choose a preference (even though this has the flaw of being susceptible to subjective preference) against where there is no guarantee of a difference between the samples the test subjects are offered.  I personally think I would find the "is there a difference" setup more confusing.
 
May 17, 2012 at 12:44 PM Post #1,349 of 17,336
Well, I would agree with what you said if we were talking about, say, headphones or speakers...but if people can find a difference with cables period then that would be a pretty big deal.  To really get a good result with that, a fairly large sample size would be needed.
 
May 17, 2012 at 12:47 PM Post #1,350 of 17,336
drez,
I think you are sort of teasing out another idea that is also interesting, that is a repeated measures sort of thing. Regarding whether or not people are able to improve their responses over time. Doing the same ABX repeatedly, you could also test differences between informative and uninformative rounds. ie: just keep repeating it with and without giving the test subject their result from the previous round. The point being that maybe after a few burn-in rounds, people could learn to detect the difference.
 
Also, you don't ask "is there a difference?"
The sampling strategey is play A, identify it as A.
Play B, identify it as B
Then play an unidentified sample.
If a statistically significantly greater number can properly identify it, then there is a difference. If only ~50% are correct in their guesses then there is no difference.
 

Users who are viewing this thread

Back
Top