1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.

    Dismiss Notice

Testing audiophile claims and myths

Discussion in 'Sound Science' started by prog rock man, May 3, 2010.
768 769 770 771 772 773 774 775 776 777
779 780 781 782 783 784 785 786 787 788
  1. KeithEmo

    However, an awful lot of people do seem to put excessive faith in tests performed by so-called experts. Now, to a point, this makes sense. However, that point is crossed when the test itself is badly flawed or, even worse, when the person reading it is unaware of the flaws in the test and/or its conclusions. I am quite convinced that many of the people who conducted a lot of the tests listed at the begining of the thread were well intentioned, but far from well versed in how to conduct a thorough test which will provide valid results, while others simply weren't very thorough in their reporting. I also suspect that, in other cases, the reports were simply incomplete - either because the information wasn't properly captured to begin with, or because someone decided it was too boring to publish all the details.

    I also find that many people are quite ignorant of the nuances of statistics - and what they mean. I'm going to pick on one example - from the orignal Meyers and Moran study which everyone so loves to quote. When considered in total, the study produced a result of 246 correct choices out of 467 - which by itself is most certainly NOT "statistically significant". However they reported that one listener, in one trial, scored 8/10 and two others produced 7/10 in other trials. It should also be noted that, with the number of trials involved, this result IN AND OF ITSELF is not statistically significant. There is a quite reasonable probability that those seemingly positive results COULD HAVE OCCURRED by random chance. However, that does not necessarily rule out the possibility that they could also have NOT occurred by random chance. So, in terms of the overall study, those results so far are a sort of null result.

    HOWEVER, when viewed individually, those results are in fact quite suggestive. Ignoring the other results, we have one listener who scored 8/10, and his result COULD have been random luck, or it could have been proof that, out of all the subjects, he is a legitimate outlier who is in fact able to reliably detect differences that others cannot. He may just be the equivalent of the one guy in a million with perfect pitch. In this case, it would have been simple to conduct an ADDITIONAL ten test runs with just that test subject to confirm whether his previous performance was consistent and repeatable or just random good luck. However, notably, that follow-up was never done. At a guess, they may not have even correlated the results until the test subjects had all gone home. Alternately, they could have been so focussed on producing a statistical result that they completely overlooked an opportunity to explore the subject itself further.

    If I were asked to summarize the results of that test this is what I would asay:

    "They conducted a test with a reasonable number of subjects and test runs. The overall results of the test were not statisticall significant, and so did not statistically suggest that a significant majority of test subjects could hear a difference. They were also thorough enough to note that a few test subjects seemed to perform at well above the level of random chance, but failed to follow up to confirm whether their performance was due to random chance, or whether they were legitimate outliers."

    Taken as a general indication that "most people won't be able to tell the difference" the results they produced are useful. However they not only fail to prove that "nobody can hear the difference", but they actually left some suggestive results unexplored. (It would be as if, when testing a new drug, it failed to work for most people, but one or two did unexectedly and mysteriously recover, and you simply didn't bother to try and figure out why.)

    To me this sort of suggests that, as scientists, they were "trying to produce a statistical analysis" rather than that "they were trying to learn something".

    I would point out one thing, however.... I agree with you that the way to learn new things is to perform more tests. HOWEVER, if you're hoping to learn something new, it is pointless to repeat flawed tests, and include the same flaws as the originals. For example, in point of fact, I'm not especially convinced that ultrasonic harmonics make an audible difference or not. HOWEVER, because the original tests I've read were so poorly designed, we don't know if the samples they used actually contained ultrasonic harmonics or not, or whether the speakers or headphones they used were capable of actually delivering those ultrasonic harmonics to the ears of the test subjects. This produces an annoying situation. If the test subjects had noticed a difference, that would have proven that the test was valid, and the diffeence was audible. However, when the test subjects heard no difference, we cannot know if the tests were valid and produced a legitimate negative response, or if the tests were so flawed as to be worthless. We simply don't have enough information to tell either way.

    The solution is relatively simply: repeat the test minus the flaws of the original. Confirm, using actual measurements, that ultrasonic harmics are present in the test samples, at the output of the speakers or headphones, and at the ears of the test subjects. (And document his fact so there are no doubts about OUR results.)

    This is pretty basic stuff. If we were testing for the ability to detect a chemical additive, the first thing we'd do after making up our samples would be to test them and make sure that our chemical was actually present in them in the specified amounts.

    I can even provide an entertaining and quite practical example of this sort of error. I lived in Los Angeles in the 1980's and, at that time, a new chemical treatment had been introduced for drinking water - the traditional chlorine additive was replaced with a similar chemical called chloramine. I don't know about the claimed benefit, but there was serious concern in the tropical fish community, because the normal treatments used to remove chlorine from tap water in preparation to putting it in fish tanks didn't work well on chloramine. It was feared that some new water treatment would have to be devised... or that a lot of fish would die. As it turned out, most of the tap water in Los Angeles is carried in cements conduits, and the lime in cement reacts with chloramine. The expected, and feared, reactions to the chloramine never occurred, because the chloramine wasn't there.... almost none of it actually made it to the tap. People had simply assumed that, since it was added at the treatment plant, it would come out of their faucets. SImilarly, in every test I've read about that attempted to confirm whether ultrasonic harmonics were audible, samples were used which supposedly contained ultrasonic content, but nobody ever confirmed that it was actually arriving at the test listeners ears. This is not a trivial error; it is an absurd, and potentially invalidating, oversight.

    (Incidentally, I would have liked to participate in your DAC study, but I am no longer in possession of either of the two DACs which I bel;ieve sounded "significantly different". I sold them because I wasn't especially fond of the way they sounded. While I am quite certain I can hear minor differences in other DACs, they are quite subtle, and I'm not convinced they would be especially audible. For example, I am quite certain I can hear minor differences in the filter choices in one of our little EGo DACs... but only under very certain conditions, with a certain few recordings, in certain specific passages, and when played through certain specific associated equipment. Therefore, if they really exist at all, they are quite small... and so not especially good examples.)

    I would also point out another "confounding factor" to be aware of. Many DAC manufacturers, especially small boutique companies, and makers of very low cost products, have a habit of quoting the specifications of the DAC chip they use as the specifications for their entire product. Therefore, if there is an audible difference with such products, I would want to confirm that it isn't simply due to undocumented flaws in measurable performance. (I would be inclined to trust that major vendors won't cheat.)

  2. bigshot
    I've already started the ball rolling. have yet to find a single DAC or DAP that sounds different under normal listening conditions, including ones with chips that audiophiles swear sound clearly different (Wolfson, Sabre, etc).

    Anecdotal reports aren't valuable at all. That junk is just expectation bias fed by the prevarications of the high end audio market and internet forum gossip passing as "common knowledge". Do a controlled test and I'll listen to you. You don't have to make it complicated or difficult. I'm just looking for someone who knows how to get close to the truth and has found something worth looking into. I'm not going to waste my time on self serving, biased, subjective and/or anecdotal bologna. This thread is full of it.

    If you won't do a test or recognize a test unless it is up to your standards, you can rest assured that I'm not talking to you.
  3. Phronesis
    To get a result that gets people's attention, I suggest comparing a cheap DAC in the $100 range with something like the Chord Dave, which I believe is around $10K.
  4. Phronesis
    We need to also keep in mind that, due to variability in perceptual accuracy for a given listener, someone may be able to discern a difference some of the time, but not consistently all of the time, and sometimes their perception may cause them to imagine a difference between A and B which results in their swapping them. So three things can happen in a series of trials:

    a. Correct detection of a difference between A and B some of the time.

    b. Inability to detect a difference between A and B some of the time, resulting in random guessing.

    c. Due to misperception ("imagining things"), incorrectly swapping A and B some of the time.

    Errors in case c would negate/cancel correct detection in case a, and b is random guessing, so a listener may do a series of trials and come out with an apparent null result close to 50/50, despite sometimes really detecting the difference between A and B. If someone just crunches stats, this possibility would likely be missed, so this illustrates why we need to understand the science relevant to our experiment. It would be an assumption that, if a difference between a and b can be detected, that difference will be detected consistently, and that assumption may indeed be wrong (from my own testing experience, I suspect that it is wrong).
    Last edited: Dec 18, 2018
  5. bigshot
    I already tested a $40 Walmart DVD player against an Oppo HA-1 that costs over a grand. They sounded exactly the same. Price is a lousy way to predict performance in DACs. Absurd. Honestly, I can't even believe you'd make a suggestion like that.

    I am asking for a DAC that sounds clearly different from other DACs. One that YOU HAVE CAREFULLY EVALUATED AND HAVE FOUND TO SOUND DIFFERENT. I'm not going to waste my time dancing like a monkey for people who are too lazy to even do a cursory check for themselves. Repeating completely unsubstantiated subjective impressions from audiophool forums don't qualify as due diligence, neither does making up your own completely unsubstantiated subjective impressions, Keith. YOU guys go put the $10K DAC on your credit card and see if there is a difference. If you do a reasonably careful listening test and hear a clear difference, let me know. You may have to contact me through someone else if you do however, because I probably won't see your posts and I'll be unable to receive your PMs. Until then, we have no evidence that a DAC that sounds different exists. We have plenty of evidence that many, if not all DACs sound the same.

    I'm with Gregorio on this one. The signal to noise here is getting excessive. Diogenes mode on.
    Last edited: Dec 18, 2018
  6. Phronesis
    My point is that by comparing a cheap DAC with a very expensive one, you make a stronger case if the listeners in your test can't consistently tell the difference. Maybe you can find someone who owns an expensive DAC and would make it available because they're confident that it will sound better.

    And you won't know in advance which DACs sound "clearly different," you have test a bunch of them with a bunch of listeners to see if you can find a couple that sound clearly different to at least one listener. I myself have only tried a few of them, and can't say that they sound clearly different (to me).
  7. bfreedma
    I’ve seen repeated commentary here suggesting the tests documented in the first post and others may be flawed. That may or may not be true, but if that assertion is going to be made, then it’s encumbent upon the person suggesting the tests were flawed to be specific about those flaws, not just throw a turd in the punch bowl.

    General statements that “there might be flaws” aren’t helpful and could be construed as deflection, particularly from those averse to participating in reasonable testing. While it may not be conclusive, an ABX via Foobar is quite simple to construct and enough data gathered may be inidicative. Particularly if statistics aren’t abused and single run results aren’t stated as being significant. If someone can score 8/10 or better on 20 test runs, then we have something to discuss.
  8. Phronesis
    In general, I think the bigger issue with those tests is that they're not documented well enough to properly evaluate them. Also, the tests tend to be relatively small, and the use of stats tends to be very simple (most don't even have something like a p-value).
    GearMe likes this.
  9. bfreedma

    Why would MSRP make a stronger case (from the perspective of scientific value)? Aren’t we simply trying to determine if two DACs sound different enough to generate statistically significant results under controlled testing?

    And realistically, no amount of testing is going to convince those fawning over their Chord Daves and M Scalers that their DACs and bazillion tap add on boxes aren’t “special”. Those threads are an embarrassing collection of golden eared backslappers justifying their purchases.
  10. Phronesis
    I figure that if you knock the Dave off its lofty pedestal with a good credible test, that should sow seeds of doubt in general about differences in DACs. I've heard the Dave described as not just substantially better than most other stuff, but "on another planet." If my dealer can get me a Dave for demo, I may do some testing myself.

    I agree, those threads are pretty crazy. I had to unsubscribe from the Hugo 2 thread because I just couldn't take it any more. There were people saying they heard differences in the filter settings at 16/44, even after Watts himself said there shouldn't be any sound difference based on his design!
    Last edited: Dec 18, 2018
  11. Steve999
    If I may drag things down to my level, I notice if there is too much bass or too much treble or uneven bass and that is fatiguing or tiring or whatever and that's easy to fix with EQ. If I close my eyes and switch between two settings it can often be clear in my lay-opinion that part of the spectrum is over or under emphasized and that can be fixed with EQ. An extra subwoofer has helped keep the bass even in my room (a nightmare of a room, by the way, L-shaped with an opening to other parts of the house). You can walk around the room and hear that it is much more even with the two subwoofers. I can make more minor adjustments and close my eyes (a sort of self-imposed and very vulnerable and flawed blind test) and switch in and out and have a very hard time deciding which is "better." That's my happy place. Once I settle down and relax I enjoy it.

    When it comes down to having to do a a blind A/B or A/B/X test to hear very minor differences, I've done that, and it's tedious and difficult and reassuring to know everything is cool, at least to my ears, but I'm just confirming that my equipment and lossy encoding are working right. I'm not really making any progress in my set-up.
    Last edited: Dec 18, 2018
  12. bfreedma
    I’ve heard the DAVE a number of times, twice in a quiet room where I could listen undisturbed. It’s a DAC. While I didn’t ABX it, I didn’t find it unusual in any way (good or bad)
  13. bfreedma

    Wait, you enjoy your gear? Is that permitted :beerchug:
  14. KeithEmo
    Just for the sake of correctness....

    Could we please have a few more details about your comparison between the Walmart DVD player and the Oppo HA-1?

    How about, for starters:
    - model of the WalMart DVD player
    - models of headphones or speakers used for the test
    - details of test samples used for the test (songs, sources, sample rates, and anything esle required to duplicate the test scenario exactly)
    - duration and actual test methodology used (was this a properly conducted A/B test or "merely an informal sighted test")
    - demographics and number of test subjects chosen

    Incidentally, I would add a comment to your statement about price.
    The Sabre DAC chip used in the Oppo HA-1 costs less than $20 in quantity.
    Therefore, while it's quite possible that significant money was spent on implementation details like a better power supply or analog audio circuitry, the cost of the core component wasn't all that different.
    Basically, there isn't a commercial DAC chip that's used in a piece of commercial audio gear that costs more than about $20 in quantity.
    (Of course, custom designs, and designs built from discrete components are excluded here, and can cost far more.)

  15. Killcomic
    If anything, testing will save people money. Really, if you can't hear the difference between a lossy vs a lossless file, what chances do you have of enjoying the "benefits" of 96khz/24bit audio?
768 769 770 771 772 773 774 775 776 777
779 780 781 782 783 784 785 786 787 788

Share This Page