Proposed Test: Cable Differences | Page 4 | Headphone Reviews and Discussion - Head-Fi.org

wavoman · Sep 7, 2008 at 3:36 AM

Hirsch and Nick -- fantastic posts. Right on the money IMHO.

I am a huge fan of Sample Size One experiments (this means one subject, not one test. Also called Single Subject Trials).

Think how the entire world as we know it changes if one person is proven to really have ESP.

I love the analogy with perfect pitch. Brilliant.

If you check my earlier rants on DBT and A/B/X, you will see I insist on swindles -- false comparisons, lying to the subject, throwing in 64kbs MP3's (or AM radio) along with 24/96 uncompressed audio, etc. This is exactly Hirsch's "known difference" point.

You don't need 8/10 replicated 2 or 3 times. I will run some numbers (these are called "power calculations" in statistics) using a variety of assumptions with an eye towards "ruling out chance". But (I have argued this before) there is the issue of "effect size". If I always prefer Treatment 1 to Treatment 2 just 6 times out of 10, and I can always repeat this, never fewer, never the other way 'round, then the effect is real, but small.

Although not too small for me to care. If Treatment 1 costs $50 more than Treatment 2, I'm buying it, 'cause my perceived SQ will never be worse, and will be better every once in a while ... good enough for me. If it costs $5000 more I would not buy it, but Bill Gates might.

For a given effect size (which is in real-life unknown), power caclulations tell you how many trials you need to prove an effect of that size exists beyond reasonable doubt (where "reasonable doubt" is set at a probability-of-error level, by convention 5%, but in the real world that is too strict).

That's classical statistics anyway, which most of us don't believe much anymore. Rather, we look at the economic gain or loss in trying to decide if something is true, we start with a guess based on scientific theory (engineering measurement of cables in this case) as to whether it is true, and the magnitude of the effect (we actually write down our probability beliefs for many different sizes of effects), and then revise our beliefs and recommend courses of action as we accumulate evidence from the trials.

But classical methods are OK to start.

In these trials, which are called "paired comparisons" (i.e. you pick the one of two musical samples you like, not A/B/X!), the typical model assumes an effect size theta (theta is zero if there is no difference), and we also assume a convenient mathematical model that tells us the probability of actually picking Treatment 1 over Treatment 2 for every possible value of theta.

In Single Subject Trials we assume each person has their very own theta, and we try to estimate it, and/or test that it is non-zero. In standard population trials we assume all people have the same theta, or that everyone has a different theta but we are all related in that there is an average theta (and a deviation around that) across the population, which in general is reasonably consistent.

When we get real data I can easily model both. What's new here is exactly what Hirsch said -- the concomitant variable of "can this subject tell the difference between two samples of music that we know are different". Adding this variable to the model makes for a more complex analysis, but at first we just throw the tests for those people away, or just not worry about them since we are focusing on individuals, not the group.

Note that the published DBT tests on high res vs redbook did none of this stuff.

mcsamms, you are right as rain. I tried to pull tests together at a meet, but nobody cared. I have said "let's do this at Can Jam '09", but no uptake.

If we did this right we would break new ground, confirm or refute the published studies re hi res, settle the cable issue once and for all, etc. Not curing cancer, but worth doing I think.

I will eventually get some of this organized with a small group of golden ears (not mine) in NJ. Will take several months to build out the test venue, but we are moving slowly in the right direction.

JadeEast · Sep 7, 2008 at 4:08 PM

Quote:

Originally Posted by Hirsch /img/forum/go_quote.gif
Is there reason to believe that any less would be needed to study individual differences of any auditory phenomena, and the conditions under which they occur?

I don't think that people who claim to posses perfect pitch would loose their ability under a test situation. They may have difficulty or not hit 100 percent but I don't think they would loose it. This is just speculation on my part but if this is true it would require listening skills for audiophile differences "greater" than or at least more sensitive in nature than perfect pitch. Are you arguing for special listening ability beyond the average person?

I've been reading about hi-fi stuff and buiding my own systems for a while. I lost interest for a while and only wanted a "good enough" system to enjoy music. I have heard the difference in cables but now I see that I have no way to tell if those were purely psychological. I also hear solid state as being hard analog as warmer etc. I've even painted green on the edge of my cds in the past. I seemed to find that everything changes something when I was really into "tweaking" things. Now I see that I have little way to determine how much was just in my head.

The more I read the arguments for and against DBT and ABX in audio the more it seems to me that reports of subjective differences or improvements needs to be taken with a grain of salt or at least dialed down. It just seems like many of arguments against DBT and ABX are operating under the assumption that the differences are very difficult to detect. I don't see this reflected in reported experience of people or reviewers.

If the differences between A & B simply become so elusive under dbt test situations then it seems to me that the differences are very minor and hugely dependent on psychological influence. Of course this is just my current POV and will hopefully change when I see more test results.

I think Hirsch's idea of testing for known differences in a test is a great idea. It would be interesting to see how large of an impact being "in" a test were to someones perceptions. I imagine that having a baseline determined somehow outside the test conditions would help, I'm not sure how you can get a baseline without testing.

wavoman · Sep 9, 2008 at 1:42 AM

Quote:

Originally Posted by JadeEast /img/forum/go_quote.gif
...many of arguments against DBT and ABX are operating under the assumption that the differences are very difficult to detect. I don't see this reflected in reported experience of people or reviewers. ...I think Hirsch's idea of testing for known differences in a test is a great idea. It would be interesting to see how large of an impact being "in" a test were to someones perceptions. I imagine that having a baseline determined somehow outside the test conditions would help, I'm not sure how you can get a baseline without testing.

Great post! I am for listener-blind A/B tests (they don't have to be Double Blind if the leader is in a different room or otherwise shielded from the subject). But not A/B/X, since the underlying theory of: "is X like A or B"? is flawed. We ask instead: which do you prefer? ... an easier question.

You establish the baseline -- testing for a known difference -- through subterfuge. You tell the listener that Gear 1 and Gear 2 will be randomly assigned to A or B in the many trials, but you secretly use Gear Inferior in some of the trials, and in others you use the same Gear for A and B.

You mix these in randomly.

In fact it doesn't hurt to not use subterfuge -- you just say: "In this test you will be presented with a pair of musical passages, A and B. Just tell us which of the following four statements are true:

1. I hear no difference
2. I hear a difference, but I have no preference
3. I prefer A to B
4. I prefer B to A

You can ask us to repeat A and B as often as you like.

Then we more on to the next comparison."

Through careful randomization and use of obviously bad signals you can work it all out in the analysis.

But you never say "now this is a baseline test".

Man I can't wait to do this. No special box needed.

JadeEast · Sep 9, 2008 at 2:39 AM

wavoman

An interesting read about a cable test from last year is here (hear?).
Observations of a controlled Cable Test - AVS Forum

It's not a perfect test by any means but it's an interesting read as far as the experiences, observations and peoples interpretations of the test even with it's down falls. It may interest you if you're doing some tests.

I really think the audiophile who took the test went in with a great attitude ended up surprised and handled the whole thing really well.

A link to the listening room where the test were done.
Audio Asylum Inmate mikel's Music System

nick_charles · Sep 9, 2008 at 3:44 PM

Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
In fact it doesn't hurt to not use subterfuge -- you just say: "In this test you will be presented with a pair of musical passages, A and B. Just tell us which of the following four statements are true:

1. I hear no difference
2. I hear a difference, but I have no preference
3. I prefer A to B
4. I prefer B to A

I really think this design, while interesting is more complicated than it needs to be.

The underlying premise of "new" technologies is that they are superior due to some technological innovation (i.e different sampling technique) or refinement (faster, bigger, deeper and so on) or subtle alteration (roll off , shaped FR and so on) .

There is an unspoken assumption that listeners will prefer the new technology because this improvement makes the new technology detectably different.

If two things are the same (i.e no change) then preference makes no sense and in fact a preference of A over A it is a failure of perception. In fact a far more serious failure than just saying they are different when they are not, as you are saying they are so very different that the difference is really obvious when there is no difference.

Thus it is more reliable and far simpler to just test for difference detection.

i.e

1. No difference
2. Different

Only after you have shown rigorously that a difference can be detected do you need to worry about preference, if no difference can be detected then preference is meaningless...

b0dhi · Sep 9, 2008 at 5:07 PM

Quote:

Originally Posted by JadeEast /img/forum/go_quote.gif
wavoman

An interesting read about a cable test from last year is here (hear?).
Observations of a controlled Cable Test - AVS Forum

It's not a perfect test by any means but it's an interesting read as far as the experiences, observations and peoples interpretations of the test even with it's down falls. It may interest you if you're doing some tests.

I really think the audiophile who took the test went in with a great attitude ended up surprised and handled the whole thing really well.

A link to the listening room where the test were done.
Audio Asylum Inmate mikel's Music System

Totally agreed. It's great to see someone approaching the topic with both scepticism and open-mindedness in the right doses. It's a shame the experiment was fatally flawed in that it took a minute to change cables. No chance for a positive result in that situation, even if the cables did sound different :/

nick_charles · Sep 9, 2008 at 5:39 PM

Quote:

Originally Posted by b0dhi /img/forum/go_quote.gif
Totally agreed. It's great to see someone approaching the topic with both scepticism and open-mindedness in the right doses. It's a shame the experiment was fatally flawed in that it took a minute to change cables. No chance for a positive result in that situation, even if the cables did sound different :/

Cable swap tests are always slow and without perfectly transparent switch boxes on both amp and speaker ends it is hard to see how they could ever be instant.

But, are you not contradicting what you said in post #36 where you suggest that only long term tests are valid. At some point in your proposed long term ABX tests you would have to make the change and then if instant memory is required for discrimination then these tests would break down as well ?

b0dhi · Sep 10, 2008 at 5:46 AM

Quote:

Originally Posted by nick_charles /img/forum/go_quote.gif
Cable swap tests are always slow and without perfectly transparent switch boxes on both amp and speaker ends it is hard to see how they could ever be instant.

Agreed.

Quote:

Originally Posted by nick_charles /img/forum/go_quote.gif
But, are you not contradicting what you said in post #36 where you suggest that only long term tests are valid. At some point in your proposed long term ABX tests you would have to make the change and then if instant memory is required for discrimination then these tests would break down as well ?

Just to clarify, when I said that I would only give weight to a long term test, I meant a test that had a negative result. Clearly a short term test doesn't preclude a positive result.

With re. to discrimination - since this ABX was not "long term", it was mostly (though not entirely ofcourse) a test of aural memory, and in that sense (due to the fact that it took a minute or more to change cables) it was almost certain, IMO, to have a result in the negative. The aural (instant as you referred to it) memory is ofcourse always required for discrimination. The long term component of the test that I suggested is an attempt to minimise the degree to which aural memory is the "bottleneck" in the tests. We are, after all, trying to test what is audible, not what is audible then storable then recallable then discriminable.

Also, it might be possible that certain aspects of sound are still detected, but processed in different ways by the subconscious and conscious parts of the brain/mind. For example, it's possible that the consciously reasoning/discriminating mind may only have access to "high level" sound information (which it deals with on a day-to-day basis), while the subconscious may have access to more "low level"/"pure data" sound information. This could lead to scenarios whereby two different sounding cables might be impossible to differentiate even in a perfect test, yet still have slightly different emotional impact on a listener because of that difference.

The other, secondary reason for the long term approach is that aural memory can be improved by practice. The fact that in the long term test, "analysis" of the sound is conducted by the listener over a long period of time would encourage the development of the aural memory and therefore further expose discernable audible differences.

JadeEast · Sep 10, 2008 at 7:28 AM

Quote:

Originally Posted by b0dhi /img/forum/go_quote.gif
Also, it might be possible that certain aspects of sound are still detected, but processed in different ways by the subconscious and conscious parts of the brain/mind. For example, it's possible that the consciously reasoning/discriminating mind may only have access to "high level" sound information (which it deals with on a day-to-day basis), while the subconscious may have access to more "low level"/"pure data" sound information. This could lead to scenarios whereby two different sounding cables might be impossible to differentiate even in a perfect test, yet still have slightly different emotional impact on a listener because of that difference.

If there is a difference I think it may lay in the two paths that sense data appear to take in our brain.
HowStuffWorks "Creating Fear"

The aural memory problem seems like it would be present in any comparison made sighted or not.

edstrelow · Oct 5, 2008 at 7:07 PM

Quote:

Originally Posted by oicdn /img/forum/go_quote.gif
I did a search and couldn't find anything.

Why is it nobody recables Electrostats? Recabling dynamics is almost standard procedure, but you never see a recabled stax. Why is that?

I assume it's because finding the plugs/terminations is hard?

Most stat phones are an upscale item and have pretty decent cables already. However there is at least one outfit re-cabling the Koss 950 with a stax cable. Also, some people, including me have recabled Stax phones with their most expensive, low capacitance cable used in the Omegas, 404 and 4070.
http://www.head-fi.org/forums/f4/sig...dphone-175556/

I also have a Stax SRX3 recabled with a custom silver cable.

myinitialsaredac · Oct 5, 2008 at 7:16 PM

For a similar scientific test you may want to look at this - http://www.head-fi.org/forums/f133/r...tudent-368745/.
It is an experiment I am planning to conduct in roughly a month using an electric wave analyzer (whether oscilloscope, baseband analyzer, audio analyzer or the like, time domain reflectometer, and various ICs.

Dave

wavoman

Headphoneus Supremus

JadeEast

Headphoneus Supremus

wavoman

Headphoneus Supremus

JadeEast

Headphoneus Supremus

nick_charles

Headphoneus Supremus

b0dhi

Headphoneus Supremus

nick_charles

Headphoneus Supremus

b0dhi

Headphoneus Supremus

JadeEast

Headphoneus Supremus

edstrelow

Headphoneus Supremus

myinitialsaredac

1000+ Head-Fier

Users who are viewing this thread