DBT problems --methodology
Sep 7, 2009 at 9:28 PM Post #46 of 63
Quote:

Originally Posted by Clutz /img/forum/go_quote.gif
For the same reason people buy Ferraris and Porsches when Ford and and Chevy will do. A Ford or a Chevy will get you to the store to buy groceries, drop off your children at school, allow you to go and visit your parents, as well as the Ferrari or a Porsche would. Practically speaking, the speed of the Ferrari and Porsche is lost on most of the population most of the time.


I'm sure this was unintentional but your comments above now have me wondering about the number of audiophiles who buy high-end gear mainly as a status symbol (and because they can).

I say more power to them.
smile.gif


--Jerome
 
Sep 7, 2009 at 9:30 PM Post #47 of 63
Hello Clutz,

Quote:

Originally Posted by Clutz /img/forum/go_quote.gif
Are either of you statisticians?


Not me.

Quote:

Originally Posted by Clutz /img/forum/go_quote.gif
How will wavoman's set up produce more false negatives than yours?


Reminder, the argument was about Wavoman's idea of gathering people in a meeting where two products are supposed to be compared, and asking each one if they prefer A or B.

Example 1 : everyone can easily hear the difference between A and B, but 50 % of the listeners prefer A, and 50% prefer B.

In my setup, everyone pass an ABX test (for example), with success.
In Wavoman's setup, 50 % of the listener say that they prefer A, and 50% say that they prefer B. Result : no difference is proven.

Example 2 : none can hear a difference except one, who prefers B.
In my setup, all listener fail the test, except the last one. That's enough to get a statistically significant positive.
In Wavoman's, all answers are random, except the last one, that says "B is better". No significant difference has been found.

Example 3 : everyone can hear the difference, everyone prefers B. But not all times except one listener, trained in blind testing.
In my setup, all listeners fail to acheive a individually statistically significant result except one, which is enough for the global result to be statistically significant.
In Wavoman's setup, most answers are not significant because of the lack of training of the listeners, except one, which only gives one answer, but is not enough to get a statistically significant result.

Quote:

Originally Posted by Clutz /img/forum/go_quote.gif
People are either going to hear a difference- and report it- or not hear a difference.


No, Wavoman said that the question would be "which one do you prefer ?", not "is there a difference ?".

Quote:

Originally Posted by Clutz /img/forum/go_quote.gif
Averaged over a large enough randomly selected population, that sort of noise doesn't really matter


The problem is that in real life, finding one listener takes about one or two years, so testing a significant sample of listeners would take hundreds of years.

That's why I give so much importance in finding a perfect, trained listener and an unquestionable blind setup, in order to succeed since the first test with the first listener.

Quote:

Originally Posted by Clutz /img/forum/go_quote.gif
First of all, it seems to me that there are two distinctly different questions here. One is "On average, can a population tell the difference between cable A and Cable B?". [...] The second question is "Can individuals detect a difference between cable A and Cable B?".


That's very true ! I only deal with the second question.

Quote:

Originally Posted by Clutz /img/forum/go_quote.gif
For a moment, let's imagine there are differences between two cables. 50% of the population cannot hear the difference between the cables, but 50% of the population can.


That's a fair starting point, but in reality, it turns out that facing blind tests, many listeners start with poor performance, then get much better after some training.
Sean Olive wrote a paper about this phenomenon : Audio Musings by Sean Olive: Part 2 - Differences in Performances of Trained Versus Untrained Listeners

I ran a small experiment about this : you can read the account here : http://www.head-fi.org/forums/f133/e...24/index3.html

These data give a good example of real-life situation. I think it is intersting to to apply it to the proposed protocols.

I thus disagree with your idea, Haloxt, that the listeners should not know what they are listening to : in this case, in my experiment, we can see that without knowledge of what to listen for, 2 listeners out of 7 have passed the ABX test, while, with the knowledge of the difference, 4 out of 7 have succeeded ! Information about what to listen for have doubled the number of listeners capable of hearing the difference.
I stand with Sean Olive here : the most the listeners are trained, the better the significance of the results.
And, is it useful to recall it ? we are dealing with extremely small effects in this matter.

The results of my experiment also make me think that the swindles idea might be a bad one, Wavoman. For example, in my experiment, if I had removed listeners as soon as they make a mistake, 6 listeners out of 7 would have been rejected, while, with training, 4 out of 7 are capable of producing significant results on they own. I'm loosing 75 % of valuable listeners this way.
 
Sep 8, 2009 at 12:08 AM Post #48 of 63
Quote:

Originally Posted by SmellyGas /img/forum/go_quote.gif
Good. I'm glad you acknowledge that your proposoal constitutes faulty methodology.


I didn't say it constitutes a faulty methodology, I said it might be. I also said I hadn't given it much thought because I was using it as an example of a type of a method where one could come up with something approximating a positive control, because you can titrate the amount of difference until you get to a detectable threshold. It was meant to be an example to demonstrate how one might go about doing that. Additionally, I argued that you hadn't shown that my method was necessarily problematic, you just asserted it was. I don't know if it is, or isn't, but since it's besides the point, I decided to move on. But thanks for paying close attention.

Quote:

Let's back up here. The premise of your argument was that the difference between a Ford and a Ferrari was "subtle," yet some people feel that the subtle improvement is worth it. Similarly, cables/amps/dacs might offer a "subtle" improvement, and some people might also feel this difference is worth it.


If the argument is about a lack of parallelism, then why are you jumping all over the word subtle- which is a scalar and not a vector? If two things have the same direction but different magnitudes, they're still parallel. Subtly has nothing to do with a lack parallelism.

Quote:

No, I'm sorry, but I am not an idiot, and neither are the people on this forum. It is insulting to try to argue that the difference between a Ford and a Ferrari is "subtle." Gimme a break, man. Are you trying to have a discussion, or are you trying to be argumentative?


The qualitative difference between a Ferrari and a Ford is subtle. They both use internal combustion engines. They both use four wheels. One has substantially better specifications than the other. The difference between a Ferrari and the Space Shuttle is not subtle. Similarly one might compare the onboard soundcard from a cheap netbook to a super high end dedicated DAC. When measuring their specs there is no comparison. To some people the difference is subtle, to others it is not. The fact of the matter is you are not the arbiter of what is, and is not, subtle. In my mind the comparison between a Ford and a Ferrari and a cheap DAC and a high quality DAC is the same.

Quote:

Yeah, real subtle there. Oh, wait, what's that? After seeing how dramatically different a Ferrari is from a Ford, you realize that the Ferrari is substantially different? BAM. There goes your PARALLEL EXAMPLE, my friend....unless you still want to mantain that the Ferrar is only a subtle improvement from a Ford.


Substantially different? In both cases, you measure same things: horse power, stopping distance, acceleration, top speed, etc. Parallelism doesn't require similarities in amplitude, just character. If they're measuring the same things, then how can a lack of similarity in the magnitude affect the parallelism of the argument? Your argument is that the magnitude of difference between a Ford and a Ferrari is so much greater than the difference between DACs, that it's not a parallel comparison. That is an issue of scale, not character. With the DACs or AMPs or CABLES we're measuring the same things within a set of comparisons. In both cases we're talking about measuring / detecting differences. It is in fact, quite silly to even say that the magnitude of the difference between a Ferrari and a Ford is different from between a cheap soundcard and the top of the line DAC because you're measuring fundamentally different things. The only way you could even begin to make a comparison between such different things would be to compare the relative differences- and I'm not even sure that makes sense to do. The fact that you refuse to acknowledge the parallelism in the example only goes to show your stubbornness or ignorance, or both.

Quote:

Let's start by addressing the silly statement you made above.


You know, I'm not certain it's actually worth talking to you about anything since you seem to misunderstand the difference between amplitude and character.
 
Sep 8, 2009 at 12:12 AM Post #49 of 63
Quote:

Originally Posted by jsaliga /img/forum/go_quote.gif
I'm sure this was unintentional but your comments above now have me wondering about the number of audiophiles who buy high-end gear mainly as a status symbol (and because they can).

I say more power to them.
smile.gif


--Jerome



Oh, I definitely think there is an element of that in there. I had intentionally used a Ferrari as an example to draw out that allusion, so I'm glad it worked. Though the other reason I used a high end car as an example was to illustrate the point that different people are going to have different price/performance curves, and those curves will also differ depending on how much money those people also have. If I can't afford a house, I'm probably not buying a Ferrari over a Chevy; at the same time, if I've already have a dozen or so vacation houses scattered around the globe, spending the money on a fancy car might be worth it to me.
 
Sep 8, 2009 at 5:41 AM Post #50 of 63
Pio -- what you describe as my protocol is not at all what I proposed. I am sorry that I was not clear enough.

I do not plan on bringing people in a room, and in no way will I pool results.

Half of my idea is to ask about preference instead of identification.

This is the half you discussed.

But the other half is all about testing individuals (each individual is a block) and not pooling results (plus swindles, and play-the-winner)

The question of more false negatives or more false positives cannot be answered without a lot of assumptions.

Under my likelihood model of responses (which I have not yet published so you couldn't know this) my protocol will elicit information faster than other protocols ... well of course I rigged it this way.

Back to your post -- if everyone passes your A/B/X test then they will do perfectly with my questions too (remember I have swindles, A vs A and B vs B -- you didn't take that in to account) and we have a tie in the effectiveness of our methods -- but this is all about bringing people in to a room and I want no part of that.

Again, sorry if I was not clear. I did not mean to imply that one should run a standard type of DBT and pool the results, simply asking preference vs A/B/X. As you point out, that would be very silly.

Added -- to be precise, in all three of the situations you describe, I reach the exact same conclusion you do. In Example 1, everyone scores perfectly on the swindes, so we believe their preferences 100%. In Example 2, like you, the last man produces a signficant result. Ditto for example 3. See? I don't pool. A/B/X has a control built in -- every trial has a right or a wrong answer. This is great, but a straight-jacket, leading to response bias in many cases of subtle differences. I have BOTH kinds of trials -- controls, where there is a right aswer (A vs A and B vs B), and open questions (A vs B) which can comfortably elicit information with less response bias. We are dealing with humans, not robots. Response bias is everything! The statistical reasoning is trivial, the experimental design and modeling of human nature/response bias is the intellectual challenge here. And I am a statistician for sure.
 
Sep 8, 2009 at 5:50 AM Post #51 of 63
Clutz and Smelly are fighting, but there is clear common ground here.

May I?

Let see if we all agree:

Audible difference in cables, if they exist at all, must be subtle, otherwise simple experiments would already have proven the difference exists. Still there is something worth investigating since people claim to hear differences. Some people just want expensive gear, and may claim to hear a difference, but really it is the placebo effect. Fine -- having said that, let's move on, and agree that some level-headed people, aware of the placebo effect, still claim to hear a difference.

Thus we have already, in essence, done the un-blind part. So we select audiophiles who claim to hear a difference in, say, analog interconnects in their own system. Now we need to design a blind test that is discriminating enought to allow subtle differences to be detected, and not too wierd as to eradcate the chance for hearing these differences due to set and setting. This is hard.

Some people who post here simply don't care if tiny differences exist, or exist only in very expensive cables. They are making an economic and practical argument, and we can respect that. Others care very much, becuase they will spend nearly anything even it the improvement is only slight, and only on 1% of the music they listen to. Many of us are in-between these poles.

We can stop fighting now, yes?
 
Sep 8, 2009 at 11:31 AM Post #52 of 63
Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
Pio -- what you describe as my protocol is not at all what I proposed. I am sorry that I was not clear enough.


No, it's my fault. I was taking back the discussion to the point it was in the early first page. You then posted a short description of what could be a blind test, and I said that this would lead to more false negative than with my method.
But this comment only applied to the short glimpse that you posted there, and you have developed your real method in later messages.
 
Sep 8, 2009 at 12:34 PM Post #53 of 63
Quote:

Originally Posted by wavoman /img/forum/go_quote.gif
Clutz and Smelly are fighting, but there is clear common ground here.

May I?

Let see if we all agree:

Audible difference in cables, if they exist at all, must be subtle, otherwise simple experiments would already have proven the difference exists. Still there is something worth investigating since people claim to hear differences. Some people just want expensive gear, and may claim to hear a difference, but really it is the placebo effect. Fine -- having said that, let's move on, and agree that some level-headed people, aware of the placebo effect, still claim to hear a difference.

Thus we have already, in essence, done the un-blind part. So we select audiophiles who claim to hear a difference in, say, analog interconnects in their own system. Now we need to design a blind test that is discriminating enought to allow subtle differences to be detected, and not too wierd as to eradcate the chance for hearing these differences due to set and setting. This is hard.

Some people who post here simply don't care if tiny differences exist, or exist only in very expensive cables. They are making an economic and practical argument, and we can respect that. Others care very much, becuase they will spend nearly anything even it the improvement is only slight, and only on 1% of the music they listen to. Many of us are in-between these poles.

We can stop fighting now, yes?



Interesting thread so far.
Especially this post, because it comes very close to what I think is the heart of the matter.
I think that before we start discussingg DBT ABX or whaterver kind of test we must first very carefully formulate the question we want an answer to.
For instance:
If we want to know if a difference in cables can make an audible difference in a stereo setup, the only thing we have to do is to demonstrate that there is a person that can do that. The conditions under which this demonstration is to take place have to be as accommodating for this person as possible. This may include a learning period in which he will be training himself to recognize the cues he needs to find the differences. This also includes the frequency of the changes, the duration of the listening periods, the choice of music or sound and the choice of the setup, including the cables.
If he is to demonstrate that there is a difference, all he'd have to do is to indicate after every change if the current cable is the same or different from the previous one.
The method for making the changes is not very difficult. If the person would be unable to see the setup. A random sequence would be ok. The amount of changes and the percentage of changes that have been identified correctly have to be decided on, but I'm sure we could agree on that.
The real challenge is in finding this individual.

This is a completely different question than the one some other people seem to want an answer to: Does a significant amount of people have the ability to identify cable A or cable B in a given setup.
The answer to this question, whatever it is, will have absolutely no relevance to the first question, because one is not a significant amount.

In my opinion we have to find the answer to the first question, because that would prove that no claim of anyone who says he can hear a difference can be dismissed out of hand.
 
Sep 8, 2009 at 6:04 PM Post #54 of 63
Quote:

Originally Posted by Kees /img/forum/go_quote.gif
The conditions under which this demonstration is to take place have to be as accommodating for this person as possible. This may include a learning period in which he will be training himself to recognize the cues he needs to find the differences. This also includes the frequency of the changes, the duration of the listening periods, the choice of music or sound and the choice of the setup, including the cables.


Perhaps I am not understanding you, but the above would seem to assume that differences do exist. One might also argue that anyone having to work that hard to detect an audible difference would mean it is safe to say they are probably insignificant under normal listening conditions.

I'll admit to a slight bias and acknowledge that I can be something of a skeptic when it comes to golden ear type claims. Usually when an audiophile uses the word "subtle" I generally interpret it to mean "nonexistent." It's one of the reasons I usually don't post in these kinds of threads. Why I am doing it now I don't quite know, and it is not my intention to offend people who claim to hear microscopic differences between cables, sources, etc. I just know that I cannot hear any differences in cables, at least to an extent that would have any kind of meaningful impact on my listening satisfaction. Nonetheless, I still find these topics interesting. However, at the same time I also find them somewhat unsatisfying because people tend to make it personal and it is hard to navigate these discussions what with all the ad hominem bombs being lobbed back and forth.

--Jerome
 
Sep 8, 2009 at 7:18 PM Post #55 of 63
Quote:

Originally Posted by jsaliga /img/forum/go_quote.gif
Perhaps I am not understanding you, but the above would seem to assume that differences do exist. One might also argue that anyone having to work that hard to detect an audible difference would mean it is safe to say they are probably insignificant under normal listening conditions.


--Jerome



The demonstration must be set up assuming the difference exists. In order to demonstrate they can be heard we must make the circumstances for them to show up as favourable as possible. There must be no other hurdles for this person to take than his quality of hearing the differences.
The IF is in "can a person be found who can do it?" and how do we go about either finding such a person or proving he cannot exist.
And that is not what you can do with a DBT or an ABX.
 
Sep 8, 2009 at 7:48 PM Post #56 of 63
Quote:

Originally Posted by haloxt /img/forum/go_quote.gif
All you do is argue that if there's a difference "it should be heard EASILY". That's all I see you say because you want to derail the discussion, nothing about the topic at hand. Since the question is "how can we find a true positive", it does not matter whatsoever what the probability of a type II error is, except to encourage measures to reduce the possibility of false negatives. But I don't see you suggesting anything to do anything to this effect, troll.


It doesn't seem like he's trolling.
 
Sep 8, 2009 at 7:56 PM Post #57 of 63
Quote:

Originally Posted by Antony6555 /img/forum/go_quote.gif
It doesn't seem like he's trolling.


It doesn't to me either.

Whatever haloxt's personal beef is with SmellyGas I wish he would take it offline.

--Jerome
 
Sep 8, 2009 at 8:09 PM Post #58 of 63
Quote:

Originally Posted by Kees /img/forum/go_quote.gif
The demonstration must be set up assuming the difference exists. In order to demonstrate they can be heard we must make the circumstances for them to show up as favourable as possible. There must be no other hurdles for this person to take than his quality of hearing the differences.


I can't say I agree with that. At least I disagree with the wording. A test should be constructed in such a way as to not bias the results one way or another. I would say that the test environment must be such that it does not, of its own accord, interfere with or aid one's listening ability or their ability to judge the sound in the listening test. But setting up a test in a way that is favorable to a specific outcome is a biased test. I simply would not accept the results of such a test as credible.

We may merely be having a difference on semantic grounds, and if that's the case by all means you are free to point that out.

--Jerome
 
Sep 8, 2009 at 8:42 PM Post #59 of 63
Quote:

Originally Posted by jsaliga /img/forum/go_quote.gif
I can't say I agree with that. At least I disagree with the wording. A test should be constructed in such a way as to not bias the results one way or another. I would say that the test environment must be such that it does not, of its own accord, interfere with or aid one's listening ability or their ability to judge the sound in the listening test. But setting up a test in a way that is favorable to a specific outcome is a biased test. I simply would not accept the results of such a test as credible.

We may merely be having a difference on semantic grounds, and if that's the case by all means you are free to point that out.

--Jerome



I agree with your wording and I don't think that the setup I describe can favour any particular outcome, it is just meant to describe that the situation in which the demonstration takes place has to be free of every potential hindrance. It is not a test, it is a demonstration. If the candidate can demonstrate this ability under very particular circumstances of his own choice, it is a valid example that indicates that it is possible that the difference is heard. And that would be a conclusive answer to our question.
That, or the proof that no-one, under any circumstances could ever hear the difference.
 

Users who are viewing this thread

Back
Top