Head-Fi.org › Forums › Equipment Forums › Sound Science › The validity of ABX testing
New Posts  All Forums:Forum Nav:

The validity of ABX testing - Page 3

post #31 of 109
Quote:
Originally Posted by royalcrown View Post
I think the analogy is flawed because it's resting on a different assumption than with audio. In your analogy, you mention flying bullets. For one, we know that the flying bullets exist. Moreover, we know what the effects of flying bullets are outside of playing golf - for example, we know that flying bullets cause people to tense up and fear for their lives in almost any situation.

However, with the objections raised such as imagination contamination, it's a completely different story. We don't know if imagination contamination exists. We also don't have any conclusive proof that this condition manifests itself in anything outside of double blind audio testing, nor where we would find it, or even how to isolate it (unlike flying bullets, where we've witnessed them in tons of situations and can isolate their effects very easily). The only proof that imagination contamination even conceivably exists is that people are failing ABX tests where they "should" be passing them. Thus, the proof of their existence relies upon ABX tests yielding negative results. However, if ABX tests yielded positive results, and if one accepts those positive results as valid, then the only proof for imagination contamination's existence is gone. That means that, ultimately, the argument is circular:

If imagination contamination exists, ABX testing is flawed.
Imagination contamination exists only if ABX testing produces a negative result.
Therefore,
ABX testing is flawed because ABX testing produces a negative result.
What is "imagination contamination" again? I forget. Can't we use quick switching instead, which is one of the elements I think you said would be included in your test (which several of us critics of ABX tests don't like)?
post #32 of 109
Quote:
Originally Posted by PhilS View Post
What is "imagination contamination" again? I forget. Can't we use quick switching instead, which is one of the elements I think you said would be included in your test (which several of us critics of ABX tests don't like)?
Imagination contamination is my term for an effect which I have experienced first-hand, although that doesn't prove it to a skeptic. It is equivalent to your example (shooting bullets over your head when golfing) or basically an effect that reduces sensitivity in tests.

Let me first say that I have NEVER (repeat: NEVER) used failure in ABX tests to argue that imagination contamination exists, and if anyone thinks I have, they aren't really reading my posts.

When I listen to a very high-resolution and enjoyable presentation of music, I often participate actively through thinking, feeling, and subtle body movement. (I believe that a lot of people do this.) If I later listen to the same music on a low-resolution source, I notice that I enjoy it tremendously, strangely, and notice that I'm actively participating in similar ways. So it is my hypothesis that if a person listens to A and B back-to-back (quick-switch style), and A is very enjoyable, and B is just a tad less enjoyable, the person won't be sensitive to the difference, because they have started to actively participate along with A and will bring that participation to B.

However, this is irrelevant to Royalcrown's original question.
post #33 of 109
Thread Starter 
Quote:
Originally Posted by mike1127 View Post
Now you are going in a totally different direction than your original question. Your original question is whether I would accept a positive result as meaningful in some way. I've answered that. A conditional "yes". I've shown why it is rational to think that.

Arguing for or against the existence of "imagination contamination" or any effect that reduces sensitivity in an ABX is an entirely different issue.
This is the original point of my post. The point of my post is not whether or not statistically a positive DBT matters. The point of my post is whether or not criticizing ABX testing for reasons such as imagination contamination while at the same time accepting a positive ABX result (if it were to occur) is logically unsound.
post #34 of 109
Thread Starter 
Quote:
Originally Posted by PhilS View Post
What is "imagination contamination" again? I forget. Can't we use quick switching instead, which is one of the elements I think you said would be included in your test (which several of us critics of ABX tests don't like)?
The reason I used imagination contamination is because that is one of the proposed effects of quick-switching that is harming ABX sensitivity. I think we can use quick-switching instead, though I think using that term goes back one step in that it doesn't answer the question how quick-switching affects ABX sensitivity. If we ignore that question for now though, the term would probably work.
post #35 of 109
Thread Starter 
Quote:
Originally Posted by mike1127 View Post

Let me first say that I have NEVER (repeat: NEVER) used failure in ABX tests to argue that imagination contamination exists, and if anyone thinks I have, they aren't really reading my posts.
Right, but the only empirical evidence that exists supporting it (whether you noted it or not) is the failure of ABX testing.

Quote:
Originally Posted by mike1127 View Post
When I listen to a very high-resolution and enjoyable presentation of music, I often participate actively through thinking, feeling, and subtle body movement. (I believe that a lot of people do this.) If I later listen to the same music on a low-resolution source, I notice that I enjoy it tremendously, strangely, and notice that I'm actively participating in similar ways. So it is my hypothesis that if a person listens to A and B back-to-back (quick-switch style), and A is very enjoyable, and B is just a tad less enjoyable, the person won't be sensitive to the difference, because they have started to actively participate along with A and will bring that participation to B.
That's a just-so story, but at any rate the claim is that quick-switching is bad, the "why" factor will always be a speculative one.

Quote:
Originally Posted by mike1127 View Post
However, this is irrelevant to Royalcrown's original question.
It honestly is far more relevant to the thread than the statistical arguments.
post #36 of 109
Quote:
Originally Posted by royalcrown View Post
The reason I used imagination contamination is because that is one of the proposed effects of quick-switching that is harming ABX sensitivity. I think we can use quick-switching instead, though I think using that term goes back one step in that it doesn't answer the question how quick-switching affects ABX sensitivity. If we ignore that question for now though, the term would probably work.
Maybe you could repeat your reply to my previous post with quick-switching, as I don't see why the putting analogy isn't pretty good. I think you're focusing on the trees rather than the forest, when you say that we know how people react to bullets. (I'm not being critical; I'm just trying to explain my difficulties with the argument.)

To me, if a believer says quick-switching is something that can prevent people from hearing differences in ABX tests, I don't see why it isn't reasonable to accept a positive result and reject a negative result when quick-switching is utilized. The positive result may mean (1) the differences were so obvious that the quick-switching was not enough of a hindrance to overcome them, or (2) quick-switching is not a hindrance at all. OTOH, if the result is negative, we don't know that quick-switching is not a hindrance. Maybe that is what cause the negative result.
post #37 of 109
Quote:
Originally Posted by PhilS View Post
To me, if a believer says quick-switching is something that can prevent people from hearing differences in ABX tests, I don't see why it isn't reasonable to accept a positive result and reject a negative result when quick-switching is utilized. The positive result may mean (1) the differences were so obvious that the quick-switching was not enough of a hindrance to overcome them, or (2) quick-switching is not a hindrance at all. OTOH, if the result is negative, we don't know that quick-switching is not a hindrance. Maybe that is what cause the negative result.
You say it pretty well here, just as you have all throughout this thread.

I think maybe Royalcrown's point is that SOME believers would base their entire argument on the success or failure of a test. In other words, this strawman "believer" being criticized by Royalcrown thinks like this:

- Someone didn't pass an ABX test.
- Therefore ABX tests are flawed.
- Now let me invent some reasons why.

There may in fact be a few people who think like that, but certainly none of them have participated in this sound science forum during the time I've been here.
post #38 of 109
Quote:
Originally Posted by mike1127 View Post
I think maybe Royalcrown's point is that SOME believers would base their entire argument on the success or failure of a test. In other words, this strawman "believer" being criticized by Royalcrown thinks like this:

- Someone didn't pass an ABX test.
- Therefore ABX tests are flawed.
- Now let me invent some reasons why.
I'm not so sure he's really saying that. I think he's making a different point, but I'm still not sure if I understand it correctly. Maybe eventually we'll all get on the same page.
post #39 of 109
The relationship of statistics to "knowledge of the real world" is quite subtle and easily misunderstood. I usually assume most people on the sound science forum know more statistics than I do, because I know only a little of the math. However, I may have a better "feel" for what statistics mean than most people, because my day job is writing software that does something called "statistical orbit determination." That is navigating spacecraft on the basis of noisy and incomplete measurements. At this point you are probably wondering why I claim to know little about statistics! I don't actually program the mathematical part of the software. But I have stood over the shoulders of navigators and heard them thinking out loud.

Statistics is a kind of black art, because it involves manipulating models of the world, when you often don't know if those models are correct (even remotely correct)! Your conclusions can be right. They can be wrong. And they can be dangerous. (Spacecraft have been lost due to mistakes in navigation.)

They can be right for the right reason, and right for the wrong reason.

In the case of an ABX test, we start with a null hypothesis, which specifically is the hypothesis that the listener cannot hear any difference and is picking an answer based on some arbitrary criteria or random guessing. We then model this listener as someone who answers correctly 50% of the time, or to put it another way, p=0.5.

In other words, we are modeling the listener as essentially a fair coin. That's a bit of a funny thing to do, considering a listener is a brain and a body and so forth. For all that to reduce to simply a coin toss is amusing in a way. Although it is an accurate model if there is no audible difference.

Now, you might wonder, why do we take this approach?

Here's an analogy.

If someone claims to be a bad golfer, then goes out and gets a bad score, that is consistent with being a bad golfer. If he gets a good score, that is inconsistent with being a bad golfer.

Perhaps we have a reason to suspect someone who is actually a good golfer would fake being a bad golfer. If he claims to be a good golfer, then gets a bad score, you could say that is inconsistent. But we know he could easily be faking it. So in certain contexts, we cannot reject his claim of being a good golfer from a bad score.

On the other hand, if he claims to be a bad golfer, then gets a good score, we can categorically reject his claim. We intuitively know this, but a way to put it mathematically is to say there is very little probability he could achieve a good score while being a bad golfer.

What ABX testing analysis does is start by assuming he's a very bad golfer and seeing if he can prove otherwise. Or rather, start by assuming the test subject is randomly guessing, and wait for them to prove otherwise. Call the hypothesis they are randomly guessing H0.

A good test result can demonstrate H0 is unlikely to be true. That's because you can't fake a good result, except by getting lucky. And if you have only a 3% chance of "getting lucky", and yet you can repeat this feat several times, you have pretty much conclusively demonstrated you weren't lucky. Rather, H0 is false.

So if H0 is false, then what is true? In a sense, we still don't know. All we know is that H0 is false. Almost anything else could be true.

For example, we can demonstrate the test subject isn't well modeled by a 50% chance of guessing right, but does that mean they are well modeled by some other chance of guessing right? Not really, because people and test situations are way too complex to be reduced to a number.

Also, note that I said the golfer can easily be faking a bad score. The analogy is that a test can be insensitive for any number of reasons---and not just the ones Royalcrown thinks I'm "making up." Subject isn't paying attention, bad choice of music, incorrect or no listener training, wrong choice of test subjects, etc.

Now, you might think at this point I'm trying to use this argument to attack all skeptics and that I'll never let go. If you think that, well sorry to disappoint. There is a time and a place to accept H0. It is reasonable to do that when you feel that adequate testing has been done under good conditions, using all available evidence to design the tests.
post #40 of 109
Often people have elaborate opinions about amps after head-fi meets. Indeed, people are often encouraged to participate in meets in order to find out what they like etc.

It would seem to me that meet conditions are very significantly less optimal than blind-test conditions.

Why is it that people can often hear differences so clearly under meet conditions when they can't under blind conditions?
post #41 of 109
Quote:
Originally Posted by Dane View Post
Often people have elaborate opinions about amps after head-fi meets. Indeed, people are often encouraged to participate in meets in order to find out what they like etc.

It would seem to me that meet conditions are very significantly less optimal than blind-test conditions.

Why is it that people can often hear differences so clearly under meet conditions when they can't under blind conditions?
I'm not sure they can hear things under meet conditions.

I'm also not sure that blind-test conditions are optimal.

The theory goes that subtle differences in amps, dacs, and cables (and I say "subtle" not because they are insignificant but because they are not as obvious as headphone differences in back-to-back comparisons) matter most under these conditions:

- Listening for enjoyment, not analytically

- Fresh ears: primarily this means you are listening to fresh music rather than a repeated track, secondarily it means first time listening to a piece of equipment

- Long-term settling in with a system and getting used to its sound

Blind testing as generally done does not replicate these three conditions. Theoretically it could and I'm working on it.

The first two conditions are met sometimes in meets.

It seems like objectivists think that subjectivists subscribe to a lump set of beliefs that all have to come together:
  • Any sighted test is valid.
  • We can listen once and proclaim "It's magnificent" and always be right.
  • Blind tests are flawed simply by virtue of being blind
  • Or when people have trouble passing blind tests that, in itself, is evidence they are flawed.

Etc. I am definitely not an objectivist but I don't subscribe to any of these things.
post #42 of 109
Mike, I understand what you're saying. Blind test conditions are not optimal, I accept that.

My point is this: If you (not you personally) claim that blind testing is not valid for the reasons you listed (or similar) while at the same time can get meaningful impressions of amps (example) at meets and subsequently report about it in great detail, then this implies that meet conditions are better than blind testing conditions. Think about this. A room full of commotion and babbling people and what not, that is not enough to mask differences. But a blind test, where you can quietly sit and listen and concentrate, that totally masks all differences. It just doesn't seem plausible to me.
post #43 of 109
Quote:
Originally Posted by mike1127 View Post
The first two conditions are met sometimes in meets.
Just enjoy the blind test, pretend it's a meet. No one is forcing you to listen in a particular way, if "meet listening" help you differentiate, then by all means.

I see no problem in choosing new music for each trial in a blind test, but I think it would reduce the chance of success rather than increase it.

So, I don't see why the two first meet conditions couldn't be present in a blind test also.

Edit: The last condition (long term) is neither present at meet or blind conditions, although there is nothing inherently preventing long term listening blindly - it just complicates the practicalities of the test.
post #44 of 109
I say ABX tests are valid.
But only for the person performing it, listening to that specific music/recording on that specific audio gear. It can of course be a guideline for others, but then take into consideration that you have different set of gear, ears and probably music.

You need to perform your own ABX test to know where you stay.
post #45 of 109
Quote:
Originally Posted by Dane View Post
Mike, I understand what you're saying. Blind test conditions are not optimal, I accept that.

My point is this: If you (not you personally) claim that blind testing is not valid for the reasons you listed (or similar) while at the same time can get meaningful impressions of amps (example) at meets and subsequently report about it in great detail, then this implies that meet conditions are better than blind testing conditions. Think about this. A room full of commotion and babbling people and what not, that is not enough to mask differences. But a blind test, where you can quietly sit and listen and concentrate, that totally masks all differences. It just doesn't seem plausible to me.
Sorry I wasn't clear. I don't think meet conditions are very good.

I just wanted to make specific points, specific reasons why one thing or another is bad:
  • Background noise is bad. (meets)
  • Getting all excited about something, having your friends tell you how great it sounds, etc., will skew perception (meets)
  • Listening to one track over and over with slightly varying equipment is bad (meets and most blind tests done to date) (although not inherent to blind testing)
  • Having a short time to "get a handle on" or "get used to" something is bad (meets and most blind tests done to date)

It's not so easy to "just enjoy the blind test, pretend it's a meet." If I switch to a long-term listening protocol (where long-term means a minute or longer), I have to contend with the problem that my attention cannot be strictly controlled. Different listens will seem different simply because my attention wanders from thing to thing. Deliberately placing my attention on something can change the results, because there is a difference between deliberate use of my attention and something "grabbing" my attention without direct willing of it.

I'm working on solving these problems. I'm not trying to argue blind testing is impossible. But I do think these problems have plagued most blind tests done to date.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Sound Science
Head-Fi.org › Forums › Equipment Forums › Sound Science › The validity of ABX testing