ABX Reliability | Page 4 | Headphone Reviews and Discussion - Head-Fi.org

Head Injury · May 17, 2010 at 10:50 PM

Quote:

slaughter said:
I was able to get the same colors by refreshing a few times earlier, but a screenshot will work just as well as you suggested. Ham Sandwich, what is the difference between needing to get 95% right and needing a 95% confidence level? If you don't get 95% correct, then you don't have a 95 confidence level. Semantics my friend...

No, there is a difference. For example, if someone scores 10/20, he got 50% right but there's not a 50% chance he was guessing. There's nearly a 100% chance he was guessing. And if someone gets 20/20, he got 100% right but there's still a > 0% chance he guessed.

Are we allowed to do this with an increased color difference? Because with a 10 color difference, I scored 20/20 easily. Which, I think, proves that the ability of an ABX to show differences relies on the magnitude of the differences, and by extension means that if people can't pass DBT cables tests then the differences may well be too small to easily perceive (if they exist at all).

Edwood · May 17, 2010 at 11:24 PM

That color test is also skewed as well. It's picking a color that is very close to magenta range that most monitors have difficulty displaying properly. Now if it were a test with printed samples, that would be another story, but yeah, I see it's point.

It's very difficult to see the differences on a crappy TN monitor. But on my calibrated IPS panel, I can see the difference more easily.

-Ed

Head Injury · May 17, 2010 at 11:37 PM

Quote:

edwood said:
That color test is also skewed as well. It's picking a color that is very close to magenta range that most monitors have difficulty displaying properly. Now if it were a test with printed samples, that would be another story, but yeah, I see it's point.

It's very difficult to see the differences on a crappy TN monitor. But on my calibrated IPS panel, I can see the difference more easily.

-Ed

It randomizes color every time.

I agree on the monitor thing. I didn't realize how terrible the vertical viewing angle of my cheap Samsung was until I tried this test. The squares look like color gradients

My monitor probably needs calibrated better. Differences in red seem easier to detect. I don't mind the need for calibration in this test's validity, though. In audio DBTs, differences will be limited by equipment as well.

Ham Sandwich · May 17, 2010 at 11:48 PM

Quote:

slaughter said:
I was able to get the same colors by refreshing a few times earlier, but a screenshot will work just as well as you suggested. Ham Sandwich, what is the difference between needing to get 95% right and needing a 95% confidence level? If you don't get 95% correct, then you don't have a 95 confidence level. Semantics my friend...

It's statistics, not semantics. A 95% confidence level does not mean scoring 95 out of 100. Confidence levels are based on the bell curve and probability. It has nothing to do with doing well enough to score an A on your English paper.

A 95% confidence level means that statistically you are 95% confident that the result was not due to chance. It's bell curve and probability stuff.

Head Injury · May 17, 2010 at 11:50 PM

Quote:

ham sandwich said:
It has nothing to do with doing well enough to score an A on your English paper.

OT:

Hey, that's a good idea. Tests scored based on confidence level that the student was not guessing. Would make multiple choice questions a real bitch.

Ham Sandwich · May 18, 2010 at 1:05 AM

Quote:

head injury said:
I doubt it. I certainly wouldn't trust a 14/20 in a scientific experiment. If it does, I'd be surprised.

It's frustrating that I can't remember how to do my statistics calculations. My books aren't handy at the moment and trying to refresh my knowledge based on internet resources is even more of a frustration. :frowning2:

Anyways, the Wikipedia article on ABX Tests mentions that the results required for a 95% confidence level with 20 trials is only 14 correct. So 14/20 in an ABX test would indeed be enough for 95% confidence. As you increase the sample size the % correct needed goes down (that concept there is going to blow Slaughter's mind).

Results required for a 95% confidence level:

Number of trials	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25
Minimum number correct	8	8	9	9	10	11	11	12	12	13	14	14	15	15	16	17

Unfortunately the Wikipedia article doesn't give the formula or any info on the theory and the references don't either.

QRomo · May 18, 2010 at 2:09 AM

Ham Sandwich, check out the binomial distribution. It gives the probability of an event with probability p happening k times in n trials. In the case of an ABX test, p = 0.5.

Slaughter · May 18, 2010 at 2:50 AM

Here is an ABX manual. Page 10 somewhat explains probability versus right and wrong, but the funny thing is....they just made up the 95% rule for audio.

http://www.qscaudio.com/support/library/manuals/abxman.pdf

I feel slightly better about DBTs, but still not for audio, due to short term memory, sensory memory, subconscious, and the complexity in a music sample.

xnor · May 18, 2010 at 2:53 AM

Quote:

slaughter said:
xnor, you don't get it.... [...] As I understand it, [...]

I don't want to argue with you. What you just wrote in the posts above shows that you do not understand how a properly implemented ABX test looks like, how it works and what the results mean.
Read up on the topics, try it out yourself and try to answer some of my questions.

Then we can get into discussion.

Slaughter · May 18, 2010 at 2:55 AM

I'm good now. Bring it on...

I am a logical person who deals in facts.

If you use two images or sounds and there is a known visual or audible difference between them, and you fail an ABX, how can the test be accurate?

The only logical answers that I can think of, the test cannot be accurate or our brain is really bad at remembering things. And if we just have bad memory, then test never tells us if there is a difference, only if we can remember the difference.

Make sense?

Ham Sandwich · May 18, 2010 at 3:35 AM

Quote:

slaughter said:
Here is an ABX manual. Page 10 somewhat explains probability versus right and wrong, but the funny thing is....they just made up the 95% rule for audio.

http://www.qscaudio.com/support/library/manuals/abxman.pdf

I feel slightly better about DBTs, but still not for audio, due to short term memory, sensory memory, subconscious, and the complexity in a music sample.

The 95% confidence rule is just made up. It's kind of a common rule of thumb as a reasonable confidence level to use in these sorts of things. There are no proofs or theory saying it is the best number to use. You can use a higher or lower confidence level if you want.

Did the explanation in that PDF article make sense to you? I read it and know what they're trying to say and why they're trying to say it. But at the same time I got the feeling that it glossed over things in a way that people who already know the subject would know what is going on but those who don't already know the subject could be confused. I think their explanation could have done with another page worth of explanation along with some drawings and graphs.

To get an idea of how ABX type testing can be done try the Foobar2000 ABX tester. Try it with two easy to distinguish samples. Don't worry about results. Just play with it and see how it works and what it can do. It does things slightly different by having choices for A, B, X, and Y rather than just A, B, and X. Having A, B, X, and Y available makes comparisons easier since all of the samples are there to compare against.

JamesL · May 18, 2010 at 4:06 AM

Quote:

slaughter said:
...
The only logical answers that I can think of, the test cannot be accurate or our brain is really bad at remembering things. And if we just have bad memory, then test never tells us if there is a difference, only if we can remember the difference.

Or it could be that there is a difference, but our senses can't perceive the difference.
Or.. it could be that our equipment makes it difficult the discern the difference (uncalibrated tn panel vs professional ips panel).

From what I understand, the purpose of the test ABX test isn't to show that there are no differences.. It is to show that there is no difference that we can perceive/experience directly. There are other types of scientific tests that presents evidence supporting/against the former.

Trysaeder · May 18, 2010 at 4:38 AM

Got 11/20 and 10/20. Felt quite bad, then I read this thread and decided to act on someone's recommendation to printscreen.
20/20 three times.
So I failed the AB test 2/2 times, passed the ABX test 3/3 times.
Monitors are a non-issue, crappy TN monitor (asus g2410) doesn't matter at all. I could try it on my good TN monitor but there's no room for improvement.

So what's wrong with an ABX test?

You are given 2 samples that you can tell a difference with, then one of A or B.
-You are allowed to compare the sample with the original one. There should be no reason to get one wrong. If you do, that's you being dumb OR the difference is too small for you to tell.
-You are not 'allowed' (that test I just took). Depends how big the difference is, you will fail. In that case, it was one shade of one color missing.

You are given 2 samples that you cannot tell a difference with.
-Well what's the point? There may be a difference, but as long as the human is concerned, there is no difference.

Add 1% more potassium to a banana and give it to an ape. There is a difference. Ask the ape if there is a difference. Does he care if the mass spectrometer says there is a difference? Will he believe it?

eucariote · May 18, 2010 at 7:13 AM

Quote:

slaughter said:
When you scored 9/10, ABX says that you cannot tell the difference between the two images, at least to a degree that has any scientific significance. How does that make you feel about ABX?

Quite good actually. Here is how hypothesis testing is done:

H0: the difference between a and b is not detectable
H1: the difference between a and b is detectable

The probability of getting exactly k successes in n trials with p chance of success and q chance of failure is:

>> factorial(10)/(factorial(9)*factorial(10-9))*(.5^9)*(.5^(10-9))

ans =
0.0098

0.0098 < 0.05 therefore H1

And a brief explanation of the test done.

eucariote · May 18, 2010 at 8:38 AM

Quote:

slaughter said:
If you use two images or sounds and there is a known visual or audible difference between them, and you fail an ABX, how can the test be accurate?

The only logical answers that I can think of, the test cannot be accurate or our brain is really bad at remembering things. And if we just have bad memory, then test never tells us if there is a difference, only if we can remember the difference.

Or our brains cannot perceptually detect them. Remember, the structure and capabilities of our sense organs and brains are only good enough to have kept us alive long enough to produce progeny. Anything beyond that is a waste of metabolic and behavioral resources, which is happily killed off by evolution. Subjective experience is famously limited, full of blind spots and prone to illusions. As I noted before, there is a whole scientific discipline that has delinated those limits.

Latest Thread Images

Head Injury

Headphoneus Supremus

Edwood

1/2 hamster, 1/2 Turkish∙ Blueteething

Head Injury

Headphoneus Supremus

Ham Sandwich

Headphoneus Supremus

Head Injury

Headphoneus Supremus

Ham Sandwich

Headphoneus Supremus

QRomo

Head-Fier

Slaughter

1000+ Head-Fier

xnor

Headphoneus Supremus

Slaughter

1000+ Head-Fier

Ham Sandwich

Headphoneus Supremus

JamesL

1000+ Head-Fier

Trysaeder

1000+ Head-Fier

eucariote

1000+ Head-Fier

eucariote

1000+ Head-Fier

Users who are viewing this thread