The Dishonesty of Sighted Listening Tests | Page 2 | Headphone Reviews and Discussion - Head-Fi.org

Me x3 · Dec 1, 2016 at 7:09 AM

me x3 said:
Originally Posted by watchnerd /img/forum/go_quote.gif

Quote:

me x3 said:

What about "The Dishonesty of Blind Listening Tests"?

Click to expand...

Catchy title.
Where's your data?

= D

Well, even with our eyes closed we aren't tools. We are far from being tools indeed.
We have relatively capable sensors, but terrible memory.
And beside our eyes, we have a very complex (always active) brain playing all sort of tricks.

So you could be blind, and say you like A more than B, but it's possible that you were bored the second time you've played the same song (system B), or it could be that you biased yourself thinking A should be the pricier one/bassier one or whatever. It could be that you remembered something and got partially distracted while listening to system B. It could be that you've focused on vocals on system A and then focused on the bass on system B to end up deciding vocals sound better on system A and you like vocals so you prefer system A.

Even when blind, our preferences change and can change drastically depending on mood, listening level and so on...
On top of that recordings are all over the place so what's better for recording A might not be better for recording B.

The problem of what's best is very complex and most of the times, can not be solved in a general way.

So yeah... sighted testing can lead to the wrong conclusions, but blind testing too.

At the end of the day, you can measure things with real tools insted, but even then some other problems arise, such as:
What should be measured? How?
What's the traget? Why? Who agree?

watchnerd · Dec 1, 2016 at 9:38 AM

me x3 said:
So yeah... sighted testing can lead to the wrong conclusions, but blind testing too.

Please explain this.

If I put a cheap wine and expensive wine in paper bags so that people taste them without seeing, there is no "wrong" conclusion if people prefer either the expensive one, the cheap one, or like both equally.

It's just data.

TheoS53 · Dec 1, 2016 at 1:17 PM

Also, we should be more picky about what we label as being "dishonest". Dishonesty suggests a conscious and deliberate attempt to misinform or mislead.

However, what we're talking about with sighted tests is the immense power of suggestion. If someone feels that A sounds better than B because A looks more expensive or whatever, the fact is to them it really does sound better. Of course, it doesn't mean that they're right, but it also doesn't mean that they're intentionally trying to skew the results.

watchnerd · Dec 1, 2016 at 1:37 PM

theos53 said:
Also, we should be more picky about what we label as being "dishonest". Dishonesty suggests a conscious and deliberate attempt to misinform or mislead.

However, what we're talking about with sighted tests is the immense power of suggestion. If someone feels that A sounds better than B because A looks more expensive or whatever, the fact is to them it really does sound better. Of course, it doesn't mean that they're right, but it also doesn't mean that they're intentionally trying to skew the results.

I think the "dishonest" label is not aimed at end-users / consumers, but at publications like Stereophile which have, on more than one occasion, had a product that was given a raving subjective review, but then revealed poor / incompetent design during the measurements portion, with little explanation or caveats.

castleofargh · Dec 1, 2016 at 1:59 PM

theos53 said:
Also, we should be more picky about what we label as being "dishonest". Dishonesty suggests a conscious and deliberate attempt to misinform or mislead.

However, what we're talking about with sighted tests is the immense power of suggestion. If someone feels that A sounds better than B because A looks more expensive or whatever, the fact is to them it really does sound better. Of course, it doesn't mean that they're right, but it also doesn't mean that they're intentionally trying to skew the results.

nobody who spent a few years in the audio hobby can claim to just be ignorant about suggestions and biases. even less so for manufacturers and reviewers. the matter would have been brought to them at least a few times a year if not way more often. those who deliberately decide to remain ignorant of that matter wouldn't be at fault if they didn't try so hard to drag everybody else down with them. nobody cares if I don't learn quantum mechanic or how to properly iron a shirt. how the human brain and senses work is the same, it isn't required by society for me to make the effort to learn about it. so I do or I don't, that's my own choice in life. but if I don't, I shouldn't try to argue about those matters, pretending that I do know all about them. that is dishonesty, or at the very least obvious incompetence.

Me x3 · Dec 1, 2016 at 2:46 PM

watchnerd said:
I think the "dishonest" label is not aimed at end-users / consumers, but at publications like Stereophile which have, on more than one occasion, had a product that was given a raving subjective review, but then revealed poor / incompetent design during the measurements portion, with little explanation or caveats.

I think Sean Olive himself explain somewhere in the comments the dishonest term is aimed at the metodology (sighted testing) not at the consumers or reviewers.

watchnerd said:
Quote:

me x3 said:

So yeah... sighted testing can lead to the wrong conclusions, but blind testing too.

Click to expand...

Please explain this.

If I put a cheap wine and expensive wine in paper bags so that people taste them without seeing, there is no "wrong" conclusion if people prefer either the expensive one, the cheap one, or like both equally.

It's just data.

You can make data with sighted tests as well, the question here is whether the resulting data is 'honest' or 'dishonest'. For the reasons explained in the post you've quoted, we are not good at blind testing, our memory is very limited and our brain tricks us very easily no matter if our eyes are closed or not. In both cases the results can be considered dishonest. It's the same as doing measurements with an audio analyzer that changes the results with temperature, pressure, light and time of the day. You could say, well lets place the analyser somewhere with controlled temperature, but the measurements will remain dishonest.

Similar reasoning can be applied to your wine comparison. The first one you try might change the way you like the second one. You can trick you to think B tastes better than A, especially if they are very close in flavour. You can focus on different things during the test, say flavour on A and consistency or flavour persistence on B. Testing flavour is probably just a tad easier than testing sound because it's a simpler matter. Nontheless the result of blind testing can be very dishonest as well because we aren't tools. Even when the wine comparison you have to store in your memory how wine A tasted and we don't have built in MicroSD cards, so we are only able to compabe B with our memories of A which might differ considerably from A

cupofwrathcom · Dec 1, 2016 at 9:33 PM

The problem with ABX and similar testing is one of familiarity. Its just like if you've ever fixed a hole in the ceiling. You have to try to match the repair with the rest of the ceiling. For days after when you walk in that room your eyes go right to repair, because it doesn't quite match. But then if you gave someone five or ten minutes to find the "different" spot they wouldn't be able to. This is because they're not familiar enough with your ceiling to see the difference.

In the same way obsessed audiophiles are very familiar with their set ups. THey can hear subtle differences. Controlled ABX testing doesn't account for this in depth experience and familiarity. Add in the pressure and fatigue of having to repeat the test over and over and of course it won't show a difference. I don't think I could tell the differernce between a whopper and a big mac if I had to repeat the test 5 times, even with the special sauce. After a while it all tastes the same, even though they don't exactly.

spruce music · Dec 1, 2016 at 10:50 PM

cupofwrathcom said:
The problem with ABX and similar testing is one of familiarity. Its just like if you've ever fixed a hole in the ceiling. You have to try to match the repair with the rest of the ceiling. For days after when you walk in that room your eyes go right to repair, because it doesn't quite match. But then if you gave someone five or ten minutes to find the "different" spot they wouldn't be able to. This is because they're not familiar enough with your ceiling to see the difference.

In the same way obsessed audiophiles are very familiar with their set ups. THey can hear subtle differences. Controlled ABX testing doesn't account for this in depth experience and familiarity. Add in the pressure and fatigue of having to repeat the test over and over and of course it won't show a difference. I don't think I could tell the differernce between a whopper and a big mac if I had to repeat the test 5 times, even with the special sauce. After a while it all tastes the same, even though they don't exactly.

A couple of funny things. ABX is like anything you aren't comfortable or familiar with. If you do a few of them, it gets to be no big deal. It is tedious, and hardly fun. Anxiety inducing it is not once you have a little familiarity. Plus you can do them with your own familiar system. That helps too. Really helps if you learn to comfortably do it as a useful procedure to gather information instead of a test of your audiophile manhood.

Those whopper and big mac flavors, man if you only knew the tests (all double blind) behind nailing down just the taste and smell and mouth feel that works. It would make your mouth water I tell you. The food industry does these things in legions. They work too.

cupofwrathcom · Dec 1, 2016 at 11:39 PM

spruce music said:
A couple of funny things. ABX is like anything you aren't comfortable or familiar with. If you do a few of them, it gets to be no big deal. It is tedious, and hardly fun. Anxiety inducing it is not once you have a little familiarity. Plus you can do them with your own familiar system. That helps too. Really helps if you learn to comfortably do it as a useful procedure to gather information instead of a test of your audiophile manhood.

Those whopper and big mac flavors, man if you only knew the tests (all double blind) behind nailing down just the taste and smell and mouth feel that works. It would make your mouth water I tell you. The food industry does these things in legions. They work too.

Well it does depend a lot on how the test is done. A lot of us test ourselves with our own stuff. If you can use something like a "black box" that doesn't let you know then it is a good test. Personally I've been able to tell differences with cheaper amps.

Testing someone on their own "familiar" set up without a time limit is what I would consider a well designed ABX test. However people always say that unless its under controlled conditions its not valid, because you're basically relying on honesty. Maybe it could be done with black boxes through the internet but it really isn't important enough to get the funding for something like that.

The food and flavor industry I would imagine is very careful to avoid fatigue and use trained testers who are familiar enough with the formulas to tell differences. A lot of ABX audio testing is just meant to make audiophiles look bad.

TheoS53 · Dec 2, 2016 at 12:13 AM

cupofwrathcom said:
A lot of ABX audio testing is just meant to make audiophiles look bad.

I don't think they're inherently designed to make a specific group of people bad...that might very well just be a by-product. I mean, the point of an ABX test is to take out as many variables as possible and to test things in an objective manner rather than to introduce any sort of subjective biases (whether those be intentional or subconsciously). Is that not pretty much the definition of scientific testing?

Me x3 · Dec 2, 2016 at 12:38 AM

theos53 said:
I don't think they're inherently designed to make a specific group of people bad...that might very well just be a by-product. I mean, the point of an ABX test is to take out as many variables as possible and to test things in an objective manner rather than to introduce any sort of subjective biases (whether those be intentional or subconsciously). Is that not pretty much the definition of scientific testing?

That's relatively true, but it's just that. There's still plenty of subjective stuff involved anyway. And you can do nothing about it.
So you end up testing in a very subjective manner anyway.

spruce music · Dec 2, 2016 at 1:28 AM

me x3 said:
That's relatively true, but it's just that. There's still plenty of subjective stuff involved anyway. And you can do nothing about it.
So you end up testing in a very subjective manner anyway.

This makes no sense. If you have done it right, you are seeing if you hear a difference. That is all. You get results indicating you do hear a difference, or a null result that fails to show a difference is heard. There are depending on what is being tested still variables. To suggest because in not all cases, not all variables can be removed, so we'll just not attempt to limit any variables and that will be just as good or better does not add up right.

Me x3 · Dec 2, 2016 at 2:05 AM

spruce music said:
This makes no sense. If you have done it right, you are seeing if you hear a difference. That is all. You get results indicating you do hear a difference, or a null result that fails to show a difference is heard. There are depending on what is being tested still variables. To suggest because in not all cases, not all variables can be removed, so we'll just not attempt to limit any variables and that will be just as good or better does not add up right.

Of course you'll hear a difference between speaker A and speaker B!!
This thread is not about using ABX to find out if you can tell A and B apart. Read it please.

Sean Olive proved there were differences in preference ratings between sighted and blind tests comparing a particular set of speakers.
The conclusion is sighted tests are dishonest which I think most of us agree. What I say is that blind testing (read, listening to the speakers without knowing which speaker is playing and/or listening to the speakers without knowing their placement as in Olive's tests) you can build up all sort of biases and trick yourself with eyes closed as well. Then you'll rate speakers A, B, C and D based on extremely subjective stuff anyway because you're not a measurement tool. You might come back two days later and rate the speakers in a different manner, let alone if you use different recordings during the test and/or listen at different levels.

You can remove one single variable (by going blind) but the rest is purely subjective anyway. And that's the way it is. Even matching levels is flawed because you can not match at all frequencies.
I've done many blind tests in the past as well as sighted tests. In my experience the best way of noticing subtle differences and defining personal preferences is by extended exposure to the system. Both sighted and blind tests with A/B switching within short periods of time yield pretty poor results in my experience since you can not focus on everything when listening to music and the whole comparative experience tends to become overwhelming and confusing.

Ruben123 · Dec 2, 2016 at 3:56 AM

cupofwrathcom said:
The problem with ABX and similar testing is one of familiarity. Its just like if you've ever fixed a hole in the ceiling. You have to try to match the repair with the rest of the ceiling. For days after when you walk in that room your eyes go right to repair, because it doesn't quite match. But then if you gave someone five or ten minutes to find the "different" spot they wouldn't be able to. This is because they're not familiar enough with your ceiling to see the difference.

In the same way obsessed audiophiles are very familiar with their set ups. THey can hear subtle differences. Controlled ABX testing doesn't account for this in depth experience and familiarity. Add in the pressure and fatigue of having to repeat the test over and over and of course it won't show a difference. I don't think I could tell the differernce between a whopper and a big mac if I had to repeat the test 5 times, even with the special sauce. After a while it all tastes the same, even though they don't exactly.

I do agree with what you say here, though you have to know you could ABX gear that youve owned for a long time. Thing is, most audiophiles do hear great, night and day, differences between A and B, sighted. Even when it's new gear. The black A just sounds much more cleaner, with a darker background than the grey B, which sounds more clinical (or what ever). Blindfold them and differences are gone, it is as easy as that. Nothing to do with familiarity.

cupofwrathcom · Dec 2, 2016 at 5:02 AM

Well testing has absolutely debunked many audiophile claims and we can all agree that sighted listening tests are nonsense. But just because bias can be documented it doesn't prove that its all hype. This is some of the false logic that skeptics use all the time. That they can discredit all of something by pointing to some failures. They sort of set the goal post to their own benefit.

It's interesting that in the 80's Carver did have to modify his amp to win the bet against the magazine reviewers. They eventually knew when it was spot on and didn't bother to do blind tests.

Latest Thread Images

Me x3

Member of the Trade: FiiO Store Argentina

watchnerd

Headphoneus Supremus

TheoS53

Headphoneus Supremus

watchnerd

Headphoneus Supremus

castleofargh

Sound Science Forum Moderator

Me x3

Member of the Trade: FiiO Store Argentina

cupofwrathcom

New Head-Fier

spruce music

500+ Head-Fier

cupofwrathcom

New Head-Fier

TheoS53

Headphoneus Supremus

Me x3

Member of the Trade: FiiO Store Argentina

spruce music

500+ Head-Fier

Me x3

Member of the Trade: FiiO Store Argentina

Ruben123

Headphoneus Supremus

cupofwrathcom

New Head-Fier

Users who are viewing this thread