I don't like the Burden of Proof Augument. | Page 10 | Headphone Reviews and Discussion - Head-Fi.org

castleofargh · Jan 2, 2016 at 2:04 PM

mmerrill99 said:
sonitus mirus said:

mmerrill99 said:

I see this use of "nothing" often being used but this isn't true - we listen normally all the time

Click to expand...

99% of the time humans get it right? Where is the percentage coming from?

Click to expand...

In my experience of my day to day encounters with the world - I honestly can't remember the last time I heard something in the real world that I then discovered I heard wrong.

But if the contention is that everybody hears differently then I could be mistaken in extrapolating & generalising my experience of the real world? Although I doubt it

then why treat personal opinions as anecdotale if you can't remember being wrong with them and go give a 99% confidence rate?
and about the "I honestly can't remember the last time I heard something in the real world that I then discovered I heard wrong." well isn't it obvious, you trust nothing else so it's never wrong... a convenient twist in the draconian need for controls. because where did you get that idea that real world experience was accurate or more accurate than you trying a blind test? you have zero evidence of that outside of your own confirmation bias. you start from something not demonstrated and judge everything else from there. isn't that a fallacy?

I have plenty of examples of gears I thought sounded different to me when used casually, but lost most of those oh so clear differences just by using a switch. and sometimes all of those differences went away after I tried some blind test. you have just decided for no rational reason that the blind test must be the wrong result of the 2, and look for ways to justify that opinion. I decided that the test with less variables was probably the more accurate, because that's how it goes when you reduce the number of biases, and decided that I shouldn't be so ready to trust my guts in a sighted evaluation.
so same experience, 2 different conclusions. allegory of the cave IMO.

and for those who think sighted uncontrolled evaluation is ok, I have plenty of arguments like the mcgurk, and anybody well versed in human senses will tell you that vision has priority over hearing. so anytime they get conflicting cues, there is a possibility that the brain will trust the eyes. if you're ok with that, great. but I wouldn't call any sighted evaluation an audio test.
but there really is no need to go that far. just loudness is plenty enough to make a total joke of an uncontrolled evaluation. and if you don't think it has an impact, go ask the radios, the CD industry and all the add companies why they put so much efforts for so many years trying to get their stuff slightly louder than the competition.

but what do they know, they only make a living by guessing what will make people pay and all decided that going louder would be a way.
oh and of course many sound differences went away for me even in a sighted evaluation once I had figured out a way to measure the output and match the loudness of 2 devices. which is nothing more than removing 1 bias!

you're awfully strict about blind test and awfully lenient about casual listening. could that be a bias?

watchnerd · Jan 2, 2016 at 2:08 PM

mmerrill99 said:
But I was asked "Maybe if you had an example of this occurring here" - if you are saying that this doesn't happen here then I guess it's a mute point & you are saying exactly the same as I am - the self-administered ABX tests results are not legit.

But I don't see them being treated as not legit by many here or on other forums - I see them being treated as evidence & "burden of proof", etc. Did ultmusicsnob's ABX results get this treatment - did anybody raise it to a more controlled environment or even suggest this?

Do you have examples of it going to the next, more controlled level of testing with the null ABX results posted on this forum?

On this forum? Why is posting on this forum the benchmark of whether the method is valid? And I have no idea about whatever backstory or drama you're talking about with "ultmusicsnob". Is the whole point of this thread a grudge rant against some dude?

Also, I'm definitely not saying self-administered tests are always illegitimate.

I can absolutely self-administer an ABX test that I can't cheat on using ABX Tester software if the tracks are level-matched.

mmerrill99 · Jan 2, 2016 at 2:33 PM

castleofargh said:
mmerrill99 said:

sonitus mirus said:

mmerrill99 said:

I see this use of "nothing" often being used but this isn't true - we listen normally all the time

Click to expand...

99% of the time humans get it right? Where is the percentage coming from?

Click to expand...

In my experience of my day to day encounters with the world - I honestly can't remember the last time I heard something in the real world that I then discovered I heard wrong.

But if the contention is that everybody hears differently then I could be mistaken in extrapolating

Click to expand...

then why treat personal opinions as anecdotale if you can't remember being wrong with them and go give a 99% confidence rate?
and about the "I honestly can't remember the last time I heard something in the real world that I then discovered I heard wrong." well isn't it obvious, you trust nothing else so it's never wrong... a convenient twist in the draconian need for controls. because where did you get that idea that real world experience was accurate or more accurate than you trying a blind test? you have zero evidence of that outside of your own confirmation bias. you start from something not demonstrated and judge everything else from there. isn't that a fallacy?

I think we are talking about different things - we develop our internal auditory model of the world over our time in it usually experiencing it sighted, not blind. This auditory model of the world is pretty much the same in everybody & thus we all pretty much hear the same things in the same way. How does one explain this? How do we all develop this same internal model when we do it sighted? Do people blind from birth, have a different internal auditory model of the world?

I have plenty of examples of gears I thought sounded different to me when used casually, but lost most of those oh so clear differences just by using a switch. and sometimes all of those differences went away after I tried some blind test.

Ok so a blind test ridded you of the belief that you could hear something different - you could now look at your shiny device & not perceive the sound as different? What changed? Your belief? What if you never had this belief in the first place - do you think you were being influenced by the looks of the device, anyway?Now that you have overcome this influence, could you not achieve this without a blind test or is there something magical about doing a blind test?

you have just decided for no rational reason that the blind test must be the wrong result of the 2, and look for ways to justify that opinion.

Nope, I'm questioning it logically & so far, no evidence has been presented that home-run blind testing has any legit value

I decided that the test with less variables was probably the more accurate, because that's how it goes when you reduce the number of biases, and decided that I shouldn't be so ready to trust my guts in a sighted evaluation.
so same experience, 2 different conclusions. allegory of the cave IMO.

And I'm questioning your contention that blind testing has less variables & you haven't given me any evaluation of what level of false negatives are happening in any blind test - as far as anybody knows, without this control, the test could be returning 100% false negatives ( real, actual differences would need to be at a level far higher than what is being tested for, to be differentiated), you don't know - you are working on a simplistic statement that removing a bias is always good but you fail to recognise that by the very process of removing this bias you may well be introducing far more biases in the opposite direction - you just don't know without some controls.

and for those who think sighted uncontrolled evaluation is ok, I have plenty of arguments like the mcgurk, and anybody well versed in human senses will tell you that vision has priority over hearing. so anytime they get conflicting cues, there is a possibility that the brain will trust the eyes. if you're ok with that, great. but I wouldn't call any sighted evaluation an audio test.
but there really is no need to go that far. just loudness is plenty enough to make a total joke of an uncontrolled evaluation. and if you don't think it has an impact, go ask the radios, the CD industry and all the add companies why they put so much efforts for so many years trying to get their stuff slightly louder than the competition. :wink_face:but what do they know, they only make a living by guessing what will make people pay and all decided that going louder would be a way.
oh and of course many sound differences went away for me even in a sighted evaluation once I had figured out a way to measure the output and match the loudness of 2 devices. which is nothing more than removing 1 bias!

But since when has sighted listening & comparison require that volume isn't matched?

you're awfully strict about blind test and awfully lenient about casual listening. could that be a bias?

It's always interesting how when the potential issues with home run blind testing are pointed out, some people seem to refuse to look at these issues & instead focus on sighted listening & it's potential flaws. I have said already that both are simply anecdotal impressions & some people agree that if you want to raise such self-administered "tests" above the level of parlour games & cargo cult science then you need to be much more rigorous to qualify it as a legit test

watchnerd · Jan 2, 2016 at 2:40 PM

castleofargh said:
but there really is no need to go that far. just loudness is plenty enough to make a total joke of an uncontrolled evaluation. and if you don't think it has an impact, go ask the radios, the CD industry and all the add companies why they put so much efforts for so many years trying to get their stuff slightly louder than the competition.

but what do they know, they only make a living by guessing what will make people pay and all decided that going louder would be a way.

This. +1

upstateguy · Jan 2, 2016 at 8:04 PM

watchnerd said:
I can absolutely self-administer an ABX test that I can't cheat on using ABX Tester software if the tracks are level-matched.

But you can post misrepresented results.

mmerrill99 · Jan 2, 2016 at 8:56 PM

watchnerd said:
mmerrill99 said:

But I was asked "Maybe if you had an example of this occurring here" - if you are saying that this doesn't happen here then I guess it's a mute point

Click to expand...

On this forum? Why is posting on this forum the benchmark of whether the method is valid? And I have no idea about whatever backstory or drama you're talking about with "ultmusicsnob". Is the whole point of this thread a grudge rant against some dude?

Also, I'm definitely not saying self-administered tests are always illegitimate.

Do I have to repeat your line of argument? It went like this:
First you posted - "[COLOR=FF00AA][COLOR=FF00AA]Everyone knows that home ABX testing is for hobbyist and educational reasons, not for submission to a peer-reviewed paper."[/COLOR][/COLOR]

Then you posted - "[COLOR=FF00AA]But anyone who doesn't have an axe to grind and/or remembers their science education knows the proper response is, "Okay, you might have found something interesting. But to know for sure we're going to have to repeat that test a lot more times and more controlled circumstances[/COLOR]."

So I asked you to give examples of such thing ever happening on this forum - an elevation of a home-run ABX test to a more controlled test repeated many times?

You then jump to quoting Harmon & peer-reviewed AES papers - exactly the opposite of what you said in your first point " "Everyone knows that home ABX testing is for hobbyist and educational reasons, not for submission to a peer-reviewed paper."

No problem with ultmisicsnob's ABX results - his threads are just a good example of positive ABX results that show his differentiation of Rb Vs high-res & another thread of positive ABX results differentiating low levels of jitter. In neither of these threads do I see the "proper response" as you stated?

I can absolutely self-administer an ABX test that I can't cheat on using ABX Tester software if the tracks are level-matched.

Of course you can cheat on an ABX test - just hit keys randomly without listening!

watchnerd · Jan 2, 2016 at 9:15 PM

upstateguy said:
But you can post misrepresented results.

Yes, human beings can lie.

This will remain an unsolved problem for eternity.

watchnerd · Jan 2, 2016 at 9:27 PM

mmerrill99 said:
Of course you can cheat on an ABX test - just hit keys randomly without listening!

Then I'll just get random results that, with a large enough sample size, approximate the normal distribution and give a non-discriminatory result.

In other words, if the test is one where most subjects can't distinguish A from B better than chance (i.e. they fail to discriminate) then 'cheating' is going to give the same random result. It doesn't move the needle, statistically. It will look average.

If the test is one where most subjects can distinguish A from B (e.g. 56K MP3 vs lossless), then the results will be worse than average. But it will fall outside of the fat part of the bell curve and look undistinguishable from those who are unlucky, have bad ears, or otherwise did poorer than most.

Aberrant results (cheating or not) is why we have large sample sizes.

mmerrill99 said:
No problem with ultmisicsnob's ABX results - his threads are just a good example of positive ABX results that show his differentiation of Rb Vs high-res & another thread of positive ABX results differentiating low levels of jitter. In neither of these threads do I see the "proper response" as you stated?

I have no idea to who or what posting you're referring to. I can't comment on something that is unknown to me.

mmerrill99 · Jan 2, 2016 at 9:50 PM

watchnerd said:
mmerrill99 said:

Of course you can cheat on an ABX test - just hit keys randomly without listening!

Click to expand...

Then I'll just get random results that, with a large enough sample size, approximate the normal distribution and give a non-discriminatory result.

In other words, if the test is one where most subjects can't distinguish A from B better than chance (i.e. they fail to discriminate) then 'cheating' is going to give the same random result. It doesn't move the needle, statistically. It will look average.

This is the "lie" at the heart of such BX "tests"- we all know that the greater the number of null tests, the more people are persuaded that there is no difference to be heard.

If you are quite happy to submit a result for a "test" which you basically haven't done - you may as well have just posted in a random set of A, B selections without doing any test at all.

This is the same disingenuousness that the Hydrogen audio crew try to pass off as a legit test, too!

watchnerd · Jan 2, 2016 at 9:58 PM

mmerrill99 said:
This is the "lie" at the heart of such BX "tests"- we all know that the greater the number of null tests, the more people are persuaded that there is no difference to be heard.

If you are quite happy to submit a result for a "test" which you basically haven't done - you may as well have just posted in a random set of A, B selections without doing any test at all.

This is the same disingenuousness that the Hydrogen audio crew try to pass off as a legit test, too!

But this is easy to solve by:

1. If one guy can pass the test consistently and repeatedly, it carries huge statistical significance. This reduces the significance of the null result.

2. In a large format study, you can offer a prize for anyone who can consistently pass the test to incentivize effort.

Again, these aren't flaws in the the theory. These are just holes in the execution.

mmerrill99 · Jan 2, 2016 at 10:19 PM

watchnerd said:
mmerrill99 said:

This is the "lie" at the heart of such BX "tests"- we all know that the greater the number of null tests, the more people are persuaded that there is no difference to be heard.

If you are quite happy to submit a result for a "test" which you basically haven't done - you may as well have just posted in a random set of A, B selections without doing any test at all.

This is the same disingenuousness that the Hydrogen audio crew try to pass off as a legit test, too!

Click to expand...

But this is easy to solve by:

1. If one guy can pass the test consistently and repeatedly, it carries huge statistical significance. This reduces the significance of the null result.

As you said yourself - it doesn't carry huge statistical significance "[COLOR=FF00AA]But anyone who doesn't have an axe to grind and/or remembers their science education knows the proper response is, "Okay, you might have found something interesting. But to know for sure we're going to have to repeat that test a lot more times and more controlled circumstances[/COLOR]."

As I said already I've never seen this happen on forum posts of positive ABX results - usually there is a forensic examination & analysis of the posted results & if nothing is found more often than not it is suggested that the testee is dishonest.

If you want to evaluate your contention of the "huge statistical significance" of such reported positive ABX results on forums - you should search for the threads of ultmusicsnob on this & other forums - puts your contentions about this into perspective

2. In a large format study, you can offer a prize for anyone who can consistently pass the test to incentivize effort.

Again, these aren't flaws in the the theory. These are just holes in the execution.

Yes, the home-run execution of ABX blind testing is what is the issue - nobody is saying otherwise although you keep trying, in your replies, to defend it in some way by one minute referring to self-administered ABX tests (for which a large format study does not apply) & then flitting to AES peer reviewed papers

watchnerd · Jan 2, 2016 at 10:35 PM

mmerrill99 said:
If you want to evaluate the "huge statistical significance" of such reported positive ABX results - you should search for the threads of ultmusicsnob on this & other forums

But why bother?

He's not publishing a paper, is he?

We waste enough time here going in rhetorical circles. Who has time go look for even more arguments that don't matter?

mmerrill99 · Jan 2, 2016 at 10:38 PM

watchnerd said:
mmerrill99 said:

If you want to evaluate the "huge statistical significance" of such reported positive ABX results - you should search for the threads of ultmusicsnob on this

Click to expand...

But why bother?

He's not publishing a paper, is he?

We waste enough time here going in rhetorical circles. Who has time go look for even more arguments that don't matter?

So your contention of "huge statistical significance" disappears when you are faced with evidence that you refuse to look at.
Yes, you are going around in circles - all of your own making!

watchnerd · Jan 2, 2016 at 10:54 PM

mmerrill99 said:
So your contention of "huge statistical significance" disappears when you are faced with evidence that you refuse to look at.
Yes, you are going around in circles - all of your own making!

No, it doesn't disappear.

It doesn't matter really matter what the data set is. One guy who can consistently defy the results repeatedly has a huge statistical significance, potentially enough to undo a theory.

Extreme example:

The general scientific consensus is that telekinesis doesn't exist. But if one guy could consistently pick up beer cans and levitate them into the air using only the power of his mind, theories would have to be revised.

Anyway, I get the sense that we're operating on massively different understandings and backgrounds with regarding education and in statistics and engineering. Or at least very different motivations.

If you genuinely don't understand this stuff, I think it's beyond the scope of this thread to explain basic stats and empiricism. There are whole books and college courses devoted to that.

If you genuinely want to learn the state of the art with regard to audio testing from people who do this for a living, I'd recommend joining the AES. Associate memberships are cheap and you get access to the full library.

If you just want to pick a fight with certain people who posted other things on other fora, I have no dog in that fight.

mmerrill99 · Jan 2, 2016 at 11:01 PM

watchnerd said:
mmerrill99 said:

So your contention of "huge statistical significance" disappears when you are faced with evidence that you refuse to look at.

Yes, you are going around in circles - all of your own making!

Click to expand...

No, it doesn't disappear.

It doesn't matter really matter what the data set is. One guy who can consistently defy the results repeatedly has a huge statistical significance, potentially enough to undo a theory.

Extreme example:

The general scientific consensus is that telekinesis doesn't exist. But if one guy could consistently pick up beer cans and levitate them into the air using only the power of his mind, theories would have to be revised.

Anyway, I get the sense that we're operating on massively different understandings and backgrounds with regarding education and in statistics and engineering. Or at least very different motivations.

If you genuinely don't understand this stuff, I think it's beyond the scope of this thread to explain basic stats and empiricism. There are whole books and college courses devoted to that.

If you genuinely want to learn the state of the art with regard to audio testing from people who do this for a living, I'd recommend joining the AES. Associate memberships are cheap and you get access to the full library.

If you just want to pick a fight with certain people who posted other things on other fora, I have no dog in that fight.

Ah, I see you want to try to play the superiority card now - another circle of rhetoric - enough now!

Latest Thread Images

castleofargh

Sound Science Forum Moderator

watchnerd

Headphoneus Supremus

mmerrill99

Member of the Trade: M2 Tech

watchnerd

Headphoneus Supremus

upstateguy

Headphoneus Supremus

mmerrill99

Member of the Trade: M2 Tech

watchnerd

Headphoneus Supremus

watchnerd

Headphoneus Supremus

mmerrill99

Member of the Trade: M2 Tech

watchnerd

Headphoneus Supremus

mmerrill99

Member of the Trade: M2 Tech

watchnerd

Headphoneus Supremus

mmerrill99

Member of the Trade: M2 Tech

watchnerd

Headphoneus Supremus

mmerrill99

Member of the Trade: M2 Tech

Users who are viewing this thread