I don't like the Burden of Proof Augument.
Dec 26, 2015 at 12:58 AM Post #31 of 151
 
   
I didn't have a chance to read through all the posts but to clarify a bit.......
 
The 'issue' as you call it, is exactly what I'm talking about. I'm tired of guys gaming the system with the can't prove a negative cop out.
 
It  has been said that the person making a negative claim cannot logically prove nonexistence. But this is not true in our electronic hobby.  Nonexistence can be proven.
 
When you claim that you can't hear a difference between hi-res and Redbook of the same master, what you're really claiming is that there is no audible difference between the two.
 
And if you claim that you can hear a difference between hi-res and Redbook of the same master, what you're really claiming is that there is an audible difference between the two.
 
In either case a claim has been made with an accompanying a burden of proof.
 
I'll leave the type and validity of the proof to the claim makers.
 
So, for instance, and to provide something of an example, whether you hear a difference between 'your claim here' or not, record the output of each and null it in Audacity or the DiffMaker and post the results.
 
 

 
My point I guess was more about the realities of doing statistical tests and how you get meaningful results out of them. In a world where everyone puts his 100% honest-to-gawd effort into any test, then I'd agree with you to an extent. But that's not our world.

 
We can dance around this verbally all night but if you run samples through a Diffmaker, for instance, or null it yourself in Audacity, the results are going to be the same regardless of whether the claim was positive or negative.  If it's audible something will show up.
 
Dec 26, 2015 at 4:47 AM Post #32 of 151
   
My point I guess was more about the realities of doing statistical tests and how you get meaningful results out of them. In a world where everyone puts his 100% honest-to-gawd effort into any test, then I'd agree with you to an extent. But that's not our world.

 
That's why tests where the likelihood of "gaming" exists must contain positive and negative controls.
 
Dec 26, 2015 at 8:10 AM Post #33 of 151
   
We can dance around this verbally all night but if you run samples through a Diffmaker, for instance, or null it yourself in Audacity, the results are going to be the same regardless of whether the claim was positive or negative.  If it's audible something will show up.

 
I agree on that of course, but the thing is that if it's inaudible things can also show up.
   
That's why tests where the likelihood of "gaming" exists must contain positive and negative controls.

 
Right, which would mean having some way of detection intentional random guessing, which is something not handled by power controls in the simple binomial ABX setup, which expects people to act like dice.
 
Dec 26, 2015 at 11:08 AM Post #34 of 151
I can't hear a difference means exactly that; I can't hear a difference.  It may support the notion that there are no audible differences, but it certainly does not indicate that this is an absolute truth.  Even 10000 people hearing no difference cannot trump 1 person demonstrating that they can.  Nobody is gaming anything.
 
Dec 26, 2015 at 12:03 PM Post #35 of 151
  Quote:db
  I can't hear a difference between hi-res and Redbook of the same master. How would I prove that to you? A failed ABX? That's basically the issue: I can fail the test if I want. Answering "A" for 10 guesses has an extremely low probability that I'll randomly pass. So where does that leave us? Well, we'd need a different test, but then I can probably fail that if I want.

 
I didn't have a chance to read through all the posts but to clarify a bit.......
 
The 'issue' as you call it, is exactly what I'm talking about. I'm tired of guys gaming the system with the can't prove a negative cop out.
 
It  has been said that the person making a negative claim cannot logically prove nonexistence. But this is not true in our electronic hobby.  Nonexistence can be proven.
 
When you claim that you can't hear a difference between hi-res and Redbook of the same master, what you're really claiming is that there is no audible difference between the two.
 
And if you claim that you can hear a difference between hi-res and Redbook of the same master, what you're really claiming is that there is an audible difference between the two.
 
In either case a claim has been made with an accompanying a burden of proof.
 
I'll leave the type and validity of the proof to the claim makers.
 
So, for instance, and to provide something of an example, whether you hear a difference between 'your claim here' or not, record the output of each and null it in Audacity or the DiffMaker and post the results.
 
 
 

I understand that the "prove it, you can't so you're wrong" argument might look like an easy rhetoric on the web. but that's not really what we say.
when someone can't prove his claim, it doesn't demonstrate that he's wrong, it demonstrates that he made a claim without evidence. so it shows the wrongness of claiming stuff we can't prove. not the wrongness of the point made in the claim. the guy may very well hear a change, failing to prove doesn't disprove. it just gives me the incentive not to trust him one way or another.
 
and about "I can't hear a difference", I disagree that I'm claiming there is no audible difference. the test could contain 18khz changes and I can't hear 18khz so I hear no difference. but another guy could. to me it's not even a claim, I'm just stating the result of my experience, I couldn't hear a difference. it's limited to me with my gears and I'm really not trying to impose a consensus.
I understand that people might think of my "no difference" confession as a claim that no difference exists, even more so for people used to misinterpret other people's posts and take anything that disagrees with them as meaning the straight opposite of what they say. just to end up making up fallacies all day long. but then again it's a matter of honesty. just like making claims we aren't really certain of(and that's what making a claim without evidence is).
 
to me you keep seeing everybody as making claims and the call for the burden of proof as an excuse to shut them up. empty claims are a shame and shouldn't be happening in the first place. the burden of proof is a weapon against people talking at the top of their head and making up claims. remove those guys for me, and you will never ever see me bringing up the burden of proof into a topic. it's the defensive consequence of an abuse, not the abuse. if there was no empty claim and no liar, there would be no need for proof. sadly we're in dire need for proof.
 
Dec 27, 2015 at 10:24 AM Post #36 of 151
 
  Quote:db
  I can't hear a difference between hi-res and Redbook of the same master. How would I prove that to you? A failed ABX? That's basically the issue: I can fail the test if I want. Answering "A" for 10 guesses has an extremely low probability that I'll randomly pass. So where does that leave us? Well, we'd need a different test, but then I can probably fail that if I want.

 
I didn't have a chance to read through all the posts but to clarify a bit.......
 
The 'issue' as you call it, is exactly what I'm talking about. I'm tired of guys gaming the system with the can't prove a negative cop out.
 
It  has been said that the person making a negative claim cannot logically prove nonexistence. But this is not true in our electronic hobby.  Nonexistence can be proven.
 
When you claim that you can't hear a difference between hi-res and Redbook of the same master, what you're really claiming is that there is no audible difference between the two.
 
And if you claim that you can hear a difference between hi-res and Redbook of the same master, what you're really claiming is that there is an audible difference between the two.
 
In either case a claim has been made with an accompanying a burden of proof.
 
I'll leave the type and validity of the proof to the claim makers.
 
So, for instance, and to provide something of an example, whether you hear a difference between 'your claim here' or not, record the output of each and null it in Audacity or the DiffMaker and post the results.
 
 
 

I understand that the "prove it, you can't so you're wrong" argument might look like an easy rhetoric on the web. but that's not really what we say.
when someone can't prove his claim, it doesn't demonstrate that he's wrong, it demonstrates that he made a claim without evidence. so it shows the wrongness of claiming stuff we can't prove. not the wrongness of the point made in the claim. the guy may very well hear a change, failing to prove doesn't disprove. it just gives me the incentive not to trust him one way or another.
 
and about "I can't hear a difference", I disagree that I'm claiming there is no audible difference. the test could contain 18khz changes and I can't hear 18khz so I hear no difference. but another guy could. to me it's not even a claim, I'm just stating the result of my experience, I couldn't hear a difference. it's limited to me with my gears and I'm really not trying to impose a consensus.
I understand that people might think of my "no difference" confession as a claim that no difference exists, even more so for people used to misinterpret other people's posts and take anything that disagrees with them as meaning the straight opposite of what they say. just to end up making up fallacies all day long. but then again it's a matter of honesty. just like making claims we aren't really certain of(and that's what making a claim without evidence is).
 
to me you keep seeing everybody as making claims and the call for the burden of proof as an excuse to shut them up. empty claims are a shame and shouldn't be happening in the first place. the burden of proof is a weapon against people talking at the top of their head and making up claims. remove those guys for me, and you will never ever see me bringing up the burden of proof into a topic. it's the defensive consequence of an abuse, not the abuse. if there was no empty claim and no liar, there would be no need for proof. sadly we're in dire need for proof.

 
When you claim that you can't hear a difference, you most certainly are claiming that for you and as far as you're concerned, there is no audible difference.     <---- (I don't mean you personally)
 
OTOH, if you, or someone else (like the 18khz person), did  hear a difference, you, or they, would be claiming that there was an audible difference.
 
Therefore, the burden of proof falls on both positive and negative claims equally.
 
Dec 27, 2015 at 9:35 PM Post #37 of 151
Right, which would mean having some way of detection intentional random guessing, which is something not handled by power controls in the simple binomial ABX setup, which expects people to act like dice.

Indeed, I've seen guys on Hydrogen Audio argue against such controls that might uncover issues of intentional random guessing - a number of them even claimed that once they can't identify any difference in the first couple of trials they don't try for the rest of the 20 or so trials - they just randomly guess for the rest of the trials. They saw nothing wrong with this. To me, this brings into sharp focus one of the major issues with non professionally run blind tests - there's no measure of false positives (sorry should be false negatives) or the statistical power of the test. 
 
Dec 28, 2015 at 7:05 AM Post #38 of 151
  Indeed, I've seen guys on Hydrogen Audio argue against such controls that might uncover issues of intentional random guessing - a number of them even claimed that once they can't identify any difference in the first couple of trials they don't try for the rest of the 20 or so trials - they just randomly guess for the rest of the trials. They saw nothing wrong with this. To me, this brings into sharp focus one of the major issues with non professionally run blind tests - there's no measure of false positives or the statistical power of the test. 


I can't see the problem. If you can distinguish a feature of the recording that sounds different between the two samples then you can attempt to use that feature to discriminate. If you can't isolate such a feature then you're going to be guessing, and whether this is 'intentional' or not is irrelevant. Human decision making is, of course, far from being truly random anyway, and if there were subtle subconcious factors at play they would affect the 'guessing' in a way that would be apparent in a suitably large test.
 
'Gaming' the system would require overt dishonesty - a situation in which the subject can distinguish a disciminatory feature but chooses to disregard it and present choices they realise will appear random. If someone's decided to be dishonest there are a multitude of ways in which they can pervert the test and controlling for that can only be done in very strict circumstances.
 
If we assume the tests are being conducted honestly, then the only real issues are ones of training and fatigue. In cases where there's a subtle, but objectively-verifiable, difference it can take some time to train yourself to hear that difference reliably, and you can often only reproduce that performance for a limited period of time before you tire. Because the ear has a poor memory you then need to retrain after taking a break if you want to regain the prior performance. But let's face it, we're talking about listening to music as a recreational activity, not training sonar operators, and it would be reasonable to argue that testing conditions should mimic the same sort of recreational situation in which the products are going to be used.
 
Dec 28, 2015 at 8:19 AM Post #39 of 151
  When you claim that you can't hear a difference, you most certainly are claiming that for you and as far as you're concerned, there is no audible difference.     <---- (I don't mean you personally)
 
OTOH, if you, or someone else (like the 18khz person), did  hear a difference, you, or they, would be claiming that there was an audible difference.
 
Therefore, the burden of proof falls on both positive and negative claims equally.

except that I say I failed my ABX on my gear, and not that I there is no audible difference between the files. you're making your deduction and I'm not at fault for the claim IMO. but anyway, how do I prove that "claim"? by showing you my abx result? I am still waiting to see people enthusiastically asking for my 52/48% ABX result to be posted.
biggrin.gif
. but of course I can do an abx today  for something, and post it. if it could clear things up for someone, I'll do it.
the same way I don't see legions of people coming at me saying how I should modify my test this way or that way to maybe get more positives and showing to me how they succeeded. constructive skepticism and ideas to improve the tests would be great, we're asking for that all year long. instead the best I get are people saying "ABX is flawed" while they sit on their asses, one finger in the nose, the other hand between their legs, and pretend to know better while offering nothing at all. to those guys, I am not allowed on headfi to say what I'm thinking. so I go with science methods and ask the guy making a claim to put up or shut up. it's a legitimate demand, I didn't force him to come make empty claims all day long. that was his irresponsible decision. and the burden of proof rapidly shows if the guy was just honestly forgetful or ignorant, as we always suggest methods to test for his claim when there are some, we're not the one closing the door and he can go and try.  or if he's here to argue in the sound science subforum that methods are inferior to him sitting in a chair and making stuff up.
 
 
 
 
Right, which would mean having some way of detection intentional random guessing, which is something not handled by power controls in the simple binomial ABX setup, which expects people to act like dice.

Indeed, I've seen guys on Hydrogen Audio argue against such controls that might uncover issues of intentional random guessing - a number of them even claimed that once they can't identify any difference in the first couple of trials they don't try for the rest of the 20 or so trials - they just randomly guess for the rest of the trials. They saw nothing wrong with this. To me, this brings into sharp focus one of the major issues with non professionally run blind tests - there's no measure of false positives or the statistical power of the test. 

1/ availability. we use what we can, know it's not perfect and don't pretend to change the world with the conclusions of a personal blind test. we're only saying how it's more accurate than doing nothing and suggest everybody to try instead of being blinded by all sorts of biases. ultimately you do your own test and draw your own conclusions. it really doesn't have much impact for the world. 
 
2/ looking at my brain activities also has plenty of limitations. testing more than independent stimuli makes things massively complicated to replicate and analyze. with real music, just having the subject to focus on the exact same detail on one instrument several times in a row, that's almost impossible. and do we know enough to tell what part of the data is discarded? because we know most of the data from our senses is scanned for patterns and then mostly discarded so that we can move on to the new data coming in. do we know how to tell what is used and what isn't while playing complex music? seems like a very ambitious challenge.
let's say we end up with something changing, proof that somewhere the brain noticed a difference. if all conscious audible tests fail for that difference, what will you conclude? that it's discarded data? that i doesn't matter? that it is subconscious but does matter? if so how and why doesn't it impact the conscious test? that some audible tests are flawed?
looks to me that anybody making a conclusion over this would just be cherry picking whatever agrees with himself.
 
that said I'd love to see a lot of experiments done with MRI and other techs. there is certainly a lot of other stuff to discover from it and I'm go for any and all control testing.
 
Dec 28, 2015 at 8:44 AM Post #40 of 151
   
I can't see the problem. If you can distinguish a feature of the recording that sounds different between the two samples then you can attempt to use that feature to discriminate. If you can't isolate such a feature then you're going to be guessing, and whether this is 'intentional' or not is irrelevant. Human decision making is, of course, far from being truly random anyway, and if there were subtle subconcious factors at play they would affect the 'guessing' in a way that would be apparent in a suitably large test.

 
Well this is one of the many problems with ABX testing - there is so much confusion around it's use. It can be used for training or it can be used for testing - the divide between these two is somewhat indistinct & as it relies on statistical analysis of the results what trials are included in the results becomes crucial. It would be disingenuous to include training trials in results or indeed results that the participant has consciously decided that he is "purposely randomly guessing". A conscious decision to just guess randomly in such a test is a decision not to participate - it's the equivalent of not doing any test, just submitting a series of random A/Bs - not a result that should be counted as a genuine test result.
'Gaming' the system would require overt dishonesty - a situation in which the subject can distinguish a disciminatory feature but chooses to disregard it and present choices they realise will appear random. If someone's decided to be dishonest there are a multitude of ways in which they can pervert the test and controlling for that can only be done in very strict circumstances.

I consider the above decision as gaming i.e not honestly taking the test. But there are so many ways to screw up the test results, both consciously & unconsciously, that this is exactly why these tests are of no significance unless administered by those trained in perceptual testing methodologies.  
If we assume the tests are being conducted honestly, then the only real issues are ones of training and fatigue. In cases where there's a subtle, but objectively-verifiable, difference it can take some time to train yourself to hear that difference reliably, and you can often only reproduce that performance for a limited period of time before you tire. Because the ear has a poor memory you then need to retrain after taking a break if you want to regain the prior performance. But let's face it, we're talking about listening to music as a recreational activity, not training sonar operators, and it would be reasonable to argue that testing conditions should mimic the same sort of recreational situation in which the products are going to be used.

Yes ABX testing is fraught with many more issues than just honesty, training & fatigue although I agree these are the big ones
 

 
Dec 28, 2015 at 9:27 AM Post #41 of 151
 
except that I say I failed my ABX on my gear, and not that I there is no audible difference between the files. you're making your deduction and I'm not at fault for the claim IMO. but anyway, how do I prove that "claim"? by showing you my abx result? I am still waiting to see people enthusiastically asking for my 52/48% ABX result to be posted.
biggrin.gif
. but of course I can do an abx today  for something, and post it. if it could clear things up for someone, I'll do it.
the same way I don't see legions of people coming at me saying how I should modify my test this way or that way to maybe get more positives and showing to me how they succeeded. constructive skepticism and ideas to improve the tests would be great, we're asking for that all year long. instead the best I get are people saying "ABX is flawed" while they sit on their asses, one finger in the nose, the other hand between their legs, and pretend to know better while offering nothing at all. to those guys, I am not allowed on headfi to say what I'm thinking. so I go with science methods and ask the guy making a claim to put up or shut up. it's a legitimate demand, I didn't force him to come make empty claims all day long. that was his irresponsible decision. and the burden of proof rapidly shows if the guy was just honestly forgetful or ignorant, as we always suggest methods to test for his claim when there are some, we're not the one closing the door and he can go and try.  or if he's here to argue in the sound science subforum that methods are inferior to him sitting in a chair and making stuff up.

 
 
 
1/ availability. we use what we can, know it's not perfect and don't pretend to change the world with the conclusions of a personal blind test. we're only saying how it's more accurate than doing nothing and suggest everybody to try instead of being blinded by all sorts of biases. ultimately you do your own test and draw your own conclusions. it really doesn't have much impact for the world. 

Hold on you just made a claim that personal blind testing is more accurate - can you prove this claim scientifically? Based on the number of people reporting not hearing a difference between 128hbps & original in blind tests, I would conclude the opposite or at least would like to see a statistic relating to this. Nothing wrong with doing a personal blind test but I consider it no different to putting on a different piece of familiar music to check if what I heard holds up - maybe it does, maybe it doesn't - it takes many different pieces of music, listened to over a longer period of time to come to any personal conclusions - one blind test does not sway me one way or the other & I certainly don't ask of others to do what I find to be boring - I listen to their reports of what they heard & evaluate a number of other factors before I decide that it might be worthwhile for me to try it myself   
 
The main problem with all of this is the confusing of personal home blind testing with professionally run, scientifically rigorous & administered  perceptual tests & somehow concluding that they have anything in common, apart from the name.
 
The idea of demanding someone "prove" their anecdotal listening impressions by using one of these home tests (which are just another skewed anecdote) is actually far from objective
2/ looking at my brain activities also has plenty of limitations. testing more than independent stimuli makes things massively complicated to replicate and analyze. with real music, just having the subject to focus on the exact same detail on one instrument several times in a row, that's almost impossible. and do we know enough to tell what part of the data is discarded? because we know most of the data from our senses is scanned for patterns and then mostly discarded so that we can move on to the new data coming in. do we know how to tell what is used and what isn't while playing complex music? seems like a very ambitious challenge.
let's say we end up with something changing, proof that somewhere the brain noticed a difference. if all conscious audible tests fail for that difference, what will you conclude? that it's discarded data? that i doesn't matter? that it is subconscious but does matter? if so how and why doesn't it impact the conscious test? that some audible tests are flawed?
looks to me that anybody making a conclusion over this would just be cherry picking whatever agrees with himself.

Sure, from a layman's perspective testing of brain electrical activity seems very complicated & is only done by experts which is another good reason for using it - there is no possibility that some home administered MEGs can be demanded of someone as proof of their listening experience  
that said I'd love to see a lot of experiments done with MRI and other techs. there is certainly a lot of other stuff to discover from it and I'm go for any and all control testing

.Yea, that's the true spirit of discovery - righton!

 
Dec 28, 2015 at 10:16 AM Post #42 of 151
Originally Posted by mmerrill99 /img/forum/go_quote.gif
 
A conscious decision to just guess randomly in such a test is a decision not to participate - it's the equivalent of not doing any test, just submitting a series of random A/Bs - not a result that should be counted as a genuine test result.

 
The problem is that we can't guess randomly on purpose. The processes that result in pressing the A or B button, like any motor act, are far from random. You could only get close to it by coming into the test with a table of numbers from a pRNG, which would be dishonest.
 
In the situation you describe the participants have stated that they failed to discern any concious way of discriminating between the two samples. If there were a variation that could only be detected subconciously that would show up as a bias in the 'random' responses, and tests like this are certainly capable of demonstrating such a bias. In fact if you're searching for such a bias the less care the subject puts into selecting a repsonse the better.
 
Dec 28, 2015 at 10:44 AM Post #43 of 151
   
The problem is that we can't guess randomly on purpose. The processes that result in pressing the A or B button, like any motor act, are far from random. You could only get close to it by coming into the test with a table of numbers from a pRNG, which would be dishonest.
 
In the situation you describe the participants have stated that they failed to discern any concious way of discriminating between the two samples. If there were a variation that could only be detected subconciously that would show up as a bias in the 'random' responses, and tests like this are certainly capable of demonstrating such a bias. In fact if you're searching for such a bias the less care the subject puts into selecting a repsonse the better.

 
The issue is having enough trials for the differentiation between dishonesty and true inability, especially within the context of a test like ABX where more trials can cause other human issues to crop up. The 10-trial tests that seem to be the online norm are fine with Type I error but have absolutely no power for low true probabilities of detection, but I doubt 10 trials is enough to detect dishonesty is any meaningful way; I guess I should do some math on the subject. All this just highlights the importance of letting real researchers have some $$ to allow meaningful samples sizes.
 
Dec 28, 2015 at 11:35 AM Post #44 of 151
 
Hold on you just made a claim that personal blind testing is more accurate - can you prove this claim scientifically? Based on the number of people reporting not hearing a difference between 128hbps & original in blind tests, I would conclude the opposite or at least would like to see a statistic relating to this. Nothing wrong with doing a personal blind test but I consider it no different to putting on a different piece of familiar music to check if what I heard holds up - maybe it does, maybe it doesn't - it takes many different pieces of music, listened to over a longer period of time to come to any personal conclusions - one blind test does not sway me one way or the other & I certainly don't ask of others to do what I find to be boring - I listen to their reports of what they heard & evaluate a number of other factors before I decide that it might be worthwhile for me to try it myself   

I guess you are right, sighted evaluation can get 100% all the time, as the guy only needs to know how to read and ultimately to agree with himself.
deadhorse.gif

 
 
 "ok I will click on the file that says 24/96 and listen to it, then I will click on the file that says mp3 that I know for a fact is different and inferior in resolution, and try to find out if I can guess that I that I know it". 
do you define this as a test? IDK for you, but the tests I've had to pass in my life were missing that nice part where I get the answer with the question.
biggrin.gif

 
-"now close one eye and read this, it says D K N P W M N.  can you see it? do you need me to repeat it slower?"
cool.gif

 
come on let's be serious.  you're trying to compare a test with a bad joke.
 
Dec 28, 2015 at 1:16 PM Post #45 of 151
 
 
The problem is that we can't guess randomly on purpose. The processes that result in pressing the A or B button, like any motor act, are far from random. You could only get close to it by coming into the test with a table of numbers from a pRNG, which would be dishonest.

I believe you are mixing up two concepts of here - one being what registers as a statistical significant result (95% or whatever significance is decided to be acceptable for the question being posed) & the other being how close can we get to "truly random numbers" generated by algorithms. Two completely separate & distinct notions separated as RRod stated, by the number of trials being run   
In the situation you describe the participants have stated that they failed to discern any concious way of discriminating between the two samples. If there were a variation that could only be detected subconciously that would show up as a bias in the 'random' responses, and tests like this are certainly capable of demonstrating such a bias. In fact if you're searching for such a bias the less care the subject puts into selecting a repsonse the better.

No, it wouldn't show up because, by their own admission on hydrogen audio, they don't even listen to the A/B samples in the following trials - they simply hit a random button each time the trial starts - there is no mechanism whereby the audio samples are influencing the result subconsciously.

 

Users who are viewing this thread

Back
Top