ABX Blind testing - the Ins & outs
Jan 16, 2016 at 6:52 PM Thread Starter Post #1 of 64

mmerrill99

Member of the Trade: M2 Tech
Joined
Mar 2, 2006
Posts
1,233
Likes
46
I have mentioned the ins & outs of ABX blind testing on other threads & used the posts of a previous forum member (ultmusicsnob) as a good example of positive ABX results & a good study of what is involved in conducting a an ABX test that has a hope of positive results i.e that has a hope of hearing audible differences. I suggested that every member interested in the actual workings of ABX testing should read his threads but I'll bring pertinent posts of his here for discussion. He has posted positive ABX results in a number of threads -some involving high-res Vs RB files & some involving jitter. Here's his first thread of such

I'm not interested in debating his findings - I'm much more interested in his description of how he did the ABX test & what the pitfalls are. I find this is a great example of a real ABX test (Not some abstract philosophical discussion) & what works & what doesn't

So, let's start with this first post of his which explains his music source, DAC & headphones
I've been doing ABX tests using the foobar2000 ABX utility.

Taking commercially produced CD's, I upsample them to 192 kHz @ 24 bits.
Then I ABX the original 44.1/16 file against the upsampled one.
The files are converted using Sound Forge 10's included tool from iZotope, their 64-bit SRC.
I've done it with the default settings (filter steepness 32, alias suppression 175) and with the "highest quality" settings on the slider (filter 150, alias 200), with equal success.
I've tested against pop music (MEG, album 'Room Girl', tracks 'G Ballad' and 'Groove Tube'), and against classical (Christopher Parkening performing Albeniz with London Symphony), with the same success.
Here's the truly weird part: I've been using my Beyerdynamic DT 770 Pro's through Schiit Asgard2 for the most part, but today I replicated my results again using cheapo earphone plugs, from the output jack on a plain-jane PC with no soundcard (just built-in motherboard chips)--same success. The plugs provide decent isolation, but that's about it in terms of quality.
One sample result is listed below, I have several others.
http://i.imgur.com/UdWshKh.png

Does anyone else have a successful ABX result to cite for upsampling?
Does anyone have any suggestions for upsampler settings? Filter steeper or gentler? Anti-aliasing? etc

Comments, hearing test challenges welcome.


The files he was using in his test were uploaded & analysed by STV014 & found to have no level issues in them & he uses iZotope SRC to upsample the files

This is his procedure & what he is listening for:
Thank you for the response! I think my preferred decision procedure (below) coincidentally helps protect against cues due to time between playback when switching--I don't use continuous playback, so there's always start/stop/restart, regardless, and my time to restart varies quite a bit. It's not easy to hear these differences, so I also occasionally break for a couple of minutes to let the synapses recover from over-familiarity with the incoming stimuli.

In the foobar2000 ABX interface I always use the option to set a 'Start' and 'End' position, with the player jumping back to the Start point every time, whether repeating a single version, or moving to an alternative (foobar provides A, B, X, Y--I tend to listen to a lot of repeats, generally). So this is not like a recording studio A/B where you might split a signal onto two paths and then seamlessly move back and forth between them while the song keeps playing. foobar2000 will apparently let you repeatedly click back and forth between A and B while the song just continues, but I don't like that method: it compares two different clips, essentially, since if you let the song just keep playing, you're comparing two different musical segments.

Instead, to truly compare precisely the same musical segment formatted two different ways, I UNcheck "Keep playback position when changing track". This way I'm comparing the same few seconds of audio against each other. I use this for both A/B and X/Y comparisons.

I've tried deliberately selecting my X or Y based on a perceived switching time, to *look* for the artifact, but I can't get any results at all that way.

The difference I hear is NOT tonal quality (I certainly don't claim to hear above 22 kHz). I would describe it as spatial depth, spatial precision, spatial detail. The higher resolution file seems to me to have a dimensional soundstage that is in *slightly* better focus. I have to actively concentrate on NOT looking for freq balance and tonal differences, as those will lead you astray every time. I actively try to visualize the entire soundstage and place every musical element in it. When I do that, I can get the difference. It's *very* easy to drift into mix engineer mode and start listening for timbres--this ruins the series every time. Half the battle is just concentrating on spatial perception ONLY.

So we can see that even with this expert listener there is some difficulty in doing the ABX test in a "valid" manner. There is a very strong biasing of the test towards a null result

His first positive posted ABX result is here

The original impulse was discovering the foobar2000 ABX tool in the first place by a skeptic who appeared to believe that users who tried it on their upsampled files would discover that they could not hear a difference, and were thus merely experiencing a placebo effect when listening non-blind. I accepted the challenge, grateful for the tool which I had not heard of before, and have since repeatedly passed the test under an increasingly variety of circumstances.

I think it's likely that indeed the cause of the differences I hear does indeed result from the "performance of the playback chain at 2 different bit depth/rate combinations", and that is indeed my actual listening interest. I had been upsampling for a short time before (just since purchasing SoundForge 10), and felt that the upsampled playbacks, playback chain and all, did sound better subjectively. This series of ABX tests demonstrates that I am hearing a difference that I can identify in blind testing--no placebo effect here.

The scientific question of isolating the pure effects of 192/24 vs Redbook, if that pure isolation is even possible, is interesting, but is not my goal. I want better sound out of the CD's I own, and I can get it by upsampling and playing back through my equipment. Specifically, I hear better spatial detail. I hypothesize that my listening experience (and true, confirmed ability to discern) may be the result of improved temporal resolution during the D/A conversion processes specifically, but I don't have near the circumstances to do anything remotely like proving it rigorously. I need at least one more listener who can tell the files apart, for one thing--right now I'm a sample n of 1, and unless more ears turn up capable of passing ABX under *some* circumstances, playback chain and all, there's never going to be an opportunity of drawing a conclusion about anything except my own personal listening experiences, at best as a single case.

I *have* passed ABX repeatedly on more than one system, and I intend to get to as many different ones as I can. Perhaps they are all the same, but in that case I would simply be able to rest assured that the difference I hear IS robust across different kinds of equipment and different playback chains.
 
Jan 16, 2016 at 7:08 PM Post #2 of 64
Is this really about ABX testing?  I don't have any issues successfully passing an ABX test with 2 different files.  No special training needed.  Maybe the 2 files being tested are so close to the same that there is no real, practical difference between them?  These differences being identified in this particular ABX test appear to be something that would not even be remotely identifiable by anyone if they were not making a conscious effort to listen in an odd way that probably cannot be accomplished by too many of us even with great hearing and a lot of time spent training our ears.
 
Jan 16, 2016 at 7:20 PM Post #3 of 64
In this post he settles on this soundstage size & depth as the "tell" that differentiates the files
http://www.head-fi.org/t/676885/successful-abx-testing-to-hear-the-difference-between-redbook-audio-vs-upsampled-to-192-24/15#post_9710408


[QUOTE]I initially found training my ears to find a difference very difficult. It's *very* easy to go toward listening for tonal changes, which does not help. I get reliable results only when trying to visualize spatial detail and soundstage size, and I tend to get results in streaks. I get distracted by imaginary tonal differences, and have to get back on track by concentrating only on the perceived space and accuracy of the soundstage image.[/QUOTE]

As you read through that thread you will see various possibilities are explored that might explain his results & he passes every one of them - still getting positive ABX results

Again his posts reveal more of his procedure
Originally Posted by stv014 View Post


Are these your only results, or cherry picked best ones, with some worse runs discarded (or the test reset after any early incorrect guesses) ?
Neither.
My usual procedure is to warm up--which mainly means getting my focus and concentration together--with about 50-60 individual rounds.
Once I'm warmed up I can replicate the results you see here indefinitely without discarding.
For data collection I prefer as many trials as my subjects will give me, but on these ABX's I've been just stopping whenever I feel like it.
Here I've posted a range of results ranging from 91% confidence to 99.9%.


The number of trials seems to be odd as well, did you just stop whenever the score was "good" ? They are all within 5% probability of guessing, but also just one more wrong guess would have increased that to 7-11% in all cases. Your total result of 78/117 translates to a chance of 0.02%, but that again assumes that those 117 trials were all you have ever done.
No, I stopped when 1) my kids interrupted me, 2) I got into the 20's number of rounds, 3) I hit a target in the single digits or below 5, or below 1, or whatever I was aiming for. This absolutely is not a rigorous process.


Warm-up is over when I can replicate short-run results. If I can get 5/5, stop, get 3/4, stop, get 5/6, etc., then I'm ready to begin a full series.

This is distinct from trying over and over until a good run happens to turn up, and then just stopping there. I make a decision about warm-up, THEN start a series that is longer.

When I first sit down I'm usually at about 50% chance I was guessing--not as bad as near 100%, but too low to consider myself warmed up.

There were no additional trials between the three above--that went really quick, I did not have to spend as much time on individual rounds as I thought I would.

I *do* take mini-rests of about 30 secs to 2 minutes sometimes, in the midst of one trial. If I have trouble making a choice--that's trouble deciding, BEFORE hitting the choice button, not trouble meaning I guessed wrong on a particular trial--then I stop for a short period to renew my ears.
 
Jan 16, 2016 at 7:36 PM Post #4 of 64
You don't find it odd that in all of the posted results, only a single fail more would invalidate the test as being commonly accepted as statistically insignificant?  Do we all need to ignore tones and listen for obscure spatial details about 50 times so that we can barely pass an ABX test beyond this odd warm-up period?  I just don't understand how this particular example invalidates ABX testing at large and makes it equally as unreliable as sighted evaluations with regards to bias.
 
Jan 16, 2016 at 7:36 PM Post #5 of 64
Even expert listeners have trouble getting a positive ABX result on particular days
Having trouble "warming up" tonight.
I get the occasional 3/3 or even 4/4, but I can't keep it together for a planned run of 20, confidence always falls apart before I put together a run.
These "all in 192/24" trials are very tough, three may be my limit for a day. I'll try again tomorrow.


I don't really have a good vocabulary for it. I **always** start out listening for tonality, meaning frequency spectra, and it **never** works. Of course, that's how one listens when mixing, creating sound designs, working on guitar technique right or left hand, setting up the amp, on and on. But listening for "differences" in files which do *not* differ tonally is a weird idea to begin with, and these all-192s are even less describable than the earlier 44.1 / 192 comparisons.


B versus F: confused_face(1).gif

I may have over-acclimated to all these 192's or something. I did get a run of 14/20 once, but I'm going to call that one a chance fluke, since I've tried so many sets. Eventually you can hit a number that way, just random, but in this case it followed small groups of 3 or 4 that did not indicate I was warmed up, and then I was unable to even remotely replicate it at all. There may be an audible difference somewhere in this signal to be found, but the features I've been looking for previously, I cannot find in this pair. So, white flag on this one. I'm wondering about trying to hear past whatever coloration the Xonar sound card analog Line Out section might contribute (if the Xonar analog after D/A was involved in F).

Completely undermined my judgment. I walked away for awhile and then came back and replicated an earlier round (44.1 versus 192 directly) just to see if it was possible (0.2% chance of guessing, so that was ok).


The rest of the thread analysis & measures various files that ultmusicsnob used & posted positive ABX results for & according to these analyses any differences ar way below audibility.

But that's not the point - the difficulties involved in doing a useful ABX test are well illustrated here

When he does ABX tests on jitter he uses completely different differences - differences he has to firstly find in the audio files (& it takes many attempts) & then he has to try to remain focussed on these differences during the ABX testing
 
Jan 16, 2016 at 7:41 PM Post #6 of 64
Is this really about ABX testing?
Yes, it's trying to demonstrate, using actual ABX tests, what is involved. Why the question?
 I don't have any issues successfully passing an ABX test with 2 different files.  No special training needed.
So what?  
Maybe the 2 files being tested are so close to the same that there is no real, practical difference between them?  These differences being identified in this particular ABX test appear to be something that would not even be remotely identifiable by anyone if they were not making a conscious effort to listen in an odd way that probably cannot be accomplished by too many of us even with great hearing and a lot of time spent training our ears.
So we have a many positive ABX results of files which are downloadable to be tested/listened to by yourself & this is all that you have to say about it - his listening style is odd? Jeez!!
 
Jan 16, 2016 at 7:52 PM Post #7 of 64
You don't find it odd that in all of the posted results, only a single fail more would invalidate the test as being commonly accepted as statistically insignificant?
You are really making this statement in all seriousness?  Show us all the examples that are nearly "fails". Do you even understand the statistics & what the results signify?
Do we all need to ignore tones and listen for obscure spatial details about 50 times so that we can barely pass an ABX test beyond this odd warm-up period?
Wow, you really don't understand any of this. My point was/is exactly that he had already established that he could hear a difference between these files outside of blind listening - without any quirky listening, just general normal listening. However when doing a blind test, this normal style of listening only returns null results & the "obscure spatial details" are the lengths he found were needed to differentiate the files blind & get successful positive result. As I said all along, blind testing requires a substantial expertise & training in order to overcome it's skew towards null results but if you are just interested in obtaining null results then it's easy to achieve this - just don't go to these lengths  
I just don't understand how this particular example invalidates ABX testing at large and makes it equally as unreliable as sighted evaluations with regards to bias.
I am showing what is needed to do a "valid" ABX test that has any hope of a positive result. Readers can decide for themselves just how many have the motivation/dedication & expertise to go to these lengths
 
Jan 16, 2016 at 8:12 PM Post #8 of 64
My point was/is exactly that he had already established that he could hear a difference between these files outside of blind listening - without any quirky listening, just general normal listening. 

 
Normally I would not have an issue with a sighted test matching any ABX test, but in this particular case, with so much warming-up involved and special listening techniques being employed to show statistically acceptable passing ABX tests with only a single fail from making it suspect, I would like to know if the sighted tests are anything more than bias.  The fact that he can easily hear the differences when sighted, but then requires 50-60 warm-ups only to achieve statistically valid passes by a single positive choice in every example shown seems odd to me.
 
It is just this one example.  In other blind testing results, such as from Harmon, I have no issue accepting that 12 specially selected and trained listeners could hear differences where others could not.  But this single person's testing appears to be extraordinarily different and requires more evidence that would qualify it as being valid to me.
 
Jan 16, 2016 at 8:17 PM Post #9 of 64
And these posts on his positive results from jitter testing of files show how his listening had to change to pick up these differences
I have to think it's not about fidelity of the equipment, it's figuring out what to listen for. Listening for jitter is *unlike* other ABX comparisons I've done before. If it helps, I try to imagine the sharpest focus of sound in terms of how "narrow" I can hear the piano attack, as though it were a spatial measure. The narrower attack is 'n'. It is difficult because I'm continually tempted to chase mirages of differences in other details. If I stick to "focus" and "narrow" I get a result.


A following post
Replication. This is just path30jr versus path30n again, to establish the reliability and consistency over more rounds.
Back to the Beyerdynamic 770 Pros for these. I'm listening in the mids, if that helps any. The notes are "shaped" differently not in the bass extension or treble extension, but in the core of the piano attack, where it seems that the 'n' is focused, while the jr has a slightly 'flattened out' aspect. This was listening for the quiet chords right near the end again.

Since the "noise" component was mentioned above, I'll mention that I'm not listening for noise, since that gives a null result--no discernible difference I can detect on that basis.


And the rest of the thread analyses these files with the conclusion that differences should not be audible
 
Jan 16, 2016 at 8:41 PM Post #10 of 64
My point was/is exactly that he had already established that he could hear a difference between these files outside of blind listening - without any quirky listening, just general normal listening. 


Normally I would not have an issue with a sighted test matching any ABX test, but in this particular case, with so much warming-up involved and special listening techniques being employed to show statistically acceptable passing ABX tests with only a single fail from making it suspect, I would like to know if the sighted tests are anything more than bias.  The fact that he can easily hear the differences when sighted, but then requires 50-60 warm-ups only to achieve statistically valid passes by a single positive choice in every example shown seems odd to me.
It's not at all odd - it shows the difficulty of successfully differentiating these audible differences in ABX blind testing

It is just this one example.  In other blind testing results, such as from Harmon, I have no issue accepting that 12 specially selected and trained listeners could hear differences where others could not.  But this single person's testing appears to be extraordinarily different and requires more evidence that would qualify it as being valid to me.
As you said & I pointed out on other threads, the Harmon plot shows that ONLY selected & trained listeners perform to a level of 95% consistency in their blind tests. The next highest group only perform to 35% consistency - that means they are 65% inconsistent i.e. don't actually get the preferences consistent with their previous preferences.

How do you know what listening style or techniques these expert listeners are using? Would you similarly denote them as "obscure"?

I know you are set against any & all evidence put before you & will not be swayed in your "beliefs" but treat the evidence in a somewhat objective & scientific manner.

I'm going to cite another set of positive ABX results which has similarly come in for criticism - Amir's positive ABX results of Arny Kreuger's jangling keys test along with his positive ABX results of Ethan Winers test of AD/Da loopback. In each the creator of the test cited them as "proving" that high-res was not audibly different (in the case of ArnyK) & DAC were transparent (in the case of Winer). They claimed this "proof" because nobody had differentiated or posted positive ABX results for 15 years or so. Then when this "meme" was upset by Amir's positive results they quickly found "problems" with their tests to explain this away.

My takeaway from this is that they discovered a "problem" with the test which they claim accounted for the audibility that Amir uncovered (& his listening was similarly "obscure" as you call it - he is a trained listener, having managed audio development in microsoft) so how come nobody had uncovered this audible difference in all the 15 years of the files being available for test? To me it shows blind testing is heavily skewed towards null results & it takes a great deal of effort/care/motivation/expertise to overcome this bias - the efforts entered into by AltMusicsnob are an example of what's needed & it isn't many who will go to these extremes.

Again don't confuse these efforts to mean that "hey the differences couldn't be that significant if it takes this effort" - that's missing the point completely - the ABX test is such that it requires such level of effort to overcome it's inbuilt skew - anything less than this effort will return a null result because of this skew.

So, yes all the simplistic statements about removing sighted bias "must give more accurate results" have to be evaluated against the effort needed to overcome the false negative skew built into the ABX test. If this effort isn't put into doing the test do you really believe that the resultant null result is a "more accurate result"
 
Jan 16, 2016 at 10:01 PM Post #11 of 64
This post of castleofargh is brought over from another thread so as not to pollute that thread with my answers

 
- 1/  my blind test aren't bringing truth. that's obvious to me, but it's been discussed so I point that out. science and knowledge in general don't bring truth, they try and come closer to it. when I do a sighted evaluation there will be biases and I know I might be influenced by them. so let's say I could be influenced to various degrees by 10 biases. if I match volumes, now I have to deal with 9 biases. whatever the result, I will be one step closer to the truth compared to before. so this is progress and everybody should match the loudness of the stuff they test.

now if I do a simple blind test where I don't know which device I'm listening to, I remove the look of the devices, the colors, the size, the price tag, I remove the preconceptions I had about one of the device from reading reviews or being the owner of it. but let's say I add a bias that is the blind test itself. I end up removing let's say 2 kind of biases, sight and preconceptions, that spread into sub genres or biases. and I'm adding 1, the blind test. so to know if it's beneficial, I would need to find out how much I can be influenced by each bias. 

I don't think the false negative introduces in blind test comes even close to the bias of knowing the price of a device, the look, the preconceptions I have about it etc... I don't have proof of that for my one test I'm doing, and obviously not everybody shares my opinion about that. but I do believe I'm overall getting closer to the truth by doing such a test.
Well this is your "belief" for which you have stated you have no evidence. What I am doing in this thread is showing evidence of just how difficult it is to do a proper ABX blind test. This is part of my evidence that a null result is highly likely because of the inbuilt bias of the actual test towards null results.

The problem with these tests is that the level of false negatives is not known in the results.

I have also given two examples of ABX tests that existed for 15 years without anyone getting a positive result even though there were audible differences in the files - pretty good evidence of the level of false negatives in this sort of testing.

of course it would be even more beneficial to do a real double blind test, but those are hard to organize and do well. and when I'm only trying to get answers for myself, I tend to be a little lazy ^_^.
but if I was to organize something for the community some day, I would very much insist on the test being double blind before drawing any kind of conclusion about the results! this is an important point.  if we cut too many corners in the blind test, it will come a moment where the bias introduced by the blind test may become more significant than the bias it was trying to remove. (again protocol>all)
Well, it's good to see this but I doubt any home based test is suitable to draw conclusions from


- 2/  mmerril99 brought up the idea that if I'm going to be biased in my every day listening and feel that the music is better from the look of the device or something else, then I should take that bias and be happy about it. I totally agree with this, and it is to me what real subjectivity is all about. Steve Eddy was(he's still very alive but banned from headfi:frowning2: ...) one of the true subjectivists I know, he has always been advocating toward including subjective biases that bring joy, as being meaningful. but at the same time if the test was about sound, and not about enjoying a device, then he would never advise to do a sighted evaluation.
objective and subjective sides don't have to be mutually exclusive in real life, but the subjective side should be removed as much as possible when the test we perform is called an audible test. if the test is called "enjoyment of the gear" then I would suggest bringing the biases back in and just do a sighted evaluation.
You have me wrong - I'm suggesting that we listen "normally" which involves all sorts of biases being in play - you & others claim that this biasing influence is unavoidable. My point is that it doesn't matter what the outcome of your blind test is when you listen "normally" your biases will kick in. It's the placebo effect - it's real - it is well known in medical testing - it has a real, physical effect - it's not about preference
 
Jan 17, 2016 at 12:27 AM Post #12 of 64
I have also given two examples of ABX tests that existed for 15 years without anyone getting a positive result even though there were audible differences in the files - pretty good evidence of the level of false negatives in this sort of testing.

what are you talking about? and if nobody is getting positive results from ABX, how do you know there were audible differences??????????? the wife heard it from the kitchen?  
Quote:
 
of course it would be even more beneficial to do a real double blind test, but those are hard to organize and do well. and when I'm only trying to get answers for myself, I tend to be a little lazy ^_^.
but if I was to organize something for the community some day, I would very much insist on the test being double blind before drawing any kind of conclusion about the results! this is an important point.  if we cut too many corners in the blind test, it will come a moment where the bias introduced by the blind test may become more significant than the bias it was trying to remove. (again protocol>all)

Well, it's good to see this but I doubt any home based test is suitable to draw conclusions from

ok but then what? should I make my decision based on gut feeling or throw a coin? I have questions and I'm trying to get some answers. again I don't see the stuff I do as scientifically relevant, I simply see it as probably better than sighted evaluation. and as a matter of fact, sighted evaluation and blind test don't always contradict each other. I just trust blind test more because of the removed biases.
 
 
- 2/  mmerril99 brought up the idea that if I'm going to be biased in my every day listening and feel that the music is better from the look of the device or something else, then I should take that bias and be happy about it. I totally agree with this, and it is to me what real subjectivity is all about. Steve Eddy was(he's still very alive but banned from headfi:frowning2: ...) one of the true subjectivists I know, he has always been advocating toward including subjective biases that bring joy, as being meaningful. but at the same time if the test was about sound, and not about enjoying a device, then he would never advise to do a sighted evaluation.
objective and subjective sides don't have to be mutually exclusive in real life, but the subjective side should be removed as much as possible when the test we perform is called an audible test. if the test is called "enjoyment of the gear" then I would suggest bringing the biases back in and just do a sighted evaluation.

You have me wrong - I'm suggesting that we listen "normally" which involves all sorts of biases being in play - you & others claim that this biasing influence is unavoidable. My point is that it doesn't matter what the outcome of your blind test is when you listen "normally" your biases will kick in. It's the placebo effect - it's real - it is well known in medical testing - it has a real, physical effect - it's not about preference

ok so you're going for "the world I see is my real world". because you will always have the impact of the biases, you decide they are part of the reality of my experience of sound. do I get it right?
but light has a given frequency and I perceive it with my eyes. sound as another range and I perceive it with my ears. it's easy to test that fact, I can close my eyes and now I don't have the information from the light. when I'm talking about the sound I'm hearing, I wish to talk about the stuff that came in my ears, not the one that came in my eyes at the same time. it's really as simple as that.
 
Jan 17, 2016 at 9:39 AM Post #13 of 64
what are you talking about? and if nobody is getting positive results from ABX, how do you know there were audible differences??????????? the wife heard it from the kitchen?

If you can read what I wrote already then you shouldn't need to ask this question. Two challenges were in existence for 15 years & nobody ever posted positive ABX results - carefully prepared High-res Vs Rb files from Arny Kreuger (he claims to have conceived the ABX test) & carefully prepared AD/DA loopback files from Ethan Winer. Both of these people used the fact that there had never been anybody, in 15 years who had posted positive ABX results of their files & both claimed that this proved that - there was no audible difference with high-res (Kreger's claim) - that DACs are transparent (Winer's claim)

Now along comes Amir & does many ABX tests showing positive results. Both Kreuger & Winer now suddenly discover that there were problems in their files & that was what Amir was picking up - audible cues which allowed the files to be differentiated.

So we have from the mouths of the files creators the fact that they made mistakes in the "careful preparation" of the files which allowed them to be differentiated. Yet no one, in 15 years had differentiated them.

I call that good evidence that the ABX test is so skewed/biased towards false negatives that it withstood 15 years of effort to hear differences

ok but then what? should I make my decision based on gut feeling or throw a coin? I have questions and I'm trying to get some answers. again I don't see the stuff I do as scientifically relevant, I simply see it as probably better than sighted evaluation. and as a matter of fact, sighted evaluation and blind test don't always contradict each other. I just trust blind test more because of the removed biases.
Well, seeing as this is the science thread, asking questions & getting answers is the right approach but just because the answers leave you in a dilemma is not good science to reject them. Looking for comfort blankets is not what science is about.

Well, I now see you rowing back on what you said before - you are now saying "sighted evaluation and blind test don't always contradict each other. I just trust blind test more because of the removed biases." whereas before you said "several DACs seem to sound the same once I have matched the loudness. and almost all "sound" different when I only do C (if only for loudness differences, but other biases kick in every time even if I match loudness for C)." I proceeded to ask you "Your claim is that you "other biases kick in every time even if I match loudness" in sighted listening - so you "hear" differences because of these biases." but you didn't answer so I'm confused what you are really saying

ok so you're going for "the world I see is my real world". because you will always have the impact of the biases, you decide they are part of the reality of my experience of sound. do I get it right?
If someone asked me to taste my food with my nose pinched & decide which meal I wanted on that basis but i always ate meal without my nose pinched, I would call them a fool. Are you asking me to always listen to my music with my biases eliminated or what exactly?

but light has a given frequency and I perceive it with my eyes. sound as another range and I perceive it with my ears. it's easy to test that fact, I can close my eyes and now I don't have the information from the light. when I'm talking about the sound I'm hearing, I wish to talk about the stuff that came in my ears, not the one that came in my eyes at the same time. it's really as simple as that.
It's really not that simple. What we perceive is due to the brain's processing, it's pattern matching engine, among other things - in other words it creates the auditory scene we perceive as laid out in front of us - just the same as we create the visual scene that we perceive as being the world in front of us. In creating these scenes the brain's processing involved uses all sorts of tricks to arrive at the scene because the data arriving through each sense is incomplete & inconclusive - the problem is intractable in mathematical terms. It's the equivalent of two sensors being in the corner of a pool sensing the waves that are arriving at that corner & being able to generate a scene of how many people are in the pool, where they are located, their size, their movement, etc And, btw, these sensors have limited storage & limited processing power so they have to do one job at a time, they can't process all waves at the same time. If we think of something we do casually & analyse what's required - listening to one conversation at a party where many conversations are happening simultaneously (the Cocktail Party effect), & translate that using our pool analogy - it's the equivalent of the processing behind the sensors of being able to pick out & group the waves from this one person in the pool from among all the other intermingled waves & following the dynamic changes in this group of waves coming from this one person. It's a mammoth task & one that isn't fully understood in terms of what exact mechanisms are involved

So, in order for this auditory sense to be of any use to us, it has evolved into a mechanism which finds a way to best solve this intractable problem & produce an immediate, moment-to-moment best-fit solution to what we are hearing and/or what we are seeing. Note, it's not an answer, it's just a best-fit to the data signals it is receiving from moment to moment. So our whole perception system is based on trying to deal with these unknowns & make best-fit guesses in very fast time.

So the auditory part of the brain uses rules, experience, guesswork, expectations, sight etc to form an auditory scene that we perceive from moment to moment. Yes, this includes our biases , our expectations, our guesses, in fact anything that will give us as quick an answer to the dilemma that faces auditory perception - "how to make sense of the signals arriving via the auditory cortex"

This whole working basis of auditory perception has to be recognised & it seldom is - people just have the simplistic notion that we have two microphones on the side of our heads & the signal from these microphones is recorded in our brain - voila "hearing"

If you are going to talk about biases, blind testing, this hobby of music enjoyment & do so at a scientific level then the simplistic models of how auditory processing woks need to be left behind & the much more complicated, scientific understanding of the model used (although this is still being researched & teased out)

If you are going to talk about blind testing then realise that you are cutting off a major source of data signals that is normally used as part of the confirmation stage in auditory processing & which we have come to rely on for it's correct functioning. And hence in ABX blind testing we have the situation where we select A but then have doubts & second guess ourselves & end up really not being able to make a reliable decision at all - hence the test returns a null result. All this doubt is a direct result of removing one of the major data signals we use in our auditory processing.

Hence the only way to overcome this doubt is to find an exact "tell" that we can focus on & remain focussed on this tell during blind testing. That's why I use ultmusicsnob's description of how he does the ABX tests, what he is focussed on (& this tell is very difficult to find in the first place), how difficult it is to remain focussed on this one little bit of the sound; the fatigue & boredom which results, etc.

So, you see, it really isn't that simple & that's what I'm trying to put across in this science of sound section - I'm trying to steer people away from simplistic thinking into more sophisticated scientific thinking about these matters. I'm not saying I have the answers but I wanted to open up some possibilities for consideration - possibilities that are based on real scientific research, not simplifications found on forums

Please, let's all get a bit more sophisticated & think about this sense of perception & how it might work - not as a microphone stuck to each side of the head which acts as the front end of a recording device in our brain to which we can "listen" to aspects of willy nilly. The whole area of perception is founded on dealing with "unknowns" & once you realise this, you begin to understand how auditory/visual illusions come about, how listening blind leads to unsurety & second guessing, how blind testing is biased towards null results.

Edit: Oh & btw we can't eliminate bias - some people suggest this is what blind testing is doing - wrong - expectation bias is at the heart of auditory processing as one of the fundamental aspects of it's workings - pattern matching is constantly used to evaluate the signals - in other words based on our lifetimes' experience of the auditory world we have models for how things behave in that world - when a bell is struck we have an expectation of the sound envelope of what will be heard - we anticipate the sound - it is our constant expectation bias - it's what allows us to fill in the missing bits in sound that momentarily get obscured by some noise
 
Jan 17, 2016 at 2:48 PM Post #15 of 64
Arny vs Amir is a saga worth making a TV show of it. ^_^
if all this is true and I don't see why not, I'd call it normal science. you don't look too hard when you fail to demonstrate a difference, but once you do have one, then you can work on identifying it and see if it comes from what was tested of from a protocol mistake.
it's a good thing Amir worked on passing it, and it's a good thing Arny or Ethan could figure out where it was coming from instead of accepting it as the highres difference if it wasn't that.
I see this as progress, and imagine that if many people did try this ABX(because when Arny and Ethan talk, I figure many would take stuff at face value and never test it themselves), and failed, it is a rather small difference and it falls under the accepted error margin. because nobody said ABX was 100% reliable, not even Arny said that.
and maybe Amir has great hearing? maybe he is obsessed with finding out differences so he works more than other to achieve that goal? it's a good example of not taking stuff for granted. I kind of like it.
the question is, would people identify that one difference better in casual listening? and if they did, how would you know? if you explain what the difference sounds like and then give me twice the same file to try, there is a non null possibility that I will believe I heard that difference after it was suggested to me.
 

 
whereas before you said "several DACs seem to sound the same once I have matched the loudness. and almost all "sound" different when I only do C (if only for loudness differences, but other biases kick in every time even if I match loudness for C)." I proceeded to ask you "Your claim is that you "other biases kick in every time even if I match loudness" in sighted listening - so you "hear" differences because of these biases." but you didn't answer so I'm confused what you are really saying
 

lol I knew after writing it that it would come bite me in the ass. ^_^ I went with "several DAC seem to" and "almost all "sound" different", but yeah the "every time" wasn't an accurate description, as it it was, I wouldn't have "almost all sound different", but they would all have sounded different. so my bad I pushed my opinion on this one instead of what is really likely to happen.
 
 
 
If someone asked me to taste my food with my nose pinched & decide which meal I wanted on that basis but i always ate meal without my nose pinched, I would call them a fool. Are you asking me to always listen to my music with my biases eliminated or what exactly?

not at all. it's about assessing the right subject. if we're going for "which meal did you prefer?" then the all experience counts, and restaurants know that better than anybody else.
but if the question is "which tasted better?", answering based on the all experience will have some people answer differently if they tested the 2 meals at home knowing both were made by the wife. now at a restaurant, if one is 3 times the price if the renowned chef came to talk to you and gave you an anecdote about this particular meal that uses some BS stuff from the other side of the planet. many people would feel like the food was better from all those added non gustatory biases.
so the answer wouldn't really be about "which tasted better?" as taste would then be only one of the cues to decide.
to me that's wrong. because we don't deal with the question asked. outside of this, of course I too will enjoy a good service and it will indeed make me enjoy my meal better and possibly make me feel like the food tastes better. and I'm very much the kind of guy who wouldn't come back if the food was good and the service mediocre, so I'm not disregarding "the whole package" of experiences. but to answer "what's that smell", I should have to look around, the question is about smell and to answer it I should smell. it's really a matter of trying to talk about the correct thing. if someone asks me how I found a given restaurant, I wouldn't start blind tests, I would answer fro my global experience including the biases. because that's what I'm being asked about in this case.
when we're discussing sound and aren't actually talking only about sound while pretending we are, I estimate that we're misleading others, because we're not really talking only about sound. if it's made clear than fine, but if it's not and we keep the pretense that what we describe is sound, then IMO we should try to remove as much of what isn't sound as possible.
 
 
And, BTW, castle, I agree with you, ABX home testing has very little to do with science so why is it demanded of people as "proof" that they can hear differences?

I guess because it's an experiment that can be replicated under the same conditions and by other people. and it is falsifiable(if we decide to). so it has the look and the smell
wink_face.gif
of objectivity. even if we're still very much dealing with subjects and senses.
on the other hand sighted evaluation isn't falsifiable and we can't really load other peoples preconceptions into our heads to try and replicate the test. else it becomes a blind test. so we never know if we heard something or if we think we heard it. and others have no mean to verify(without a blind test).
 
and yes we are the result of our experiences and we interpret sounds based on our life and previous experiences. that + anatomy makes us unique individuals so all this is still very much subjective in the end. but isn't it better to deal with sound with what we know about sound instead of what we know about vision, social value of money or how sexually attracted we are toward the seller? I still think sound discussions should be about hearing.
I've read something about how it was being discussed if the jury should still look at the musician interpret while he's playing. because they realized they didn't always have the same verdict when the player "looked" like he was putting his "soul" into the exercise.
and you have a famous Stradivarius blind test to show how the same instrument seem to sound better when we know it is a strad. if we are to enjoy a concert more when we know there is a stradivarius, let go tell the audience there is one in every concert ever played. it's a rather cheap way to improve the experience. but it will not improve audio ^_^.
 

Users who are viewing this thread

Back
Top