ABX testing consensus on the question of audibility | Page 2 | Headphone Reviews and Discussion - Head-Fi.org

maverickronin · May 31, 2015 at 3:18 PM

safulop said:
Well I don't know of a better method, but I know that ABX is expected to miss a certain amount of audible difference because the two different sounds are not "side by side" in a literal sense. You have to present first one and then another, so it becomes like a test of memory. Imagine if we did the same thing with swatches of color. I bet there are numerous pairs of color swatches from the paint store that you "couldn't tell apart" if I showed them to you in the manner of an ABX test - first one and then the other. But you can sure tell the difference when you see them both, and they do have to be butted up against each other as well. Even a few inches of separation and you can lose the ability to distinguish them even while seeing them both at the same time.

I'm thinking about a method for sound comparison that has more of this side-by-side juxtaposition. I'm thinking about trying the change from one sound to the other in the middle of the sound, or maybe changing back and forth several times, but in a way that doesn't cause hard cuts or clicking noises obviously. A colleague of mine has developed a technique for "sound morphing" by which one sound can be transformed into another smoothly, so that might be applicable to this problem. He once changed a cello note smoothly into a cat's meow, that was interesting.

The color swatch analogy breaks down because it's easy to see two things side by side but it's impossible to hear two thing side by side. You can only hear two things at once. The sound morphing idea might be good improvement to the standard fast switch ABX test though.

The other thing you seem to be missing is the specific context in which ABX tests are usually talked about on here. It's to determine if a difference which someone already claims to hear is real or not. It's not just giving a random person 2 samples and asking them to discriminate between them with no other background about what they should be listening for. I wouldn't expect that to go too well either. The context in these circles is something like this.

Audiophile A listens to both $5000 MegaAmp X and $200 CheapoAmp Y sighted. He declares MegaAmp X to be "obviously" superior with "night and day" differences even though they measure identically down to -80, -90, -100db or something similar. Skeptic B arranges an ABX test in which Audiophile A fails to distinguish the "obvious" differences which he previously "heard" when listening sighted. Audiophile A blames switchboxes for making everything sound the same and Skeptic B concludes that no difference between MegaAmp X and CheapoAmp Y have been demonstrated.

Is there something wrong with using ABX testing in this way?

safulop · May 31, 2015 at 3:28 PM

macacodosom said:
A stereo file made of 2 diifferent mono files alternating L/R every 5 seconds? or 2 stereo files made of 2 different mono files, one L-R and the other R-L?
could be a nice experiment....

Yeah, you might be on to something there. Exactly time-syncing would be the biggest challenge with creating the sound files.

MacacoDoSom · May 31, 2015 at 3:35 PM

rrod said:
macacodosom said:

A stereo file made of 2 diifferent mono files alternating L/R every 5 seconds? or 2 stereo files made of 2 different mono files, one L-R and the other R-L?
could be a nice experiment....

Click to expand...

Something like that, if it didn't drive you absolutely bonkers.

maybe it would be nice to compare MP3 vs whatever or whatever with whatever for people that has doubts about whatever, the only problem I can see it's the lack of stereo image... but maybe it could work with 2 stereo files in a quad configuration ( a 5.1 setup without the center and the sub woofer, maybe) and one could be spinning in the center of the room looking for changes...or differences, it would be like comparing something side by side, wouldn't it?

safulop · May 31, 2015 at 3:36 PM

maverickronin said:
The color swatch analogy breaks down because it's easy to see two things side by side but it's impossible to hear two thing side by side. You can only hear two things at once. The sound morphing idea might be good improvement to the standard fast switch ABX test though.

The other thing you seem to be missing is the specific context in which ABX tests are usually talked about on here. It's to determine if a difference which someone already claims to hear is real or not. It's not just giving a random person 2 samples and asking them to discriminate between them with no other background about what they should be listening for. I wouldn't expect that to go too well either. The context in these circles is something like this.

Audiophile A listens to both $5000 MegaAmp X and $200 CheapoAmp Y sighted. He declares MegaAmp X to be "obviously" superior with "night and day" differences even though they measure identically down to -80, -90, -100db or something similar. Skeptic B arranges an ABX test in which Audiophile A fails to distinguish the "obvious" differences which he previously "heard" when listening sighted. Audiophile A blames switchboxes for making everything sound the same and Skeptic B concludes that no difference between MegaAmp X and CheapoAmp Y have been demonstrated.

Is there something wrong with using ABX testing in this way?

Well I don't really want to engage too much with the debates like "can you hear how much my $1000 cables are helping my system". I agree that ABX testing is still useful for determining whether a certain difference is really as "night and day" as is claimed. But I'm more interested in seriously researching the question of, what is our best metric for audibility? I'd really like to develop a test that is a proven improvement upon ABX testing. Even Arny said he would welcome that, and he has been a proponent of ABX testing for a long time.

One reason I came to this was that I had trouble detecting clear "night and day" differences using ABX tests, and I do mean there were night and day differences. Like, I had trouble detecting a significant change in the spectral profile between two different masters of the same song. I thought, wow this is crazy, I should easily be able to hear the spectrum change that much. And then I got to thinking, "I bet I would hear it more easily if I changed the spectrum in the middle of the song, instead of having to switch them."

maverickronin · May 31, 2015 at 4:14 PM

safulop said:
Well I don't really want to engage too much with the debates like "can you hear how much my $1000 cables are helping my system". I agree that ABX testing is still useful for determining whether a certain difference is really as "night and day" as is claimed. But I'm more interested in seriously researching the question of, what is our best metric for audibility? I'd really like to develop a test that is a proven improvement upon ABX testing. Even Arny said he would welcome that, and he has been a proponent of ABX testing for a long time.

One reason I came to this was that I had trouble detecting clear "night and day" differences using ABX tests, and I do mean there were night and day differences. Like, I had trouble detecting a significant change in the spectral profile between two different masters of the same song. I thought, wow this is crazy, I should easily be able to hear the spectrum change that much. And then I got to thinking, "I bet I would hear it more easily if I changed the spectrum in the middle of the song, instead of having to switch them."

It would be great if there was an even more sensitive test., it's just AFIK fast switch ABX is currently the most sensitive protocol. People usually do even worse in long term, slow switch listening tests.

Also, "can you hear how much my $1000 cables are helping my system" is usually the question around these parts. Even if you came up with a more sensitive test that demonstrated the audibility of something previously thought to be inaudible some 'audiophiles' would still dismiss it because it doesn't prove the effectiveness of their favorite magic idol.

I think we're just coming at it from different angles. You're brainstorming about some real scientific research. A lot of the people on the sound science forum here are trying to warn newbies away from snake oil.

arnyk · May 31, 2015 at 4:48 PM

safulop said:
Well I don't really want to engage too much with the debates like "can you hear how much my $1000 cables are helping my system". I agree that ABX testing is still useful for determining whether a certain difference is really as "night and day" as is claimed. But I'm more interested in seriously researching the question of, what is our best metric for audibility? I'd really like to develop a test that is a proven improvement upon ABX testing. Even Arny said he would welcome that, and he has been a proponent of ABX testing for a long time.

One reason I came to this was that I had trouble detecting clear "night and day" differences using ABX tests, and I do mean there were night and day differences. Like, I had trouble detecting a significant change in the spectral profile between two different masters of the same song. I thought, wow this is crazy, I should easily be able to hear the spectrum change that much. And then I got to thinking, "I bet I would hear it more easily if I changed the spectrum in the middle of the song, instead of having to switch them."

Large differences like those due to remastering are usually easy to ABX if things are set up right and the listener is well-trained. I do't know what went wrong for you, maybe we can troubleshoot your test. Can you make the actual files you compared avalable to me, perhaps via Dropbox or something?

Meanwhile there is a set of "Can't fail" tests related to interchannel delays based on listening to some synthetic sounds at http://www.hydrogenaud.io/forums/index.php?showtopic=107570&view=findpost&p=899713
you can evaulate yourself and your setup with. If you can't hear the difference among the "27 sample file" and the "0 sample delay" file then something is seriously wrong. If you get that one 12/16 or better right, move on to smaller delays and let's see how you do.

safulop · May 31, 2015 at 7:09 PM

Well, I could hear differences using ABX methods, but it just didn't seem as obvious as I thought it should. So maybe I am mistaken, and maybe if I devise this whole "sound morphing" method or whatever it actually won't make any difference in the results. But it will be interesting to check into it, if I end up going seriously into it. In my research I have often focused on measurement methods, though in the past this has concerned mostly spectrum measurement, phase measurement etc. I have not delved into perceptual psychophysics before, so this will be a steep learning curve. But at least I know where I could drum up twenty subjects to do the listening

arnyk · May 31, 2015 at 7:22 PM

safulop said:
Well, I could hear differences using ABX methods, but it just didn't seem as obvious as I thought it should. So maybe I am mistaken, and maybe if I devise this whole "sound morphing" method or whatever it actually won't make any difference in the results. But it will be interesting to check into it, if I end up going seriously into it. In my research I have often focused on measurement methods, though in the past this has concerned mostly spectrum measurement, phase measurement etc. I have not delved into perceptual psychophysics before, so this will be a steep learning curve. But at least I know where I could drum up twenty subjects to do the listening

If by sound morphing you mean cross-fading, then ABX comparators that use cross fading for the transitkions have already been built, and the current release of Foobar2000's ABX comparator has it.

safulop · May 31, 2015 at 7:35 PM

I will check out the cross-fading comparator. However, I was thinking of a scheme of the following sort. Suppose I create a sound file of a song or whatever, and at a particular point, or several, I substitute a segment of a different sound file. Perhaps it has a different spectral profile, perhaps it is a different encoding, whatever. The trick is, the spliced sound file needs to sound absolutely seamless, so that if you truly can't hear the difference in the audio, you will not even notice a change has been made. So then instead of an "ABX" testing procedure, you would simply play the sound file for the subject and ask them to press a key when or if they heard any change in the nature of the sound. It would then literally be a question of detecting a difference, rather than what is, in my view, a more involved task of matching a third example to one of another pair. The results would be fairly straightforward as well, since if the subject hit the key just for the sake of hitting it, it would almost certainly be at the wrong point in time. So "chance" performance would be more like 0% accuracy in this protocol, rather than 50%.

limpidglitch · May 31, 2015 at 7:35 PM

arnyk said:
If by sound morphing you mean cross-fading, then ABX comparators that use cross fading for the transitkions have already been built, and the current release of Foobar2000's ABX comparator has it.

I believe this is what he's talking about: http://www.cerlsoundgroup.org/Kelly/soundmorphing.html

I'm not so sure how applicable it will be. The motivation of Kelly seems to make sounds with differing harmonics blend seamlessly together, but for our use I fear the extra processing might just be another confounding variable, and it should really be necessary anyway as the sounds we want to compare will have the harmonics all aligned. If they weren't, then we wouldn't need something more sensitive than regular ABX to tell them apart.

arnyk · May 31, 2015 at 7:48 PM

limpidglitch said:
I believe this is what he's talking about: http://www.cerlsoundgroup.org/Kelly/soundmorphing.html

I'm not so sure how applicable it will be. The motivation of Kelly seems to make sounds with differing harmonics blend seamlessly together, but for our use I fear the extra processing might just be another confounding variable, and it should really be necessary anyway as the sounds we want to compare will have the harmonics all aligned. If they weren't, then we wouldn't need something more sensitive than regular ABX to tell them apart.

Thanks, that helps!

It turns out that even the use of cross fading is controversial on the grounds that it hides differences that are more audible if the transition is a simple splice.

A big click at the transition is generally distracting but many people do well when there is a small click or about 20-100 milliseconds where the sound drops out.

One other thing is that if the processing is too complex the change-over gets messy to implement as you seem to be suggesting.

I think that the fact that we can generally ABX differences that are well below thresholds developed by other means says that the current process is really pretty good. ABX is better today then it was in 1977 or even 1997. The people who have enhanced it the most are those who used it for a fair while and then started implementing and popularizing their changes.

Most suggestions by newbies turn out to be red herrings. It is that way in many other technical areas.

limpidglitch · May 31, 2015 at 7:57 PM

The idea of experimenting with fading in and out caught my attention.
I made up some test files, three of them in fact, by merging together lossy and lossless versions of the same song, each with the same, but inverse, tremolo applied. The period is 0.5Hz, and the overlap of each cycle is fairly short. It might be worthwhile to tweak the parameters.
For convenience I just used plain LAME CBR. I had some trouble with the levels of the 160kbps version, so decided to normalize them all to -15 LUFS. But anyways, more a proof of concept than a real test.
If anyone want to try/fool around with something similar I used a freebie AU/VST plugin from MeldaProduction called MTremolo to do the work.

Anyway, here's the files.

sonitus mirus · May 31, 2015 at 8:05 PM

arnyk said:
Large differences like those due to remastering are usually easy to ABX if things are set up right and the listener is well-trained. I do't know what went wrong for you, maybe we can troubleshoot your test. Can you make the actual files you compared avalable to me, perhaps via Dropbox or something?

Meanwhile there is a set of "Can't fail" tests related to interchannel delays based on listening to some synthetic sounds at http://www.hydrogenaud.io/forums/index.php?showtopic=107570&view=findpost&p=899713
you can evaulate yourself and your setup with. If you can't hear the difference among the "27 sample file" and the "0 sample delay" file then something is seriously wrong. If you get that one 12/16 or better right, move on to smaller delays and let's see how you do.

Just for fun, I tried to ABX the linked files. I was able to get 16/16 with the "27 sample file", but it was much tougher than I was expecting it to be.

With the "15 sample file" I was doing ok (see log below), but I'm certain I would get a null result with any of the other test files.

foo_abx 2.0.1 report
foobar2000 v1.3.7
2015-05-31 19:47:14
File A: Impulses shift 0 samples 2klp norm 4416 .flac
SHA1: 8fc00a4bb6a1bb0a66ec5c83cfaa36f9d8fddd13
File B: Impulses shift 9 samples 2klp norm 4416 .flac
SHA1: 25d43a1144f0c53d53c43814b69c01073851a387
Output:
WASAPI (event) : Speakers (USB Modi Device), 24-bit
Crossfading: NO
19:47:14 : Test started.
19:49:38 : 01/01
19:49:49 : 01/02
19:50:26 : 02/03
19:50:59 : 02/04
19:51:31 : 03/05
19:51:39 : 04/06
19:52:11 : 05/07
19:52:25 : 06/08
19:52:37 : 07/09
19:53:29 : 08/10
19:54:18 : 09/11
19:55:52 : 10/12
19:56:35 : 10/13
19:57:04 : 11/14
19:57:25 : 12/15
19:57:56 : 13/16
19:57:56 : Test finished.
----------
Total: 13/16
Probability that you were guessing: 1.1%
-- signature --
294dd8f2f354959295123e2f88042f17467c7bae

arnyk · May 31, 2015 at 8:11 PM

sonitus mirus said:
Just for fun, I tried to ABX the linked files. I was able to get 16/16 with the "27 sample file", but it was much tougher than I was expecting it to be.

With the "15 sample file" I was doing ok (see log below), but I'm certain I would get a null result with any of the other test files.

foo_abx 2.0.1 report
foobar2000 v1.3.7
2015-05-31 19:47:14
File A: Impulses shift 0 samples 2klp norm 4416 .flac
SHA1: 8fc00a4bb6a1bb0a66ec5c83cfaa36f9d8fddd13
File B: Impulses shift 9 samples 2klp norm 4416 .flac
SHA1: 25d43a1144f0c53d53c43814b69c01073851a387
Output:
WASAPI (event) : Speakers (USB Modi Device), 24-bit
Crossfading: NO
19:47:14 : Test started.
19:49:38 : 01/01
19:49:49 : 01/02
19:50:26 : 02/03
19:50:59 : 02/04
19:51:31 : 03/05
19:51:39 : 04/06
19:52:11 : 05/07
19:52:25 : 06/08
19:52:37 : 07/09
19:53:29 : 08/10
19:54:18 : 09/11
19:55:52 : 10/12
19:56:35 : 10/13
19:57:04 : 11/14
19:57:25 : 12/15
19:57:56 : 13/16
19:57:56 : Test finished.
----------
Total: 13/16
Probability that you were guessing: 1.1%
-- signature --
294dd8f2f354959295123e2f88042f17467c7bae

We have a report from a user who has been practicing with the files and has sucessfully ABXed all but the last. The duration of a "sample" was 5.5 uSec so he bottomed out at 11 uSec which is consistent or better than other published thresholds.

I personally had no problems with the first set, but previous sets had a lot fewer timing options, and the ones that existed then were too tough for me the first one.

I also was suffering from a head cold and am 68 with the kind of hearing one expects from a 68 year old male. I am currently in a situation where I can't do the test again but hopefully that will change soon.

anetode · May 31, 2015 at 8:14 PM

There are several other avenues to pursue. Neuroimaging could prove useful: hook someone up to an fMRI during a randomized AB, whether soundmorphed or back to back. Establish thresholds for patterns of activity which occur when the test subject senses distortion. Unfortunately this approach is prohibitively expensive, but there's an increasing amount of neuroimaging studies which concern the perception of audio, or at least aspects of it (pitch discrimination, rhythm retrieval).

Alternately save yourself the trouble and see if there's a measurable difference beyond experimentally established thresholds.

Latest Thread Images

maverickronin

Headphoneus Supremus

safulop

100+ Head-Fier

MacacoDoSom

100+ Head-Fier

safulop

100+ Head-Fier

maverickronin

Headphoneus Supremus

arnyk

Repeatedly defended arguments with personal attacks.

safulop

100+ Head-Fier

arnyk

Repeatedly defended arguments with personal attacks.

safulop

100+ Head-Fier

limpidglitch

Headphoneus Supremus

arnyk

Repeatedly defended arguments with personal attacks.

limpidglitch

Headphoneus Supremus

sonitus mirus

Headphoneus Supremus

arnyk

Repeatedly defended arguments with personal attacks.

anetode

Headphoneus Supremus

Users who are viewing this thread