Successful ABX testing to hear the difference between Redbook Audio vs upsampled to 192/24
Aug 17, 2013 at 4:41 PM Post #46 of 136
Okay, here's the first round of result:
http://imgur.com/0hdsmzW
 
It IS a different experience of A/B-ing. Hard to describe.
Even more difficult than before to avoid mirages of EQ and frequency balance.
Still a difference in soundstage depth, but not necessarily in the same way. "Aura"??? (bigger in my 192). Sorry to do such a poor job of description. Questions welcome.
 
This test compares an iZotope SRC-converted 44.1 file (mine) to a 192_24 encoding capture of the playback of a 44.1 file (stv's).
 
On the one hand, this removes the effects of any difference purely on my end in DAC handling of differing playback rates at playback time--because both in this test are in 192_24.
On the other hand, it adds two new factors---not that they are known to have effects, just that these are differing experimental conditions:
     the DAC processing on stv014's device to send the 44.1 file to analog playback,
plus,
     the effect of A/D on stv014's 192-encoding capture.
 
Keep this up, I'm gonna need a matrix to sort out the cause-effect possibilities...
 
Aug 17, 2013 at 5:00 PM Post #47 of 136
This is just me keeping track:
 
1 original source: CD Redbook Audio
 
List of treatments:
  • rip to .wav (shouldn't have an effect if the rip was properly executed--ripped in SoundForge 10).
  • ripped ,wav resampled to 192 (by UltMusicSnob) - iZotope 64-bit SRC
  • same ripped .wav played back analog (by stv014)
  • played back analog through my Babyface at 44.1 (the original rip)
  • played back through my Babyface at 192 (my resample of original rip)
  • played back through stv014's device at 44.1 (the original rip)
  • stv014's playback of original rip captured and encoded at 192/24
  • foobar2000 ABX plugin and playback---upstream of Babyface, foobar2000 uses WASAPI
 
There are potential differences at any point here, least likely would be 1.
 
My original ABX was 1 - 8 - 4  versus 1 - 2 - 8 -  5
 
This latest test above was 1 - 2 - 8 - 5 versus 1 - 6 - 7 - 8 - 5
 
Aug 17, 2013 at 5:09 PM Post #48 of 136
Ugh you really shouldn't do that many trials.
 
A common suggestion is to keep it below 25. Try to stick around ~16 trials.
If you want to use a strict type I error of 1% you'd need 13 out of 16, for 5% only 12.
 
Aug 17, 2013 at 5:18 PM Post #49 of 136
Quote:
Ugh you really shouldn't do that many trials.
 
A common suggestion is to keep it below 25. Try to stick around ~16 trials.
If you want to use a strict type I error of 1% you'd need 13 out of 16, for 5% only 12.


Okay, I'll go around again. Generally in statistical tests I'm looking for a higher sample n.
 
I find this advice online: "The company QSC, in the ABX Comparator user manual, recommended a minimum of ten listening trials in each round of tests.[2]
Results required for a 95% confidence level:[3][4]
Number of trials 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Minimum number correct 9 9 10 10 11 12 12 13 13 14 15 15 16 16 17 18
QSC recommended that no more than 25 trials be performed, as listener fatigue can set in, making the test less sensitive (less likely to reveal one's actual ability to discern the difference between A and B).[2] However a more sensitive test can be obtained by pooling the results from a number of such tests using separate individuals or tests from the same listener conducted in between rest breaks. For a large number of total trials N, a significant result (one with 95% confidence) can be claimed if the number of correct responses exceeds
f964acf834f7ae3dfae21a8264f63ad4.png
. Important decisions are normally based on a higher level of confidence, since an erroneous "significant result" would be claimed in one of 20 such tests simply by chance." http://en.wikipedia.org/wiki/ABX_test#Confidence
 
It should be noted that if the reasoning on 25 tests is about listener fatigue as described above, then I actually took a **harder** test, doing 68 rounds--I obtained a result despite the risk of listener fatigue over many tests. On their calculation I would need 42 positives for a "significant" result, which I missed by 1---that's also consistent with foobar's calculation, since I was over 5% by a small margin.
 
Aug 17, 2013 at 5:25 PM Post #50 of 136
I think I'm going to take a look and maybe listen at the files later, but just a quick note...
 
Statistically speaking, if you're looking at the number while testing and stop once reaching 5% (i.e. 95% "confidence") or some other criteria, you're running a whole lot of unprotected comparisons. In that case, the percentage you get when stopping is off somewhat. Pick a fixed number (I'd say no more than 20-25 for sanity / fatigue) and stop once it's over.
 
Aug 17, 2013 at 5:29 PM Post #51 of 136
Okay, these are just for replication. Same test as latest above, <= 25 trials.
    
 
Now it's not so hard. Maybe I just had to get focused. "Larger / deeper soundstage", that's my subjective sense.
 
Aug 17, 2013 at 5:55 PM Post #52 of 136
Fair enough.
 
Since the resampled track has clipped samples, wouldn't it be better to attenuate the 44.1 kHz file a bit and then do the resampling/recording again? Different D/As (and A/Ds) deal differently with clipping, maybe even differently at different sample rates.
 
Aug 18, 2013 at 6:43 AM Post #53 of 136
Quote:
Since the resampled track has clipped samples, wouldn't it be better to attenuate the 44.1 kHz file a bit and then do the resampling/recording again? Different D/As (and A/Ds) deal differently with clipping, maybe even differently at different sample rates.

 
Actually, it was not clipped on recording (and even if it was, the ADC of the Xonar STX handles signals at or near the clipping level very well), only when it was scaled in software to match the level to the "original" 192/24 file, which was clipped as well. But I will create a few new files:
- resampled 192/24 format version without clipping (-1 dB gain), with a lowpass filter approximating the OP's resampler
- re-recorded Xonar D1 playback (unfortunately I deleted the non-clipped WAV file already)
- software upsampling with a similar lowpass filter to that of the Xonar D1 (which is a minimum phase filter with 3 dB attenuation at Fs/2)
- Xonar STX recording - it has a linear phase filter
 
Aug 18, 2013 at 6:47 AM Post #54 of 136
Quote:
Now it's not so hard. Maybe I just had to get focused. "Larger / deeper soundstage", that's my subjective sense.

 
Are these your only results, or cherry picked best ones, with some worse runs discarded (or the test reset after any early incorrect guesses) ? The number of trials seems to be odd as well, did you just stop whenever the score was "good" ? They are all within 5% probability of guessing, but also just one more wrong guess would have increased that to 7-11% in all cases. Your total result of 78/117 translates to a chance of 0.02%, but that again assumes that those 117 trials were all you have ever done.
 
Aug 18, 2013 at 9:49 AM Post #55 of 136
Quote:
 
Are these your only results, or cherry picked best ones, with some worse runs discarded (or the test reset after any early incorrect guesses) ?
Neither.
My usual procedure is to warm up--which mainly means getting my focus and concentration together--with about 50-60 individual rounds.
Once I'm warmed up I can replicate the results you see here indefinitely without discarding.
For data collection I prefer as many trials as my subjects will give me, but on these ABX's I've been just stopping whenever I feel like it.
Here I've posted a range of results ranging from 91% confidence to 99.9%.
 
The number of trials seems to be odd as well, did you just stop whenever the score was "good" ? They are all within 5% probability of guessing, but also just one more wrong guess would have increased that to 7-11% in all cases. Your total result of 78/117 translates to a chance of 0.02%, but that again assumes that those 117 trials were all you have ever done.
No, I stopped when 1) my kids interrupted me, 2) I got into the 20's number of rounds, 3) I hit a target in the single digits or below 5, or below 1, or whatever I was aiming for. This absolutely is not a rigorous process.

Doing a series of trials with 10 in every one only, or 20, etc is a good idea. I'll try that out, thanks for the tips.
 
At this point, my main interest is finding other testers who can ABX the difference between 44.1/16 and 192/24. I've had several responses in other places (not Head-Fi) of the type: "That's no big deal, you're just hearing....".    Well, if that's the case, then take my files and replicate. Instead, they either report a null on their end, or just withdraw from the conversation. I understand that there are good reasons my results would be considered unexpected, depending on what people think they know or believe about audio. But if there's the possibility of getting a better result, I'd also expect music lovers to chase that down pretty tenaciously. Hopefully someone will work hard enough to hear the difference and pass the tests under *some* conditions.
 
 
Aug 18, 2013 at 10:16 AM Post #56 of 136
How do you decide when the "warm up" is over ? Is it a pre-determined number of trials, or whenever you happen to get good results ?
 
Were there any additional trials ("warm up" or anything) between the second set of tests you posted (those with 21, 11, and 17 trials) ?
 
Aug 18, 2013 at 11:50 AM Post #57 of 136
I've tried it before (but not with these samples) and not succeeded. So sorry... maybe if I have more time and motivation.
 
Now where's everybody else that says they can (easily) hear the difference in ____?
 
Aug 18, 2013 at 12:48 PM Post #58 of 136
Quote:
How do you decide when the "warm up" is over ? Is it a pre-determined number of trials, or whenever you happen to get good results ?
 
Were there any additional trials ("warm up" or anything) between the second set of tests you posted (those with 21, 11, and 17 trials) ?

 

Warm-up is over when I can replicate short-run results. If I can get 5/5, stop, get 3/4, stop, get 5/6, etc., then I'm ready to begin a full series. 
 
This is distinct from trying over and over until a good run happens to turn up, and then just stopping there. I make a decision about warm-up, THEN start a series that is longer.
 
When I first sit down I'm usually at about 50% chance I was guessing--not as bad as near 100%, but too low to consider myself warmed up.
 
There were no additional trials between the three above--that went really quick, I did not have to spend as much time on individual rounds as I thought I would.
 
I *do* take mini-rests of about 30 secs to 2 minutes sometimes, in the midst of one trial. If I have trouble making a choice--that's trouble deciding, BEFORE hitting the choice button, not trouble meaning I guessed wrong on a particular trial--then I stop for a short period to renew my ears.
 
Aug 18, 2013 at 2:41 PM Post #59 of 136
OK, here is a set of new files, this time in FLAC format to reduce the download size. You can convert them to WAV first if you want to. The files are now attenuated by 0.63 dB to avoid clipping. I used the following command to simulate the "original" 192/24 file with attenuation:
Code:
 resample.exe -k 0.000002923047 -r 192000 -f 3 -ff 0.5 -fw 0 -fl -2400 -g 0.93 Test_File_Foobar_Redbook.wav test_192k.wav
I do not reveal the identity of all the files yet, so only listen, and do not analyze them.
normal_smile .gif
However, B.flac is the 192/24 resampled file, so you can compare the others against that one.
 
Aug 18, 2013 at 2:54 PM Post #60 of 136
Quote:
OK, here is a set of new files, this time in FLAC format to reduce the download size. You can convert them to WAV first if you want to. The files are now attenuated by 0.63 dB to avoid clipping. I used the following command to simulate the "original" 192/24 file with attenuation:
Code:
 resample.exe -k 0.000002923047 -r 192000 -f 3 -ff 0.5 -fw 0 -fl -2400 -g 0.93 Test_File_Foobar_Redbook.wav test_192k.wav
I do not reveal the identity of all the files yet, so only listen, and do not analyze them.
normal_smile .gif
However, B.flac is the 192/24 resampled file, so you can compare the others against that one.


Thank you! I appreciate the time and effort this is taking on your end to prepare test files.
Edit: Never mind previous question, I just unzipped and saw that there are six files total.
 

Users who are viewing this thread

Back
Top