[SOLVED] Help with resampling files, I can hear differences between 192/24 and 48/24 ABX tests
Sep 12, 2015 at 1:53 PM Thread Starter Post #1 of 21

zareliman

100+ Head-Fier
Joined
Mar 29, 2012
Posts
111
Likes
24
Hi

A while ago I discovered the 192/24 formats, downloaded some, read some stuff. According to the nyquist theorem there should be no audible difference between the 48/24 and 192/24 so I decided to downsample the ultra-big files I had (since I'm running low on storage).
Even though I know and trust the nyquist theorem I still wanted to try if I could spot the differences between a downsampled track and the original using ABX tests so I'm not losing original quality with my conversion.

I read some stuff about the 3 options for foobar2000, the PPHS, the dbpowerAMP and the SoX. The SoX was supposed to be the better one in terms of quality (still I don't know how can a resampler be different than other in terms of quality, I can understand one being faster or more efficient due to better coding but the quality shouldn't be an issue when you're supposed to just delete half the samples). There's supposed to be filters that eliminate some undesired frequencies theorically resulting in better quality but, again, conceptually I think the best conversion method shouls just downsample without altering it any further.
 
I decided to use: The David Hazeltine Trio - Impromptu (Binaural+) record, track 2.
 
I started my tests by transcoding the original 96/24 to Mp3. I could spot the difference within seconds after I knew "where to look". At least my hearing/focus can be trusted at that level (8/8 ABX correct)
 
  foo_abx 2.0.1 report
foobar2000 v1.3.9 beta 1
2015-09-12 14:02:46
File A: 02. Jesu Joy of Man's Desiring.mp3
SHA1: 560d3c0b13e9ae880014b08a33fa989326f972e7
Gain adjustment: +1.60 dB
File B: 02. Jesu Joy of Man's Desiring.flac
SHA1: 09ed7a6da74d0de725715e0519fee6eb262bcb36
Gain adjustment: +1.62 dB
Output:
WASAPI (event) : Speakers (ASUS Xonar D1 Audio Device), 24-bit
Crossfading: NO
14:02:46 : Test started.
14:03:14 : 01/01
14:03:23 : 02/02
14:03:30 : 03/03
14:04:17 : 04/04
14:04:23 : 05/05
14:04:28 : 06/06
14:04:35 : 07/07
14:04:38 : 08/08
14:04:38 : Test finished.
 ----------
Total: 8/8
Probability that you were guessing: 0.4%
 -- signature --
8d64736d68ba51ad702056666775f2ffc2795ceb


Then decided to transcode from 96/24 to the highest quality Nero AAC. I scored 8/8 in the ABX test without being doubtful in any choice.
 
  foo_abx 2.0.1 report
foobar2000 v1.3.9 beta 1
2015-09-12 14:06:04
File A: 02. Jesu Joy of Man's Desiring.m4a
SHA1: 3d3dc295ba147cc35915b582a7267fe37399b9a1
Gain adjustment: +1.63 dB
File B: 02. Jesu Joy of Man's Desiring.flac
SHA1: 09ed7a6da74d0de725715e0519fee6eb262bcb36
Gain adjustment: +1.62 dB
Output:
WASAPI (event) : Speakers (ASUS Xonar D1 Audio Device), 24-bit
Crossfading: NO
14:06:04 : Test started.
14:06:18 : 01/01
14:06:23 : 02/02
14:06:27 : 03/03
14:06:33 : 04/04
14:06:41 : 05/05
14:06:51 : 06/06
14:06:54 : 07/07
14:06:59 : 08/08
14:06:59 : Test finished.
 ----------
Total: 8/8
Probability that you were guessing: 0.4%
 -- signature --
e8009cbfbd07428cbd6c0da3cb27a0071f7e6674

 
Now the 96/24 versus the 48/24 (FLAC). To my surprise I could hear differences here (I have the ABX tests to prove it 8/8 correct).
 
  foo_abx 2.0.1 report
foobar2000 v1.3.9 beta 1
2015-09-12 14:11:47
File A: 02. Jesu Joy of Man's Desiring.flac
SHA1: 09ed7a6da74d0de725715e0519fee6eb262bcb36
Gain adjustment: +1.62 dB
File B: 02. Jesu Joy of Man's Desiring.flac
SHA1: b1c8bb800b8f4a9915954dc641d2d797d0d6b821
Gain adjustment: +1.60 dB
Output:
WASAPI (event) : Speakers (ASUS Xonar D1 Audio Device), 24-bit
Crossfading: NO
14:11:47 : Test started.
14:11:55 : 01/01
14:12:03 : 02/02
14:12:16 : 03/03
14:12:22 : 04/04
14:12:29 : 05/05
14:12:38 : 06/06
14:12:42 : 07/07
14:12:47 : 08/08
14:12:47 : Test finished.
 ----------
Total: 8/8
Probability that you were guessing: 0.4%
 -- signature --
54e5e721bebd7810ae4ac4bd26664e66ed770107


So either I have a very special sense of hearing (I still can't hear above 18khz when I tested) or there's something wrong with my conversion method. I used the foobar2000 built-in tool, the latest FLAC encoder without dither and only the SoX resample DSP with 98% Passband and 40% Phase response, no antialiasing, Best Quality and Downsample 2x.
I suspect the SoX resampler for 2 reasons, the Passband and Phase response must be doing something to the original signal (I lack the technical knowledge to know what they do exactly). The second reason is that the gain changes when the file is resampled which I don't think should be happening in a downsampling.
If that's the case, what would be the best way to conduct such test ? Is there a mathematically accurate downsampling (just delete 1 every 2 samples) ? Or could I win the golden ear prize they're offering...

EDIT: The problem was identified. Through WASAPI or DS the playback of big files had a singular noise when played in y system (I couldn't find about my issue in the net I assume it must be a very particular issue produced by some incompatiblity in my system). Though ASIO the issue dissapears.
 
 
Sep 12, 2015 at 2:09 PM Post #3 of 21
SoX's default resampler is linear phase and completely flat right up to 20kHz, so anything you are hearing from that would be between 20-22kHz. Not dithering could make a difference if the track is recorded at a low level, but I would really doubt you are listening to music that uses more than 16 bits. So there are two questions:
 
1) Are you re-upsampling back up to 96? Otherwise there could be clues given away during switching
2) Are you setting the volume to actual listening levels and leaving it there?
 
Sep 12, 2015 at 2:16 PM Post #4 of 21
The xonar cards have a hardware resampler that you control through the driver. To disable it, you have to use ASIO (and that disables volume control as well, so prepare to use software volume control). Also, some DSPs on it are made to accept only 48khz max, so that might be the reason for audible differences. All dsps disabled in ASIO.
 
Sep 12, 2015 at 2:17 PM Post #5 of 21
  SoX's default resampler is linear phase and completely flat right up to 20kHz, so anything you are hearing from that would be between 20-22kHz. Not dithering could make a difference if the track is recorded at a low level, but I would really doubt you are listening to music that uses more than 16 bits. So there are two questions:
 
1) Are you re-upsampling back up to 96? Otherwise there could be clues given away during switching
2) Are you setting the volume to actual listening levels and leaving it there?


1) The playback is not reupsampled in foobar2000 and I don't use any DSP for playback. The soundcard is set to 192/24 and it should be upsampling internally but if I reduce it to 48/24 the playback of the original file would be altered during the test.
 
2) The volume is standarized through replaygain, both tracks are adjusted at 1.60db and 1.62db to sound the same accoding to the heuristic.

BTW this is a Chesky studio recording, I don't know if technically it uses more than 16bit but it's among the records with the highest dynamic range I have in my collection (ranks arround 15 in DR) so at least I know is a non-compressed master.
 
Sep 12, 2015 at 2:19 PM Post #6 of 21
 
1) The playback is not reupsampled in foobar2000 and I don't use any DSP for playback. The soundcard is set to 192/24 and it should be upsampling internally but if I reduce it to 48/24 the playback of the original file would be altered during the test.
 
2) The volume is standarized through replaygain, both tracks are adjusted at 1.60db and 1.62db to sound the same accoding to the heuristic.

 
1) See mindbombs comment about Windows stuff (ah things we avoid in Linux). Probably still best practice to go ahead and do the upsampling first, so you can vouch for the results
2) I mean do you, yourself, ever touch the volume pot while the track is playing?
 
Sep 12, 2015 at 2:44 PM Post #7 of 21
+1, don't let the job to your soundcard, convert the 24/48 back to 24/96 so that both files have about the same size and are used the same way by your computer.
also if there is some DSP active on your computer, it will at least deal with both files the same way. but using wasapi or asio sure wouldn't hurt your sound.
if you still get a difference then you will have to look into the resampler and its settings. but there is no meaning looking into that before making sure the rest is identical.
if you stop having a clear difference, then you might want to look a little into your signal path and windows settings to find out what's messing with your sound at one of the resolutions.
 
Sep 12, 2015 at 2:45 PM Post #8 of 21
  BTW this is a Chesky studio recording, I don't know if technically it uses more than 16bit but it's among the records with the highest dynamic range I have in my collection (ranks arround 15 in DR) so at least I know is a non-compressed master.

 
I think the highest I've gotten from a CD is DR21. Recording level matters in all this: the track (or the complete piece that it is a part of) should be normalized to 0dB before going to 16bits, especially if you are not dithering. Just another possibility.
 
p.s. Man this stuff is expensive. I thought about buying the track but it's album only and the album is $25 for 52 mins.
 
Sep 12, 2015 at 3:39 PM Post #9 of 21
  The xonar cards have a hardware resampler that you control through the driver. To disable it, you have to use ASIO (and that disables volume control as well, so prepare to use software volume control). Also, some DSPs on it are made to accept only 48khz max, so that might be the reason for audible differences. All dsps disabled in ASIO.

 
Nevermind I discovered the issue. On DirectSound or WASAPI the playback introduces noise on big files (it depends on the kbps), notably files over 6000kbps get a very noticeable white noise (some 192/24 FLAC, I don't know the exact threshold but it's notably close to arbitrary 6000 kbps which is very weird). I already knew this when I took the tests but since I was using only files with 96/24 (which max at ~3200kbps) I thought WASAPI would be fine. After the weird results I got in ABX tests and after reading your post I ran the test again in ASIO and couldn't tell 96/24 from 48/24. What I found is that WASAPI/DS also introduces a white noise background between ~2000kbps and 6000kbps but it's much more subtle than the one above 6000kbps.
I don't use any DSP effect on the Xonar drivers when I listen to music (they're useful for multichannel movies or games) but I guess it has trouble with high bitrate files (again it's not about frequency of sampling rate, it's about data per second, very weird). I can't fully understand why this happens but I least I know under what conditions I can expect trouble.
 
Sep 12, 2015 at 3:56 PM Post #10 of 21
   
I think the highest I've gotten from a CD is DR21. Recording level matters in all this: the track (or the complete piece that it is a part of) should be normalized to 0dB before going to 16bits, especially if you are not dithering. Just another possibility.
 
p.s. Man this stuff is expensive. I thought about buying the track but it's album only and the album is $25 for 52 mins.

It's cheaper to buy it on amazon http://www.amazon.com/Impromptu-David-Hazeltine/dp/B00CX7OW2U/ref=sr_1_1?ie=UTF8&qid=1408978510&sr=8-1&keywords=david%20hazeltine%20impromptu
 
The sound should be the same (since it's the same master), only at a lower resolution. I don't know if the extra bucks are worth the 192/24 (considering you don't even get the physical cd box with stuff) but since the original was recoerded at 192khz, the natural downsample should be 48khz so CD's 44.1khz "can" be less ideal (theorically).
 
Sep 12, 2015 at 4:10 PM Post #11 of 21
  It's cheaper to buy it on amazon http://www.amazon.com/Impromptu-David-Hazeltine/dp/B00CX7OW2U/ref=sr_1_1?ie=UTF8&qid=1408978510&sr=8-1&keywords=david%20hazeltine%20impromptu
 
The sound should be the same (since it's the same master), only at a lower resolution. I don't know if the extra bucks are worth the 192/24 (considering you don't even get the physical cd box with stuff) but since the original was recoerded at 192khz, the natural downsample should be 48khz so CD's 44.1khz "can" be less ideal (theorically).

 
Oh I just meant that if the single track had been available cheap in HD, I would have bought it to see what could be going on with the conversion. I buy used CDs like they're going out of style ^_^ Wait…
 
Sep 12, 2015 at 8:17 PM Post #12 of 21
  I read some stuff about the 3 options for foobar2000, the PPHS, the dbpowerAMP and the SoX. The SoX was supposed to be the better one in terms of quality (still I don't know how can a resampler be different than other in terms of quality, I can understand one being faster or more efficient due to better coding but the quality shouldn't be an issue when you're supposed to just delete half the samples). There's supposed to be filters that eliminate some undesired frequencies theorically resulting in better quality but, again, conceptually I think the best conversion method shouls just downsample without altering it any further.

The filtering is necessary to downsample audio, even if you are halving the sample rate or using any other integer division. Here's my brief explanation of why.
 
The undesired frequencies are called aliases. If you resample a 96KHz recording to 48KHz by removing every other sample, any frequencies that were above the 24KHz Nyquist frequency will be reflected back. This is part of the Nyquist-Shannon sampling theorem. 25KHz sounds will appear at 23KHz, 33KHz sounds will appear at 15KHz, and so on. By not filtering those frequencies above the Nyquist rate before resampling, they will be reflected back, potentially into the audible frequency range where they can be heard as aliasing distortion. By filtering the audio with an anti-aliasing filter which removes anything above the Nyquist rate before you resample, you remove any frequencies that would have been reflected, so no aliasing distortion will occur.
 
Aliasing does not just affect audio. It also affects images in various ways, where I believe it is easier to see why it is a problem. Look at the pictures of the brick walls in the Wikipedia article I linked.
 
Sep 13, 2015 at 10:19 AM Post #13 of 21
  The filtering is necessary to downsample audio, even if you are halving the sample rate or using any other integer division. Here's my brief explanation of why.
 
The undesired frequencies are called aliases. If you resample a 96KHz recording to 48KHz by removing every other sample, any frequencies that were above the 24KHz Nyquist frequency will be reflected back. This is part of the Nyquist-Shannon sampling theorem. 25KHz sounds will appear at 23KHz, 33KHz sounds will appear at 15KHz, and so on. By not filtering those frequencies above the Nyquist rate before resampling, they will be reflected back, potentially into the audible frequency range where they can be heard as aliasing distortion. By filtering the audio with an anti-aliasing filter which removes anything above the Nyquist rate before you resample, you remove any frequencies that would have been reflected, so no aliasing distortion will occur.
 
Aliasing does not just affect audio. It also affects images in various ways, where I believe it is easier to see why it is a problem. Look at the pictures of the brick walls in the Wikipedia article I linked.

 
The Nyquist theorem makes me a little suspicious here. If the 20kHz-48kHz audio was there inaudible, I don't understand really why they "become" audible when halving the sample rate, that conceptually for me implies that they were audible (not by themselves but by messing with the lower frequency waves) before the downsample.
Now, if you can hear clues above 20kHz, then the recordings made above that sample rate should add audible information. My point is that if the higher frequencies have audible effects in the 20-20000Hz range, I don't see why they're "bad", that "distortion" is just extra information that I believe your brain can make use of (despite seeming like unwanted noise when looking at the waveform in a computer). Maybe that's what people claim to hear on subjective comparisons between DSD128 or 192/24 versus 48/24.
 
BTW, the ABX test also gives clues when you're switching audio sources, if you switch between the same files is much faster than doing it from one file to another (it can be cheated). Now that shouldn't be a problem if you take the test conciously and don't mix AB with XY when changing sources.
 
Sep 13, 2015 at 10:39 AM Post #14 of 21
 
  The filtering is necessary to downsample audio, even if you are halving the sample rate or using any other integer division. Here's my brief explanation of why.
 
The undesired frequencies are called aliases. If you resample a 96KHz recording to 48KHz by removing every other sample, any frequencies that were above the 24KHz Nyquist frequency will be reflected back. This is part of the Nyquist-Shannon sampling theorem. 25KHz sounds will appear at 23KHz, 33KHz sounds will appear at 15KHz, and so on. By not filtering those frequencies above the Nyquist rate before resampling, they will be reflected back, potentially into the audible frequency range where they can be heard as aliasing distortion. By filtering the audio with an anti-aliasing filter which removes anything above the Nyquist rate before you resample, you remove any frequencies that would have been reflected, so no aliasing distortion will occur.
 
Aliasing does not just affect audio. It also affects images in various ways, where I believe it is easier to see why it is a problem. Look at the pictures of the brick walls in the Wikipedia article I linked.

 
The Nyquist theorem makes me a little suspicious here. If the 20kHz-48kHz audio was there inaudible, I don't understand really why they "become" audible when halving the sample rate, that conceptually for me implies that they were audible (not by themselves but by messing with the lower frequency waves) before the downsample.
Now, if you can hear clues above 20kHz, then the recordings made above that sample rate should add audible information. My point is that if the higher frequencies have audible effects in the 20-20000Hz range, I don't see why they're "bad", that "distortion" is just extra information that I believe your brain can make use of (despite seeming like unwanted noise when looking at the waveform in a computer). Maybe that's what people claim to hear on subjective comparisons between DSD128 or 192/24 versus 48/24.
 
BTW, the ABX test also gives clues when you're switching audio sources, if you switch between the same files is much faster than doing it from one file to another (it can be cheated). Now that shouldn't be a problem if you take the test conciously and don't mix AB with XY when changing sources.


no no it's not about the ultrasounds we hear, what we need isn't the ultrasound data, what we need is the resolution to have at least 2 points to create a 22khz signal. so we need at least 44khz of bandwidth to get 2 samples of 22khz per period. if we had only 22khz sample rate, we could only get 1 sample per period of a 22khz signal and thus fail to reconstruct it. (like being asked to draw only one line crossing 2 points. if you have only 1 point you can no longer know which is the right line). 
so we need the higher sampling rate, not higher frequency music.
 
and ABX can of course be cheated, that's why we suggest using the same resolution for both tracks but upsampling back,  to limit the differences in timing. but in the end a guy who wishes to pass an abx can pass it, just open some spectrum showing if there is or not some ultrasound content and you pass anything without even listening. but we hope that people are looking for answers, not to behave like idiots.
 
Sep 13, 2015 at 11:35 AM Post #15 of 21
   
The Nyquist theorem makes me a little suspicious here. If the 20kHz-48kHz audio was there inaudible, I don't understand really why they "become" audible when halving the sample rate, that conceptually for me implies that they were audible (not by themselves but by messing with the lower frequency waves) before the downsample.
Now, if you can hear clues above 20kHz, then the recordings made above that sample rate should add audible information. My point is that if the higher frequencies have audible effects in the 20-20000Hz range, I don't see why they're "bad", that "distortion" is just extra information that I believe your brain can make use of (despite seeming like unwanted noise when looking at the waveform in a computer). Maybe that's what people claim to hear on subjective comparisons between DSD128 or 192/24 versus 48/24.
 
BTW, the ABX test also gives clues when you're switching audio sources, if you switch between the same files is much faster than doing it from one file to another (it can be cheated). Now that shouldn't be a problem if you take the test conciously and don't mix AB with XY when changing sources.

It is difficult to intuit, but it is a proven part of how sampling works.
 

 
From the Wikipedia article, this is an image in the frequency domain representing a sampled signal. The blue curve from 0 to B is the frequencies of the desired audio. For complex reasons, it also has a negative part, and it repeats infinitely so there would be another green curve a 2, 3, 4, etc. times the sampling frequency (fs). Those green curves were not there in the original analog audio before it was sampled, they are an unnatural result of sampling. In the above example, a small part of the green crosses back into the blue. In a DAC, those green images are removed by the reconstruction filter which filters frequencies past fs/2, but in this example there is some green that has been mixed in to the blue which can not be removed. Any green that mixes with the blue will cause unnatural aliasing distortion. Those higher frequencies did not audibly affect the lower frequencies before the audio was sampled. Aliasing does not add any useful information. Your brain can not make sense of it because it is unnatural. It will only negatively affect the audio. To avoid it, the blue part must be filtered with an anti aliasing filter to make it less wide so that its green images do not mix with the blue. This does also apply to resampling in exactly the same way.
 

Users who are viewing this thread

Back
Top