Blind test: compare the sound recorded from various devices
Feb 2, 2013 at 5:59 PM Thread Starter Post #1 of 32

stv014

Headphoneus Supremus
Joined
Jul 17, 2011
Posts
3,493
Likes
273
Here is a set of 96/24 format FLAC files recorded from various DAC and amplifier outputs that you can compare against each other and a reference file (A.flac).
 
The list of files - warning: more than 500 MB total download size:
 
sample.flac - the original sample (created from several short samples taken from various other blind tests, the total length is 3:32, so it is fairly long); Note: for ABX testing, use A.flac as the reference instead, since sample.flac is not level matched with the other files, and it has a different sample rate (therefore making your DAC more of a factor in the test)
 
The following FLACs are all in 96/24 format, level matched and synchronized for ABX comparisons:
 
A.flac - sample.flac converted to 96 kHz in software
B.flac - created from rec_amp_d1.wav
C.flac - from rec_stx_dt770.wav
D.flac - from rec_alc269.wav
E.flac - from rec_d1.wav
F.flac - from rec_alc887.wav
G.flac - from rec_amp_stx.wav
H.flac - from rec_alc269_dt770.wav
 
Detailed description of what each recorded file is:
 
rec_alc269_dt770.wav - RealTek ALC269 onboard audio codec in a laptop, driving a DT770 Pro 250 Ω at full volume (0 dBFS = ~1 Vrms unloaded = slightly more than 100 dB SPL). The output impedance is about 18 Ω. Many/most people on these forums would expect this setup to be "loud enough but not driven properly" (of course, what is loud enough for one person, might not be loud enough for another). Can you hear it ?
rec_alc269.wav - the same laptop onboard headphone output driving a pair of 22 Ω resistors instead. Impressively, it still has less than 0.01% THD with an almost full scale 2.5-3 kHz tone.
rec_alc887.wav - ALC887 onboard line output of a desktop PC at full volume (0 dBFS = ~1.25 Vrms unloaded, output impedance is >200 Ω + 10 uF capacitors; 5.4 kΩ load)
rec_amp_d1.wav, rec_amp_stx.wav - an LM3876 based speaker amplifier driving 8 Ω resistors at 0 dBFS = slightly less than 11 Vrms (about 15 W, more than 100 dB SPL with efficient speakers) + pre-amplifier with volume, balance, and tone controls (4xNE5532 op amps). This is included mainly to test how the sound is affected by a long and complex "chain", with a total of up to 10 amplifier stages, at least 4 coupling capacitors, and 4 potentiometers in the signal path, in addition to the DAC and ADC chips. The balance and tone controls were set accurately for the flattest response and minimum channel imbalance (which is eliminated by the software level matching anyway). Those who believe in "op amp rolling" may also be interested to hear if so many cheap op amps really noticeably degrade the sound quality
rec_d1.wav - Xonar D1 line output at full volume (0 dBFS = ~1.935 Vrms unloaded, output impedance is 100 Ω + 220 uF capacitors; 5.4 kΩ load)
rec_stx_dt770.wav - Xonar Essence STX headphone output driving a DT770 Pro 250 Ω at -13 dB volume relative to the 7 Vrms full scale output (0 dBFS = ~1.5 Vrms loaded = ~105 dB SPL, 10.3 Ω output impedance). This headphone amplifier gets a lot of criticism on the other sub-forums, and there are also frequent claims that high impedance headphones are "loud enough but not driven properly" by it
 
Feb 4, 2013 at 5:08 AM Post #3 of 32
Frequency response graphs for all files:
 
   
 
On the left:
green = A.flac = original sample converted to 96 kHz
red = H.flac = rec_alc269_dt770.wav
blue = D.flac = rec_alc269.wav
yellow = F.flac = rec_alc887.wav
 
On the right:
green = B.flac = rec_amp_d1.wav
red = G.flac = rec_amp_stx.wav
blue = E.flac = rec_d1.wav
yellow = C.flac = rec_stx_dt770.wav
 
Feb 5, 2013 at 2:05 PM Post #4 of 32
More details: the tests (both playback and recording) were performed on Linux, using the ALSA hw: devices (these allow for the most direct access to the DAC/ADC, bypassing any software processing), and the ADC was a Xonar Essence STX sound card at 96 kHz sample rate and 24 bit resolution. To minimize the noise resulting from ground loops, I used this circuit on the input (some tests comparing the performance of a loopback with and without it can be seen here), the +/- 12 V regulated power supply is not shown:

For the speaker amplifier tests, I used the following circuit to simulate an 8 Ω load and reduce the voltage to the 2.83 V peak level the sound card input can handle (only one channel is shown):

The recorded sound was processed with a filter that (partly) compensated the low frequency roll-off of the ADC, although this is unlikely to have made any audible difference. It consisted of a simple 6 dB/octave lowpass filter with a -3 dB cutoff frequency at 1.3505 Hz, mixed to the original signal at the same level (so the gain is 6 dB at DC, and 0 dB at high frequency):

Further processing was applied using these utilities, after the above described equalization, I measured the error in pitch, delay, and level (separately on the left and right channels) with "sinetest", and corrected it with "resample". Finally, I did some fine tuning of the levels using ReplayGain, cropped the sound to the length of the original sample, and converted the files to 24-bit format (all temporary files used 32-bit float samples). I used the following resample commands:
Code:
 resample -il 128 -k 1 -r 96000 -fl -2500 sample2.wav sample2_96k.wav resample -il 128 -k 2.113377355560 -m 1.00003782795336 -g1 2.11959914 -g2 2.11630221 tmp.wav alc269_dt770.wav resample -il 128 -k 2.415132361754 -m 1.00003744202611 -g1 3.44597086 -g2 3.47372081 tmp.wav alc269.wav resample -il 128 -k 2.424511453052 -m 1.00002356823300 -g1 1.66364937 -g2 1.65349597 tmp.wav alc887.wav resample -il 128 -k 2.767857157205 -m 1.00000615568551 -g1 1.37429743 -g2 1.37805178 tmp.wav amp_d1.wav resample -il 128 -k 2.938615355979 -m 1.00000000000000 -g1 1.38842705 -g2 1.41617511 tmp.wav amp_stx.wav resample -il 128 -k 2.712378101242 -m 1.00000585083495 -g1 1.08891069 -g2 1.08572776 tmp.wav d1.wav resample -il 128 -k 2.538982778985 -m 1.00000000000000 -g1 1.37941194 -g2 1.39730859 tmp.wav stx_dt770.wav
So, the worst case pitch correction was by less than 38 ppm, which should not be audible, but it ensures accurate synchronization for the entire length of the sample. The channel imbalance was always within 0.2 dB.
 
Feb 5, 2013 at 5:38 PM Post #6 of 32
Quote:
Given that the smallest amount of any kind of processing invariably raises suspicions that the test may be invalid, what do you hope to demonstrate here, and to whom?

 
Those who are looking for excuses to why they do not hear a difference will always find them anyway, as you can see it in your "6 DACs compared" thread. Also, the original sample was only processed by sample rate conversion and level matching (it was scaled by a factor of 0.8898 * 0.997), the transparency of which can easily be proven by a simple difference test. Recording at 44.1 kHz causes more problems than resampling the source to 96 kHz.
Regarding the recorded samples, the largest pitch correction used was less than 0.07 cent, while the threshold of audible pitch errors is something like 2 cents if I recall correctly, and that is for trained musicians. Also, it can easily be proven that my correction filter for the STX ADC indeed makes its frequency response flatter, and the purpose of the test was not to test the audible effects (or lack of thereof) of the ADC. Other than these, and the level matching (which hopefully most would find acceptable), there was not really any other processing that could "improve" the recorded sound, unless you expect some kind of random "synergy". All the DSP applied is well documented and repeatable, I even released the source code of my tools, and for those who still have doubts, I can also upload the original recorded sound files (but perhaps not all of them, simply because the file sizes are huge).
 
Having said all that, regardless of the exact methods used, it does look like creating tests like this is a waste of time, and I do not think I will bother making another one. In the similar threads I have created so far, I did not get more than a handful of votes total. Someone who is convinced there is a difference usually just ignores the test, and keeps repeating the same claims, or at best comes up with the usual list of excuses trying to invalidate any form of blind testing. It could be useful to those who did not decide yet and hesitate, but in most cases they end up going with the opinion of the majority as the "safe" choice. Finally, someone who believes the measurements and expects not to hear a difference will not be really interested either.
 
Feb 5, 2013 at 6:07 PM Post #7 of 32
I thought level matching was standard protocol for blind testing? Because it is proven that louder will be perceived as better, skewing the results if it isn't used.
 
Feb 6, 2013 at 10:44 AM Post #8 of 32
Quote:
Also, the original sample was only processed by sample rate conversion and level matching (it was scaled by a factor of 0.8898 * 0.997), the transparency of which can easily be proven by a simple difference test.

 
Indeed, after I converted the sample back to 44100 Hz and the original level with
Code:
 resample -il 128 -r 44100 -g 1.127229745 sample2_96k_scaled.wav sample_44k.wav
and then from the resulting float format file I subtracted the original sample.wav, and removed everything above 20500 Hz (because there is an - intentional - roll-off above that frequency in sample2_96k.wav), not much "error" was left. Ignoring the first and last 0.1-0.2 s, because there is a low level (<-63 dBFS peak level) "click" at the beginning and end of the file, here are some statistics for the difference signal, which also includes the 24-bit quantization noise:

I doubt there are many people who can hear that
normal_smile .gif
The minor "clicks" at the beginning and end are there because the upsampled file is truncated to the original length, and the pre- and post-ringing of the lowpass filter is cut-off. When the sample rate is converted back, that information would be needed for a "perfect" reconstruction of the original signal.
 
Feb 9, 2013 at 2:46 PM Post #9 of 32
Sorry, I'm fairly new to this whole sound science stuff but I did participate in the "6 DACs" thread and posted my Foobar ABX results between the C and G sample, in which I could hear a difference between the two. From your tests here, are you implying that there should be no audible differences between DACs?
 
Feb 9, 2013 at 2:54 PM Post #10 of 32
Quote:
Sorry, I'm fairly new to this whole sound science stuff but I did participate in the "6 DACs" thread and posted my Foobar ABX results between the C and G sample, in which I could hear a difference between the two. From your tests here, are you implying that there should be no audible differences between DACs?

 
Not quite. More probably I think that most people who are "at home" here would agree
 
- All good DACs should be alike - i.e. they should reproduce the original music accurately and nothing else
 
- The $200 ODAC is probably as good as a DAC can get
 
- The DAC in the $50 Sansa Clip is already pretty damn good.
 
Feb 9, 2013 at 2:57 PM Post #11 of 32
I love a good skeptic and feel the sound science section is where the rubber meets the road. Love it. but I disagree that short term bursts/ comparisons showup all variance in sound. Sometimes it takes listening over time with many recordings, but I hope stv is not discouraged. Maybe throw those who are subjectivists a bone and give a test of a nuforce dac and nwavguy. that one should at least show some difference, god that nuforce sounded like garbage to my ears.
 
Feb 9, 2013 at 3:02 PM Post #12 of 32
Quote:
Sorry, I'm fairly new to this whole sound science stuff but I did participate in the "6 DACs" thread and posted my Foobar ABX results between the C and G sample, in which I could hear a difference between the two. From your tests here, are you implying that there should be no audible differences between DACs?

 
I am not implying anything; I did not even compare them extensively myself, so I make no claim whether they all should sound the same or not (although I think at least one is likely transparent).
 
Feb 9, 2013 at 3:19 PM Post #13 of 32
Quote:
but I disagree that short term bursts/ comparisons showup all variance in sound.

 
Entire tracks are not included because of copyright reasons (the length of the samples is limited to <30 seconds), and to keep the file sizes within reasonable limits; also, the rest of the track would often sound similar anyway. The download size is already rather large at more than 500 MB, and the total length of 3 minutes and 32 seconds is longer than the other blind tests available here. I did try to include a decent variety of different sounds, and also asked for suggestions well before creating this test, although there were only a few replies to that. I think it should be enough to detect any significant differences, although maybe I should have included something really "quiet" (which is surprisingly not too easy to find, even "high resolution" recordings often do not in fact have a very wide dynamic range) so that there is a better chance to hear differences in the noise floor. For the same reason, amplifier tests at low volume could have been included.
 
Feb 9, 2013 at 3:24 PM Post #14 of 32
Sorry, I'm fairly new to this whole sound science stuff but I did participate in the "6 DACs" thread and posted my Foobar ABX results between the C and G sample, in which I could hear a difference between the two. From your tests here, are you implying that there should be no audible differences between DACs?


Not quite. More probably I think that most people who are "at home" here would agree

- All good DACs should be alike - i.e. they should reproduce the original music accurately and nothing else

- The $200 ODAC is probably as good as a DAC can get

- The DAC in the $50 Sansa Clip is already pretty damn good.


Yeah I'm not complaining about the Clip Zip for what you pay for but its soundstage sounds congested relative to the Objective combo to me. That's why I made a thread about how to measure the soundstage.
 
Feb 9, 2013 at 3:42 PM Post #15 of 32
I wasn't critical of your test sv in the remark about short bursts of testing. In fact you included many song sections and it helps highlight differences comapred to using one tune. But I was referring to owning a device and 'living with it' over a period of weeks. This is where alot of differences show up and I feel alot of them are not even conscious. I feel alot of what we like in sound is out of our awareness and we end up 'choosing' a headphone/dac etc to use ultimately by how a product makes us feel over time. So If one has 6 different headpones on a rack, the one that ends up getting the most headtime is 'the best'. Whether that headphone would have been chosen in a shorter a/b is hard to say but I'd suggest many times it wouldn't because often what one likes in short term they find headache causing in the long term and vice versa, what one finds boring in short term ends up being pleasing in the long term......etc....of course, the only way to replicate those tests is to buy the item and live with it unfortunately. I even feel these meets and headphone shows are not quite the best way to audition anything unless one can bring in his own lounge chair and favorite snack and camp out for a few hours with each item
 

Users who are viewing this thread

Back
Top