10 bits of sample depth is more than enough for audio with tests that you can perform to prove it to yourself
Aug 25, 2014 at 10:16 AM Thread Starter Post #1 of 19

SharpEars

Head-Fier
Joined
Feb 14, 2014
Posts
62
Likes
17
The assertion:
 
After the audio goes below -54 dBFS it is largely irrelevant and virtually inaudible, classic music, high dynamic range music or any other music. To hear it (not to mention hear it loud enough to appreciate it) at that level you would have to increase the volume to such a ridiculous amount and be wearing such isolating headphones in such a quiet room that as soon as a loud passage started you would be (temporarily) deafened reducing your ability to hear (i.e., the dynamic range of your ears) to well below 54 dB.
 
-54 dBFS requires only 9-bits (You can add an extra bit for good measure, more room to dither, yada yada), so a 10-bit ADC / 10-bit DAC converter is transparent for music at all reasonable listening levels that do not cause permanent hearing damage. If you listen at levels that do cause permanent hearing damage, than you will need less than 10-bits relatively soon, since you will not be able to hear at low volumes at all without a hearing aid.
 
Virtually all recordings have less than 60 dB of dynamic range. It's easily measured in any good DAW on a track by track basis. Try it with the music you listen to that you consider has a higher dynamic range - you will be surprised! I am including audiophile 192/24 recordings in the mix by the way, so I do mean virtually all. By the way, when measuring dynamic range, any silent leader or trailer on the track should not be included, since it can (unfairly) skew results.
 
The proof based on the technicals combined with your own ears:
 
Anyone that thinks that more than a 10-bit sample depth matters needs to watch the following video: https://www.youtube.com/watch?v=BYTlN6wjcvQ starting from 45:48. Let your own ears with your own (high end) equipment be the judge.
 
Actually, if you want to do this test with complete accuracy, you can download the original .wav file used for this test at: http://ethanwiner.com/aes/bit_reduction.wav
 
I start hearing noise at around 18 seconds when playing the .wav file which equates to a 7-bit depth, when listening on Sennheiser HD650 headphones connected via a balanced cable to an OPPO HA-1 fed via asynchronous USB (i.e., I don't think anyone can call my system "low res") with the volume set quite loud in a very quiet room. Let me repeat, 7-bits is enough to transparently encode this song when it is listened to with the equipment I just mentioned.
 
Now if you want to do the test yourself, get the .wav file at the link I just posted, see at what second you can hear noise or any objectionable artifacts. Then play the youtube video (also linked above) starting at 46:18 for the number of seconds you played the .wav file. You can determine from the video at what bit depth you heard the "bad audio" or noise. That my friends is the easiest way to convince yourself that 10-bits is plenty.
 
The proof based on your own tests:
 
If you really want to go all the way, you can download the actual VST plug-in called +decimate that was used in the instructional youtube video and try it with your own DAW and your own music. I would love to hear the results. In fact, I've done all of the research for you.
 
Here is a link to the latest version of the VST collection that includes +decimate: http://www.soundhack.com/freeware/
 
You want to download the Delay Trio / Freesound Bundle from the top left column on that page. The actual plug-in you're looking for from the set is +decimate and can be found under VST/Effect/Sound Hack/+decimate in your DAW after it is correctly located and installed in your DAW software. On windows when I install it, it installs itself in c:\program files\common files\VST2, so I just added that redirectoy to my DAW and refreshed the VST list making it available.
 
Note: Some DAWs have their own mechanism for reducing bit rate. If you use this mechanism at very low bit rates, you should try it without dither, since at very low bit rates, the dither will be clearly audible and the whole point here is transparency.
 
My personal (anecdotal) experience:
I have some music high in transients that I was sure could use some major (i.e., 24) bit depth and it turned out that 5-bits was enough! I am both flabbergasted and speechless at this point. How can anyone even consider high bit depth audio again after performing this test?
 
Happy listening and point all of your audiophile friends to this thread to permanently "circumcise" them of their (high-end) 192/24 purchasing/listening habits.
 
Aug 25, 2014 at 2:03 PM Post #2 of 19
  The assertion:
 
After the audio goes below -54 dBFS it is largely irrelevant and virtually inaudible, classic music, high dynamic range music or any other music. To hear it (not to mention hear it loud enough to appreciate it) at that level you would have to increase the volume to such a ridiculous amount and be wearing such isolating headphones in such a quiet room that as soon as a loud passage started you would be (temporarily) deafened reducing your ability to hear (i.e., the dynamic range of your ears) to well below 54 dB.
 
-54 dBFS requires only 9-bits (You can add an extra bit for good measure, more room to dither, yada yada), so a 10-bit ADC / 10-bit DAC converter is transparent for music at all reasonable listening levels that do not cause permanent hearing damage. If you listen at levels that do cause permanent hearing damage, than you will need less than 10-bits relatively soon, since you will not be able to hear at low volumes at all without a hearing aid.
 
Virtually all recordings have less than 60 dB of dynamic range. It's easily measured in any good DAW on a track by track basis. Try it with the music you listen to that you consider has a higher dynamic range - you will be surprised! I am including audiophile 192/24 recordings in the mix by the way, so I do mean virtually all. By the way, when measuring dynamic range, any silent leader or trailer on the track should not be included, since it can (unfairly) skew results.
 
The proof based on the technicals combined with your own ears:
 
Anyone that thinks that more than a 10-bit sample depth matters needs to watch the following video: https://www.youtube.com/watch?v=BYTlN6wjcvQ starting from 45:48. Let your own ears with your own (high end) equipment be the judge.
 
Actually, if you want to do this test with complete accuracy, you can download the original .wav file used for this test at: http://ethanwiner.com/aes/bit_reduction.wav
 
I start hearing noise at around 18 seconds when playing the .wav file which equates to a 7-bit depth, when listening on Sennheiser HD650 headphones connected via a balanced cable to an OPPO HA-1 fed via asynchronous USB (i.e., I don't think anyone can call my system "low res") with the volume set quite loud in a very quiet room. Let me repeat, 7-bits is enough to transparently encode this song when it is listened to with the equipment I just mentioned.
 
Now if you want to do the test yourself, get the .wav file at the link I just posted, see at what second you can hear noise or any objectionable artifacts. Then play the youtube video (also linked above) starting at 46:18 for the number of seconds you played the .wav file. You can determine from the video at what bit depth you heard the "bad audio" or noise. That my friends is the easiest way to convince yourself that 10-bits is plenty.
 
The proof based on your own tests:
 
If you really want to go all the way, you can download the actual VST plug-in called +decimate that was used in the instructional youtube video and try it with your own DAW and your own music. I would love to hear the results. In fact, I've done all of the research for you.
 
Here is a link to the latest version of the VST collection that includes +decimate: http://www.soundhack.com/freeware/
 
You want to download the Delay Trio / Freesound Bundle from the top left column on that page. The actual plug-in you're looking for from the set is +decimate and can be found under VST/Effect/Sound Hack/+decimate in your DAW after it is correctly located and installed in your DAW software. On windows when I install it, it installs itself in c:\program files\common files\VST2, so I just added that redirectoy to my DAW and refreshed the VST list making it available.
 
Note: Some DAWs have their own mechanism for reducing bit rate. If you use this mechanism at very low bit rates, you should try it without dither, since at very low bit rates, the dither will be clearly audible and the whole point here is transparency.
 
My personal (anecdotal) experience:
I have some music high in transients that I was sure could use some major (i.e., 24) bit depth and it turned out that 5-bits was enough! I am both flabbergasted and speechless at this point. How can anyone even consider high bit depth audio again after performing this test?
 
Happy listening and point all of your audiophile friends to this thread to permanently "circumcise" them of their (high-end) 192/24 purchasing/listening habits.

 
I think you are confusing RMS values vs Peak values. Furthermore, you are assuming that all music in a single track (or even full album) has constant loudness throughout. Please see my example here about King Crimson's "Lizard" from their album "Lizard". This track has extended passages with RMS values at -44.8 dB and passages with peak values at -1.9 dB.
 
Here is the track on Youtube so you can get a sense for the dynamics in the piece:

 
While 10 bits gives a pretty decent reproduction of any single passage at a given volume level at any instant, it is a poor level of fidelity for capturing an entire piece which can have passages varying from piano pianissimo to forte fortissimo.
 
Cheers
 
Aug 25, 2014 at 2:23 PM Post #3 of 19
   
I think you are confusing RMS values vs Peak values. Furthermore, you are assuming that all music in a single track (or even full album) has constant loudness throughout. Please see my example here about King Crimson's "Lizard" from their album "Lizard". This track has extended passages with RMS values at -44.8 dB and passages with peak values at -1.9 dB.
 
Here is the track on Youtube so you can get a sense for the dynamics in the piece:

 
While 10 bits gives a pretty decent reproduction of any single passage at a given volume level at any instant, it is a poor level of fidelity for capturing an entire piece which can have passages varying from piano pianissimo to forte fortissimo.
 
Cheers


 
The rather extreme example you have provided requires 42.9 dB of dynamic range as you yourself correctly pointed out in a related post on a different thread. This can easily fit into 8 bits which would allow 48 dB of dynamic range (arguably even 7 bits (42 dB) would sound transparent). Therefore this is well within the 60 dB of dynamic range that 10 bits per sample allows.
 
Aug 25, 2014 at 3:46 PM Post #4 of 19
   
The rather extreme example you have provided requires 42.9 dB of dynamic range as you yourself correctly pointed out in a related post on a different thread. This can easily fit into 8 bits which would allow 48 dB of dynamic range (arguably even 7 bits (42 dB) would sound transparent). Therefore this is well within the 60 dB of dynamic range that 10 bits per sample allows.


Did you read what I wrote about confusing RMS levels vs peak levels? Additionally, you need to understand signal levels vs noise levels. It it quite easy to hear the noise floor in a 10bit encoded version of this track. Here is proof using the VST plugins you've recommended:
 
foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/08/25 14:33:41

File A: E:\C_scratch_backup\Audio\Lizard_10bit.flac
File B: E:\C_scratch_backup\Audio\Lizard_24bit.flac

14:33:41 : Test started.
14:34:05 : 01/01  50.0%
14:34:13 : 02/02  25.0%
14:34:17 : 03/03  12.5%
14:34:20 : 04/04  6.3%
14:34:24 : 05/05  3.1%
14:34:31 : 06/06  1.6%
14:34:35 : 07/07  0.8%
14:34:39 : 08/08  0.4%
14:34:42 : 09/09  0.2%
14:34:47 : 10/10  0.1%
14:34:51 : 11/11  0.0%
14:34:54 : 12/12  0.0%
14:34:56 : 13/13  0.0%
14:34:58 : 14/14  0.0%
14:35:00 : 15/15  0.0%
14:35:05 : 16/16  0.0%
14:35:18 : 17/17  0.0%
14:35:47 : 18/18  0.0%
14:36:16 : 19/19  0.0%
14:36:32 : 20/20  0.0%
14:36:41 : Test finished.

 ----------
Total: 20/20 (0.0%)
As long as I hear the difference between the 10bit and "24bit" (they were both fed through your VST plugin, encoded from an original 16/44.1 source), then there is an audible difference between them. Therefore you are wrong. 10 bits is not enough for audible transparency. Can you hear the music clearly---yes. But the noise has come up to an audible level, which (in my book) makes it unacceptable. 16 bit is a nearly optimal bit depth for encoding humanly audibly transparent audio in virtually 100% of all situations. 10 bit is not.
 
Originally Posted by ab initio /img/forum/go_quote.gif
 
....
 

 
 
At the beginning of the piece is "Prince Rupert Awakes," a part of which is shown in the figure. Jon Anderson provides guest vocals, which are pretty quiet. For example, from t = 5 seconds to 10 seconds (highlighted in green), the RMS level is 0.0057 of full scale, which is -44.8 dB. The peak level during this section is 0.0258, or -31.8 dB.
 
After Jon finishes his verse, Andy McCulloch's drums come pounding in (see the bit highlighted in red), which is a huge dynamic change. Here, the RMS levels during this section are 0.0537 (-25.4 dB) while the peaks of the drums reach 0.3573 (-8.9 dB) (and even exceed it in the blue part immediately after the red!). Overall, the recording ranges from RMS levels of -44.8 dB to a peak value of -1.9 dB.
 
That's a 42.9 dB range in the recording between the RMS value of the nominally quite part to the loudest drum strike. If you were listening to the piece on your system, you'd probably have typical volume parts around 60dB SPL or so (like the red bit in the figure above), which puts the quite parts around 40dB SPL. Meanwhile, your system will stlil have to reproduce the peak transients that would be at 83dB SPL.
 
Perhaps you're like me, and you really like the piece and want to hear Jon Anderson (Yes!) sing with Fripp and friends, so you turn the thing up until the quiet bit is medium-loud (say 60 dB SPL), then the typical parts of the music are a loud 80 dB SPL and the peak transients are over 100 dB SPL.
 
....
 

 
The problem with your assertion that 10bit is sufficient for audible transparency is that you're forgetting that you need headroom for transient peaks and that you need floor room to eliminate the audibility of noise. Transients can be 10s or more dBs above RMS levels in well mastered recordings (not hypercompressed pop), so to avoid clipping you need to account for the peak values in addition to the average loundess. The same thing goes for noise, a well mastered track will try to keep noise levels below the threshold of audibility throughout the duration of the recording.
 
When you listen to music, the average loudness is described by the RMS value of the signal. The human range of hearing is something like 120dB from the absolute quietest sound detectable to the the threshold of pain; however, at any one instant, that range is limited to something more like 50dB or so (basically, you can't hear very quiet things during really loud noise). What that means is that you can still hear background noise (depending on it's characteristics) that is several 10s of dBs below current sound levels (i.e., RMS levels). If I listen to a 7bit version of the track with quantization noise at -42 dB down, then the quantization noise is equally as loud as the quiet passages. That would be unacceptable. Right now, with 10bit encoding, the noise is 16 dB below the amplitude of the quiet passages and it is easily audible. The noise is 36 dB below the louder sections of the piece and it is still perceptible.
 
 
For something to be audibly transparent, the noise levels should be imperceptible under all conditions. If 16 bit doesn't already accomplish that, it gets very very very (like 99.99% of the way there) close. 10 bit is not audibly transparent as I have just demonstrated.
 
Cheers
 
Aug 25, 2014 at 4:05 PM Post #5 of 19
 
Did you read what I wrote about confusing RMS levels vs peak levels? Additionally, you understand signal levels vs noise levels. It it quite easy to hear the noise floor in a 10bit encoded version of this track. Here is proof using the VST plugins you've recommended:
 
foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/08/25 14:33:41

File A: E:\C_scratch_backup\Audio\Lizard_10bit.flac
File B: E:\C_scratch_backup\Audio\Lizard_24bit.flac

14:33:41 : Test started.
14:34:05 : 01/01  50.0%
14:34:13 : 02/02  25.0%
14:34:17 : 03/03  12.5%
14:34:20 : 04/04  6.3%
14:34:24 : 05/05  3.1%
14:34:31 : 06/06  1.6%
14:34:35 : 07/07  0.8%
14:34:39 : 08/08  0.4%
14:34:42 : 09/09  0.2%
14:34:47 : 10/10  0.1%
14:34:51 : 11/11  0.0%
14:34:54 : 12/12  0.0%
14:34:56 : 13/13  0.0%
14:34:58 : 14/14  0.0%
14:35:00 : 15/15  0.0%
14:35:05 : 16/16  0.0%
14:35:18 : 17/17  0.0%
14:35:47 : 18/18  0.0%
14:36:16 : 19/19  0.0%
14:36:32 : 20/20  0.0%
14:36:41 : Test finished.

 ----------
Total: 20/20 (0.0%)
As long as I hear the difference between the 10bit and "24bit" (they were both fed through your VST plugin, encoded from an original 16/44.1 source), then there is an audible difference between them. Therefore you are wrong. 10 bits is not enough for audible transparency. Can you hear the music clearly---yes. But the noise has come up to an audible level, which (in my book) makes it unacceptable. 16 bit is a nearly optimal bit depth for encoding humanly audibly transparent audio in virtually 100% of all situations. 10 bit is not.
 
Cheers

 
I will try this test when I get home as well. But, with dither enabled in the downconversion to 10 bits, the noise floor should be at around -57 dBFS. So, how loud were you listening to be able to hear the noise floor clearly and how loud was the noise floor?
 
Aug 25, 2014 at 4:57 PM Post #6 of 19
we already talked about this ^_^.
 
sure 10bits or even less was our hifi a few decades backs and nobody was really complaining. but 16bit is now the basic standard, and so many people already whine about it not being good enough for their expert ears and usual misunderstanding of digital audio and human hearing, do you wanna try and convince them about 10bits when we fail to do it with 16? ^_^
good luck mate!
 
 
 
edit: about noise floor, I too have a hd650 and simply the fact that it's an open headphone makes me less sensitive to noise than with my custom IEMs. and it's easily understandable, with the customs I get rid of some of the ambient noise of my room so I can get more of the noise of the track without listening louder.
 
Aug 25, 2014 at 5:09 PM Post #7 of 19
   
I will try this test when I get home as well. But, with dither enabled in the downconversion to 10 bits, the noise floor should be at around -57 dBFS. So, how loud were you listening to be able to hear the noise floor clearly and how loud was the noise floor?


I'm not exactly sure. OS volume at 100%, foobar at -13.8 dB. I'm using Sennheiser HD280pros out of the motherboard's onboard sound chip (set at 24/96 out, realtek HD).
According to the reltek datasheet, the headphone output has a max 1.5V rms, and given the HD280's 113 dB SPL/V sensitivity and my volume settings, that works out to a 0dB FS audio signal topping out at 102.7 dB. Given that the track I'm listening to has it's loud sections at -25.4dB( +0.5 dB from Replay Gain), I'm listening at RMS levels less than 80 dB SPL, with transient peaks at 100 dB. I think that's completely reasonable volume settings. It might also be relevant information that I'm sitting in a shared office space with 10 other (noisy) people and the doors propped open for airflow with very very busy hallways (first day of fall semester classes). I'm by no means in artificially optimal conditions for a positive test. Please let me know if you want any other details about my current listening setup. My very modest "HiFi" setup is at home.
 
Usually, I'm bothered by this tracks tape hiss (it was recorded in the 70s after all) which I can hear during the quiet passages. Here, the 10bit noise (which sounds very similar to tape hiss) is much more prominent. As you can see from the time stamps on my ABX run, it was clear enough that I could plow right through many trials with 100% success detection rate. I thought the noise floor was obnoxiously high. You will have to give it a try and see how apparent it is to you.
 
Cheers
 
Aug 25, 2014 at 5:19 PM Post #8 of 19
Here's a plot of the first six minutes of "Lizard":
 

 
If you simply throw away the 6 least significant bits then you're going to take a healthy bite out of the treble.
 
Aug 25, 2014 at 5:53 PM Post #9 of 19
  Here's a plot of the first six minutes of "Lizard":
 

 
If you simply throw away the 6 least significant bits then you're going to take a healthy bite out of the treble.

 
While i partially agree, i think this is a bit misleading as to what the negative consequences are for reducing the bitdepth. This spectrogram is averaging the energy from the quite bits (which are mostly vocals) and the loud bits (which include the heavy percussion where i suspect much of the high frequency energy comes from).  The quite bits don't contribute much energy, thus the spectrogram for this whole passage isn't unlike the spectrogram of just the loud passages, except shifted down a few dB because it's being divided by a longer sampling time.
 
What happens is that the quite bits sound like they're really obscured by tape hiss (i.e., quantization noise), while the louder bits really aren't affected as much (notably, because they remain some 30-40dB above the quantization noise).
 
I don't have audacity here at work, but perhaps what I ought to do is chop the track up into a few select clips of A) just a quite passage clip B) just a loud passage clip and C) a clip with a transition from quite to loud. That way other folks here can try their own set of tests.
 
Anyone have a good classical track with wild sings in dynamic range?
 
EDIT: Of course! The Telarc 1812 overture is classical recording that spectacularly fails the 10bit test. Any other suggestions?
 
Cheers
 
Aug 25, 2014 at 6:53 PM Post #10 of 19
Robert Stuart (Meridian) has done a detailed analysis of how many bits and what sampling rate are really needed for (a) absolute minimum and (b) audible transparency.
 
https://www.meridian-audio.com/w_paper/Coding2.PDF
 
In summary:
Bare minimum: 11 bits noise shaped, 52 KHz.
Audibly transparent: 14.5 bits (noise shaped) / 20 bits (rectangular), 58 KHz.
 
(When reading the above paper, note that the figures and graphs referenced in the text are all embedded at the bottom of the paper. I suggest opening two copies side by side so you can see the figures while reading the text.)
 
Aug 25, 2014 at 7:05 PM Post #11 of 19
  Robert Stuart (Meridian) has done a detailed analysis of how many bits and what sampling rate are really needed for (a) absolute minimum and (b) audible transparency.
 
https://www.meridian-audio.com/w_paper/Coding2.PDF
 
In summary:
Bare minimum: 11 bits noise shaped, 52 KHz.
Audibly transparent: 14.5 bits (noise shaped) / 20 bits (rectangular), 58 KHz.
 
(When reading the above paper, note that the figures and graphs referenced in the text are all embedded at the bottom of the paper. I suggest opening two copies side by side so you can see the figures while reading the text.)


Thanks for sharing the paper. Those numbers seem to be in good agreement with the experiment I did here during a discussion regarding whether or not 16bit was absolutely, 100%, completely perfect.
 
Originally Posted by ab initio /img/forum/go_quote.gif
 
 
This is certainly an interesting example. Unfortunately I don't own the track and I can't exactly analyse it through youtube. The question is, are you hearing the noise floor of 16bit digital audio or are you hearing the noise floor or the studio it was recorded in or perhaps the noise floor of the microphones or other recording equipment used? I'm not sure how we'd find out.
With modern dither the noise floor of 16bit audio is even lower, so I wonder what it would sound like if that track was recorded today.
 
Out of interest, I used Audacity to generate a 30 second 16bit WAV of silence. I'm not sure whether Audacity applied dither to this, and if so I don't know whether the dither in Audacity is a good one anyway, but I thought i'd share the file for those interested.
 
https://www.dropbox.com/s/de46xlmsjx9arr6/30%20seconds%20of%20silence.wav
 
Playing this file on my Denon receiver and AKG headphones, I have to turn the amp up to -10dB before I can just about hear noise. However, I can't tell you whether that noise is the noise floor of the track or the noise floor of something else in my system. I'd be interested to see how this track sounds to others.
 
What I can say though is I tried listening to that king crimson track (sadly through youtube) at that volume level and it while the quiet opening was listenable (fairly loud), the louder sections were almost painful.
 
EDIT: In an attempt to see if the noise was my system or 16bit, I generated another 30 seconds of silence and exported in 24bit. Even at 0dB on my receiver I can't hear noise with this one.
https://www.dropbox.com/s/uen66lni39xvo1j/24bit%20silence.wav

 
 
HI kraken,
 
Thank you for the noise test files. Normally, to test if i could hear the 16 bit noise floor, I set my DAC to 16 bit mode, and pause a song in winamp, that way I get the DAC's output through window's kmix which adds the dither. Now I have those files, it's much more convenient (and I don't have to futz with settings on my Linux machine!). Here, I will use your tracks and compare them to listening to music.
 
Abstract:
Here, I'm going to ABX the empty files kraken uploaded, seeing if I can differentiate between 16bit and 24bit empty files. Then I'm going to listen to music and see if my  eardrums are blown into my skull. Then I will repeat the ABX.
 
Materials and methods:
I listened from my CentOS linux server which has some extra fans running to keep my passively cooled raid card and all the hard disks in the machine cooled. I had a quieter listening environment in my old apartment when the server was in my bed room and I could listen in my office without my much quieter laptop+usb dac. Here, I'm using the machines motherboard line out into a Schiit Magni, and I'm using Paradox headphones with slight noise isolation. Software volumes are maxed. The schiit volume is set very loud, at about 2 o'clock. You can try and estimate the resulting SPLs using the data from this fantastic thread here. I found that the diagram of the volume pot at the bottom of the post reflects the knob on the Magni. However, I believe the sensitivity of the Paradox is less than the sensitivity listed in the thread for Fostex T50rp by direct comparison to the AKG 240s (My Paradox (modded T50rp) headphones are noticeably less sensitive than my pair of stock AKG240s, contrary to the listed data).    
 
I played the files through foobar2000 using WINE on the centos system. Using the ABX tool, I tested whether I could differentiate the  between 16bit and 24bit empty files at a specific volume setting. 30ish trials were conducted per ABX test. Then, after the ABX test, I listened to King Crimson's Lizard without adjusting the volume.
 
Next, I listened to the Telarc recording of Tchaikovsky's 1812 overture (and adjusting volume (lower) to accommodate the cannon blasts). Afterward, using these volume settings, I repeated the ABX of the 16bit and 24bit empty files at this lower  volume setting.
 
 
Results:
Here is my foobar ABX to show whether or not I can likely discriminate between the two files: 
 
foo_abx 1.3.4 report
foobar2000 v1.2.9
2014/05/05 23:40:48
 
File A: Z:\home\r*n\Downloads\30 seconds of silence (1).wav
File B: Z:\home\r*n\Downloads\24bit silence.wav
 
23:40:48 : Test started.
23:40:58 : 01/01  50.0%
23:41:10 : 02/02  25.0%
23:41:16 : 03/03  12.5%
23:41:19 : 04/04  6.3%
23:41:23 : 05/05  3.1%
23:41:27 : 06/06  1.6%
23:41:36 : 07/07  0.8%
23:41:43 : 08/08  0.4%
23:41:47 : 09/09  0.2%
23:41:51 : 10/10  0.1%
23:41:59 : 11/11  0.0%
23:42:04 : 12/12  0.0%
23:42:17 : 13/13  0.0%
23:42:22 : 14/14  0.0%
23:42:25 : 15/15  0.0%
23:42:32 : 16/16  0.0%
23:42:35 : 17/17  0.0%
23:42:40 : 18/18  0.0%
23:42:45 : 19/19  0.0%
23:42:47 : 20/20  0.0%
23:42:51 : 21/21  0.0%
23:42:54 : 22/22  0.0%
23:42:58 : 23/23  0.0%
23:43:02 : 24/24  0.0%
23:43:05 : 25/25  0.0%
23:43:08 : 26/26  0.0%
23:43:12 : 27/27  0.0%
23:43:15 : 28/28  0.0%
23:43:18 : 29/29  0.0%
23:43:28 : 30/30  0.0%
23:43:39 : Test finished.
 
 ---------- 
Total: 30/30 (0.0%)
 
 
WIthout touching the volume settings, I played King Crimson's Lizard (16/44.1). According to replayGain, the track has a peak level of 0.828 (~ -2dB) from an album with a peak of 0.99 (~0 dB). The noise in the intro sounds louder than the dither noise in the empty 16bit file, and I'm of the opinion it is tape noise (The album was recorded in 1970). At this volume setting it's definitely loud and not something I do every day, but sometimes I want to play my music loud, feel the bass, and get lost for half and hour. This is not the first time I've listened to this song at this volume. 
 
Another track I'm fond of is the Telarc recording of the 1812 overture (24/88.2). This is a more modern recording, and the background noise is quite low, and I believe it is due to the room noise picked up by the microphones. Here, the cannons are a bit louder, especially at the end, so when I relive the climax of caddyshack, i turned the volume down a bit. According to replay gain, the track and album peak level is 0.999941 ( ~0 dB ). Here Magni was set at 12:30. Afterward, I repeated the ABX at this lower volume setting:
foo_abx 1.3.4 report
foobar2000 v1.2.9
2014/05/06 00:21:24
 
File A: Z:\home\r*n\Downloads\30 seconds of silence (1).wav
File B: Z:\home\r*n\Downloads\24bit silence.wav
 
00:21:24 : Test started.
00:21:38 : 01/01  50.0%
00:21:45 : 02/02  25.0%
00:21:52 : 03/03  12.5%
00:22:02 : 04/04  6.3%
00:22:12 : 05/05  3.1%
00:22:19 : 06/06  1.6%
00:22:28 : 07/07  0.8%
00:22:38 : 08/08  0.4%
00:22:42 : 09/09  0.2%
00:22:45 : 10/10  0.1%
00:22:57 : 11/11  0.0%
00:23:08 : 12/12  0.0%
00:23:25 : 13/13  0.0%
00:23:29 : 14/14  0.0%
00:23:40 : 15/15  0.0%
00:23:49 : 16/16  0.0%
00:25:06 : 17/17  0.0%
00:25:23 : 18/18  0.0%
00:25:29 : 19/19  0.0%
00:25:44 : 20/20  0.0%
00:25:56 : 21/21  0.0%
00:26:08 : 22/22  0.0%
00:26:12 : 23/23  0.0%
00:26:23 : 23/24  0.0%
00:26:27 : 24/25  0.0%
00:26:37 : 25/26  0.0%
00:26:49 : 26/27  0.0%
00:27:01 : 27/28  0.0%
00:29:50 : 27/29  0.0%
00:30:08 : 28/30  0.0%
00:30:37 : 29/31  0.0%
00:30:47 : Test finished.
 
 ---------- 
Total: 29/31 (0.0%)
 
Conclusions:
I just wanted to point out that there are some specific, worst-case, cherry-pickable examples, where one might find the noise floor detectable and be able to listen to a track at that volume setting. If King Crimson's LIzard were recorded in a modern studio, it is plausible that the 16bit/44.1k dither noise floor could be detectable during the quietest parts of the track. It would take a track from an album with big swings in dynamics, and would only be detectable in the quietest passages while listening at otherwise very loud levels.
 
I'm not arguing for folks to buy hires audio. I'm asking you guys of the sound science forum to make more rigorous arguments, or to use the appropriate qualifiers on your statements, e.g., "for all practical purposes, 24 bit is unnecessary for listening" is a much better statement than "It's impossible for anyone to hear beyond 16 bit/44.1kHz". Even though redbook CD is dam near perfect for just about all audio playback situations, it only takes one single counter example, where it's slightly less than perfect to make the absolutist blanket statement false.
 
Let me clarify how hard the noise floor of the 16 bit track is to hear: It's really hard to hear. I didn't notice the noise until I compared it directly against the 24 bit file using the instant ABX switcher. Also, it is pretty quite here in this small midwestern city, in the middle of the night, in my office which is half under ground, in my house which is set back up and off the street. While doing the ABX, I had to pause and wait for a train to pass, and a plane to pass. It's not worth worrying about. In both tracks that I listened to, the recording noise exceeds the noise floor of the file. 
 
Cheers

 
 
Cheers
 
Aug 26, 2014 at 5:49 AM Post #12 of 19
Regarding the test above, note that Audacity uses buggy noise shaping by default when exporting 16-bit stereo files. This problem does not affect mono files (like the above "30 seconds of silence" sample), where the noise shaping does work correctly, but the default stereo 16-bit output from Audacity has an ENOB of only about 13-14. The bug may or may not have been fixed already (I do not normally use Audacity, so I do not check it regularly), but I know it definitely existed for years.
 
In any case, with the setup you used, the SPL at 0 dBFS was probably about 110 dB (assuming that your - unspecified ? - DAC outputs 2 Vrms at 0 dBFS, the Magni really does have a taper function like shown in xnor's post and thus an overall gain of 14 + -10 = 4 dB at 2 o'clock, and the sensitivity of your headphones is 100 dB/V, 3 dB worse than that of the K240). 100 dB down from that with the noise shaped dither (A-weighted at a bandwidth of 15 kHz) is 10 dB noise SPL which might be (barely) audible under very quiet listening conditions. That is, assuming that the noise is not actually made worse by unknown software or hardware problems that could for example fold back the near-ultrasonic shaped noise into the audio band.
 
However, hearing noise with actual music played at the same time can be significantly more difficult. Even when the music has a level of only -60 dBFS, its SPL would be 50 dB, and 40 dB above that of the noise. Of course, because of the different spectral distribution, it does need to be at a significantly lower level than the music to be really masked (contrary to what others seem to assume above, if there is music playing at -60 dBFS, it will not make noise at -61 dBFS inaudible), but 40 dB is likely to be enough.
 
I have some older samples that compare music quantized to various bit depths here. Of course, the sample used makes a difference (I can create more files if you post a better 24-bit one), but it is worth trying to get an idea what various levels of noise sound like.
 
Aug 26, 2014 at 6:09 AM Post #13 of 19
  Regarding the test above, note that Audacity uses buggy noise shaping by default when exporting 16-bit stereo files. This problem does not affect mono files (like the above "30 seconds of silence" sample), where the noise shaping does work correctly, but the default stereo 16-bit output from Audacity has an ENOB of only about 13-14. The bug may or may not have been fixed already (I do not normally use Audacity, so I do not check it regularly), but I know it definitely existed for years.
 
In any case, with the setup you used, the SPL at 0 dBFS was probably about 110 dB (assuming that your - unspecified ? - DAC outputs 2 Vrms at 0 dBFS, the Magni really does have a taper function like shown in xnor's post and thus an overall gain of 14 + -10 = 4 dB at 2 o'clock, and the sensitivity of your headphones is 100 dB/V, 3 dB worse than that of the K240). 100 dB down from that with the noise shaped dither (A-weighted at a bandwidth of 15 kHz) is 10 dB noise SPL which might be (barely) audible under very quiet listening conditions. That is, assuming that the noise is not actually made worse by unknown software or hardware problems that could for example fold back the near-ultrasonic shaped noise into the audio band.
 
However, hearing noise with actual music played at the same time can be significantly more difficult. Even when the music has a level of only -60 dBFS, its SPL would be 50 dB, and 40 dB above that of the noise. Of course, because of the different spectral distribution, it does need to be at a significantly lower level than the music to be really masked (contrary to what others seem to assume above, if there is music playing at -60 dBFS, it will not make noise at -61 dBFS inaudible), but 40 dB is likely to be enough.
 
I have some older samples that compare music quantized to various bit depths here. Of course, the sample used makes a difference (I can create more files if you post a better 24-bit one), but it is worth trying to get an idea what various levels of noise sound like.

 
I use Modi with an RMS output of 1.5 V, so Max SPL should have beem less than 110 dB. Maybe 106--7 dB peak SPL on the first track and another 2--3 dB down for 1812 overture. Naturally, listening conditions were somewhat favorable, which I believe I pointed out.
 
EDIT/CORRETION: While I have Modi and normally do all my listening tests using this DAC, I did not have the Modi functioning with my CentOS machine that I was using while doing the tests. That means that I was using the onboard sound (Realtek ALC892) on my machine's motherboard which has an RMS output of 2V. Therefore your estimation is probably reasonably accurate.
 
Cheers
 
Aug 26, 2014 at 6:34 AM Post #14 of 19
Well, being able to hear noise at ~5 dBA SPL - and even less under 10-12 kHz because of the noise shaping - might still be possible (although for most people under normal conditions, 10 dB is already "silent"), but I wonder if there were possibly any problems with the playback.
 
Aug 26, 2014 at 11:14 AM Post #15 of 19
  Well, being able to hear noise at ~5 dBA SPL - and even less under 10-12 kHz because of the noise shaping - might still be possible (although for most people under normal conditions, 10 dB is already "silent"), but I wonder if there were possibly any problems with the playback.

 
Have you read the paper Don Hills linked to above? The paper looks like it goes to great lengths to relate the thresholds of human detection to the noise inherent in PCM audio.
 
Everything in my test is within the realm of physical possiblity and supported by as much evidence as I can provide. You can download the test files from kraken here. I've stated all of the equipment that I've used to conduct the test and the components used in the test are all specified to be capable of resolving such information.
 
I ought to make some plots of the spectral energy density and calculate some statistics so we can get a better idea what the noise in the files actually looks like. I don't recall how @kraken applied dithering to the files and with what care.
 
What mechanism do you propose would artificially increase the audibility of the dither noise? Finally, what measurements would you recommend to eliminate playback chain errors from your suspicions? I have the either the onboard sound of my laptop or my desktop, or I can use a scientific grade data acquisition system to make measurements.
 
 
 However, hearing noise with actual music played at the same time can be significantly more difficult. Even when the music has a level of only -60 dBFS, its SPL would be 50 dB, and 40 dB above that of the noise. Of course, because of the different spectral distribution, it does need to be at a significantly lower level than the music to be really masked (contrary to what others seem to assume above, if there is music playing at -60 dBFS, it will not make noise at -61 dBFS inaudible), but 40 dB is likely to be enough.

 
It is important to note that I never claimed to hear the 16bit dither noise simultaneously with the music playing. I turned up the volume until i could detect 16 bit dither noise and differentiate between it and the 24bit file. Then I listened to music at that volume setting. I listened to slightly louder music and adjusted the volume down a little. Then after listening to the music and using this reduced volume setting, I made sure that I could still reliably differentiate between the noise (16bit dither) and noise-free (24bit) files.
 
I have done my best to put all of this into proper context so that I dont' get misconstrued as if I'm arguing that 16bit quantization noise is a BIG, NOTICABLE problem... I'm not. I tried to sum it up in my conclusions:
 Let me clarify how hard the noise floor of the 16 bit track is to hear: It's really hard to hear. I didn't notice the noise until I compared it directly against the 24 bit file using the instant ABX switcher. Also, it is pretty quite here in this small midwestern city, in the middle of the night, in my office which is half under ground, in my house which is set back up and off the street. While doing the ABX, I had to pause and wait for a train to pass, and a plane to pass. It's not worth worrying about. In both tracks that I listened to, the recording noise exceeds the noise floor of the file. 

I had to wait for a plane to pass by in the middle of the night in a small midwestern town that doesn't have a (semi)-major airport around for 40 miles. I had pretty quite conditions to just barely correctly ID the 16bit noise. Look at the time stamps on my ABX tests. I did these in the middle of the night. In May (not many noisy insects).
 
However, considering that all of these results are within the understood limits of human hearing and reasonable given the specifications of the hardware used, I conclude that it is possible to hear the noise from dithered 16 bit PCM at listenable volume settings.
 
Cheers
 

Users who are viewing this thread

Back
Top