The straight dope on 24 bit recordings.
Apr 8, 2015 at 2:05 AM Thread Starter Post #1 of 27

bigshot

Headphoneus Supremus
Joined
Nov 16, 2004
Posts
26,825
Likes
6,355
Location
A Secret Lab
Please, everybody behave yourself.
 
And oldie but goodie from Gregorio that doesn't deserve to be lost in the archive.
 
It seems to me that there is a lot of misunderstanding regarding what bit depth is and how it works in digital audio. This misunderstanding exists not only in the consumer and audiophile worlds but also in some education establishments and even some professionals. This misunderstanding comes from supposition of how digital audio works rather than how it actually works. It's easy to see in a photograph the difference between a low bit depth image and one with a higher bit depth, so it's logical to suppose that higher bit depths in audio also means better quality. This supposition is further enforced by the fact that the term 'resolution' is often applied to bit depth and obviously more resolution means higher quality. So 24bit is Hi-Rez audio and 24bit contains more data, therefore higher resolution and better quality. All completely logical supposition but I'm afraid this supposition is not entirely in line with the actual facts of how digital audio works. I'll try to explain:

When recording, an Analogue to Digital Converter (ADC) reads the incoming analogue waveform and measures it so many times a second (1*). In the case of CD there are 44,100 measurements made per second (the sampling frequency). These measurements are stored in the digital domain in the form of computer bits. The more bits we use, the more accurately we can measure the analogue waveform. This is because each bit can only store two values (0 or 1), to get more values we do the same with bits as we do in normal counting. IE. Once we get to 9, we have to add another column (the tens column) and we can keep adding columns add infinitum for 100s, 1000s, 10000s, etc. The exact same is true for bits but because we only have two values per bit (rather than 10) we need more columns, each column (or additional bit) doubles the number of vaules we have available. IE. 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 .... If these numbers appear a little familiar it is because all computer technology is based on bits so these numbers crop up all over the place. In the case of 16bit we have roughly 65,000 different values available. The problem is that an analogue waveform is constantly varying. No matter how many times a second we measure the waveform or how many bits we use to store the measurement, there are always going to be errors. These errors in quantifying the value of a constantly changing waveform are called quantisation errors. Quantisation errors are bad, they cause distortion in the waveform when we convert back to analogue and listen to it.

So far so good, what I've said until now would agree with the supposition of how digital audio works. I seem to have agreed that more bits = higher resolution. True, however, where the facts start to diverge from the supposition is in understanding the result of this higher resolution. Going back to what I said above, each time we increase the bit depth by one bit, we double the number of values we have available (EG. 4bit = 16 values, 5bit = 32 values). If we double the number of values, we halve the amount of quantisation errors. Still with me? Because now we come to the whole nub of the matter. There is in fact a perfect solution to quantisation errors which completely (100%) eliminates quantisation distortion, the process is called 'Dither' and is built into every ADC on the market.

Dither: Essentially during the conversion process a very small amount of white noise is added to the signal, this has the effect of completely randomising the quantisation errors. Randomisation in digital audio, once converted back to analogue is heard as pure white (un-correlated) noise. The result is that we have an absolutely perfect measurement of the waveform (2*) plus some noise. In other words, by dithering, all the measurement errors have been converted to noise. (3*).

Hopefully you're still with me, because we can now go on to precisely what happens with bit depth. Going back to the above, when we add a 'bit' of data we double the number of values available and therefore halve the number of quantisation errors. If we halve the number of quantisation errors, the result (after dithering) is a perfect waveform with halve the amount of noise. To phrase this using audio terminology, each extra bit of data moves the noise floor down by 6dB (half). We can turn this around and say that each bit of data provides 6dB of dynamic range (*4). Therefore 16bit x 6db = 96dB. This 96dB figure defines the dynamic range of CD. (24bit x 6dB = 144dB).

So, 24bit does add more 'resolution' compared to 16bit but this added resolution doesn't mean higher quality, it just means we can encode a larger dynamic range. This is the misunderstanding made by many. There are no extra magical properties, nothing which the science does not understand or cannot measure. The only difference between 16bit and 24bit is 48dB of dynamic range (8bits x 6dB = 48dB) and nothing else. This is not a question for interpretation or opinion, it is the provable, undisputed logical mathematics which underpins the very existence of digital audio.

So, can you actually hear any benefits of the larger (48dB) dynamic range offered by 24bit? Unfortunately, no you can't. The entire dynamic range of some types of music is sometimes less than 12dB. The recordings with the largest dynamic range tend to be symphony orchestra recordings but even these virtually never have a dynamic range greater than about 60dB. All of these are well inside the 96dB range of the humble CD. What is more, modern dithering techniques (see 3 below), perceptually enhance the dynamic range of CD by moving the quantisation noise out of the frequency band where our hearing is most sensitive. This gives a percievable dynamic range for CD up to 120dB (150dB in certain frequency bands).

You have to realise that when playing back a CD, the amplifier is usually set so that the quietest sounds on the CD can just be heard above the noise floor of the listening environment (sitting room or cans). So if the average noise floor for a sitting room is say 50dB (or 30dB for cans) then the dynamic range of the CD starts at this point and is capable of 96dB (at least) above the room noise floor. If the full dynamic range of a CD was actually used (on top of the noise floor), the home listener (if they had the equipment) would almost certainly cause themselves severe pain and permanent hearing damage. If this is the case with CD, what about 24bit Hi-Rez. If we were to use the full dynamic range of 24bit and a listener had the equipment to reproduce it all, there is a fair chance, depending on age and general health, that the listener would die instantly. The most fit would probably just go into coma for a few weeks and wake up totally deaf. I'm not joking or exaggerating here, think about it, 144dB + say 50dB for the room's noise floor. But 180dB is the figure often quoted for sound pressure levels powerful enough to kill and some people have been killed by 160dB. However, this is unlikely to happen in the real world as no DACs on the market can output the 144dB dynamic range of 24bit (so they are not true 24bit converters), almost no one has a speaker system capable of 144dB dynamic range and as said before, around 60dB is the most dynamic range you will find on a commercial recording.

So, if you accept the facts, why does 24bit audio even exist, what's the point of it? There are some useful application for 24bit when recording and mixing music. In fact, when mixing it's pretty much the norm now to use 48bit resolution. The reason it's useful is due to summing artefacts, multiple processing in series and mainly headroom. In other words, 24bit is very useful when recording and mixing but pointless for playback. Remember, even a recording with 60dB dynamic range is only using 10bits of data, the other 6bits on a CD are just noise. So, the difference in the real world between 16bit and 24bit is an extra 8bits of noise.

I know that some people are going to say this is all rubbish, and that “I can easily hear the difference between a 16bit commercial recording and a 24bit Hi-Rez version”. Unfortunately, you can't, it's not that you don't have the equipment or the ears, it is not humanly possible in theory or in practice under any conditions!! Not unless you can tell the difference between white noise and white noise that is well below the noise floor of your listening environment!! If you play a 24bit recording and then the same recording in 16bit and notice a difference, it is either because something has been 'done' to the 16bit recording, some inappropriate processing used or you are hearing a difference because you expect a difference.

G

1 = Actually these days the process of AD conversion is a little more complex, using oversampling (very high sampling frequencies) and only a handful of bits. Later in the conversion process this initial sampling is 'decimated' back to the required bit depth and sample rate.

2 = The concept of the perfect measurement or of recreating a waveform perfectly may seem like marketing hype. However, in this case it is not. It is in fact the fundamental tenet of the Nyquist-Shannon Sampling Theorem on which the very existence and invention of digital audio is based. From WIKI: “In essence the theorem shows that an analog signal that has been sampled can be perfectly reconstructed from the samples”. I know there will be some who will disagree with this idea, unfortunately, disagreement is NOT an option. This theorem hasn't been invented to explain how digital audio works, it's the other way around. Digital Audio was invented from the theorem, if you don't believe the theorem then you can't believe in digital audio either!!

3 = In actual fact these days there are a number of different types of dither used during the creation of a music product. Most are still based on the original TPDFs (triangular probability density function) but some are a little more 'intelligent' and re-distribute the resulting noise to less noticeable areas of the hearing spectrum. This is called noise-shaped dither.

4 = Dynamic range, is the range of volume between the noise floor and the maximum volume.
 
Apr 8, 2015 at 6:37 AM Post #3 of 27
 
So, can you actually hear any benefits of the larger (48dB) dynamic range offered by 24bit? Unfortunately, no you can't. The entire dynamic range of some types of music is sometimes less than 12dB. The recordings with the largest dynamic range tend to be symphony orchestra recordings but even these virtually never have a dynamic range greater than about 60dB. All of these are well inside the 96dB range of the humble CD. What is more, modern dithering techniques (see 3 below), perceptually enhance the dynamic range of CD by moving the quantisation noise out of the frequency band where our hearing is most sensitive. This gives a percievable dynamic range for CD up to 120dB (150dB in certain frequency bands).

 
While I do agree that the dynamic range of CDs is more than enough for most music at sane listening levels, and even fewer bits would be sufficient much of the time, it is not entirely right to calculate the number of bits required for transparent reproduction directly from the musical dynamic range. In other words, encoding a track with 60 dB dynamic range at 10 bits would only be enough to ensure that the quantization noise does not exceed the level of the music, but not to be always inaudible.

 
That is because the musical dynamic range is calculated from the overall levels of the signal, but individual critical bands can be much lower than the minimum level measured that way, and then the noise may not be masked in those bands (for example, if only loud bass is playing at a given time, then hiss in the mid to high range is still easily audible). The noise level also needs to be lower than the signal level in each band by >10 dB (the actual amount depends on the properties of the signal, such as whether it is tonal or noise-like) to get masked.
 
Nevertheless, the extra 36 dB dynamic range available at 16-bit resolution is likely to be enough to account for these factors.
 
Apr 8, 2015 at 12:46 PM Post #4 of 27
I think it's safe to say that many of the tracks where people are touting the benefits of 24bits could be reduced to < 16bits without any audible loss of content. It's interesting to go back in time and read reviews and expositions about Redbook and see all kinds of stories about people blowing up speakers by accident and the incredible DR of the format across the audible spectrum, yet today it's "not enough."
 
Apr 8, 2015 at 1:20 PM Post #5 of 27
It's human nature to think you might need "a little bit more" "just to be safe". But in the case of redbook, the whole format was designed to already be well into overkill. They didn't just look to capture enough sound to make music sound good (which an LP does)... They decided to capture the entire range of human hearing. Then they went and added "a little bit more" "just to be safe". So if you continue down that road, all you are doing is just piling up "a little bit more" on top of the last "a little bit more" for no practical purpose.
 
We were promised "perfect sound". They delivered it to us.
 
...then we wanted "a little bit more".
 
Apr 8, 2015 at 5:41 PM Post #6 of 27
   
While I do agree that the dynamic range of CDs is more than enough for most music at sane listening levels, and even fewer bits would be sufficient much of the time, it is not entirely right to calculate the number of bits required for transparent reproduction directly from the musical dynamic range. In other words, encoding a track with 60 dB dynamic range at 10 bits would only be enough to ensure that the quantization noise does not exceed the level of the music, but not to be always inaudible.
 
That is because the musical dynamic range is calculated from the overall levels of the signal, but individual critical bands can be much lower than the minimum level measured that way, and then the noise may not be masked in those bands (for example, if only loud bass is playing at a given time, then hiss in the mid to high range is still easily audible). The noise level also needs to be lower than the signal level in each band by >10 dB (the actual amount depends on the properties of the signal, such as whether it is tonal or noise-like) to get masked.
 
Nevertheless, the extra 36 dB dynamic range available at 16-bit resolution is likely to be enough to account for these factors.

 
And don't forget that there is extra headroom thanks to dither and noise shaping.
 
Apr 8, 2015 at 5:55 PM Post #7 of 27
Here is a nice demonstration of the effect of dithering and noise shaping on digital audio: http://www.audiocheck.net/audiotests_dithering.php
 
Some of the samples use 8-bit word length to make the effect much more audible than with 16 bit, so bear in mind that this is how 16 bit would sound when attenuated an extra 48dB. So when the voice says "my voice is now 12dB down from full scale" it is equivalent to -60dB down in 16 bit.
 
Apr 9, 2015 at 10:52 AM Post #8 of 27
 
So, can you actually hear any benefits of the larger (48dB) dynamic range offered by 24bit? Unfortunately, no you can't. The entire dynamic range of some types of music is sometimes less than 12dB. The recordings with the largest dynamic range tend to be symphony orchestra recordings but even these virtually never have a dynamic range greater than about 60dB. All of these are well inside the 96dB range of the humble CD. What is more, modern dithering techniques (see 3 below), perceptually enhance the dynamic range of CD by moving the quantisation noise out of the frequency band where our hearing is most sensitive. This gives a percievable dynamic range for CD up to 120dB (150dB in certain frequency bands).


While I do agree that the dynamic range of CDs is more than enough for most music at sane listening levels, and even fewer bits would be sufficient much of the time, it is not entirely right to calculate the number of bits required for transparent reproduction directly from the musical dynamic range. In other words, encoding a track with 60 dB dynamic range at 10 bits would only be enough to ensure that the quantization noise does not exceed the level of the music, but not to be always inaudible.

That is because the musical dynamic range is calculated from the overall levels of the signal, but individual critical bands can be much lower than the minimum level measured that way, and then the noise may not be masked in those bands (for example, if only loud bass is playing at a given time, then hiss in the mid to high range is still easily audible). The noise level also needs to be lower than the signal level in each band by >10 dB (the actual amount depends on the properties of the signal, such as whether it is tonal or noise-like) to get masked.

Nevertheless, the extra 36 dB dynamic range available at 16-bit resolution is likely to be enough to account for these factors.


If you're taking dynamic range of music for spectral bands, you also need to be taking the noise floor for spectral bands, which would be much lower than the overall noise floor.
 
HiBy Stay updated on HiBy at their facebook, website or email (icons below). Stay updated on HiBy at their sponsor profile on Head-Fi.
 
https://www.facebook.com/hibycom https://store.hiby.com/ service@hiby.com
Apr 9, 2015 at 11:49 AM Post #9 of 27
If you're taking dynamic range of music for spectral bands, you also need to be taking the noise floor for spectral bands, which would be much lower than the overall noise floor.

 
I do not recall specifically saying otherwise, nor anywhere implying that the noise floor in a single band would be equal to the total noise level (which is obviously impossible if not all noise is in that band). In any case, if the spectrum of the signal differs from that of the noise, the SNR at some frequencies is always worse than the overall value calculated from the RMS levels.
 
Apr 14, 2015 at 12:53 PM Post #10 of 27
   
Quote:
  Please, everybody behave yourself.
 
And oldie but goodie from Gregorio that doesn't deserve to be lost in the archive.
 
It seems to me that there is a lot of misunderstanding regarding what bit depth is and how it works in digital audio. This misunderstanding exists not only in the consumer and audiophile worlds but also in some education establishments and even some professionals. This misunderstanding comes from supposition of how digital audio works rather than how it actually works. It's easy to see in a photograph the difference between a low bit depth image and one with a higher bit depth, so it's logical to suppose that higher bit depths in audio also means better quality. This supposition is further enforced by the fact that the term 'resolution' is often applied to bit depth and obviously more resolution means higher quality. So 24bit is Hi-Rez audio and 24bit contains more data, therefore higher resolution and better quality. All completely logical supposition but I'm afraid this supposition is not entirely in line with the actual facts of how digital audio works. I'll try to explain:

When recording, an Analogue to Digital Converter (ADC) reads the incoming analogue waveform and measures it so many times a second (1*). In the case of CD there are 44,100 measurements made per second (the sampling frequency). These measurements are stored in the digital domain in the form of computer bits. The more bits we use, the more accurately we can measure the analogue waveform. This is because each bit can only store two values (0 or 1), to get more values we do the same with bits as we do in normal counting. IE. Once we get to 9, we have to add another column (the tens column) and we can keep adding columns add infinitum for 100s, 1000s, 10000s, etc. The exact same is true for bits but because we only have two values per bit (rather than 10) we need more columns, each column (or additional bit) doubles the number of vaules we have available. IE. 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 .... If these numbers appear a little familiar it is because all computer technology is based on bits so these numbers crop up all over the place. In the case of 16bit we have roughly 65,000 different values available. The problem is that an analogue waveform is constantly varying. No matter how many times a second we measure the waveform or how many bits we use to store the measurement, there are always going to be errors. These errors in quantifying the value of a constantly changing waveform are called quantisation errors. Quantisation errors are bad, they cause distortion in the waveform when we convert back to analogue and listen to it.

So far so good, what I've said until now would agree with the supposition of how digital audio works. I seem to have agreed that more bits = higher resolution. True, however, where the facts start to diverge from the supposition is in understanding the result of this higher resolution. Going back to what I said above, each time we increase the bit depth by one bit, we double the number of values we have available (EG. 4bit = 16 values, 5bit = 32 values). If we double the number of values, we halve the amount of quantisation errors. Still with me? Because now we come to the whole nub of the matter. There is in fact a perfect solution to quantisation errors which completely (100%) eliminates quantisation distortion, the process is called 'Dither' and is built into every ADC on the market.

Dither: Essentially during the conversion process a very small amount of white noise is added to the signal, this has the effect of completely randomising the quantisation errors. Randomisation in digital audio, once converted back to analogue is heard as pure white (un-correlated) noise. The result is that we have an absolutely perfect measurement of the waveform (2*) plus some noise. In other words, by dithering, all the measurement errors have been converted to noise. (3*).

Hopefully you're still with me, because we can now go on to precisely what happens with bit depth. Going back to the above, when we add a 'bit' of data we double the number of values available and therefore halve the number of quantisation errors. If we halve the number of quantisation errors, the result (after dithering) is a perfect waveform with halve the amount of noise. To phrase this using audio terminology, each extra bit of data moves the noise floor down by 6dB (half). We can turn this around and say that each bit of data provides 6dB of dynamic range (*4). Therefore 16bit x 6db = 96dB. This 96dB figure defines the dynamic range of CD. (24bit x 6dB = 144dB).

So, 24bit does add more 'resolution' compared to 16bit but this added resolution doesn't mean higher quality, it just means we can encode a larger dynamic range. This is the misunderstanding made by many. There are no extra magical properties, nothing which the science does not understand or cannot measure. The only difference between 16bit and 24bit is 48dB of dynamic range (8bits x 6dB = 48dB) and nothing else. This is not a question for interpretation or opinion, it is the provable, undisputed logical mathematics which underpins the very existence of digital audio.

So, can you actually hear any benefits of the larger (48dB) dynamic range offered by 24bit? Unfortunately, no you can't. The entire dynamic range of some types of music is sometimes less than 12dB. The recordings with the largest dynamic range tend to be symphony orchestra recordings but even these virtually never have a dynamic range greater than about 60dB. All of these are well inside the 96dB range of the humble CD. What is more, modern dithering techniques (see 3 below), perceptually enhance the dynamic range of CD by moving the quantisation noise out of the frequency band where our hearing is most sensitive. This gives a percievable dynamic range for CD up to 120dB (150dB in certain frequency bands).

You have to realise that when playing back a CD, the amplifier is usually set so that the quietest sounds on the CD can just be heard above the noise floor of the listening environment (sitting room or cans). So if the average noise floor for a sitting room is say 50dB (or 30dB for cans) then the dynamic range of the CD starts at this point and is capable of 96dB (at least) above the room noise floor. If the full dynamic range of a CD was actually used (on top of the noise floor), the home listener (if they had the equipment) would almost certainly cause themselves severe pain and permanent hearing damage. If this is the case with CD, what about 24bit Hi-Rez. If we were to use the full dynamic range of 24bit and a listener had the equipment to reproduce it all, there is a fair chance, depending on age and general health, that the listener would die instantly. The most fit would probably just go into coma for a few weeks and wake up totally deaf. I'm not joking or exaggerating here, think about it, 144dB + say 50dB for the room's noise floor. But 180dB is the figure often quoted for sound pressure levels powerful enough to kill and some people have been killed by 160dB. However, this is unlikely to happen in the real world as no DACs on the market can output the 144dB dynamic range of 24bit (so they are not true 24bit converters), almost no one has a speaker system capable of 144dB dynamic range and as said before, around 60dB is the most dynamic range you will find on a commercial recording.

So, if you accept the facts, why does 24bit audio even exist, what's the point of it? There are some useful application for 24bit when recording and mixing music. In fact, when mixing it's pretty much the norm now to use 48bit resolution. The reason it's useful is due to summing artefacts, multiple processing in series and mainly headroom. In other words, 24bit is very useful when recording and mixing but pointless for playback. Remember, even a recording with 60dB dynamic range is only using 10bits of data, the other 6bits on a CD are just noise. So, the difference in the real world between 16bit and 24bit is an extra 8bits of noise.

I know that some people are going to say this is all rubbish, and that “I can easily hear the difference between a 16bit commercial recording and a 24bit Hi-Rez version”. Unfortunately, you can't, it's not that you don't have the equipment or the ears, it is not humanly possible in theory or in practice under any conditions!! Not unless you can tell the difference between white noise and white noise that is well below the noise floor of your listening environment!! If you play a 24bit recording and then the same recording in 16bit and notice a difference, it is either because something has been 'done' to the 16bit recording, some inappropriate processing used or you are hearing a difference because you expect a difference.

G

1 = Actually these days the process of AD conversion is a little more complex, using oversampling (very high sampling frequencies) and only a handful of bits. Later in the conversion process this initial sampling is 'decimated' back to the required bit depth and sample rate.

2 = The concept of the perfect measurement or of recreating a waveform perfectly may seem like marketing hype. However, in this case it is not. It is in fact the fundamental tenet of the Nyquist-Shannon Sampling Theorem on which the very existence and invention of digital audio is based. From WIKI: “In essence the theorem shows that an analog signal that has been sampled can be perfectly reconstructed from the samples”. I know there will be some who will disagree with this idea, unfortunately, disagreement is NOT an option. This theorem hasn't been invented to explain how digital audio works, it's the other way around. Digital Audio was invented from the theorem, if you don't believe the theorem then you can't believe in digital audio either!!

3 = In actual fact these days there are a number of different types of dither used during the creation of a music product. Most are still based on the original TPDFs (triangular probability density function) but some are a little more 'intelligent' and re-distribute the resulting noise to less noticeable areas of the hearing spectrum. This is called noise-shaped dither.

4 = Dynamic range, is the range of volume between the noise floor and the maximum volume.


Thumbs up!
 
Apr 14, 2015 at 3:11 PM Post #11 of 27
Thanks Big shot for trying to clear the air regarding this totally overblown fixation on 24 bit recordings.
 
The way I like to explain things is as follows:
 
Think of the music on a recording as an item to shipped in a box and think of the bit depth as a way of measuring the dimensions the box that is going to be used for shipping the item. Assuming that the box is large enough to fit the item, increasing the size of the box has absolutely no effect on the size of the item going into the box. And assuming that the dynamic range of the music "fits" within a 16 bit recording then increasing the bit depth of the recording has no effect on the dynamic range of the music. In other words the dynamic range of the music contained in a 16 bit file is EXACTLY the same as the dynamic range of the same music contained in a 24 bit or 32 bit file - EXACTLY THE SAME. Just as an item measuring 5"X5"x5" shipped in a box measuring 6"X6"X6" is EXACTLY the same as the same item shipped in a box measuring 10"X10"X10".
 
This simple little fact is no where to be found in any explanation of high resolution recordings published in any high end audio magazine or on any high end audio web site. Why? MONEY that's why.
 
Apr 14, 2015 at 4:09 PM Post #13 of 27
Yes, "The truth is out there." I can't wait for someone to claim they need a 32 bit DAC.
 

Users who are viewing this thread

Back
Top