24bit vs 16bit, the myth exploded!
Jul 16, 2018 at 8:42 PM Post #4,981 of 7,175
The discussion around bit depth and sampling rates has been quite interesting. Being a somewhat novice, there is one conceptual aspect of the noise shaping which I don’t understand too well.

I follow the argument that 8bits and 4bits or even 1bit can fully reveal the analog signal, but with more noise. I also follow the logic that if we wanted say 4bits to have noise levels beneath the human hearing threshold (ie the same as 24bits), we would need a higher sampling rate than 44.1 to move the noise into the ultrasonic range.

What I don’t quite get is how then noise shaping works with 16/44. Firstly, I don’t get why noise shaping is required with 16bits when its noise floor is already around -96db. Secondly, where does the noise shaped signal go if the band width is limited to 22.05khz?
 
Jul 16, 2018 at 9:20 PM Post #4,982 of 7,175
The discussion around bit depth and sampling rates has been quite interesting. Being a somewhat novice, there is one conceptual aspect of the noise shaping which I don’t understand too well.

I follow the argument that 8bits and 4bits or even 1bit can fully reveal the analog signal, but with more noise. I also follow the logic that if we wanted say 4bits to have noise levels beneath the human hearing threshold (ie the same as 24bits), we would need a higher sampling rate than 44.1 to move the noise into the ultrasonic range.

What I don’t quite get is how then noise shaping works with 16/44. Firstly, I don’t get why noise shaping is required with 16bits when its noise floor is already around -96db. Secondly, where does the noise shaped signal go if the band width is limited to 22.05khz?
The discussion around bit depth and sampling rates has been quite interesting. Being a somewhat novice, there is one conceptual aspect of the noise shaping which I don’t understand too well.

I follow the argument that 8bits and 4bits or even 1bit can fully reveal the analog signal, but with more noise. I also follow the logic that if we wanted say 4bits to have noise levels beneath the human hearing threshold (ie the same as 24bits), we would need a higher sampling rate than 44.1 to move the noise into the ultrasonic range.

What I don’t quite get is how then noise shaping works with 16/44. Firstly, I don’t get why noise shaping is required with 16bits when its noise floor is already around -96db. Secondly, where does the noise shaped signal go if the band width is limited to 22.05khz?


Scroll ahead to 8:42 in this video, where he gets into dithering:

 
Jul 16, 2018 at 10:00 PM Post #4,983 of 7,175
I understand dithering and the points made by Monty. The question I have is more around noise shaped dither, particularly moving the energy into the higher frequencies when the band width is limited to 22.05 khz.

I appreciate that even so, being limited to 22khz that our hearing is less sensitive at higher frequencies doesn't that the extra energy have an effect? If not, why would noise shapes 8bits require a higher bandwidth than 22khz? More fundamentally, why is noise shaping beneficial at all for 16bits?
 
Jul 17, 2018 at 6:35 AM Post #4,984 of 7,175
I understand dithering and the points made by Monty. The question I have is more around noise shaped dither, particularly moving the energy into the higher frequencies when the band width is limited to 22.05 khz.

I appreciate that even so, being limited to 22khz that our hearing is less sensitive at higher frequencies doesn't that the extra energy have an effect? If not, why would noise shapes 8bits require a higher bandwidth than 22khz? More fundamentally, why is noise shaping beneficial at all for 16bits?

14:09 in that video. And listen to what he said at 17:13!
 
Last edited:
Jul 17, 2018 at 8:11 AM Post #4,985 of 7,175
What I don’t quite get is how then noise shaping works with 16/44. [1] Firstly, I don’t get why noise shaping is required with 16bits when its noise floor is already around -96db. [2] Secondly, where does the noise shaped signal go if the band width is limited to 22.05khz?
[3] ... why would noise shapes 8bits require a higher bandwidth than 22khz?

1. TBH, in most cases it isn't. If I explain it in a little more detail perhaps that would help: With 16bit, the noise floor with (standard, TPDF) dither is about -92dB, as dither typically uses 1 LSB and 16 bit un-dithered would actually have a noise floor of -98.08dB (16 x 6.02dB + 1.76dB). The vast majority of music has a dynamic range of around 48dB or less. Popular/Non-acoustic genres will hit near 0dB numerous times, overall be relatively loud and require a relatively low output level on playback. The dither noise floor at -92dB is going to be 100 times or more below the noise floor of the recording and therefore, even at loud playback volumes the dither noise floor is going to be completely inaudible. Even with classical and jazz, a -92dB noise floor is going to be completely inaudible in the vast majority of cases. However, there are a potential set of circumstances where it *could* be audible. For example, the 1812 Overture (Tchaikovsky) has cannons near the end and it could be that they produce transient peaks say 18dB above any other peak in the rest of the overture. If you wanted to playback such a recording so that the rest of the overture (excluding the cannons) sounded roughly the same volume as other classical recordings, then you'd have to increase your playback level by say 18dB and our dither noise floor would therefore be 18dB higher (effectively at -74dBFS). Assuming you have a high quality playback system (capable of +18dBSPL louder than normal), normally listen quite loudly and have a listening environment with a low noise floor, then potentially the dither noise floor could become audible. The 1812 overture is an obvious example but there are other examples which are not so obvious. For example, a hard hit on an orchestral bass drum produces a large amount of energy. It's not obvious because much of that energy is below 50Hz, where our hearing is insensitive and therefore it doesn't sound particularly loud but we could have peaks up to as much as about 12dB higher than normal. All the above requires a quite extreme set of circumstances and only applies to a tiny number of recordings, because most recordings with such unusual peaks would have those peaks reduced (compressed/limited), so the recording is suitable for consumers with good equipment/listening environments rather than only for those with excellent equipment/environments. Having said all this, it's been standard mastering practise to apply noise-shaped dither to all 16bit releases for the last 20+ years, as it only takes about a minute to apply and then you're covered, regardless of ANY music and playback scenario.

2. That's not entirely fixed, it depends on the noise-shaping algorithm and there's sometimes some user (mastering engineer) adjustment available. In general though, the shaped dither noise starts ramping up from around 10kHz and is at it's peak by about 17kHz, this deliberately coincides with human hearing; we're most sensitive at around 3kHz and have a roll-off in sensitivity starting around 5-7kHz and a steeper roll-off around 12-14kHz. With noise-shaping we don't get less dither noise, we get the same amount (or typically slightly more), so as far as RMS dither noise is concerned we've got a dither noise floor of say -90dB, but that noise is outside the range of hearing sensitivity, giving us a perceived noise floor around -120dB. This graph of noise-shaping might help you visualise the situation:

X3.png


The X-Axis is frequency and the Y-Axis represents relative dB. So with 16bit the "0dB" line represents about -96dB and the fairly flat blue (ID=99) line covering it represents standard (TPDF) dither. The other "IDs" represent different user selectable noise-shaping algorithms. You'll notice a couple of things: A. We're not actually loosing any noise energy, just redistributing it. As we reduce it from one area of freqs, we must increase it elsewhere so we end up with the same amount of RMS noise energy (exactly how this works is laid out in the Gerzon-Craven Noise-Shaping Theorem). B. That from about 600Hz upwards the shaping curve is roughly an inverse of the Fletcher-Munson equal loudness curves. For example, ID=16 (the strongest noise-shaping) gives us about -26dB less noise at 3kHz, IE. A noise floor of about -122dB with 16bit (-96dB - 26dB) but by around 17kHz we've got about 30dB more noise, a noise floor of about -66dB (-96dB + 30dB) However, assuming perfect hearing, our sensitivity is down by about 50-60dB at 17kHz and down by over 100dB at 20kHz (where our redistributed noise peaks at around +36dB). Additionally, our sensitivity rolls-off in the lower freqs starting from around 800Hz. Therefore, as far as human hearing is concerned, the noise-shaped dither noise floor of ID=16 would never sound higher than about -122dB.

3. Using my explanation and graph from point #2, let's substitute 16bit with 8bit. Our ID=99 (0dB) line now represents -48dB. ID=16 therefore represents a perceptual noise floor of -74dB (26dB lower than -48dB), while peak noise (at around 20kHz) would be at about -12dB (-48dB + about 36dB @ 20khz). However, let's say for illustration purposes that we want a perceptual noise floor of say -92dB (roughly the same as standard dithered 16bit). First of all, we're going to need another algorithm, one that is 18dB more aggressive than ID=16, so that at peak hearing sensitivity (about 3kHz) it is removing 44dB of noise instead of about 26dB (-48dB - 44dB = -92dB). Unfortunately though, this means that the peak noise level (at about 20kHz) is likewise going to be about 18dB higher than ID=16: -48dB + 36dB + 18dB = +6dBFS, which is impossible. The solution would be to increase the sample rate, say double it to 88.2kS/s. With ID=16 the highest redistributed noise levels cover a 5kHz freq band (17kHz to 22kHz). With a sample rate of 88.2kS/s we could spread that same amount of redistributed noise energy over a much larger frequency band, a 27kHz band (5kHz + the additional 22.05kHz) and thereby significantly lower it's level. ..... From all this, I hope you can see that the lower the noise floor we wish to achieve the more dither noise therefore has to be redistributed and in addition, the fewer bits we have to play with, the higher the dither noise we've got to start with. Hence why SACD, with just one bit plus a desired noise floor of about -120dB, needs a sample rate of 2.8 megahertz, to redistribute the massive amount of resultant noise.

BTW, in the example I quoted previously (Lipsh*tz & Vanderkooy), they achieved a noise floor of -120.4dB with 8 bits and they didn't follow the Fletcher-Munson curve, they simply reduced the noise by 72dB throughout the freq band of 0hz-20kHz. Obviously that would result in a lot of noise needing to be redistributed (and all of it above 20kHz), so they used a sample rate of 176.4kS/s (44.1kS/s x 4) with the redistributed noise (at -19dBFS) occupying the 20kHz-88.2kHz audio band.

G

PS. I'm not sure how easy my explanation is to understand?
 
Last edited:
Jul 17, 2018 at 12:27 PM Post #4,987 of 7,175
I appreciate that even so, being limited to 22khz that our hearing is less sensitive at higher frequencies doesn't that the extra energy have an effect?
It can. Your speakers and amplifiers will happily amplifier all that noise you have stuffed in the high frequencies 22 kHz. Tweeters can resonate in that frequency and distortion (intermodulate) into more audible band. And your amplifier can also oscillate. After all, real music doesn't have such high ultrasonic content so equipment is not designed necessarily to handle such a situation.

This is why for noise shaping you want to have higher sample rate so that you can spread the shaped dither power over a wider area.

The simple answer to all of this is not to attempt to stuff down high-resolution audio into 16/44.1 when that format is essentially gone from many of our lives. If the source is 24 bits, release it that. If it is higher sample rate, leave that alone. I can do the conversion myself if I want, thank you very much. Or you as the distributor can offer both versions. Don't force me into a spinning disc format when I am not spinning anything....
 
Jul 19, 2018 at 2:05 AM Post #4,988 of 7,175
It can. Your speakers and amplifiers will happily amplifier all that noise you have stuffed in the high frequencies 22 kHz. Tweeters can resonate in that frequency and distortion (intermodulate) into more audible band. And your amplifier can also oscillate. After all, real music doesn't have such high ultrasonic content so equipment is not designed necessarily to handle such a situation.

This is why for noise shaping you want to have higher sample rate so that you can spread the shaped dither power over a wider area.

The simple answer to all of this is not to attempt to stuff down high-resolution audio into 16/44.1 when that format is essentially gone from many of our lives. If the source is 24 bits, release it that. If it is higher sample rate, leave that alone. I can do the conversion myself if I want, thank you very much. Or you as the distributor can offer both versions. Don't force me into a spinning disc format when I am not spinning anything....
Now, realistically if you don't mind... have you ever...even once...known of an amplifier that broke into oscillation because of -70dBFS 20kHz noise-shaped dither that wouldn't have oscillated far more readily with a test signal, and been rejected in QC? Or an amplifier that couldn't deliver 20kHz to a nominal load at even 50% power?

The way IMD typically works is the intermod products that are produced are lower in amplitude than either of the intermodulating signals. So if we had some noise-shaped dither at -70dBFS at 20kHz, and it intermodulated with some audio signal at -70dBFS @ 17kHz to the level of 10%, that would put the 3kHz intermod product at -90dBFS, and it wouldn't be a discrete tone but rather a bit of noise. And that's worst case, because if you raise the 17kHz audio signal the resulting intermod will drop lower. How is this a problem?

I'm not saying you're wrong, but rather trying to keep this in the real world. It would be sad if someone saw this and said "Geez, dither is horrible! It'll ruin my amp and tweeters!" Nothing could be further from the truth. You can be both correct but unrealistic.

As to releasing all that glorious 24bit audio, talk to Apple, Spotify, Amazon, Google, eMusic, and Napster - the top half dozen online music retailers - who are releasing only 16/44. Or the top source for music streaming - YouTube - releasing in 16/44. I agree, they could move to something higher if phones, smart speakers, and DMPs could handle it, but there's not much movement in that direction. Just as there's even less recorded music where any of this would matter one tiny wit.
 
Jul 19, 2018 at 3:12 AM Post #4,989 of 7,175
Too much is never enough.
 
Jul 19, 2018 at 12:01 PM Post #4,991 of 7,175
[1] This is why for noise shaping you want to have higher sample rate so that you can spread the shaped dither power over a wider area.

[2] The simple answer to all of this is not to attempt to stuff down high-resolution audio into 16/44.1 when that format is essentially gone from many of our lives. If the source is 24 bits, release it that.

1. What "shaped dither power"? Pinnahertz mentioned a peak shaped dither noise floor at -70dB, the worst case scenario I presented above (ID=16) would result in a shaped dither peak @ 22kHz of about -60dB. In practice though, why would a mastering engineer ever choose such an aggressive noise-shaping algorithm? What music in the real world ever requires 122dB of dynamic range? In practice we would use a far less aggressive alogorithm, typically somewhere around the equivalent of ID=11 or ID=12, which would give us a perceptual noise floor of about -110dB and peak shaped dither noise at about -84dB. Any consumer equipment which suffers from IMD caused by "shaped dither power" at -84dB is going to suffer from IMD when playing back the vast majority of recordings, as they virtually all have musical signals and/or recording noise floors at levels higher than this. A higher sample rate to accommodate the redistribution of "shaped dither power" is therefore only an issue when we're dealing with the large amounts of dither power generated by very few bits. For example, the theoretical example of 8 bit above or the practical example of 1 bit (SACD).

2. The source is never 24bits, so that's the end of that "if"! Commercial mix environments are pretty much always 64bit float and as we can't distribute 64bit files we ALWAYS have to dither/truncate ("stuff down"). So it isn't a question of 24bit without dither/truncation or 16bit with dither/truncation, it's simply a question of which bit depth we dither/truncate ("stuff down") to. In practice, 16/44.1 IS high resolution, there is NO audibly higher resolution!

G
 
Jul 19, 2018 at 1:25 PM Post #4,992 of 7,175
Amirm thinks that -120dB is the proper place for a noise floor. He claims that anything less is clearly audible (if he cranks the volume up on tiny samples of fade out).
 
Jul 20, 2018 at 6:05 PM Post #4,994 of 7,175
Or you as the distributor can offer both versions.

Well, some distributors do offer many versions. Look here: https://autechre.bleepstores.com/release/98703-autechre-nts-sessions-1-4

They offer:

- Vinyl
- CD
- 24/44.1 wav
- 16/44.1 wav
- 16/44.1 FLAC
- 320 kbps mp3

I pre-ordered the CD set and I was able to download any version given. I chose 16/44.1 FLAC, because I don't need the "extra" dynamic range of 24 bit files for anything (assuming there actually is musical information below 16 bits which is possible as this is totally computer-generated music). I guess even the 320 kbps mp3s would have been totally transparent to my ears, but I played it safe.
 
Jul 21, 2018 at 5:39 PM Post #4,995 of 7,175
The ones I like lately are places that offer "studio" 24/96 and then "studio +!@11" 24/192, which I guess is twice as studio-er. They still take a back-seat to any place that deals in DXD, who format-whore like nobody's bizniss.
 

Users who are viewing this thread

Back
Top