Decorrelation is not the same as removing quantization noise. It prevents the quantization noise from being modulated by the music signal, but it does not eliminate the quantization error itself.
Dither absolutely does eliminate the quantisation error itself, assuming the correct amount of standard TDPF dither. That’s the whole point of dither.
Unless you apply noise shaping, dither is not a substitute for bit-depth.
Increasing bit depth reduces the amount of quantisation error, dither (with or without noise shaping) eliminates quantisation error entirely.
For example, 8-bit audio with dither is not the same as 24-bit audio with dither and 8-bit audio will have an increased noise floor due to quantization error. While dither may reduce harmonic distortion caused by quantization error, 8-bit audio will still sound less detailed and more smoothed out compared to 16-bit or 24-bit audio.
Dither will eliminate all quantisation error, converting it to white (uncorrelated) noise. The dither noise floor will be very significantly higher with 8 bit because there’s significantly more quantisation error, so any detail below about 48dB will be buried in the dither noise floor. With 16bit, that dither noise floor is below both the noise floor of recordings and audibility given a reasonable playback level. This is why the CD standard was specified with 16bits rather than 8bit and even 16bit is somewhat overkill.At a 44.1 kHz sampling rate, you're limited by the Nyquist frequency of 22.05 kHz. Yes, you can apply psychoacoustic noise shaping to gently push some of the noise above 2–3 kHz, and then more steeply between 20 kHz and 22 kHz. But there’s very little headroom to work with, and some of that shaped noise still remains within the audible band.
At a 44.1 kHz sampling rate, you're limited by the Nyquist frequency of 22.05 kHz. Yes, you can apply psychoacoustic noise shaping to gently push some of the noise above 2–3 kHz, and then more steeply between 20 kHz and 22 kHz. But there’s very little headroom to work with, and some of that shaped noise still remains within the audible band. You might gain the equivalent of 1 or 2 extra bits, but that’s not enough to accommodate a significant reduction in bit depth—certainly not down to 1, 5, or even 8 bits.
There’s plenty of headroom for noise-shaped dither to 16bit 44.1kHz, human hearing drops of dramatically above about 14kHz and is almost non-existent above 17kHz in adults and in fact, the equivalent of 3-4 extra bits is routinely achieved with no audible dither noise (at reasonable listening levels). You’re correct that it wouldn’t be enough to accommodate a reduction in bit depth to 1 or 5 bits and probably not for 8 bits but then no one ever tries to!
Most software-based upsampling filters fit this definition of relatively fast roll-off starting around 19–20 kHz, with the exception of a few designed specifically for lower-bitrate formats. These software filters often offer selectable characteristics, such as linear-phase, minimum-phase, or mixed-phase. While not all may be optimal, most are objectively superior to the filters implemented within typical DAC hardware.
If they are optimal (do not cause any audible artefacts) then “objectively superior” is just “on paper”, so they might be preferable to someone who likes better performance figures on paper but they cannot be audibly preferable.
Even if the effective dynamic range for a specific genre of music is say 60dB, music signal varies continuously (i.e., infinitely granular) within this range. If 10bits were used to accommodate 60dB dynamic range, then there are exactly 1024 levels available to represent, one of the bits being the sign bit, it is really about 512 levels in the postive and negative direction. The question is if 512 levels are enough, given we are used to living in an analog world where the resolution is infinite.
Sorry but we do not live in an analogue world, we live in an acoustic world and regardless, neither the acoustic nor the analogue worlds have infinite resolution. The “
levels available to represent” are largely irrelevant because we do not output those discrete levels, we reconstruct the continuously varying analogue signal from those discrete levels. The question isn’t if the number of discrete levels available with 10bit is enough because no one uses 10bit and it’s proven that even just 2 levels is enough (SACD for example). The roughly 32,000 available levels with 16bit is way more than enough. Lastly, very few recordings have a dynamic range of 60dB, the vast majority have half or less than half that dynamic range.
I was pointing to the fact that looking for the lowest audible level may not be the best approach.
Why? If it’s below the lowest audible level, IE. Is inaudible, or is so low in level it can’t even exist as sound in the first place, then what does it matter?
With dither alone, DAC non-linearity will become an additional variable.
Why would dither noise that’s well below the noise floor of any recording cause a non-linearity in a DAC?
The reason I am asking these questions is to move the focus away from the lowest audible levels to when bit-depth stops to matter (assuming you are interested in exploring it).
Again, why would you want to move the focus away from what’s audible to what isn’t?
I mean, noise shaping does improve depth as well, but when I turn noise shaping on, it changes the sound qualitetively. It cleans up the sound, but sound kind of loses its edge.
The term “noise shaping” is short for “noise-shaped dither”. The applied dither is noise-shaped, not the recording. It does not affect the recording, it does not “clean up the sound”, change the sound qualitatively or improve the depth. If applying noise shaped dither makes any audible difference at all, there is something very seriously wrong with how it’s being applied!
G