I just want to comment on the terminology. If noise shaped or gaussian dithered into 20bits vs 24bits, we are sacrificing bit depth by definition. However, we are able to encode more "dac-usable" information into those bits when noise shaping is used when taking into consideration the limitations of the dac.
Whenever re-quantization occurs - i.e., when converting from a higher bit depth (such as 64 bits used during computation) to a lower bit depth (as required by a DAC), you have three options:
Truncation:
Simply rounding the signal to a lower bit depth introduces harmonic distortion that is audible and should be avoided. This occurs because the quantization noise becomes correlated with the music signal; signals within a specific range consistently fall into the same quantized bins. Consequently, the noise is modulated by the music, resulting in a harsher, more digital sound - akin to viewing a pixelated image. Here, the noise is defined as the difference between the truncated (quantized) signal and the original high-bit-depth music signal.
Dithering:
Dithering involves adding a small amount of randomness (using methods such as triangular or Gaussian dithering) to the least significant bit. This randomness prevents the music signal from falling into predictable quantized bins, resulting in a more natural and less digital sound even though no additional information is added. Essentially, dithering decorrelates the quantization noise introduced by truncation; in this case, the noise is the difference between the quantized-and-dithered signal and the original high-bit-depth music signal.
Noise Shaping:
While truncation preserves dynamic range and dithering slightly reduces it (with the benefits of dithering outweighing this minor loss), noise shaping can effectively increase the dynamic range. This technique is particularly useful for linearizing DACs that require such correction. Noise shaping employs a feedback mechanism to filter the error (i.e., the noise), attenuating noise at lower frequencies while allowing higher frequencies to pass. In effect, the quantization noise is "pushed" to a frequency range where it is less audible. Depending on the design of the noise shaper and the extent of in-band noise attenuation, the dynamic range can be increased within the bandwidth of interest. This approach allows more information to be "packed" into a narrow frequency band (typically below 100 kHz), effectively increasing the bit depth in that band. Although noise shaping is highly beneficial for R2R DACs, its advantages are less obvious for delta-sigma DACs, where linearity is less critical and focus shifts to increasing dynamic range. Even with noise shaping, truncation occurs first; therefore, it is generally advisable to apply dithering before computing the error to avoid high-frequency distortions. Though I use 'noise shaping', I mean dithered noise shaping. The only exception is when noise shaping is used for 1-bit modulators where dithering is not possible.
How much nonlinearity is too much
- For arguments sake assume R2R which is perfectly linear to 8 bits and then it has 8 bits with -0.1db deviation
- Which one sounds best when upsampling to 16fs, just using all 16bits or noise shaping to 8 bits?
- Asking as many people seem to prefer still using those less linear resistors with some small deviation
Objectively, you want to linearize every bit for maximum accuracy. However, the point at which additional linearization is unnecessary depends on when you stop hearing a difference. If you can distinguish between a 20‑bit dithered signal and a 20‑bit noise-shaped signal, then you have your answer. Additionally, you must separate personal preference from technical accuracy; a more accurate signal does not automatically mean you will prefer it.
Does noise shaping sound artificial? (Not a technical question as such, but people don't share the same taste)
- To me noise shaping, especially higher order noise shaping, sounds unnatural vs. gaussian dither
- Yes, this is subjective preference, but in the end we do all this to enjoy, which is always subjective
- Understanding why some people enjoy dither over noise shaping would help us make better products, potentially understand our hearing system better and even ourselves
I prefer noise shaping, though the objective answer is clear, you may need to do a poll to get a sense of how the distribution is, not very scientific though.
May isn't perfectly linear but to 20bits -> explanation needed as in theory noise shaped 20bits should have more information
- At least if we assume that anydelineation means that we should ditch the bit
- I personally don't know the formula that we should use to calculate if the bit should be kept or not, other than GoldenOne's suggestion of SNR / 6
Below is the linearity measurement performed by JA on
Stereophile. Linearity is straightforward: as you increase the input signal level, the DAC’s output should increase proportionally. If the output sometimes deviates - going up or down unexpectedly - that will introduce distortion. In the graph below, the blue trace represents how the output level of a 1 kHz signal (shown on the left y-axis) changes with the input level (x-axis) as it increases from -140 dB to 0 dB. Ideally, this blue line should be perfectly straight from one corner to the other, but you can see it is slightly curved near -140 dB. Since these small deviations are harder to notice, the error is plotted separately in red. This error is calculated as the ratio (x/y) expressed in dB; when x equals y, the ratio is 1, which is 0 dB. Therefore, you want the red line to stay as close to 0 dB as possible.
In the graph you see the error settles to very close to 0 starting about -120dB, and approximately 120dB/6 = 20. The reason behind this is very simple, the smallest number you can represent using 20 bits is 1/2^20, and that value in dB is 20 * log10(2^-20) ~ = -120dB. Each bit contributes a doubling of the resolution, so if you go to 21 bits 20 * log10(2^-21) ~ = -126dB. Each doubling is a change of 6dB because 20 * log10(2^-1) or 20 * log10(0.5) ~= -6dB (i.e., each halving decrement by 6dB). So, you could simply divide by 6 to get the approximate useful number of bits.
Noise shaping doesn't add extra bits; it simply leverages the linear region of the graph (i.e., from -120 dB to 0 dB) to encode more information. How is this possible? How can 20 bits effectively store the equivalent of 64 bits of information?
A simplified example can illustrate the concept. Suppose you have a bit depth of 1, meaning you can only represent two values: -1 and 1 (with -1 corresponding to a bit value of 0 and 1 corresponding to a bit value of 1). Now, what if you want to represent 0.5? You would need a bit depth of 2. One way to achieve this is by increasing the sampling rate fourfold—for instance, from 44.1 kHz to 4 × 44.1 kHz. If you transmit a sequence such as [1, 1, 1, -1] (four samples instead of one), the average value is (1 + 1 + 1 - 1) / 4 = 0.5. This averaging acts as a simple low-pass filter. Since we cannot hear frequencies above 20 kHz, our auditory perception effectively filters out the noise (averages in some sense), allowing us to perceive 0.5 even though each individual sample is only 1-bit.
This intentionally simplistic example demonstrates the principle behind noise shaping. In practice, moving average filters are commonly used to convert 1-bit DSD to PCM.