He mentioned noise shaping and dither, but that part could indeed be a bit more detailed.
When I started this thread, the idea was to make as simple an explanation as possible, so it could be understood by Head-Fi'ers who may not have much interest in the deep technical detail. My last post followed the same vein. There's a broad spectrum of people on this site and it's a tough ask to write an article which covers all of them. In my last post I tried to balance as little detail as possible with enough detail to try and avoid rendering what I wrote too inaccurate.
Quote:
When you quantise an analogue signal, the noise introduced can actually be non-linear. This distortion can be audible. To prevent this, we can randomise the quantisation errors so that the noise is ergo random and spread uniformly across the frequency spectrum. This way, the noise introduced (called white noise) is linear. This process is known as dithering. Going even further, we can can distribute this noise at varying amounts for different frequencies. This exploits the fact that we're more sensitive to some frequencies than others, so the frequencies that we're more sensitive to should have less noise than those that we're less sensitive to. The total amount of noise is still the same; only the distribution is different.
I'm not sure I would use the term linear or non-linear in this context, correlated or de-correlated would perhaps be more accurate and less confusing. The act of dithering is the act of de-correlating the quantisation error, which is essentially just a fancy and more precise way of saying the errors are turned into random noise. If we're going into the finer detail, I'm not sure the last sentence I've quoted of yours is entirely accurate; generally noise shaped dither would actually slightly increase the total amount of noise but of course significantly decrease the amount of audible noise. However, modern mastering dither processors allow the mastering engineer to select the amount of dither applied as well as (and independently from) the amount of re-distribution (noise shaping) of that noise, so the answer to this point is not entirely clear cut. Another factor to consider is where we can (or should) apply noise shaped dither. Where and how dither is applied in ADCs and processors can get pretty complex and difficult to understand, not least because the designers and the companies they work for tend not to want to divulge this information. When mixing/mastering though we generally do not want to noise shape the dither of every channel of sound in the mix, because if we start summing many channels of sound together all with that dither noise concentrated in the same frequency band we may introduce unwanted audible artefacts in broadcast limiters and even when converting to lossy codecs. Generally, when dither is required during mixing only standard TDPF (non-noise shaped) dither is used and as it's spread evenly across the spectrum there is no build up or concentration of noise in any one frequency band, just a 3dB increase in noise for every dither summed channel, we then we apply a noise shaped dither as the very final mastering process. In most mixing environments today though the bit depths are so high that you don't need to apply dither while mixing because even the correlated noise from truncating is still massively below audibility. It's generally only when reducing to 16bit where correlated noise errors are of any potential concern. It should be noted that even when truncating to 16bit there's relatively little evidence that this correlated noise (truncation error) can be heard at normal listening levels. So the application of noise shaped dither is to some extent a "playing it safe, just in case" approach.
There is one part I am struggling with, quoted above. How is it impossible to filter a very high sampling rate according to the demands of the Nyquist Theorem? In any case, we don't go to 192 KHz in order to try and record signals up to 96 KHz. We do so in order that we can use a simpler, lower order, low-pass anti-aliasing filter.
Nyquist demands that the signal is band limited. This means applying anti-alias and anti-imaging filters to remove the error signal above the Nyquist Point (fs/2). In the case of 16/44.1 it's relatively trivial to accomplish 120dB or more of attenuation in the stop band (the range of frequencies above the Nyquist Point) and therefore reduce anti-aliasing to below the digital noise floor. But with 24/192 we have a great deal more processing to accomplish but no additional time in which to accomplish it. At these very high sample rates and bit depths we start hitting the limits of the laws of physics in how fast we can perform the calculations required to implement a filter which reduces anti-aliasing to below the digital noise floor. The only way this is likely to change is with a new paradigm in processing, for example quantum computing could in theory solve the problem! All professional ADCs initially sample at incredibly high rates (many megahertz) but they do so with a greatly reduced bit depth, 5 bits or so generally. In other words, you either have more bandwidth OR more accuracy but not both! This is borne out in tests and in manufacturers' specifications; generally at 24/192 anti-alias attenuation is only accomplished down to around -80dB which results in distortion across the entire frequency spectrum, including the audible band! It's unlikely (but not impossible) that this failure to achieve sufficient anti-aliasing to fully satisfy the Nyquist Theorem is going to be audible but nevertheless, this additional distortion does mean that in theory at least 24/192 is lower fidelity than 24/96. For the same reason, 24/384 and 32/384 performs even worse than 24/192 and is even lower fidelity. Given the choice, no knowledgeable music recording engineer would ever record at anything higher than 24/96 but they are sometimes not given the choice by the record companies employing them. Unfortunately the audiophile world is driven by marketing more than by fidelity!
G