[1] Thanks for this information, didn't know this little fact. Well, actually I could infer this, but I didn't.
[2] Also true, but what I meant was more a perfect scenario, in real life almost all recordings don't get close to -120 dB, and this a limit even most (not state-of-the-art) electronics struggle to get consistently in terms of SFDR and other measurements. 16 bits with dither should be enough for transparency.
1. It's covered to an extent in the OP. Dither is and always has been a requirement of digital audio recording. In the very earliest days (1960's and much of the 1970's), the technology didn't exist to perform dither in the digital domain, so analogue white noise was injected. Digital dithering was far more controllable/efficient and eliminates (linearises) ALL quantisation error. At the start of the 1990's, when bit depths greater than 16 became available in the pro audio world, noise-shaped dither was invented specifically for the situation of a higher than 16bit recording (or mix) that had to be converted to 16bit for consumer distribution.
2. Again though, it depends on what you mean by "
perfect scenario". Do you mean a scenario that is perfect within the "laws of physics", EG. Is never achieved in the real world but potentially/theoretically could be. Or, do you mean a hypothetical scenario which ignores the "laws of physics", EG. Could never exist in the real world?
For example, it might be theoretically possible to construct a recording venue with a 0dB SPL noise floor and, it is theoretically possible for say a large symphony orchestra to produce peak levels at 120dB SPL (if one records in or very near the orchestra). So theoretically wouldn't we have a dynamic range of 120dB (a noise floor at -120dB)? No, we wouldn't! Hypothetically we might but not in theory, because a large symphony orchestra requires 90+ musicians, all of whom are breathing, moving and therefore raising the noise floor considerably. And even in theory, a large symphony orchestra with no musicians or 90+ dead musicians is going to struggle to produce a 120dB peak level!
We've also got a problem with the theoretical max dynamic range of mics and in combination with a mic pre-amp, which at the very least is going to add thermal noise. We have a similar problem at the other end of the chain, with reproduction. Even if we have a recording with a noise floor of -120dB and we reproduce it with a DAC that has a noise floor of -120dB, we now have a noise floor of 117dB. However, we obviously can't listen to the output of an DAC, we first have to amplify the analogue output of the DAC and then convert it to acoustic sound waves. So at the very least we've got the added thermal noise of the amp and speakers/HPs, not to mention the listening environment noise floor. In theory, the best we could achieve is probably somewhere around 100-110dB dynamic range but the best commercial studios actually manage is about 90dB (at huge cost!). Lastly, we have to consider that commercial music/sound recordings are, by definition, entertainment. They are designed to be "comfortable" with moderately priced equipment, they are NOT designed to require $1m worth of construction/reproduction equipment and go so far beyond "comfortable" that we're push the limits of human hearing safety. This limits dynamic range to around 40dB-50dB or in the case of a niche market, to about 60dB or so.
G