Opinions are much less useful when you can have facts instead, which you seem to be going far out of your way to try to avoid. You keep pointing out the problem of quantization error while ignoring all of us who are saying, “Yes, but that was solved decades ago and here’s how.” It's quite literally a solved problem.
I’m not sure where you’re getting your numbers from, but you actually have the right idea in general up to that point.
The dynamic range of 8-bit is roughly 48 dB. Divide by the number of steps and you get about 0.19 dB per step. Each sample can’t be more than 1/2 step away from the actual signal level, so that represents a maximum error of about 0.09 dB (using round numbers). That’s pretty significant, but that’s why we don’t use 8-bit audio for listening.
For 16-bit, the dynamic range is about 96 dB (actually a bit more, but let’s keep it simple), with each step representing about 0.0015 dB, and each sample being off by no more than half that, or about 0.0007 dB, give or take. [Edit: I had this part incorrect. Thanks to 71 dB for helping clear me up on this in posts 102-105 below. What is correct is that each sample can't be more than 1/2 step away from the actual signal level.]
Remember that the bit value only measures the point’s location on the Y-axis, which is signal amplitude. For audio signals, that relates to volume. Now stop and think about how small those variations are compared to the threshold of audibility. How easy is it to detect those variations? Obviously it’s quite a bit easier with 8-bit, but with 16-bit it can be quite difficult even if you’re trying hard with the volume cranked up. (I’ve been trying it with my own difference tracks, and even with my amp maxed out on high gain, I can’t hear it.)
No one is arguing that distortion due to quantization error isn’t there if you don’t dither. But for the typical 16-bit depth of audio files, it’s quite difficult to detect. Add dither, and the signal is accurately reconstructed without distortion, only a slightly higher noise floor. Distortion is no longer present in that case.
Maybe some pictures will help.
Here’s one I did just now. This is the difference track after mixing 24-bit and inverted 16-bit versions of the same track without dither. First off, notice how low that is? Since dither has not been applied, the noise is correlated with the signal and actually is a form of distortion, but it’s quite low in amplitude and hard to detect.
Note in that graph that we’re zoomed in quite far, and I used -120 dB as the X-axis baseline just to make it easier to see. In fact, if I use -96 dB as the baseline, the distortion is so low that it doesn’t even show up! In other words, the distortion is below -96 dB!
It’s also worth noting, in case it’s not clear, that that noise really
is what would be heard on the actual track. Take the original 24-bit track and the difference track and play them together, and you get exactly what the 16-bit version sounds like (the sum of the 24-bit and the difference signals).
So that distortion is quite low, but still, it is distortion, and we don’t want that in our 16-bit conversion even if it’s unlikely to be heard. So we dither. And here’s the difference track when using dither:
The noise is now uncorrelated with the signal and the distortion is gone. Notice that the level is a bit higher than the distortion was. That’s the (more than acceptable) tradeoff with dither: We eliminate the distortion completely but end up with a slightly elevated (but still
very quiet) noise floor.
You mentioned masking previously, and some references call it that: using noise to mask distortion. But that’s not really what it is. Dithering doesn’t cover up distortion that’s still present. It completely replaces the distortion with uncorrelated noise. The bit values that corresponded to the distortion are gone and have been replaced with the uncorrelated noise values. Everything in the 24-bit and 16-bit versions is identical
except for that uncorrelated noise, which is no louder than about -90 dB or so.