Yes, one should replace RMS with a perceptual loudness measure. My main concern is how to automate the process to, say, parse an entire album collection without having to pull out the abacus for each album and get: a) a reasonable absolute measure and b) at least a consistent ordering.
it's something I often wondered when looking at different resolutions for albums. it would probably be worth it to band limit the signal at a fix value within the audible range(maybe even lower?) before looking into the dynamic numbers.
The first problem we have is that there's no absolute definition of dynamic range, it's got a very different meaning for musicians than it has for sound engineers for example. 0dB (or very close) at the high end is fairly straight forward but at the low/quiet end we run into difficulties. We could just say it's the noise floor and be done with it but then the term "dynamic range" doesn't really mean anything relative to human hearing, which is a problem because in it's broadest (and most common) sense dynamic range is the range between the loudest and quietest note/signal. The problem being that we can hear signals somewhat below the noise floor, depending on the signal and noise floor. Given the ideal signal and triangular dither (white noise), we could hear as much as 15dB below the noise floor but of course, we couldn't just say that dynamic range is the noise floor plus an extra 15dB because in real life recordings we're unlikely to have an ideal signal or a purely random (white noise) floor. This problem gets considerably more complex if we're talking about noise-shaped rather than standard triangular dither. Typically, noise-shaped dither is added at a higher level than triangular and would therefore register a smaller dynamic range if we're measuring the noise floor as an RMS. Typically, using RMS noise as the minimum value, we would see a CD having a dynamic range of about 92dB and with noise-shaped dither, more like 89dB.
Bare with me at this point, I know you both already know this but you'll see why I need to recap: Noise-shaped dither moves the noise out of the critical hearing band and redistributes it into less sensitive bands, typically above 12kHz on CD, with peak level at around 17kHz or so. In the early days, 20 odd years ago, noise-shaped dither was a simple choice, you either chose noise-shaped, triangular or no dither but around 15 or so years ago you could choose the parameters of the noise-shaping. We could choose more/higher shaping, which would lower noise (increase dynamic range) in the critical band and increase noise in the less critical, high freq band. However, the more shaping applied, generally the less efficient it is at decorrelating the quantisation error and therefore the more you need to use. The RMS noise floor with a high degree of shaping might could be around -86dB. On the other side of the coin, the whole point of noise-shaped dither is to provide a wider dynamic range in terms of what we would (theoretically) hear. With fairly standard/default settings (which was the only choice to start with), we're likely to get a dynamic range on CD of about 120dB but at it's most extreme settings, we could achieve a maximum dynamic range of about 150dB. Effectively (and typically) with noise-shaped dither, as the RMS noise floor gets higher, the dynamic range gets wider, which of course is counter-intuitive. Just to complicate things further, I wrote "typically" because not only can we choose the degree/severity of the shaping but the overall amount of dither. In other words, we could choose to increase the degree/severity of the shaping but not increase the overall (RMS) amount of dither, although we would then run the risk of not de-correlating all of the quantisation error.
In practise then, the choice of noise-shaped dither settings is a trade-off: As we increase the amount of shaping, we (often/typically) increase the overall amount of RMS noise, so not only do we have more overall dither noise but it's more concentrated in a relatively small (very high) frequency band. Whatever detail we may have in the very high freqs could possibly be lost in the dither noise and a significant level of noise in the very high freq band might cause issues downstream (consumer playback or broadcast equipment for example). Therefore, the very highest noise-shaping (giving theoretically 150dB of dynamic range) is rarely, if ever used ... however, this assertion is a guess! I do not know what settings other mastering engineers use, I personally don't have a specific setting, I usually use the default but sometimes I use something different depending on the material. Castleofargh's suggestion of measuring the dynamic range in just the critical hearing band could/might therefore be unrepresentative (depending on the trade-offs chosen by the engineer) and that's without considering the fact that it might still be possible to (theoretically) hear slightly/somewhat below even that band-limited noise floor, depending on the exact nature of the signal.
Just in case all that's not confusing enough, we've also got the noise floor of what was recorded (and then mixed) to consider, IE., the acoustic noise floor plus the analogue/self-noise of mics and mic-pre amps and as this is not purely white/random, it could interact with the signal. Lastly and perhaps most importantly, all the above regarding the digital noise floor would typically be completely inaudible! There's little evidence that even truncation error at the 16bit level is audible, let alone triangular dither and even less chance with noise-shaped dither. So, why do we even bother applying noise-shaped dither, let alone choosing specific settings? It's very much a "just in case" scenario, as we cannot accurately predict every potential consumer listening circumstances/equipment. Particularly with lower quality equipment/circumstances, we could encounter significant over emphasis in areas of the spectrum which coincide with dither or shaped dither, which might, at very high levels, theoretically create a scenario where hearing a difference between dither and noise-shaped dither is a possibility.
I mention all this because if we just measure RMS levels then we could end up with a figure of about 84dB for CD but if we take the dynamic range we could theoretically encode, then it's 150dB, a rather large difference. If we want to measure this arguably more appropriate perceptual dynamic range, how do we do that? We can only say the dynamic range is 150dB (for example), as a rough guesstimate, we can't directly measure it because it's perceptual. In RRod's case, he appears to be talking about the dynamic range of a recording, the noise floor of the recording rather than the digital noise floor, which is much higher in magnitude. However, there's still the potential problem of signals in that noise floor which even if not directly recognisable as music do affect perception, for example differences between room tones, as bigshot mentioned. We're not talking about anywhere near the massive magnitude of differences between RMS and perceptual dynamic range which occur with noise-shaped dither though.
G