[1] The 'perfect waveform' is the part I'm missing, at least in the amplitude domain...Thanks all, got it now I think. My problem was that I was still looking at amplitude sampling as being recorded and reproduced as some fixed temporal event (sigh, stairsteps I guess, even though I thought I knew that wasn't the case) when just as in the frequency domain this isn't actually the case.
[2] The actual process is non-intuitive and difficult to visualize without a clear explanation and as a result is probably understood only by a small fraction of a percentage of those professing opinions concerning 'hi-res' audio.
1. Maybe the difficulty you were/are having is that you were trying to separate the issue into two different "domains" (amplitude and frequency)? While it's sometimes useful to do this for the sake of "vizualization" in reality they are not separate/different domains, they're exactly the same thing. A sine wave (for example) is effectively defined as: An increasing amplitude until a "peak" is reached, then a decreasing amplitude until the "trough" is reached and then an increasing amplitude again until the starting point is reached. We call this a "cycle" and frequency is simply the number of cycles per second. In other words, Frequency = Amplitude (over time).
2. As I mentioned in my first response to you, it's all a question of getting your head around it (or "vizualing" as you put it) and that requires an explanation which works for you personally. I'm not sure that "the actual process is non-intuitive", I think it depends on the "visualization" you have to start with. For example, if you take someone who's never thought about how digital audio works, who's effectively a "blank canvas" with no preconceived "vizualization" then I don't think the process is counter-intuitive but it's more difficult and more counter-intuitive if you do have preconceived notions of how it works (the "stairstep" notion for instance). Many audiophiles for example, seem to have the notion that digital audio is effectively analogue audio but with digital data, IE. Analogue audio creates an "analogy" of actual sound waves using an electrical current and digital audio creates an "analogy" of actual sound waves using digital data. This view/"vizualisation" is incorrect and leads to a bunch of further incorrect assumptions, such as; more data (bits or sample points) results in digital audio data which is a closer/higher resolution "analogy" to the actual sound waves. In reality though, digital audio is effectively just a sequence of data points which allows the sound waves to be "reconstructed" through the application of some mathematical processes, digital audio data is NOT analogous to the sound waves.
[1] I believe, and I'm no expert here so others may chime in, that dither just adds white noise to the signal which decorrelates the amplitude errors.
[1a] So the errors are still there but it now randomised at the cost of a higher noise floor which sounds like tape hiss ...
1. You're not exactly wrong but not exactly correct either. The way you appear to be looking at dither leads to some incorrect conclusions/assumptions. Instead of thinking about dither in terms of actual white noise, try more in terms of what it actually is, a mathematical function. Standard dither is usually abbreviated to TDPF, a Triangular Probability Density Function, which is a rather off-putting term for the layman but can be thought of as: A sort of mathematical equation which randomises errors, the end result effectively being; that ALL the error is converted into white noise.
1a. By looking at dither this way, you can hopefully see that your statement is incorrect: Firstly, the errors are not "still there", the errors are completely gone, they've been converted into white noise and Secondly, the "noise" is not higher, it's the same. The difference is that with dither we end up with a constant low level amount of white noise, while without dither we end up a non-constant amount of signal distortion but in both cases we end up with the same overall "amount". This is of course the logical conclusion using this view, as all we're doing is converting the error into white noise. ... Dither is a prerequisite of digital audio, without it the conditions required to achieve the "Sampling Theorem" cannot be met, in much the same way as not applying an anti-alias filter at half the sampling rate fails to meet the required conditions. Dither is therefore always automatically applied during the quantisation process, as is an anti-alias filter, both are intrinsic to the process of digital audio. In other words, dither does not raise the noise floor, it's what defines the noise floor in the first place!
We've also now got what's called noise-shaped dither, which became commercially available in the early 1990's, in response to the growing requirement of "re-quantisation". Re-quantisation became a requirement when high-end digital recording and mixing moved beyond 16bit (initially to 20bit) and therefore needed to be re-quantised down to 16bit for consumer distribution. Without another round of dither, the re-quantisation process would introduce "truncation error", which is effectively a slightly more severe form of quantisation error. Noise-shaped dither was introduced to effectively maintain the 20bit dynamic range but in a 16bit file format. As pinnahertz effectively stated, our resultant white noise is "shaped", it's no longer "white", it's concentrated in areas where our hearing is least sensitive and is therefore inaudible.
BTW, all the above is not exactly correct or incorrect either! It's just another way of looking at the issue, a way which avoids some incorrect conclusions/assumptions.
[1] Hence better perceptual sound quality = perceptual coding of a kind even if totally different from perceptual coding methods developped for music/high quality audio.
[2] You're splitting hairs here.
1. Throughout your argument with pinnahertz you seem to have missed the fact that "perceptual coding" has a specific and well defined meaning. You are incorrectly equating better perceptual sound quality with perceptual coding. "Perceptual coding" at least partially relies on "auditory masking" and reducing the amount of data by removing masked frequencies. On the other hand, "Better perceptual sound quality" can be achieved in numerous ways which do not rely on or even directly involve "auditory masking". A simple EQ or filter, noise reduction, compression, expansion or other processes can all produce better perceptual sound quality but are NOT "perceptual coding".
2. No, he wasn't. It's an important distinction with wide ranging ramifications. Plus, if we're going to accept your definition of "perceptual coding", what new/different term are we going to use for actual perceptual coding? It seems to be a bit of a trend, someone makes an incorrect statement of fact and when called out on it, responds with "you're splitting hairs", "just because I can't find the right words doesn't mean I'm wrong" or "you just like to argue for argument sake". I'm not sure if this type of response is simply an attempt to deflect or minimise the fact they've been caught making-up (or recounting) incorrect facts or whether it's because they really don't understand "science" or enough about "sound" to appreciate why made-up, incorrect statements of fact are so important.
G