I think in many (most) contexts, accuracy is defined without reference to perception. This is not even considering all of science, engineering, statistics, etc. If you want to get some produce at the grocery store, you need to weigh the contents in a bag. What is the accuracy of the scale you are using? It's just the difference between the actual weight and the reading from the scale, which is just a single numerical value that doesn't rely on how the produce looks or smells. Maybe somebody else's bananas look yellower or seem bigger or heavier, but that has no bearing on how accurate the scale was. If you have more than one type of measurement or more than one sample, fine. You can include all of the measurements or do some data reduction and summarize the key points via some statistics (i.e. functions on those measurement samples).

There are many types of signals in the real world, electrical and not. One such type of signal is a audio waveform. The statistics we use to describe any type of signal can also be applied to audio. There are well-defined quantities such as the mean value, peak-to-peak amplitude, and variance--as well as others and higher-order statistics--that we can use to convey information about the signal. When comparing the input and output signals, there are even more statistics that describe the differences between the two such as mean squared error.

If we have a list of all the time-domain measurement samples of A and B like you suggested, at 44.1 kHz for a minute duration, then we can generate some statistics on the data. These will describe A and B in a meaningful way. The mean squared error would be one indicator of the accuracy of B (given A as a reference). This value would be one way of describing the real-world difference between A and B. Audibility and preception are not relevant to this value, since the statistic is a function of the samples only. This statistic has a far-from-trivial meaning for many different applications and many different types of signals. Distortion likewise means something.

These objective statistics are good even if we are concerned with human experiences because they will be the same no matter who calculates them.

Many of these statistics correlate well with perception, too. If they had no bearing on what people heard, then we would still be able to calculate them, though we wouldn't bother. If the correlation coefficient (well, there are multiple ways to mathematically define correlation, but any should do) between A and B is close to 1 rather than close to 0.8, then they will sound much more similar. Of course, audio is perceived in both frequency and time, so frequency-domain statistics are generally more interesting than time-domain statistics.

But, as you say, to conflate the statistics to average human perception, you need outside knowledge in the form of psychoacoustics. If the correlation between A and B was close to -1 (i.e. B is a 180 degree phase shift of A), then they would actually sound the same. Second harmonics for distortions are hardest to perceive and least offensive.

Somebody could use all this information from psychoacoustics to create an aggregate statistic that predicts how similar B sounds to A in terms of average human perception, or what we're calling "perceived accuracy." In a way, this has already been done by those working on lossy music encoding (e.g. LAME), among other things. It's not perfect, but a VBR -V0 encoding has a consistent quality level or accuracy throughout, according to some metric. Encoding algorithms got better over time because the statistics used for accuracy better predicted human perception. More precisely, the new encoding algorithms created files perceived as closer to the original at equivalent bit rates, whether constant or variable. It's definitely not that the entropy of the same source music files changed.

Edited by mikeaj - 9/3/10 at 12:28pm