I think it's probably better to try to develop separate understandings of images and audio, as some mechanisms and mathematics are similar, but others are not.
On a side note, some better image upscaling algorithms are nonlinear (I think there could be some filtering too? I know even less about image processing), so you might say that they're slightly reducing the information by upscaling... not just keeping the same level. Some type of (ideal...?) sinc-based interpolation may introduce noticeable ringing across pixels—perceptually there are better methods. I think it's more complicated.
But anyway, if you want to make an analogy, I'd make it this:
Imagine a system where each pixel can take one of 2^24 colors (8 bits each for R, G, and B). You are displaying on a system with an infinite number of pixels, each so small that you couldn't possibly distinguish individual ones. So if you're looking at some area of the display and some pixels are [130 0 0] and some others are [129 0 0], effectively you see a color between [130 0 0] and [129 0 0] (and how close it is to one or the other depends on the likelihood of each). That's pretty much spatial dithering taken to the extreme. Temporal dithering would be if the pixels rapidly shifted back and forth between [130 0 0] and [129 0 0], with the time spent in each part determining the resulting color. Some monitors do that too, especially on 6-bit (18-bit?) displays.
But that doesn't quite work, as audio output waveforms don't actually represent the individual sample points but what they represent as a whole, whereas for displays each pixel actually does represent what that sample point is.
Yes, upsampling will actually reduce the base signal (except if you go for factors of 4 or multiple of 4 with certain scalers), I oversimplified, but it seemed like a good analogy to me as it is visual, easy to try and understand. Of course "color resolution" (bitdepth) is the closer equivalent, but imo much harder to visualize. Still, I think your example proofs my point, if one source pixel is 130 and the other is 129 the one in between would appear as 129.5 to our eyes, yes, but if the source would have the same high resolution as the display, any value between 0 and 255 would be possible for this pixel as we can't know what would have been sampled in real life for this point (greatly simplified, as the capturing device probably doesn't capture aliased images, so the values 130 and 129 are already the result of interpolation, they are either way, but thats nitpicking...). Perhaps its really just a matter of perspective if you consider the resolution infinite as well as separate from DR or if you consider it finite and actually "defining" the DR (considering that we deal with a function of samples per time, resolution of the waveform results resolution of amplitude and frequency at the same time, depending on POV).
Edited by LordOctron - 5/26/13 at 1:54pm