the Basics of Digital and Analog Audio

Jul 13, 2024 at 6:39 PM Post #91 of 171
Nyquist prompt: (mult s (/ 1.0 256))
Mix to 16 bit with dither
Nyquist prompt: (mult s 256)
Yes, this will work (although I had to replace s with *track*).
Actually create 16 bit dither for silence and then (mult s 256) and add to the sound. That's the way to do it. I tested with silence so it was the same anyway, but not with signal. Sorry. Just woke up. Brain not working. Coffee needed....
I'm not sure I understand correctly. Do you mean:
  1. generate silence
  2. export to 16-bit with dither
  3. load the exported file back
  4. multiply by 256
  5. mix the result with our input signal
  6. export the mix to 8-bit without dither
Because that won't work, AFAICT. Let's say we work in 16-bit domain. In step 4 above we will get a white noise but the samples will only have one of 3 values: 0 and +/- 256. For proper conversion to 8-bit you need a dither noise where:
  • samples have any value in [-128:127] range and rectangular distribution (i.e. rectangular dither)
  • or samples have any value in [-256:255] range and triangular distribution (i.e. triangular dither)
An alternative way that seems to be working is:
  1. generate full scale white noise
  2. divide by 256
  3. mix the result with our input signal
  4. if you want triangular dither instead of rectangular one: repeat all previous steps
  5. export the mix to 8-bit without dither
 
Last edited:
Jul 15, 2024 at 10:29 AM Post #92 of 171
Yes. You can stop right there. The rest of your question is honestly nonsensical.




The signals aren’t “near perfectly correlated”. They’re precise opposites of each other because the polarity has been inverted. Other than the inversion, they are exactly the same signal. There's nothing "near" about it. If they weren't, they wouldn't cancel and you would hear more than the dither noise! (I'm speaking here of 8-bit. The noise is so low with 16-bit quantization error that it's quite a bit harder to hear it even without dither, let alone with it. This is why 8-bit makes for an easier demonstration.)

Simple test: Take two copies of the exact same sound file, load them into Audacity, and invert one of them. Play it or mix the difference into a new track. The result: Nothing. Pure silence. The two signals have completely canceled each other out.

That’s what polarity inversion does. If you subtract one signal from the other, the only sound left is the difference between the two. If they’re identical, there will be nothing. If they’re identical except for dither noise, only the dither noise will be present.




There are no “small signal amplitude variances”. If there were, we wouldn’t be looking at the same signal!




This isn’t “masking”. It’s cancellation by the precise opposite signal. That’s what polarity inversion does. Masking is a different concept that doesn’t apply here.




Again, the only thing you’ll hear when playing the original file and the inverted file simultaneously is the difference between the two files. It'll be distortion/noise correlated with the signal if the lower-bit file wasn't dithered, and it'll be uncorrelated noise if it was dithered.

As 71 dB notes above (as have several of us repeatedly), dither removes all correlation between the noise and the signal. That’s the whole point. The quantization error is effectively eliminated and you’re only left with the original signal plus uncorrelated noise. Nothing else.


If that wasn’t true, this test wouldn’t work! Again, just try it. Take a 24-bit file and convert it to 8-bit without dither, invert it (or the original; it doesn’t matter which), and play it back. The only difference between the two files will be the quantization error, and you’ll hear what that distortion sounds like.

Then do the same thing with a dithered 8-bit file against the original 24-bit. When you invert one of them, you’re subtracting one signal from the other. As 71 dB says, it’s math. Subtract one thing from the other, and you get the difference, which is just the uncorrelated noise.

The below is just my opinion, feel free to check the Math.

If an analog signal incoming to an AD converter, is as, below, in terms of db:
-3.4494955875737363636374748848485757373773733848438438838333747457484574389543895438574389564565364546359836598346596389 db

24bit rounds to:
-3.4494956 db

8 bits rounds to:
-3.45 db

-3.4494956 db =/= -3.45db


They aren't the same voltages, nor were the voltages rounded to the same quantization integer (with each integer representing unique voltages), and each unique voltage will have a unique Fourier Transform property.

Again, just my opinion.
 
Last edited:
Jul 15, 2024 at 10:35 AM Post #93 of 171
If I go to the market and buy a pound of bologna, and then buy 1.01 pounds the next week, it’s still more bologna than I can eat.
 
Jul 15, 2024 at 3:48 PM Post #94 of 171
Again, just my opinion.

Opinions are much less useful when you can have facts instead, which you seem to be going far out of your way to try to avoid. You keep pointing out the problem of quantization error while ignoring all of us who are saying, “Yes, but that was solved decades ago and here’s how.” It's quite literally a solved problem.

I’m not sure where you’re getting your numbers from, but you actually have the right idea in general up to that point.

The dynamic range of 8-bit is roughly 48 dB. Divide by the number of steps and you get about 0.19 dB per step. Each sample can’t be more than 1/2 step away from the actual signal level, so that represents a maximum error of about 0.09 dB (using round numbers). That’s pretty significant, but that’s why we don’t use 8-bit audio for listening.

For 16-bit, the dynamic range is about 96 dB (actually a bit more, but let’s keep it simple), with each step representing about 0.0015 dB, and each sample being off by no more than half that, or about 0.0007 dB, give or take.
[Edit: I had this part incorrect. Thanks to 71 dB for helping clear me up on this in posts 102-105 below. What is correct is that each sample can't be more than 1/2 step away from the actual signal level.]

Remember that the bit value only measures the point’s location on the Y-axis, which is signal amplitude. For audio signals, that relates to volume. Now stop and think about how small those variations are compared to the threshold of audibility. How easy is it to detect those variations? Obviously it’s quite a bit easier with 8-bit, but with 16-bit it can be quite difficult even if you’re trying hard with the volume cranked up. (I’ve been trying it with my own difference tracks, and even with my amp maxed out on high gain, I can’t hear it.)

No one is arguing that distortion due to quantization error isn’t there if you don’t dither. But for the typical 16-bit depth of audio files, it’s quite difficult to detect. Add dither, and the signal is accurately reconstructed without distortion, only a slightly higher noise floor. Distortion is no longer present in that case.

Maybe some pictures will help.

Here’s one I did just now. This is the difference track after mixing 24-bit and inverted 16-bit versions of the same track without dither. First off, notice how low that is? Since dither has not been applied, the noise is correlated with the signal and actually is a form of distortion, but it’s quite low in amplitude and hard to detect.

16-bit no dither 3a.png


Note in that graph that we’re zoomed in quite far, and I used -120 dB as the X-axis baseline just to make it easier to see. In fact, if I use -96 dB as the baseline, the distortion is so low that it doesn’t even show up! In other words, the distortion is below -96 dB!

It’s also worth noting, in case it’s not clear, that that noise really is what would be heard on the actual track. Take the original 24-bit track and the difference track and play them together, and you get exactly what the 16-bit version sounds like (the sum of the 24-bit and the difference signals).

So that distortion is quite low, but still, it is distortion, and we don’t want that in our 16-bit conversion even if it’s unlikely to be heard. So we dither. And here’s the difference track when using dither:

16-bit dither 3a.png


The noise is now uncorrelated with the signal and the distortion is gone. Notice that the level is a bit higher than the distortion was. That’s the (more than acceptable) tradeoff with dither: We eliminate the distortion completely but end up with a slightly elevated (but still very quiet) noise floor.

You mentioned masking previously, and some references call it that: using noise to mask distortion. But that’s not really what it is. Dithering doesn’t cover up distortion that’s still present. It completely replaces the distortion with uncorrelated noise. The bit values that corresponded to the distortion are gone and have been replaced with the uncorrelated noise values. Everything in the 24-bit and 16-bit versions is identical except for that uncorrelated noise, which is no louder than about -90 dB or so.
 
Last edited:
Jul 15, 2024 at 3:53 PM Post #95 of 171
They aren't the same voltages, nor were the voltages rounded to the same quantization integer (with each integer representing unique voltages), and each unique voltage will have a unique Fourier Transform property.

Again, just my opinion.
How could they be the same voltages? Them not being same voltages is the reason why 8 bit audio has a lot higher noise level than 24 bit audio. In fact the noise level in 8 bit audio is so high it can't be used for high fidelity purposes. With noise shaped dither 8 bit audio is okay for medium quality applications.

However, if we ignore the dither noise, the fidelity is exactly same for 24bit, 16bit and 8bit audio. There is just more noise the fewer bits we have (about +6 dB for every bit). About 13 bits is enough to drop the noise level below audible levels in all reasonable listening scenarios.
 
Jul 15, 2024 at 4:28 PM Post #96 of 171
Maybe some pictures will help.

Here’s one I did just now. This is the difference track after mixing 24-bit and inverted 16-bit versions of the same track without dither. First off, notice how low that is? Since dither has not been applied, the noise is correlated with the signal and actually is a form of distortion, but it’s quite low in amplitude and hard to detect.

16-bit no dither 3a.png

Note in that graph that we’re zoomed in quite far, and I used -120 dB as the X-axis baseline just to make it easier to see. In fact, if I use -96 dB as the baseline, the distortion is so low that it doesn’t even show up! In other words, the distortion is below -96 dB!
Correct! The level of quantization noise (no dither) in 16 bit is -98.1 dB.

It’s also worth noting, in case it’s not clear, that that noise really is what would be heard on the actual track. Take the original 24-bit track and the difference track and play them together, and you get exactly what the 16-bit version sounds like (the sum of the 24-bit and the difference signals).

So that distortion is quite low, but still, it is distortion, and we don’t want that in our 16-bit conversion even if it’s unlikely to be heard. So we dither. And here’s the difference track when using dither:

16-bit dither 3a.png

The noise is now uncorrelated with the signal and the distortion is gone. Notice that the level is a bit higher than the distortion was. That’s the (more than acceptable) tradeoff with dither: We eliminate the distortion completely but end up with a slightly elevated (but still very quiet) noise floor.

You mentioned masking previously, and some references call it that: using noise to mask distortion. But that’s not really what it is. Dithering doesn’t cover up distortion that’s still present. It completely replaces the distortion with uncorrelated noise. The bit values that corresponded to the distortion are gone and have been replaced with the uncorrelated noise values. Everything in the 24-bit and 16-bit versions is identical except for that uncorrelated noise, which is no louder than about -90 dB or so.
The noise level looks worse than it is. Dither noise is more "spiky" than quantization noise, but those spikes contribute little to the noise power. The level of TPDF dither in 16 bit audio is -95.1 dB if I am not mistaken.
 
Jul 16, 2024 at 3:04 AM Post #97 of 171
The below is just my opinion, feel free to check the Math.
If an analog signal incoming to an AD converter, is as, below, in terms of db:
-3.4494955875737363636374748848485757373773733848438438838333747457484574389543895438574389564565364546359836598346596389 db
24bit rounds to:
-3.4494956 db
8 bits rounds to:
-3.45 db
-3.4494956 db =/= -3.45db

Again, just my opinion.
I don’t need to check your math, I can tell just by looking at it that it’s wrong. The math you’ve presented is correct, the reason it’s wrong is because of the required math that you have omitted! Namely (as others have explained), the required application of dither, which gives us back our exact analogue signal (-3.4494955875737363636374748848485757373773733848438438838333747457484574389543895438574389564565364546359836598346596389 db) plus some inaudible uncorrelated noise.

Again, you are demonstrating that you don’t understand the fundamental basics of how digital audio works. You are conceptualising it as if it were actually analogue and not digital. Your example would result in that loss of information if it were an analogue process but of course it is not, it’s digital. Unlike with analogue, the principle of digital audio is to quantise the signal, effectively (temporarily) loosing some of the amplitude information (as you indicated), but then very precisely “reconstructing” all of that lost information when converted back to analogue.

Another point you’re apparently ignoring is what would happen to your original incoming analogue signal if you tried to record and reproduce it in the analogue domain. Roughly, without working it out, you would get an output accurate to about 3.449dB and then all the subsequent numbers would be incorrect, and that’s assuming a particularly good analogue recording and reproduction chain! You’re also ignoring the fact that the precise value of your analogue input signal is meaningless anyway, because the analogue noise floor and distortion of the microphone and mic pre-amp which created that analogue signal means that everything beyond about the 4th decimal place is just self noise.

G
 
Jul 16, 2024 at 8:49 AM Post #98 of 171
I’m not sure where you’re getting your numbers from, but you actually have the right idea in general up to that point.

The dynamic range of 8-bit is roughly 48 dB. Divide by the number of steps and you get about 0.19 dB per step. Each sample can’t be more than 1/2 step away from the actual signal level, so that represents a maximum error of about 0.09 dB (using round numbers). That’s pretty significant, but that’s why we don’t use 8-bit audio for listening.

The math you’ve presented is correct, the reason it’s wrong is because of the required math that you have omitted!

G

Good to hear we agree the Math is good up to the point of dither.
 
Jul 16, 2024 at 12:08 PM Post #99 of 171
Good to hear we agree the Math is good up to the point of dither.
I didn’t actually check but I’m willing to accept it because it doesn’t make any difference and “up to the point of dither” is the actual conversion point during the analogue to digital conversion process, so there are hardly any “points” before that in the digital domain!

G
 
Jul 16, 2024 at 12:56 PM Post #100 of 171
Error is existing any where.
What was the point on OP, zizag looking of digital sinal?
By a ton of factors, no one can play a track twice on a tape with the same sound!
That's analog way.
To compare,
Electric flow stableness + Friction tape v.s magnet head + tape position accuracy + RFI interference compare to jitter when sampling + quantization error.
 
Jul 16, 2024 at 8:58 PM Post #101 of 171
For 16-bit, the dynamic range is about 96 dB (actually a bit more, but let’s keep it simple), with each step representing about 0.0015 dB, and each sample being off by no more than half that, or about 0.0007 dB, give or take.
...
(Edit: Actually, I notice the difference tracks have higher variation than I mention above, I assume because the step values are actually higher closer to the X-axis, i.e., at lower amplitudes, but isn't each step the same size with linear PCM? I welcome clarification from you professionals on here. Either way, it's still ridiculously quiet.)

I had a “smack the forehead” moment and realized what’s happening here. Just had to wrap my brain around it. I was mixing up fluctuating error values in the difference track (all under -96 dB but otherwise all over the map, many dB apart) with the step values (very small) and not seeing how they relate.

The last eight bits in a 24-bit sample effectively cover the dynamic range below -96 dB or so. Of course the samples will be all over the map in the difference track! There are 256 possible values in that range! But in the 16-bit conversion, all that matters is whether that impacts the 16th bit. The error will be no more than 1/2 step, which should be about 0.0007 dB. [Edit: The last half of that sentence was incorrect. 71 dB got me on the right track with the posts below.]

I was over-complicating it in my head.


It’s also worth noting, in case it’s not clear, that that noise really is what would be heard on the actual track. Take the original 24-bit track and the difference track and play them together, and you get exactly what the 16-bit version sounds like (the sum of the 24-bit and the difference signals).

A small correction here: If you subtract the difference track from the 24-bit original, you get the 16-bit version. Put another way, you’re still summing, but it’s the original 24-bit file plus the inverted difference track.

I tested it and compared the binary content of the files just to make sure I'd worked it out properly in my head. The 24-bit result is the same as the 16-bit track, except every third byte is 0x00. In other words, where the 16-bit version has, say, 0xA7B4, the result of adding the inverted difference file to the original 24-bit version would be 0xA7B400. (The byte order in the file is actually reversed from that since it’s stored in little-endian, but you get the idea.)
 
Last edited:
Jul 17, 2024 at 1:45 AM Post #102 of 171
The error will be no more than 1/2 step, which should be about 0.0007 dB.
To me it doesn't make sense to calculate the error in decibel this way. Decibel scales are based on reference levels. What is the reference here? Signal level? But signal level can be anything! When dither is not used, the quantization error indeed is ±1/2 steps (±∆/2) at most. The error has equal probability density of 1/∆ over the range -∆/2 to +∆/2. The power P of the quantization error is calculated by integrating the squared error 𝛆 weighted (multiplied) by the probability density function over the range of error:

P = ∫(𝛆²/∆)d𝛆 = 𝛆³/3∆ + C

When integrated from -∆/2 to +∆/2 this gives

P = ∆²/12.

In 16 bit digital audio ∆ is 2^-15 because the 2^16 = 65536 quantization levels are divided into positive and negative levels. Hence

P(16bit) = ∆²/12 = (2^-15)²/12 = 1/(12*2^30).

For calculating dynamic range, P serves as the reference level. The maximum signal power level depends on the waveform, but sine-wave is often used. For sine-wave with maximum amplitude of 1 (2^15∆ in 16 bit) the power is 0.5 since the rms level is 1/√2 and power is rms level squared. We can finally calculate the dynamic range of 16 bit digital audio WITHOUT dither as

Dynamic range = 10*log10 (0.5/(1/(12*2^30))) = 98.1 dB.

The maximum quantization error (∆/2) compared to maximum signal level (∆*2^15) give difference of

20*log10 ((∆/2) / (∆*2^15)) = 20*log10 ((1 / 2^16)) = -96.3 dB.

When dither is used, the situation changes of course. For TPDF dither the maximum noise amplitude needs to be ∆ in order to remove completely the correlation between quantization and signal, but on the other hand the probability density function is triangular favoring smaller amplitudes. The power of TPDF dither is ∆²/6 which is 3 dB higher than quantization error (∆²/12) without dither. The total noise power when using TPDF dither is ∆²/12 + ∆²/6 = ∆²/4 which is 4.77 dB higher than quantization error alone. This means dynamic range of about 98.1 dB - 4.8 dB = 93.3 dB.

EDITED: Some typos corrected + added more clear description of the total noise power.
 
Last edited:
Jul 17, 2024 at 2:03 AM Post #103 of 171
The dynamic range of 8-bit is roughly 48 dB. Divide by the number of steps and you get about 0.19 dB per step. Each sample can’t be more than 1/2 step away from the actual signal level, so that represents a maximum error of about 0.09 dB (using round numbers). That’s pretty significant, but that’s why we don’t use 8-bit audio for listening.
Only now I realized how you calculate these things. 48 dB dynamic range and 256 steps doesn't mean 48 dB/256 per step! Decibel scale is logarithmic. You can't tread it linearly this way. Sample value "2" is 6 dB higher than sample value "1". Sample value "3" is 9.5 dB higher etc. The steps are not "equal" on dB scale. The lowest steps are huge and they get smaller and smaller going up. The difference between the 2 highest steps in 8 bit is

20*log10 (128/127) = 0.07 dB which is almost 100 times smaller than what it is between step 1 and 2! In fact the difference between step 0 and 1 is infinite dB! Of course in real life there is no absolute silence, so this is only theoretical.
 
Last edited:
Jul 17, 2024 at 8:41 PM Post #104 of 171
Only now I realized how you calculate these things. 48 dB dynamic range and 256 steps doesn't mean 48 dB/256 per step! Decibel scale is logarithmic. You can't tread it linearly this way. Sample value "2" is 6 dB higher than sample value "1". Sample value "3" is 9.5 dB higher etc. The steps are not "equal" on dB scale. The lowest steps are huge and they get smaller and smaller going up. The difference between the 2 highest steps in 8 bit is

20*log10 (128/127) = 0.07 dB which is almost 100 times smaller than what it is between step 1 and 2! In fact the difference between step 0 and 1 is infinite dB! Of course in real life there is no absolute silence, so this is only theoretical.

Ah, okay. So what I initially assumed was correct: Step values closer to the X-axis are larger. Then I went down a bad rabbit hole, partly because I read (and probably misunderstood) a reference to a minimum dB step difference in an article somewhere. My bad.


To me it doesn't make sense to calculate the error in decibel this way. Decibel scales are based on reference levels. What is the reference here? Signal level? But signal level can be anything! When dither is not used, the quantization error indeed is ±1/2 steps (±∆/2) at most. The error has equal probability density of 1/∆ over the range -∆/2 to +∆/2. The power P of the quantization error is calculated by integrating the squared error 𝛆 weighted (multiplied) by the probability density function over the range of error:

P = ∫(𝛆²/∆)d𝛆 = 𝛆³/3∆ + C

When integrated from -∆/2 to +∆/2 this gives

P = ∆²/12.

In 16 bit digital audio ∆ is 2^-15 because the 2^16 = 65536 quantization levels are divided into positive and negative levels. Hence

Thanks for walking through all that. I confess my calculus is very rusty. I had to refresh myself on integrals and draw some things out. I also spent way too much time today reading about all this to clear up my confusion, but it's interesting. (After reading/skimming a ton of articles and references, I ended up thinking this was the most helpful one I came across.)

This has all helped me visualize it a lot better.


P(16bit) = ∆²/12 = (2^-15)²/12 = 1/(12*20^30).

Shouldn't that be: 1/(12*2^30)


For calculating dynamic range, P serves as the reference level. The maximum signal power level depends on the waveform, but sine-wave is often used. For sine-wave with maximum amplitude of 1 (2^15∆ in 16 bit) the power is 0.5 since the rms level is 1/√2 and power is rms level squared. We can finally calculate the dynamic range of 16 bit digital audio WITHOUT dither as

Dynamic range = 10*log10 (0.5/(1/(12*20^30))) = 98.1 dB.

Same typo here... There's an extra zero: 10*log10 (0.5/(1/(12*2^30))) = 98.1 dB.

I find it a bit easier to read if multiplying by the reciprocal: 10*log10 ((0.5)(12*2^30))
 
Last edited:
Jul 18, 2024 at 5:39 AM Post #105 of 171
Ah, okay. So what I initially assumed was correct: Step values closer to the X-axis are larger. Then I went down a bad rabbit hole, partly because I read (and probably misunderstood) a reference to a minimum dB step difference in an article somewhere. My bad.
On linear scale all the steps are of course the same size*, but on logarithmic scales (dB scale in this case) they are not. If you walk steadily, your every step is about the same size (say 20 inches/50 cm), but on logarithmic scale 200 steps is 100 steps doubled, but so is 14 steps compared to 7 step.

* If a DAC produces 2 volts for maximum signal level (-2 volts for minimum signal), the step size in 8 bit is (2 - (-2))/256 = 1/64 = 0.015625 V.

Thanks for walking through all that. I confess my calculus is very rusty. I had to refresh myself on integrals and draw some things out. I also spent way too much time today reading about all this to clear up my confusion, but it's interesting. (After reading/skimming a ton of articles and references, I ended up thinking this was the most helpful one I came across.)
Things get rusty for us. I need to do this kind of things myself too every now and then. I try to produce this kind of "walk through" presentations (on subjects I feel I know enough about) whenever I feel it may help others to understand better things under discussion.

This has all helped me visualize it a lot better.
I'm glad it helped!

Shouldn't that be: 1/(12*2^30)
Oh, yes! I need to correct my post! Thanks for noticing this typo!

Same typo here... There's an extra zero: 10*log10 (0.5/(1/(12*2^30))) = 98.1 dB.
I copy pasted stuff, so...

I find it a bit easier to read if multiplying by the reciprocal: 10*log10 ((0.5)(12*2^30))
Yeah, I actually thought about that when making the post.
 
Last edited:

Users who are viewing this thread

Back
Top