24bit vs 16bit, the myth exploded!
Mar 24, 2017 at 12:38 AM Post #3,766 of 7,175
  So indeed, if I mock up reconstruction using SoX (resample to some ungodly high rate and keep things at 32 bits), the difference between the reconstructions of the original file and the 8-bit version end up peaking at -42. I picked a hell of a day to quit drinking. Thanks for that; I'll keep trying to digest what's happening, because I'm totally not grokking it.

it's sad that we're not water brothers, not even beer brothers. makes communicating harder.
I tend to naturally think the way you do/did, while knowing it's wrong, which is how my brain seems to enjoy irony the most. keeping the flawd logic while adding the extra "oh BTW you remember that it's wrong right?"
angry_face.gif
.  but yeah practical tests tend to agree with Greg(he probably altered our reality just to be right, you know him). I'm guessing we're having the same faulty logic saying that if I remove X bits, I'm only losing the accuracy of those removed bits. in term of data it's true.
 
but thinking differently, the LSB is almost always wrong compared to the actual signal amplitude it's coding, and adding 1 more bit decreases that error, making it the size of that new LSB. but an error still. the last bit gives the magnitude of the error.  does that make sense?(it's totally what Greg is saying but from a dumbed down brain).
 
Mar 24, 2017 at 6:40 AM Post #3,767 of 7,175
  So indeed, if I mock up reconstruction using SoX (resample to some ungodly high rate and keep things at 32 bits), the difference between the reconstructions of the original file and the 8-bit version end up peaking at -42. I picked a hell of a day to quit drinking. Thanks for that; I'll keep trying to digest what's happening, because I'm totally not grokking it.

 
Let's try an analogy: Let's say we take an adult man and amputate one of his legs above the knee. The difference between the man before and after the amputation isn't just the difference between the gap (nothing) which now exists and the part of his leg which has been amputated, there's quite a big difference in what's left of the man himself: For starters he's got a severe wound to deal with and it's also going to cause other changes/differences in his body, metabolic (and probably numerous other) changes for example. Furthermore, there are also going to be significant differences between this adult amputee and the same man had he been born with a congenital defect which resulted in exactly the same gap/nothing above the knee. Likewise in digital audio, there's a difference between running out of resolution/bits when recording and chopping off resolution/bits which did once exist. In fact, truncation error is double the quantisation error in terms of the RMS of the error signal.
 
I've looked around for a few mins and can't find a quote of the actual math to back this up but I seem to remember that the RMS of quantisation error is just under 0.3 LSB (and therefore about 0.575 LSB for truncation). The LSB in the context of this discussion being the 8th bit, not say the 9th or 16th bit which has been truncated (and then padded with zeros).
 
I also looked around for an actual example, so you can hear for yourself, which will hopefully help you to get your head around it. I came up with this example (which is actually a link from the previous article posted). While we can't in practise create an 8bit recording (with only quantisation error) for comparison with an 8bit version truncated from a 16bit version, this example does demonstrate well the difference between a 16bit original and the various different ways of arriving at 8bit versions; truncation, dithering and noise-shaped dithering. However we get to 8bit from 16bit though, there is ALWAYS going to be error in those remaining 8bits, regardless of the fact that they're identical to the 8 MSBs in the 16bit original.
 
One last point. While the same math applies to bit reductions of any depth, there are additional factors to consider when doing a bit reduction from 24bit to 16bit. Additional factors which are commonly ignored and which significantly change the result!
 
Quote:
  ... but yeah practical tests tend to agree with Greg(he probably altered our reality just to be right, you know him).

 
Hmm, it's true that I in effect do often try to alter the perceived reality but I don't think I'm doing it just to be right, I'm doing it to get closer to what is really going on under the hood. Having said this, my perception of what is going on under the hood has it's limits and is not perfect, so maybe in effect it is just about me being wanting to be right, hmmm?! On the other side of the coin, many of the questions/discussions revolve around issues which I investigated as long as 20 years or so ago and in many cases have continued to improve my understanding ever since.
 
The fact is that under the hood, digital audio is ultimately of course all math, much of which requires requires a quite highly educated mathematician to fully understand. Furthermore, even a great mathematician who would find digital audio math simplistic, wouldn't have the years of audio engineering experience necessary to be sure they're actually taking into account ALL the math relevant to a particular issue/topic of discussion. Us pro audio engineers are not mathematicians though, we're only ultimately concerned with the perceptual results of employing the math. So like the more educated consumers, we have to rely on layman's terms and analogy, although with experience we should have a far better understanding of what underpins those layman's terms and analogies. Ultimately though, if we desire to go further down the rabbit hole, we have to trust/consult others, of which there are incredibly few who are willing to speak publicly, have enough independence from marketing, a deep enough understanding of the math and a broad enough understanding of practical audio to stand a good chance of considering ALL the relevant math. Bob Katz, Paul Frindle, Dan Lavry and just a few others have fit the bill for me over the course of many years.
 
So when I see a misapprehension due to the inevitable, inherent inaccuracies of layman's terms and analogies, I'll try to come at the issue from a different angle and with some different analogies, to hopefully create a more comprehensive understanding. This approach could easily be seen as "trying to alter your reality"! I could in theory just quote the math but I don't think that would help most here and besides, I commonly don't know the math (because it's proprietary) and even when I do, I often don't understand it comprehensively enough to discuss it in mathematical terms. So while I might sometimes allude to the underlying math, I try to avoid going too far down that hole.
 
G
 
Mar 24, 2017 at 8:57 AM Post #3,768 of 7,175
it was a kind of follow up joke. with Stranger in a strange land, fullness and what one can do with it... 
redface.gif

soz.
 
Mar 24, 2017 at 8:59 AM Post #3,769 of 7,175
Is this what you are talking about, essentially the difference between sample peak and true peak?
 
sox -n -c 1 -r 48k -b 16 noise16.wav synth 60 whitenoise gain -1
sox noise16.wav -b 8 noise8.wav -D
sox noise8.wav -b 16 noise16pad.wav
sox -m noise16.wav -v -1 noise16pad.wav diff.wav
sox diff.wav -r 192k diff-ups.wav
sox diff.wav -n stats 2>&1 | grep "Pk lev dB"
# Pk lev dB -48.16
sox diff-ups.wav -n stats 2>&1 | grep "Pk lev dB"
# Pk lev dB -42.18
 
Mar 24, 2017 at 9:05 AM Post #3,770 of 7,175
@gregorio I don't find the amputation example too revealing, as I keep thinking (probably wrongly) "how have things changed if you reattach the limb?" But I can see somewhat of a problem from the math perspective, in that truncation and rounding a different operations. If we wanted to round off 126 to 2 places, we really want 130 but truncation+pad gives us 120. But I'm still not sure if that's really relevant, and even if it is I still need to flesh out how it works in reconstruction. As you stated, being a math guy I am perfectly fine if you give me a proof, but I have zero instinct when it comes to storing numbers on a computer.
 
Quote:
  Is this what you are talking about, essentially the difference between sample peak and true peak?
 
sox -n -c 1 -r 48k -b 16 noise16.wav synth 60 whitenoise gain -1
sox noise16.wav -b 8 noise8.wav -D
sox noise8.wav -b 16 noise16pad.wav
sox -m noise16.wav -v -1 noise16pad.wav diff.wav
sox diff.wav -r 192k diff-ups.wav
sox diff.wav -n stats 2>&1 | grep "Pk lev dB"
# Pk lev dB -48.16
sox diff-ups.wav -n stats 2>&1 | grep "Pk lev dB"
# Pk lev dB -42.18

 
Yes, so my question is if this is just two ways of talking about the same issue. Is the fact that a sinc will sometimes clip intimately related to the peak value of our errors? I'm all jumbled...
 
Mar 24, 2017 at 10:01 AM Post #3,771 of 7,175
  it was a kind of follow up joke. with Stranger in a strange land, fullness and what one can do with it... 
redface.gif

soz.

 
Rather than a joke I will consider it more like a multi feet slaughtery
redface.gif

Since 16 bits to 8 bits' truncation can be considered for the sampled signal as:
  1. an apodization (foot removal) when the 8 MSB are kept into 8 bits
  2. an apodization with 'sticks' when the 8 MSB are kept into 16 bits and the 8 LSB are filled with 0.
Add one more apodization during FFT windowing (Blackman Harris) for the fun.
 
Indeed, IMHO, it was a great post from @gregorio
 
Mar 24, 2017 at 10:14 AM Post #3,772 of 7,175
   
Rather than a joke I will consider it more like a multi feet slaughtery
redface.gif

Since 16 bits to 8 bits' truncation can be considered for the sampled signal as:
  1. an apodization (foot removal) when the 8 MSB are kept into 8 bits
  2. an apodization with 'sticks' when the 8 MSB are kept into 16 bits and the 8 LSB are filled with 0.
Add one more apodization during FFT windowing (Blackman Harris) for the fun.
 
Indeed, IMHO, it was a great post from @gregorio


At the very least a skilled surgeon would round the LSB rather than hack it off.
 
Mar 24, 2017 at 11:49 AM Post #3,773 of 7,175
  Is this what you are talking about, essentially the difference between sample peak and true peak?

 
Nope, that's a different issue unrelated to the one currently under discussion. Intersample peaks is an issue of upsampling/reconstruction rather than of bit reduction/reconstruction. Reducing the bit depth without resampling does not affect the value of the MSBs.
 
  [1] I don't find the amputation example too revealing, as I keep thinking (probably wrongly) "how have things changed if you reattach the limb?"
 
[2] But I'm still not sure if that's really relevant, and even if it is I still need to flesh out how it works in reconstruction.

 
1. That is "probably wrongly". If you reattach the limb (perfectly) things haven't changed in the slightest (maybe not in this analogy but if we removed and then replaced those 8 LSBs). I can see how you could infer from this that all the change therefore occurs only in those 8 LSBs but this is wrong because those 8 MSBs which we're left with are in effect no longer a coherent number (as far as a sinc function is concerned). Likewise our amputee isn't just the identical man he was with the only difference being in the removed limb, his metabolism and the rest of his body have been affected by the loss of that limb. Let's try another analogy, let's say we have a 4 cylinder car engine and we disable two cylinders, do we now have a car which is identical to before except it has half the horse power? We can re-enable those cylinders and the car will work perfectly again but disabling two cylinders has an effect on that car beyond just those two missing cylinders. In fact, if we could even get the engine to start in the first place, it will probably rip itself to pieces pretty quickly because it's out of balance. We can (and do) build 2 cylinder engines which work absolutely fine but there's a huge difference between a 2 cylinder engine and a 4 cylinder engine with only two working cylinders/pistons, even though the number of working cylinders are exactly the same. That's what's happening here, the 8 remaining bits are exactly the same as the 8 MSBs in our 16bit original, but this 8 bit file is supposed to be a 16bit file and those remaining 8 bits are effectively now "out of balance".
 
2. Absolutely, I mentioned before that we HAVE to consider reconstruction, how those numbers are going to relate to a reconstructed signal! For reconstruction, we need a series of numbers and and to consider what will happen when the sinc function "joins the dots", then the issue becomes one of statistics, of probability distributions. Trying to take a single number and imagining what happens if we just reduce it's accuracy is not going to give us the full picture of what's going on.
 
BTW, rounding error is essentially what we get with quantisation error. We can also round truncation error but the results are still very different to each other. If we round the error causing truncation distortion we still get truncation distortion, although presumably somewhat less.
 
Did you have a listen to the demo I linked to?
 
G
 
Mar 24, 2017 at 12:24 PM Post #3,774 of 7,175
 
 
Nope, that's a different issue unrelated to the one currently under discussion. Intersample peaks is an issue of upsampling/reconstruction rather than of bit reduction/reconstruction. Reducing the bit depth without resampling does not affect the value of the MSBs.
 
 
1. That is "probably wrongly". If you reattach the limb (perfectly) things haven't changed in the slightest (maybe not in this analogy but if we removed and then replaced those 8 LSBs). I can see how you could infer from this that all the change therefore occurs only in those 8 LSBs but this is wrong because those 8 MSBs which we're left with are in effect no longer a coherent number (as far as a sinc function is concerned). Likewise our amputee isn't just the identical man he was with the only difference being in the removed limb, his metabolism and the rest of his body have been affected by the loss of that limb. Let's try another analogy, let's say we have a 4 cylinder car engine and we disable two cylinders, do we now have a car which is identical to before except it has half the horse power? We can re-enable those cylinders and the car will work perfectly again but disabling two cylinders has an effect on that car beyond just those two missing cylinders. In fact, if we could even get the engine to start in the first place, it will probably rip itself to pieces pretty quickly because it's out of balance. We can (and do) build 2 cylinder engines which work absolutely fine but there's a huge difference between a 2 cylinder engine and a 4 cylinder engine with only two working cylinders/pistons, even though the number of working cylinders are exactly the same. That's what's happening here, the 8 remaining bits are exactly the same as the 8 MSBs in our 16bit original, but this 8 bit file is supposed to be a 16bit file and those remaining 8 bits are effectively now "out of balance".
 
2. Absolutely, I mentioned before that we HAVE to consider reconstruction, how those numbers are going to relate to a reconstructed signal! For reconstruction, we need a series of numbers and and to consider what will happen when the sinc function "joins the dots", then the issue becomes one of statistics, of probability distributions. Trying to take a single number and imagining what happens if we just reduce it's accuracy is not going to give us the full picture of what's going on.
 
BTW, rounding error is essentially what we get with quantisation error. We can also round truncation error but the results are still very different to each other. If we round the error causing truncation distortion we still get truncation distortion, although presumably somewhat less.
 
Did you have a listen to the demo I linked to?
 
G
 

 
Yeah I've gone through audiocheck's stuff before. This paragraph from there seems to be related to my issues:
 
By down-converting the 16-bit file into 8-bit, every sample now gets truncated to one of 256 possible values (the original had 65,536 possible values). Severe quantization distortion occurs. The loss of clarity below -36 dBFS and the absence of any signal below -48 dBFS are the typical limitations of 8-bit audio files.

 
So we see both the -48 value and the higher -36 that you mentioned. But I can hear stuff happening way before the -36dB mark, so I'm still at a loss for how you make a # for "where does the bit depth start to suck because of truncation". But I guess what I'm hearing proves your point: the errors happen way above -48dB.
 
 
The car analogy doesn't really do it for me, sadly.
 
Mar 24, 2017 at 1:23 PM Post #3,775 of 7,175
  [1] I can hear stuff happening way before the -36dB mark, so I'm still at a loss for how you make a # for "where does the bit depth start to suck because of truncation".
 
[2] But I guess what I'm hearing proves your point: the errors happen way above -48dB. ... The car analogy doesn't really do it for me, sadly.

 
1. Ah, OK. I'm not sure that there is a calculable # (peak value)! There's two issues at play here: A. Quantisation error is quite evenly distributed and quite decorrelated from the signal, therefore the RMS value, which we can easily calculate is useful and gives us a fair indication of it's likely peak value (very roughly -42dB). This isn't the case with truncation error though, it's not evenly distributed and it's correlated to the signal, so although we know it's RMS value, it's peak value, being input signal dependent, could be far higher than the RMS would suggest. It's possible that there is a way to calculate it's peak value but I personally don't know it. My ~-36dB peak figure was no more than a reasoned guess to be honest! B. The result of truncation error is non-harmonically related spikes, which sounds a bit like odd-harmonic distortion and to which human hearing is particularly sensitive, so regardless of it's actual peak level, it sounds more noticeable than it's values would suggest (even if we knew it's values!). In the link provided, I can discern distortion even in the first example (0dB), although it's obviously more noticeable in subsequent examples/levels.
 
2. Shame, I thought the engine analogy worked quite well, because the error isn't in the two cylinders removed, it's in what's left of the engine (the 8 MSBs), which probably wouldn't even start. It's good though that the audio demos helped, at least you know it's there now. All you've got to do now is develop your own understanding of how/why it's that way, unless someone else can figure out a more helpful analogy than I've managed to come up with! :)
 
G
 
Mar 24, 2017 at 2:08 PM Post #3,776 of 7,175
   
1. Ah, OK. I'm not sure that there is a calculable # (peak value)! There's two issues at play here: A. Quantisation error is quite evenly distributed and quite decorrelated from the signal, therefore the RMS value, which we can easily calculate is useful and gives us a fair indication of it's likely peak value (very roughly -42dB). This isn't the case with truncation error though, it's not evenly distributed and it's correlated to the signal, so although we know it's RMS value, it's peak value, being input signal dependent, could be far higher than the RMS would suggest. It's possible that there is a way to calculate it's peak value but I personally don't know it. My ~-36dB peak figure was no more than a reasoned guess to be honest! B. The result of truncation error is non-harmonically related spikes, which sounds a bit like odd-harmonic distortion and to which human hearing is particularly sensitive, so regardless of it's actual peak level, it sounds more noticeable than it's values would suggest (even if we knew it's values!). In the link provided, I can discern distortion even in the first example (0dB), although it's obviously more noticeable in subsequent examples/levels.
 
2. Shame, I thought the engine analogy worked quite well, because the error isn't in the two cylinders removed, it's in what's left of the engine (the 8 MSBs), which probably wouldn't even start. It's good though that the audio demos helped, at least you know it's there now. All you've got to do now is develop your own understanding of how/why it's that way, unless someone else can figure out a more helpful analogy than I've managed to come up with! :)
 
G

 
Well I think your 1) is quite helpful in getting there, as I can certainly understand that the reconstruction of truncation errors that, by happenstance, end up to be "noise-ish" can have a much different peak value than the reconstruction of errors that look more like a square wave. So in essence the -48dBFS sample difference maps to different actual peak values depending on the nature of the signal. I guess that's my great sin here: treating samples as though they were actual signal.
 
Mar 24, 2017 at 3:29 PM Post #3,777 of 7,175
  [1] I can certainly understand that the reconstruction of truncation errors that, by happenstance, end up to be "noise-ish" can have a much different peak value than the reconstruction of errors that look more like a square wave.
[2] So in essence the -48dBFS sample difference maps to different actual peak values depending on the nature of the signal.
[3] I guess that's my great sin here: treating samples as though they were actual signal.

 
1. Yes, although two points: "happenstance" should actually be "statistical probability distribution", IE. Will definitely be quite near to perfect white noise. And secondly, truncation error sounds rather squarish but doesn't look much like a square wave.
 
2. I'm having difficulty understanding this. By "-48dBFS sample difference" do you mean the 9th and subsequent bits which have been removed? If so, then obviously they can't "map" to anything because they no longer exist. If you mean the 8th bit, then the relationship between the 8th bit and the 9th bit no longer exists (because there is no 9th bit) which will cause the sinc function to create spurious non-harmonic tones as a consequence of this error.
 
3. Essentially yes, or rather, treating a sample as a signal instead of treating them as a bunch of samples and the analogue signal they'll be converted into. This is an easy trap to fall into because a stream of digital samples is effectively a signal, it's just not related to an analogue signal until it's been decoded by a sinc function. This has several consequences, for example, I was confused by your statement #2 because of your use of "-48dB". -48dB is the quantisation noise floor dictated by 8 bit, however -48dB is not the limit of the signal we can encode in 8 bit, in theory the limit of 8 bit (or any other bit depth) is any signal level down to minus infinity(!), the question is instead: At what point can we still detect that signal within the noise floor? As the linked demo shows, with TDPF dithering we can resolve down to -54dB and if we noise-shape the dither, down to -66dB. The Nyquist-Shannon Theorem is true at all bit depths, which is why it doesn't specify a bit depth! I think you're confusing sample value with what a sinc function will actually render in response to a stream of sample values.
 
G
 
Mar 24, 2017 at 3:30 PM Post #3,778 of 7,175
Wonderful article. I laughed my ass off at the part about dying instantly if you had the equipment to reproduce 24bit audio.
 
Mar 24, 2017 at 3:51 PM Post #3,779 of 7,175
   
1. Yes, although two points: "happenstance" should actually be "statistical probability distribution", IE. Will definitely be quite near to perfect white noise. And secondly, truncation error sounds rather squarish but doesn't look much like a square wave.
 
2. I'm having difficulty understanding this. By "-48dBFS sample difference" do you mean the 9th and subsequent bits which have been removed? If so, then obviously they can't "map" to anything because they no longer exist. If you mean the 8th bit, then the relationship between the 8th bit and the 9th bit no longer exists (because there is no 9th bit) which will cause the sinc function to create spurious non-harmonic tones as a consequence of this error.
 
3. Essentially yes, or rather, treating a sample as a signal instead of treating them as a bunch of samples and the analogue signal they'll be converted into. This is an easy trap to fall into because a stream of digital samples is effectively a signal, it's just not related to an analogue signal until it's been decoded by a sinc function. This has several consequences, for example, I was confused by your statement #2 because of your use of "-48dB". -48dB is the quantisation noise floor dictated by 8 bit, however -48dB is not the limit of the signal we can encode in 8 bit, in theory the limit of 8 bit (or any other bit depth) is any signal level down to minus infinity(!), the question is instead: At what point can we still detect that signal within the noise floor? As the linked demo shows, with TDPF dithering we can resolve down to -54dB and if we noise-shape the dither, down to -66dB. The Nyquist-Shannon Theorem is true at all bit depths, which is why it doesn't specify a bit depth! I think you're confusing sample value with what a sinc function will actually render in response to a stream of sample values.
 
G

 
On 2) I'm imagining the reconstruction of the difference signal itself, but I probably need to stop doing that. On 3) I tend to think of multiplying the sinc function by each sample value and then adding up all the resulting scaled sincs. I do wonder if underlying my confusion is the sneaky reality that truncation is not linear; that is, I haven't been taking the "distortion" part of "truncation distortion" seriously enough.
 
Mar 24, 2017 at 4:11 PM Post #3,780 of 7,175
  On 2) I'm imagining the reconstruction of the difference signal itself, but I probably need to stop doing that. On 3) I tend to think of multiplying the sinc function by each sample value and then adding up all the resulting scaled sincs. I do wonder if underlying my confusion is the sneaky reality that truncation is not linear; that is, I haven't been taking the "distortion" part of "truncation distortion" seriously enough.

 
2. I'm still not sure what you mean by "difference signal itself"?
3. Not sure I understand this either!
blink.gif
biggrin.gif

 
G
 

Users who are viewing this thread

Back
Top