24bit vs 16bit, the myth exploded!
Feb 22, 2017 at 9:34 AM Post #3,706 of 7,175
  [1] Now imagine raising the volume (dramatically, if you like) just during the quietest moment of the track. ... Theoretically, you should then be able to 1) not destroy your hearing and 2) hear the difference between a 24-bit and 16-bit dithered noise floor.
 
[2] I recently found the following article, which describes this more eloquently than I ever could: http://www.tonmeister.ca/wordpress/2014/09/15/audio-mythinformation-16-vs-24-bit-recordings/

 
1. This was discussed, albeit briefly, in the first couple of pages of this thread. Let's take an example, an extreme one, a recording with a 72dB dynamic range, this is extreme because hardly any commercial recordings have a dynamic range of more than 60dB. For a 72dB dynamic range we need about 12bits of data/resolution. Now let's say we whack the volume up during the quietest parts by 20dB so that we can hear the digital noise floor (and differentiate 24 from 16bit): As Spruce Music says, what you've effectively done is manual compression, you've raised the noise floor by 20dB while peak volume remains the same (because you lower the volume again during the loud parts). The 72dB dynamic range of our recording is now 52dB, for which we only need about 9bits!
 
2. I see this kind of thing quite often, even in some published papers. Most of the information provided in that article is based on 16bit with TPDF dither, the result of this dither is white noise (as the article stated) and under certain circumstances the potential problems described might exist, although I would dispute both the magnitude of these potential problems and how often they would be encountered in practice. However, my main problem is with the starting premise: In the real world, how many commercial 16bit recordings actually have TPDF dither? If comparing 24bit vs 16bit, the answer is pretty much none at all! It has NEVER been standard practice to apply TPDF dither to a 24bit master/mix for 16bit distribution, always some form of noise shaped dither. This brings us on to the appendix, where the author admits his potential problem with TPDF dither is eradicated by noise shaped dither but introduces a new potential problem in the form of possible IMD, caused by the shaped dither noise energy up around the >16kHz range. However, this again makes no sense when comparing 24bit to 16bit. In practise, we don't really encounter 24/44.1 files, 24bit consumer music files are typically 96kHz or 192kHz, which provide a significantly higher frequency response. If a replay system can't handle the audible range and is generating IMD products from frequency content at say 17kHz, what IMD products is it going to generate from frequency content at say 30kHz?
 
Quote:
Originally Posted by castleofargh /img/forum/go_quote.gif
 
some stuff are relatively easy, like the maximum dynamic range of an ADC or a DAC. but when looking at microphones, recording methods, different studio environments... it becomes hard to put a number on things.

 
As Pinnahertz stated, dynamic range is a "weakest link" scenario, it's defined by the point in the whole recording/playback chain which has the smallest dynamic range. The peak/high point of that dynamic range is probably defined by the consumer amp speakers/cans part of the chain but let's generously assume high quality consumer playback equipment and define the peak part of our dynamic range equation as 120dBSPL (dictated by the human hearing part of the chain). For most music it's going to be a substantially lower figure than this but again, let's take an extreme case, say a symphony orch which could commonly have sustained peaks up around 105dB but may contain the odd transient up to nearly 120dBSPL, which is very loud but just about bearable provided those transients are infrequent and very short duration. Let's also take another extreme, very well isolating IEMs/headphones in a very quiet listening environment, and therefore a listening environment of say 20dBSPL. Using these two extreme circumstances to define our boundaries, we have a potential dynamic range of 100dB, which could in theory mean that we would be able to differentiate between 16 and 24bit. However, this is in theory, not in practice because there are a few serious holes in this scenario:
 
1. I chose a symphony orchestra as an example because logic suggests this produces the largest dynamic range. Typically from just one, two or a small handful of musicians playing quietly at one extreme, to 80-120 musicians simultaneously playing as loud as they can at the other. A large top class studio should have a noise floor around 30dBSPL, which btw already reduces our potential dynamic range from 100dB to 90dB but it doesn't stop there: Put say 90 living, breathing, moving musicians in that studio and it's noise floor is no longer even close to 30dBSPL. Our potential dynamic range is now down to about 60dB for average peak levels and 75dB for the occasional transient. And obviously, if we're talking about recording a live performance then we typically have around 1-4 thousand living, breathing, moving audience members to add to the noise floor and probably around another 10dB or so reduction in our potential dynamic range.
 
2. As far as dynamic range is concerned, our hearing operates on a similar fundamental principle as our sight. We have a wide visual dynamic range for brightness, from a bright sunny day to a fairly dark room. However, we're all well aware that this is a bit of a clever trick, our eyes actually have a much smaller dynamic range than the limits suggest but get around this fact by effectively making that limited dynamic range window moveable. We can see well in a darkened room but if we leave that room and walk into bright sunlight, it's dazzling to the point of painful. The upper limit of our visual dynamic range is significantly less than our theoretical/usual limit, until our eyes have had time to adjust their dynamic range window and once they have, if we re-enter the darkened room we can no longer see well, it's pitch black, again until our eyes readjust it's dynamic range window to a lower/darker level. The same happens with our hearing; if we really have achieved a listening environment of just 20dBSPL, our upper limit is no longer 105dBSPL (with occasional 120dB transients), it's significantly/proportionately lower. The evidence I've seen suggests our ears' moveable dynamic range window is somewhere between 30dB and 60dB.
 
3. Closely related to my response #2 to csglinux above, we have to be careful about the quoted response figures, how do these figures actually apply to the real world? Yes, a symphony orch can produce transient peaks up to 120dB, yes, a harmon-muted trumpet can produce measurable volume at >80kHz, etc. But, what can be produced and what is actually heard are two different things. Not withstanding the fact that I don't know of any symphonies which require a trumpet to use a harmon mute or that we can't hear 80kHz, just because we can record and measure 80kHz content with a mic placed a few inches from the trumpet's bell is meaningless in terms of an accurate or realistic recording, unless you're accustomed to sitting just a few inches in front of the trumpet during a symphony performance?! In practice we're going to be at least about 30ft away and probably double that in a "prime" seat, add to this a dozen or so living absorption panels in the way (say three or so desks of violas and a few rows of audience) and see how much 80kHz trumpet content you can record now! Even worse with say a french horn, where between what a french horn actually produces and what you (in the audience) hear is maybe: 100ft of air, two percussion sections, a wall (!), 4 or 5 desks of violins and a few rows of audience! Same with the volume figures for the orchestra, where are we measuring say 120dBSPL transient peaks? From just in front of the conductor or from a few rows back in the audience? If it's the latter, then our peaks (and therefore dynamic range) are probably around 10dB or more lower.
 
Taking the above three practicalities into account: 1. The dynamic range limit of the 16bit format is pretty much the least of our dynamic range bottlenecks. Even in extreme circumstances it's no more than the second least, with still several other bottlenecks of more significance actually defining the practical dynamic range. 2. Dynamic range is effectively an artistic decision, which in the case of acoustic performance genres is defined not by the sound the instrument/s actually produce but by where we choose to position the listener and therefore where we place the mics relative to the orch/sound sources. 3. I don't believe it's entirely coincidental that the dynamic range of the most dynamic music recordings is generally no more than about 60dB.
 
G
 
Feb 22, 2017 at 10:00 AM Post #3,707 of 7,175
Could someone please clarify this for me. I've seen a few posts now that say that various recordings (most of them) have such limited dynamic range, etc. Now, this confuses me, as I thought dynamic range was the difference between the loudest and quietest sounds.

If we look at this spectrogram image I posted a little while back, notice the legend for the colours on the right hand side. That scale has a dB range of 100. So if we look at the actual spectrogram image, can we not indeed see all those colours..suggesting that the track does cover the full 100dB dynamic range?


 
Feb 22, 2017 at 10:11 AM Post #3,708 of 7,175
  Could someone please clarify this for me. I've seen a few posts now that say that various recordings (most of them) have such limited dynamic range, etc. Now, this confuses me, as I thought dynamic range was the difference between the loudest and quietest sounds.

If we look at this spectrogram image I posted a little while back, notice the legend for the colours on the right hand side. That scale has a dB range of 100. So if we look at the actual spectrogram image, can we not indeed see all those colours..suggesting that the track does cover the full 100dB dynamic range?
 

 
Do this: make a 2kHz square wave that peaks at, say, -80dBFS. Now note how high you have to set your volume before you can hear it, call this V0. If a song is loud enough that it forces you to set your volume *lower* than V0, then arguably that song isn't using 80dB of dynamic range *in your listening room*. I add that last part because how soft a signal you can hear depends both on your equipment and your listening room: I'd hope we can all agree you aren't going to appreciate signals at -80dBFS on a loud bus.
 
Feb 22, 2017 at 10:53 AM Post #3,709 of 7,175
  Could someone please clarify this for me. I've seen a few posts now that say that various recordings (most of them) have such limited dynamic range, etc. Now, this confuses me, as I thought dynamic range was the difference between the loudest and quietest sounds.

 

Technically you are right, however when people say dynamic range, they often don't actually mean dynamic range.
See here: http://www.head-fi.org/t/834222/understanding-the-parameters-in-the-dynamic-range-database
 
 
 
If we look at this spectrogram image I posted a little while back, notice the legend for the colours on the right hand side. That scale has a dB range of 100. So if we look at the actual spectrogram image, can we not indeed see all those colours..suggesting that the track does cover the full 100dB dynamic range?

 

I'm not sure about the graph but I think this is not how it works. If you played a 5kHz sine wave that peaks at -20dBFS together with a 10 kHz sine wave that peaks at -100dBFS you would have the red and blue colour represented; the red would be a flat line at 5kHz and the blue would be a flat line at 10kHz. However it wouldn't have any dynamic range because it's just two sines playing at a constant (not changing) amplitude.
 
 

 
Feb 22, 2017 at 12:03 PM Post #3,710 of 7,175
  Technically you are right, however when people say dynamic range, they often don't actually mean dynamic range.
See here: http://www.head-fi.org/t/834222/understanding-the-parameters-in-the-dynamic-range-database
 
 
I'm not sure about the graph but I think this is not how it works. If you played a 5kHz sine wave that peaks at -20dBFS together with a 10 kHz sine wave that peaks at -100dBFS you would have the red and blue colour represented; the red would be a flat line at 5kHz and the blue would be a flat line at 10kHz. However it wouldn't have any dynamic range because it's just two sines playing at a constant (not changing) amplitude.
 
 

I generated those 2 sine waves and combined them, here's the image :

 
Feb 22, 2017 at 12:14 PM Post #3,712 of 7,175
It didn't work as I expected, I stand corrected.

I made a small mistake. When I generated the 10kHz tone, I entered -100dB, but for some reason the generator can only go as low as -90 (so it defaulted back to -20). So I generated the new 10kHz tone at -90dB and combined it with the -20dB 5kHz tone Pic updated. But yeah, still looks a bit different than anticipated 
 
Feb 22, 2017 at 2:36 PM Post #3,713 of 7,175
  I made a small mistake. When I generated the 10kHz tone, I entered -100dB, but for some reason the generator can only go as low as -90 (so it defaulted back to -20). So I generated the new 10kHz tone at -90dB and combined it with the -20dB 5kHz tone Pic updated. But yeah, still looks a bit different than anticipated 

I got home so I could take a closer look at the graph. Looking at the 5kHz sine wave, I can see that in the middle it's red, then it gets progressively colder and colder down to purple with the green colour being dominant. Point is, looking at a graph like this, reading the dynamic range is far from being trivial.
 
As an aside I think the program should plot an "infinitely" thin red line. If you could increase the resolution of the plotting, it might look closer to what I expected.
 
Feb 22, 2017 at 2:58 PM Post #3,714 of 7,175
  I got home so I could take a closer look at the graph. Looking at the 5kHz sine wave, I can see that in the middle it's red, then it gets progressively colder and colder down to purple with the green colour being dominant. Point is, looking at a graph like this, reading the dynamic range is far from being trivial.
 
As an aside I think the program should plot an "infinitely" thin red line. If you could increase the resolution of the plotting, it might look closer to what I expected.

I did another test. 5kHz at -3dB, and this is what I got. It seems the louder the signal, the more it bleeds into other frequencies. This might explain why the 10kHz signal was just a blue line, because it's so quiet that the sound decays much faster

 
Feb 22, 2017 at 3:14 PM Post #3,716 of 7,175
  That scale has a dB range of 100. So if we look at the actual spectrogram image, can we not indeed see all those colours..suggesting that the track does cover the full 100dB dynamic range?


 
No. The spectogram is giving you a breakdown of the energy contained in the frequencies which comprise a sound, not the total energy of the sound and therefore not the dynamic range. For example, at exactly 2:00 we have a sound whose total energy we can only guess to be about -20dB. The spectogram is showing us the breakdown of this sound, that the vast majority of the energy in this sound is in the low frequency band, roughly -25dB in the 0-1kHz band, about -40dB in the 1-5kHz band, about -80dB in the 5-15kHz band and about -110dB in the 15-30kHz band. Add all these energy levels in the different frequencies together and we'd have the total energy of this sound, which we can then compare with the quietest sound in the track which is at the very end. Here we can see the highest level is green/light blue, also in the low freqs, a total energy value of probably somewhere around 70-75dB, 50-55dB lower than the max level of about -20dB. At a guess then, the dynamic range of this recording is (very roughly) about 50-55dB, which would require about 9bits!
 
G
 
Feb 22, 2017 at 3:49 PM Post #3,717 of 7,175
  I did another test. 5kHz at -3dB, and this is what I got. It seems the louder the signal, the more it bleeds into other frequencies. This might explain why the 10kHz signal was just a blue line, because it's so quiet that the sound decays much faster

 
  I got home so I could take a closer look at the graph. Looking at the 5kHz sine wave, I can see that in the middle it's red, then it gets progressively colder and colder down to purple with the green colour being dominant. Point is, looking at a graph like this, reading the dynamic range is far from being trivial.
 
As an aside I think the program should plot an "infinitely" thin red line. If you could increase the resolution of the plotting, it might look closer to what I expected.

 
  I find these graphs pretty cool. 

I've highlighted a few points to respond to.
 
You guys are having both fascination and difficulties with the spectrogram as it applies to DR because it's actually the wrong tool.  That display is actually compressing a 3D data block into a 2D/flat display, and giving you too much information presented in a confusing way.  
 
The observation that quiet sounds "decay much faster" is an anomaly of the analysis only, not true in reality.  
 
Yes, the graphs are cool, and yes, and reading the dynamic range from them is non-trivial.  Relating a spectrogram to any actual audible characteristic is extremely difficult. 
 
Feb 22, 2017 at 4:20 PM Post #3,718 of 7,175
  I did another test. 5kHz at -3dB, and this is what I got. It seems the louder the signal, the more it bleeds into other frequencies. This might explain why the 10kHz signal was just a blue line, because it's so quiet that the sound decays much faster


Spectrograms are done using FFTs.  Usually the FFT uses a small number of bins or filter banks.  This means a loud signal will bleed into adjacent bins and show up on the graph as a wide line instead of a thin line in one frequency bin.
 
Using different software and colors here is an example of a 5 khz sine wave.  At -120 the background goes to gray.
This one is with the spectrogram set to 32 k FFT bins.
 

 
This one is with the spectrogram using 256 FFT bins.
 

 
You can see with only 256 bins the 5 khz signal actually bleeds over slightly at all frequencies.  It is just an artifact of windowing in an FFT.  Has nothing to do with decay.
 
The above is from Audacity which is free.  It has a spectrogram view as well as the default waveform view. In preferences you can adjust the size of the FFT bins, the dynamic range over which it functions and the type of windowing to use.
 
Feb 23, 2017 at 4:16 AM Post #3,720 of 7,175
A couple of points maybe worth expanding upon:
 
@VNandor Spectral analysis is a fantastically useful tool but there are some inherent weaknesses/limitations. By increasing the FFT bin size we increase frequency resolution/accuracy, as Spruce Music explained/demonstrated. However, as we increase the number of bins and frequency resolution (the y-axis) we decrease the timing accuracy (the x-axis). A partial solution to this is to choose a moderate FFT window size but overlap them (vertically), which provides better frequency resolution. At the same time, one can also overlap the windows horizontally, giving greater time resolution/accuracy. There is however still a price to pay, in terms of the required amount of processing. Occasionally I need to really dig down deep in order to treat a specific freq/harmonic in a fast, dense piece of music/audio. I might use  x32 frequency overlap and x32 time overlap, I don't actually know the number of bins as I've found the best results in my program is if I set the number of bins to "auto" and let the program decide. I get relatively accurate frequency and time resolution but it takes my 12 core Mac Pro about 20secs to calculate/render about 2 seconds worth of audio. Using the default settings I can render several minutes of audio in just a second or two.
 
@TheoS53 As Pinnahertz stated, this is the wrong tool for measuring dynamic range and as I stated, the best we can do with a spectogram is make a very rough guess at dynamic range but we still need to understand/interpret what we're looking at, as well as understand what "dynamic range" actually means (which is not easy as it's a rather loosely defined term). For example, look again at the very end of the recording and let's say it continues for another few seconds, to say 2:25, all the time dying away so at say 2:23 there's no more green/light blue, only dark blue/purple, which is say -110dB. Would this recording now have a dynamic range of 90dB (-20dB at it's highest to -110dB at it's lowest)? Possibly but it's very unlikely, far more likely is that the actual noise floor is at about -60 or -70dB and all we're seeing is the mix engineer fading this noise floor to digital silence (black), not at all an uncommon practice. This transition from the noise floor to digital silence is NOT part of our dynamic range (!), because I've used the term "dynamic range" to mean the ratio of peak value to noise floor (in this case, say -20dB to -70dB, IE. 50dB) and therefore what happens beneath the noise floor cannot be part of the dynamic range calculation. If you think about it, this raises some interesting questions.
 
G
 

Users who are viewing this thread

Back
Top