24bit vs 16bit, the myth exploded! | Page 75 | Headphone Reviews and Discussion - Head-Fi.org

jaddie · Apr 2, 2013 at 11:53 AM

Quote:

stv014 said:
It can easily be made random enough that it is not practically different from "ideal" white noise for the purpose of dithering, unless the input signal already contains the same pseudo-random sequence for some reason. That has virtually zero chance of happening accidentally, so it is normally only an issue if the signal was already dithered once with the same noise, or possibly in the case of noise shaping, where the dither is in a feedback loop. For simple dithering in software, the pseudo-random generator can be initialized from the current system time to avoid the problem of dithering more than once with exactly the same noise.

All it would take would be two sequences of dissimilar length, and of sufficient length that any non-random component would be sub-audible. Noise shaping then takes the spectral distribution in a direction that takes the dither signal more out of the audible range.

This has been well-studied, worth a quick Google search. Since the concept of dithering in digital audio is more than 30 years old, it might be a valid assumption that any issues dithering would cause would be pretty well resolved by now.

By the way, basic dither was first accomplished with fully random analog means. Even the noise of an analog mic preamp would do it.

stv014 · Apr 2, 2013 at 12:03 PM

I might not have been clear enough, but I did not mean there are actual problems in well implemented dithering or noise shaping, rather that "non-randomness" should only be an issue in some contrived theoretical cases.

jaddie · Apr 2, 2013 at 5:06 PM

Quote:

stv014 said:
I might not have been clear enough, but I did not mean there are actual problems in well implemented dithering or noise shaping, rather that "non-randomness" should only be an issue in some contrived theoretical cases.

Gotcha.

harmonix · Apr 3, 2013 at 9:12 AM

Quote:

stv014 said:
I might not have been clear enough, but I did not mean there are actual problems in well implemented dithering or noise shaping, rather that "non-randomness" should only be an issue in some contrived theoretical cases.

Quote:

jaddie said:
Gotcha.

Quote:

stv014 said:
It can easily be made random enough that it is not practically different from "ideal" white noise for the purpose of dithering, unless the input signal already contains the same pseudo-random sequence for some reason. That has virtually zero chance of happening accidentally, so it is normally only an issue if the signal was already dithered once with the same noise, or possibly in the case of noise shaping, where the dither is in a feedback loop. For simple dithering in software, the pseudo-random generator can be initialized from the current system time to avoid the problem of dithering more than once with exactly the same noise.

As an example, in the audio processing utilities linked in my signature, I use this simple algorithm to generate noise:

Code:

x[n] = (x[n - 1] * 742938285) % 2147483647

where x[n] is the current state of the generator (an integer in the range 1 to 2147483646), and % is the modulo (remainder of division) operator. The output of this passes basic tests of randomness like the DIEHARD battery of tests, and is plenty good enough for generating white noise and dithering in particular (but not for more demanding scientific or cryptographic applications). The sequence does loop after 2147483646 samples, but that is not a problem for audio (it is more than 1.5 hours of stereo noise at 192000 Hz sample rate). For the purpose of dithering, the distribution of the noise is made triangular by subtracting the previous sample, which also shifts some of the noise energy into the higher frequency range and slightly reduces the weighted level of the noise at sample rates above 32000 Hz. By default, the initial 'x' in the PRNG is set from the current system time, so the output file is always slightly different. Using some optimization tricks to avoid the expensive % operator, the dithering costs about 12.5 CPU cycles per mono sample of output.

I see. These class of rescursive PRNG's using mod functions don't really cut it for numerical/statistical generators. So the question is are they okay for implementation here? I would guess also the dither algorithm needs to be uncorrelated or "orthogonal" to you signal or it's going to be an issue. Interesting question is also if your initial seed is derived from the system clock then can that be the case. Also for low level signals wouldn't this get "absorbed" into the noise/dither floor created?

jaddie · Apr 3, 2013 at 11:47 AM

Quote:

harmonix said:
I see. These class of rescursive PRNG's using mod functions don't really cut it for numerical/statistical generators. So the question is are they okay for implementation here? I would guess also the dither algorithm needs to be uncorrelated or "orthogonal" to you signal or it's going to be an issue. Interesting question is also if your initial seed is derived from the system clock then can that be the case. Also for low level signals wouldn't this get "absorbed" into the noise/dither floor created?

Why not research dither a bit on your own? It's not like it's a new concept or anything. Great minds have spend man-years thinking about it. Pretty sure all the issues have been thought of, and addressed, and papers written, patents filed...

stv014 · Apr 7, 2013 at 5:57 AM

Well, the dither noise basically needs to:
- have the correct probability distribution (easy to achieve even with simple generators)
- sound like noise to humans (again not difficult, unless the sequence is very short, or there are obvious spectral peaks)
- be uncorrelated to the input signal (it is very unlikely for recorded music to have content that is "accidentally" similar to the output of a particular PRNG with a specific seed value, and even have the right amplitude etc. to cancel out the dither)

mikeaj · Apr 7, 2013 at 1:17 PM

I think this is one of those situations where even if your model or function produces the "wrong" distribution, it's really not a big deal. There are a wide range of distributions which would be acceptable.

It's pretty hard to imagine any recorded music being correlated in any significant way with that.

xnor · Apr 7, 2013 at 2:02 PM

http://en.wikipedia.org/wiki/Dither#Different_types

TPDF is ime the most commonly used and easy to implement.

timbgray · Apr 18, 2013 at 11:37 AM

Having some trouble moving from digital photography, where I am totally comfortable with the concepts of dynamic range, bit depth and resolution... and maybe the terms have a different meaning in audio than digital photography, but to some extent digital should be digital....

Dynamic range is what it is based on the sensor and has nothing to do with bit depth (dynamic range = the difference in stops between the darkest and lightest source where the sensor can detect a difference)

The bit depth is the precision with which strength of a given "piece of light" can be measured - the light is what it is and bit depth simply is a measure of precision. In images this is relevant particularly in the editing process where changes to an 8 bit image eg: JPG = roughly analagous to MP3 (actually 3 x 8 bit = 24 one for the red, blue and green channels) where "quantize errors" are more significant than 12 14 or 16 bit images eg: TIFF or RAW - roughly analagous to FLAC etc.

The resolution is the density in a given area of the photosites of the sensor and would seem to correspond to the samples per second in audio. the more photosites (pixels) the higher the resolution.

So the way I see this in audio is take a given sound pressure say 100db - the bit depth would determine the difference between 100.0000000000 (lower bid depth) and 100.000000000012345 (higher bit depth) - whether that is an audible difference is probably still open for discussion, but I don't see that bit depth is relevant to dynamic range, it certainly isn't in digital photography.

xnor · Apr 18, 2013 at 12:41 PM

Quote:

timbgray said:
Having some trouble moving from digital photography, where I am totally comfortable with the concepts of dynamic range, bit depth and resolution... and maybe the terms have a different meaning in audio than digital photography, but to some extent digital should be digital....

Dynamic range is what it is based on the sensor and has nothing to do with bit depth (dynamic range = the difference in stops between the darkest and lightest source where the sensor can detect a difference)

The bit depth is the precision with which strength of a given "piece of light" can be measured - the light is what it is and bit depth simply is a measure of precision. In images this is relevant particularly in the editing process where changes to an 8 bit image eg: JPG = roughly analagous to MP3 (actually 3 x 8 bit = 24 one for the red, blue and green channels) where "quantize errors" are more significant than 12 14 or 16 bit images eg: TIFF or RAW - roughly analagous to FLAC etc.

The resolution is the density in a given area of the photosites of the sensor and would seem to correspond to the samples per second in audio. the more photosites (pixels) the higher the resolution.

So the way I see this in audio is take a given sound pressure say 100db - the bit depth would determine the difference between 100.0000000000 (lower bid depth) and 100.000000000012345 (higher bit depth) - whether that is an audible difference is probably still open for discussion, but I don't see that bit depth is relevant to dynamic range, it certainly isn't in digital photography.

Some A/D or D/A converters output/accept 24-bit data but don't even reach 16-bit performance, but that's now what we're talking about here.

In digital audio theory the dynamic range is limited by quantization error. This is the ideal case (not limited by converter noise).

jaddie · Apr 18, 2013 at 1:46 PM

Quote:

timbgray said:
Having some trouble moving from digital photography, where I am totally comfortable with the concepts of dynamic range, bit depth and resolution... and maybe the terms have a different meaning in audio than digital photography, but to some extent digital should be digital....

Dynamic range is what it is based on the sensor and has nothing to do with bit depth (dynamic range = the difference in stops between the darkest and lightest source where the sensor can detect a difference)

Partially true, in that the sensor is the limiting factor, but so is bit depth. The confusing occurs in the scaling a camera does between sensor output and the digital conversion. So an 8 stop sensor can still be scaled so it is digitized to 12 bits per channel, even though the actual sensor dynamic range is much less than what 12 bits per channel is capable of. In photography we are also concerned with who big the steps are in the gray scale. This is one way digital audio a digital imaging differ.
Quote:

timbgray said:
The bit depth is the precision with which strength of a given "piece of light" can be measured - the light is what it is and bit depth simply is a measure of precision. In images this is relevant particularly in the editing process where changes to an 8 bit image eg: JPG = roughly analagous to MP3 (actually 3 x 8 bit = 24 one for the red, blue and green channels) where "quantize errors" are more significant than 12 14 or 16 bit images eg: TIFF or RAW - roughly analagous to FLAC etc.

The precision of measurement idea is right, but the analogies are a bit off. JPG images are reduced in size by eliminating duplicated pixels during jpg encoding, then predicting them and reinserting them on display. It's done by considering groups of pixels and the degree to which they differ, keeping the most different ones and dumping the similar ones. The degree to which that is done is chosen by the jpg quality setting, which is pretty high in cameras, variable in image processing software. mp3 (technically MPEG-2, Layer 3) processing is a bit different in that it uses the concept of masking to determine what's needed and what's not. Masking is where a dominant loud frequency makes another close by, but lower level frequency inaudible. While that's sort of similar to jpg image processing, audio is changing over time, so the data that can be eliminated because it's not audible changes in definition on a continual basis. Also, when you compare jpg or mp3 compression, the discussion of bit depth is technically a separate issue. You're right there being larger approximations for lower bit depth, but that's only a related issue to the actual data reduction methods. TIFF and RAW are "uncompressed", as is FLAC and AIFF, WAV and ALC, but a TIFF image can also have meta data, and a RAW image has tags that are required for proper rendition, and are camera specific as to dynamic range, gamma, color etc. None of that happens in any of the audio formats. Part of what goes into a RAW file is determined by the scaling and calibration of the sensor. In audio, there isn't any of that going on.
Quote:

timbgray said:
The resolution is the density in a given area of the photosites of the sensor and would seem to correspond to the samples per second in audio. the more photosites (pixels) the higher the resolution.

So the way I see this in audio is take a given sound pressure say 100db - the bit depth would determine the difference between 100.0000000000 (lower bid depth) and 100.000000000012345 (higher bit depth) - whether that is an audible difference is probably still open for discussion, but I don't see that bit depth is relevant to dynamic range, it certainly isn't in digital photography.

The resolution analogy is good as far as pixel count vs sample rate. However, bit depth is always related to dynamic range in both photography and audio. The fewer the bits the less range between the maximum signal level or light level and the minimum (and noise) level. In audio, quantization is linear, meaning there's no scaling pre conversion. So there is a fixed relationship between bit depth and available dynamic range, which is roughly 6dB per bit, not counting noise shaping and dither. 16bit audio is basically capable of 96dB between maximum and noise. The same is true in photography, except there is scaling dictated by the sensor. So your blackest black and whitest white of the sensor lands somewhere between the minimum and maximum of the digital word and bit depth, even if the actual sensor output is non-linear. The key to decoding the scaling data is provided in meta tags, an is important for RAW decoding. Every get the RAW profile wrong in Photoshop? Probably not, because that's mostly been fixed now, but early on, you could sometimes mis-decode a raw image, the results were interesting, but not useful. But it's that correction that's got you confused. The other issue is color profiles in display and output devices. So you sensor may capture 10 stops, but your display can't display that, and you certainly can't print that, so what profiles do is again apply a correction to let your (hopefully calibrated) screen "fake" a 10 stop image. We don't do that in digital audio either.

Trying to say it simply, bit depth always relates to DR. In audio, the steps have a fixed size, in imaging, the size of the step is scaled to the sensor/scanner, and then again to the display or output device, such that the sensor's minimum black is still within the bit depth, and the sensor's maximum white is also below maximum defined by bit depth.

jaddie · Apr 18, 2013 at 2:07 PM

Just to add to xnor's post, he's right about quantizing error limiting 24 bit audio. There are only a tiny handful of 24 bit A/D converters that realize full 24 bit performance, and they are expensive even in the pro market. Here's the one I'm most familiar with, 140 bit DR for real. Mostly we're at 18-20 bits of real quantization, with a whole lot of noise/dither/q-noise taking up the bottom few bits. There's no point to 24 bit playback for dynamic range, 16 is more than we can typically use, but there is a point to having 24 bit or more to work with in processing/dsp.

In imaging, we are mostly display limited. For 12 bit quantization, we have about at 4000:1 contrast ratio. Displays that go farther fake it with local dimming. But in projection, we get a real 2000: 1 contrast ratio in real rooms and theaters because of stray light. For example, if you projector can theoretically do a 100,000:1 contrast ratio, a guy in the room with a white shirt on will reflect enough light back to the screen to kick that to around 2000:1. A candle at 10' from the screen is ends up worse than that. LCD screens in lit rooms have a similar issue.

jcx · Apr 18, 2013 at 2:57 PM

I don't consider these ranged input stage ADC as "true 24 bit" - where superposition linearity, INL, DNL, S/N are limited by 24 bit lsb size - at all signal levels, all of the time

another problem sorting audio ADC claims is noise weighting functions - most audio ADC/DAC marketing bullet point numbers a A weighted - again a fail by the flat, full bandwidth S/N spec expected in instrumentation ADC specs

spurious free dynamic range can also be a useful spec - complex mixed signal systems often have odd, non harmonic spurious frequency lines in their output at very low levels

I think it is currently safe to say there are no Audio ADC meeting the most stringent interpretation of "24 bit" resolution, linearity and unweighted noise floor all at the same time

jaddie · Apr 18, 2013 at 4:13 PM

Quote:

jcx said:
I don't consider these ranged input stage ADC as "true 24 bit" - where superposition linearity, INL, DNL, S/N are limited by 24 bit lsb size - at all signal levels, all of the time

another problem sorting audio ADC claims is noise weighting functions - most audio ADC/DAC marketing bullet point numbers a A weighted - again a fail by the flat, full bandwidth S/N spec expected in instrumentation ADC specs

spurious free dynamic range can also be a useful spec - complex mixed signal systems often have odd, non harmonic spurious frequency lines in their output at very low levels

I think it is currently safe to say there are no Audio ADC meeting the most stringent interpretation of "24 bit" resolution, linearity and unweighted noise floor all at the same time

Agreed, ranged ADC isn't quite the same as true 24 bit, but it's as close as we can come today, and definitely better than garden-variety 24bit ADCs that only do a real 18. Audio ADCs aren't instrumentation ADCs, never meant to be. Weighting in noise specs is probably valid, but I would agree that the specific A curve is not.

obuckley · May 1, 2013 at 11:55 AM

I came across this thread by chance and I have read the first page with interest. I notice the thread is 75 pages long now. Unfortunately I don't have a couple of weeks spare to read it all, so apologies if my point has already been made/disputed/disproved.
Digital audio reproduces sound along two axes. Frequency and Amplitude. The shape of the waveform is a function of the frequencies conventionally along the horizontal (x) axis and the amplitude along the vertical (y) axis.
One tends to imagine that if you sample a smooth analog curve every so often and draw a bar chart of the results, you get a jagged edge in place of the smooth curve. The more frequently you sample, the smoother and less jagged the digital representation and as the sampling frequency approaches infinity you arrive at a perfectly smooth curve. That is what differential calculus is all about. In theory you do not need to do this. It is not easily understood and is in any case counter-intuitive, that by sampling at a higher frequency than is used in CD, you do not get a closer approximation to the shape of the original analog waveform, (closer to a smooth curve than a bar chart), but Nyquist has proved this and I don't have the maths to argue. According to his theorem, 44.1kHz is enough of a sampling frequency to reproduce perfectly a waveform of up to 20kHz content. You obviously need a greater sampling frequency to reproduce accurately waveforms of higher frequency than human hearing is capable of, but we are here considering human audio.
However bit depth is a different matter. In practice, the 144 dB that 24-bit allows, does not translate (and is not intended to translate) into nearly 200dB of sound, destroying the ear-drums. You can always turn the volume down, after all.
A single musical digit somewhere along the x-axis is a number from -32,767 to +32,767. If you try to code a number higher than these into a recording, it gets clipped off. The equipment cannot understand what 32,769 is intended to mean and you usually end up with some very odd and unpleasant artefacts. There is a useful function here. If you take a snatch of music (say a sine wave) ranging from e.g. -12,000, through zero to +12,000; when this has been through a DAC and amplified out into speakers, this will play at a certain sound pressure level. If you change nothing else, but double the digits arithmetically to range from -24,000 to +24,000 - you double the volume. This makes trimming digital music to increase or reduce the volume very easy.
What this is also saying is that even in CD-quality 16bit sound, you can record 65,000 or so different levels of volume of whatever instrument you are recording. Whether the human ear can distinguish between a sine wave ranging from -12,000 through zero to +12,000 and one doing the same but at 12,001, I do not know. Electronic keyboards used to have something like 127 different volume settings according to how hard you hit the key. That was acknowledged to be inferior to the analog results of striking a piano key, but 65,000?
So if you get in amongst the digits and start adjusting them, with 24bit sound, you do not multiply the numbers up from 32,768 to some astronomical number - still none of your equipment would understand what any number above 32,767 meant. In practice, you get the option to adjust each digit not a whole digit at a time, but to a decimal place. So you can vary the volume not just from 12,000 to 12,001, but from 12,000.6 to 12,000.7 etc. In this way, you get an increase in dynamic range by adding decimal point precision to your amplitudes. Sounds that were previously recorded at the same amplitude (let's say Volume) which were rounded in CD-quality 16bit sound to the nearest whole digit, may now be represented by different, more precise numbers when recorded in High-Res.
Whether many human ears can actually detect a difference is a relevant question, but at least (unlike Nyquist and sampling frequency) nobody has yet come up with a mathematical proof that it makes no difference.

Latest Thread Images

jaddie

Account deactivated by request.

stv014

Headphoneus Supremus

jaddie

Account deactivated by request.

harmonix

500+ Head-Fier

jaddie

Account deactivated by request.

stv014

Headphoneus Supremus

mikeaj

Headphoneus Supremus

xnor

Headphoneus Supremus

timbgray

New Head-Fier

xnor

Headphoneus Supremus

jaddie

Account deactivated by request.

jaddie

Account deactivated by request.

jcx

Headphoneus Supremus

jaddie

Account deactivated by request.

obuckley

New Head-Fier

Users who are viewing this thread