Support Head-Fi.org by
starting all of your
Amazon.com shopping by
clicking here.
____________________________________________________________________
Today's Featured Head-Fi Blog: A Japanese headfier's monologue (Sasaki)
____________________________________________________________________
Please help
support Head-Fi by becoming a Contributing Member
CLICK
HERE -- Contributing Members, thank you
for your generous support! --
This isn't really so much of a DIY-related question, but (yet another) question from me for technically-minded head-fiers.
If the human perception of loudness is roughly logarithmic, why is linear PCM used for almost all digital music reproduction?
This is the mental exercise I keep playing out in my head:
Start with a certain loudness that we'll call mezzo forte, define a second loudness that is one unit of perceptual loudness more loud, called forte, and then a third loudness, fortissimo, that is one more unit of perceived loudness greater than forte.
If the perceptual space is logarithmic and we were to map these units into a linear space, the difference between fortissimo and forte would be a degree of magnitude greater than the difference between forte and mezzo forte:
Giving the three values arbitrary values in logarithmic space that make them evenly spaced apart:
mezzo forte = 3
forte = 4
fortissimo = 5
Solving for the corresponding values in linear space, assuming that loudness perception is base-10 logarithmic (which it isn't, but it makes the math easier):
mezzo forte:
3 = log(x)
x = 10^3 = 1000
forte:
4 = log(x)
x = 10^4 = 10000
fortissimo:
5 = log(x)
x = 10^5 = 100000
In the above linear space, the difference between mezzo forte and forte is (10000 - 1000) = 9000 linear units, while the difference between forte and fortissimo is (100000 - 10000) = 90000 linear units, ten times more.
If these linear units were analogous to linear PCM quantization levels, wouldn't that mean that linear PCM has considerably higher, er... perceptual loudness resolution (is there a better word for that?) at the loud end than at the quiet end?
Is there a flaw in my thinking? I have a degree for Computer Science and another for Photography, which basically means I understand the basics of how PCM works and I have a decent idea of how visual perception works, but I'm mostly talking out of my ass when I try to apply what I know towards psychoacoustics.
I starting thinking down this path while I was pondering why the dynamics (micro and macro) of vinyl sound so much better to me than those of CD audio, despite some valid technical arguments against that being the case. I started thinking about how the goals of engineering and mastering for the two processes would differ, which got me thinking about the linear nature of PCM used in most audio production. I'm not really interesting in starting a vinyl vs. CD debate (ugh), so let's try to avoid that...
So why do we use linear quantization to model phenomenona that we perceive logarithmically?
Logarithmic PCM encoding is already out there. It's called u-law and it's used a lot for... drumroll... 8-bit encoding of phone line quality signals.
This is less of a psychoacoustic question than an EE question. ADC/DAC design is much, much easier to do with a linear relationship between input and output. With the exception of a couple ADC types, most types I can think of get a lot harder to do right if you try to do a nonlinear transform inside the conversion process. If you disagree with me, feel free to say so, and I'll go over all the different ADC types and how much harder it gets. I might be wrong.
Moreover, any sort of nonlinear relationship can be considered equivalent, in terms of model, to a nonlinear filter placed before the ADC or after the DAC. And in reality, the analog logic behind such a filter is going to be pretty bloody easy, and low noise. So it doesn't make much sense to do that sort of work inside the ADC if it would work just as well outside.
I think from a technical standpoint a linear relationship between an electrical signal and a digital source is much easier. While not knowing the details i'm sure it's easier in hardware, and possibly in software to code this as well.
__________________
My wishes are simple. I demand only the best - Oscar Wilde
I've got a good explaination that no one has explained yet but i dont have time to explain it all now as i have to go out for a bit
will write it up when i get back
__________________
Audio electronics- Where we strive for inefficiency
Logarithmic PCM encoding is already out there. It's called u-law and it's used a lot for... drumroll... 8-bit encoding of phone line quality signals.
I know about u-law and A-law, hence the question about why logarithmic PCM isn't used specifically for music.
Originally Posted by Publius
This is less of a psychoacoustic question than an EE question. ADC/DAC design is much, much easier to do with a linear relationship between input and output. With the exception of a couple ADC types, most types I can think of get a lot harder to do right if you try to do a nonlinear transform inside the conversion process. If you disagree with me, feel free to say so, and I'll go over all the different ADC types and how much harder it gets. I might be wrong.
I can believe this. I'll defer to you actual EE types for this sort of knowledge. I can understand if the practicality of implementation makes linear PCM a better solution. Is my general idea logically/mathematically sound, though?
Originally Posted by Publius
Moreover, any sort of nonlinear relationship can be considered equivalent, in terms of model, to a nonlinear filter placed before the ADC or after the DAC. And in reality, the analog logic behind such a filter is going to be pretty bloody easy, and low noise. So it doesn't make much sense to do that sort of work inside the ADC if it would work just as well outside.
So, in actual implementation, does the phenomenon I described result in more quantization noise for quiet passages in perceptual space than an implementation of logarithmic PCM with an equal number of bits? My concern after going through the math was that it seems that quiet passages of music would not be recorded/reproduced via linear PCM with as much fidelity as they would if they were simply louder.
First off, the DAC is only responsible for outputting a line level signal, amplification comes separately later on (I’m sure you all know that anyway)
Now you have to picture this in your head, you have a standard sine wave that you want to output from a DAC, the digital representation has points along this wave spaced out at set intervals and there are enough points to make a good representation of this sine wave
Now your idea of logarithmic sampling of this would probably work well and be great if you were only playing single sine waves of varying intensity but that is not how music works. Music could be described as the collective sum of many different sine waves all summed together and played at the same time and here is where linear sampling is required
Going back to our original sine wave, we now sum another sine wave on top of this one, think of this second wave as of one with a higher frequency and less amplitude and just to keep it fairly simple for explanation that the peak of the sum of these two waves reaches our full scale output. now a logarithmic representation would have far more resolution of the high frequency signal that is riding on the original low frequency when the low frequency is crossing over the zero point compared to when it was at its peaks, while the linear would have equal resolution no matter if it was samples at the peak end or in the middle at the zero crossing
So with music that is a complex sum of sine waves you need to have equal resolution at the extremes as well as zero
You try and bring loudness perception into it, which is somewhat valid but we already got 16 bits on CD giving us 96dB of dynamic range which is ample for most situations and if you ever need any more you just add more bits, 1 bit gives 6dB. 24/96 with 144dB is more than nearly any system can handle. These days you have most of the recording studios apply compression that robs you of nearly all the dynamics of the music anyway (see sig)
__________________
Audio electronics- Where we strive for inefficiency
Both of the above are correct explanations, but there is a deeper truth.
If you consider the audio signal as a bandwidth limited channel with a defined signal to noise ratio (which any such signal is) we can define the data bandwidth needed to code it. This is a straightforward application of Shannon's coding theorem. Doing this for audio pretty much buy us the CD coding as a minimum. However noting the ear's logarithmic sensitivity is one of a number of features of the human auditory system that tells us that we can't use the full information content on the channel - in other words there is information in the channel that can safely be lost with no audible effect.
This observation is the core of all compression schemes. However such compression is not as trivial as, say, a logarithmic encoding. As DaKi][er describes above, you will lose low level information at frequencies distant to the dominant tone. Applying knowledge of psycho-acoustics, and knowledge of how the ear works, we can model the limits to such sensitivity. For instance the effect known as masking prevents the ear/brain perceiving tones close in frequency to a loud tone, and we also know that the frequency sensitivity of the ear is also logarithmic. There is only one octave of information between 10kHz and 20kHz. Plus the ear is vastly more sensitive in the mid bands than the high and low frequency bands. All up this allows us to create clever compression algorithms that can reduce the information bandwidth quite markedly and yet maintain a startling level of fidelity.
So, it is reasonable question, but the answer is more subtle. You must start any perceptive compression system with the full information. Then selectively remove the unperceivable information.
From a practical point of view, Publius has it right. The reason that it's linear PCM can be expressed in two(ish) words: delta-sigma DAC. Delta-sigmas are by their nature linear beasts (I suppose it would be possible to make a logarithmic delta-sigma, but given how poorly the loop dynamics of high-order linear delta-sigmas are actually understood, adding an intentional nonlinearity to the loop would probably be, shall we say, a challenge), and they're also what everyone wants to use for CD players, since the cost per unit quality, if you will, is ridiculously small compared to other converter types. Even in modern telephony ICs, where a-law and mu-law abound, all the data conversion is linear and there are companders to handle the digital conversion.