Head-Fi.org › Forums › Equipment Forums › Sound Science › Decoding AAC at a Higher Bit Depth and Sample Rate
New Posts  All Forums:Forum Nav:

Decoding AAC at a Higher Bit Depth and Sample Rate

post #1 of 19
Thread Starter 

AAC files are composed of a bunch of cosine waves, right? Shouldn't they be able to be decoded at any bit depth and sample rate? As far as I know, if I set my output to 24 bit 96 kHz, iTunes decodes my AAC files at 16 bit 44.1 kHz then scales the bit depth to 24 bits and resamples to 96 kHz. That sounds like taking a font, rendering it at 44 point, then scaling it to 96 point. The font would look better if it were rendered at 96 point directly.

post #2 of 19

Not sure about this, but you might possibly find an improvement decoding at a higher bit depth but not necessarily with a higher sampling rate.

post #3 of 19
There is no point in doing this though because all modern DACs oversample by default anyway, it is simply common practice that improves anti-aliasing and thus produces higher fidelity.
post #4 of 19
Quote:
Originally Posted by vhobhstr View Post

AAC files are composed of a bunch of cosine waves, right? Shouldn't they be able to be decoded at any bit depth and sample rate? As far as I know, if I set my output to 24 bit 96 kHz, iTunes decodes my AAC files at 16 bit 44.1 kHz then scales the bit depth to 24 bits and resamples to 96 kHz. That sounds like taking a font, rendering it at 44 point, then scaling it to 96 point. The font would look better if it were rendered at 96 point directly.

Re-sampling isn't at all like rendering a font.  Digital fonts contain instructions for rendering that describe the curves and shapes required to create the character.  When you start with a font at 12 point, then make it 96 point, you haven't applied scaling at all, you've just told the computer to crate the character from the same instruction set, only make it larger. 

 

Resampling is much more like connecting the dots. You only have so many original data points.  If you want to create more, you have to interpolate what's between the existing data points. You have to, essentially, take a guess and make up the missing data on the fly.  Resampling does not recover missing information, does not improve really anything.  The only advantage, as mentioned might be that a higher resulting sample rate makes anti-aliasing filtering easier, but that's pretty much already done in the DAC.  

 

Font data is complete to render the font at any size because it doesn't describe specific bit patterns, but rather is instructions on how to draw the shape.  Your last statement, that rendering a font directly at 96 point would look better than scaling to 96 point is simply not true of digital fonts like those in Truetype or Postscript formats.  What you say does apply to bit-mapped fonts, but as the name implies, those are exact bit patterns, so there's no scaling possible, and yes, the 96 point version would look far better than an blown-up 44 point version.  Nobody in the publishing world uses bit-mapped fonts specifically because of this problem.

post #5 of 19
Thread Starter 

jaddie - Isn't the process of encoding an AAC from a lossless source analogous to making a Truetype font from the data available in a bit-mapped font? The reason why I think AAC could be decoded at any bit depth and sample rate is because the curves and shapes required to create characters from a Truetype font are similar to the cosine waves in an AAC. The second to last sentence in my first post in this thread should've been written as "That sounds like taking a Truetype font, rendering it at 44 point, then scaling it to 96 point" for clarity.

post #6 of 19
Quote:
Originally Posted by vhobhstr View Post

jaddie - Isn't the process of encoding an AAC from a lossless source analogous to making a Truetype font from the data available in a bit-mapped font?

 

It is not possible to encode information that was not present in the original lossless file in the first place. If it had a sample rate of 44.1 kHz, then the AAC cannot include any frequency above 22.05 kHz, and decoding to higher than 44.1 kHz does not make much sense either. However, there is actually a small (probably inaudible) gain from decoding to 24 bits; even if it obviously cannot recover information that was not there in the original 16 bit lossless file, it avoids a second quantization to 16 bits that would accumulate the noise. It also allows to reduce the volume during the decoding (to avoid clipping peaks that would go slightly above 0 dBFS in the output file) without a significant loss of dynamic range.


Edited by stv014 - 12/14/12 at 10:43am
post #7 of 19
Quote:
Originally Posted by vhobhstr View Post

jaddie - Isn't the process of encoding an AAC from a lossless source analogous to making a Truetype font from the data available in a bit-mapped font? The reason why I think AAC could be decoded at any bit depth and sample rate is because the curves and shapes required to create characters from a Truetype font are similar to the cosine waves in an AAC. The second to last sentence in my first post in this thread should've been written as "That sounds like taking a Truetype font, rendering it at 44 point, then scaling it to 96 point" for clarity.

No, they are not at all similar.  The idea behind all bit-rate reduction codecs is to eliminate data that represents inaudible information.  When decoding a bit-rate-reduced file, the original is not recovered.  Only what's "necessary" is there, hence the audible artifacts present in low bit rate files.  The AAC file does not contain instructions for reconstruction for reconstruction of the exact original, it contains information to construct a reasonable faximile only .  If you look under the hood of AAC, the first step is to convert the signal to the frequency domain using modified discrete cosine transform as a result of passing through what is essentially a group of filters.  The next step is to eliminate components that are "irrelevant" based on a psychoacoustic model.  The degree of relevancy is subjective and user selectable as a byproduct of the target bit rate, and as a result of adjusting the results of the psychoacoustic model.  But the point is, components are eliminated...permanently.  A Truetype font has all the information in it to be rendered at any size, all the time.  Scaling is a result of rendering instructions, things link "draw a curve of 45 degrees, with a radius and line thickness based on the target size".  It's instructions for creating the original character, as opposed to data that has been reduced but still represents the important parts of the original, as in AAC.    The process of creating a Truetype font may begin with a high resolution bitmap, but the font creation process is more of a definition of the tracing of the shape.  This is actually something I've done.  I started with a high res scan, ran it through software that created the outline, translated that into vectors and instructions, then compiled into a TT font.  The outlining and vectoring process is what differentiates the two processes.  AAC is nothing like outlining and vectoring, it's literally throwing away information based on a psychoacoustic model of the masking property of human hearing.

post #8 of 19
Thread Starter 
I'm actually interested in improving the frequencies that can be heard in the original recording. Imagine what a single cosine wave would look like decoded at 16 bit 44.1 kHz, and what the same cosine wave would look like decoded at 24 bit 96 kHz. If you zoom in, the 24 bit 96 kHz wave would be a little less jagged. Also, there are audible frequencies which only have a handful of samples per cycle. For example, 11025 Hz is represented by only four samples per cycle in a 44.1 kHz file. At 96 kHz it would have about eight. The higher sample rate would make waves look (and perhaps sound) a little more wavelike. A 16 bit 44.1 kHz AAC decoded at 24 bit 96 kHz would definitely be different from the original lossless file that the AAC was produced from (and even if it were decoded at 16 bit 44.1 kHz, it still wouldn't be exactly the same as the lossless file), but perhaps at 24/96 it would sound smoother.
 
However, maybe decoding a 16/44.1 AAC at 24/96 wouldn't sound significantly different from decoding the same AAC at 16/44.1 then resampling the decoded stream to 24/96, and that's why the method I'm suggesting isn't used. Maybe it would sound worse. Maybe there are some technical barriers.
 
By the way, I'm pretty sure that going from 44.1 kHz to 96 kHz is called resampling. What is the corresponding term that should be used to describe conversion from 16 bits to 24 bits? 

Edited by vhobhstr - 12/14/12 at 6:18pm
post #9 of 19
Quote:
Originally Posted by vhobhstr View Post

As far as I know, if I set my output to 24 bit 96 kHz, iTunes decodes my AAC files at 16 bit 44.1 kHz then scales the bit depth to 24 bits and resamples to 96 kHz.

 

The iTunes decoder requantizes to 24 bit, not 16 bit, when decoding.

post #10 of 19
Thread Starter 
Quote:
Originally Posted by MoonUnit View Post

 

The iTunes decoder requantizes to 24 bit, not 16 bit, when decoding.

So, if I set my output to 24 bit 96 kHz, the 16/44.1 AAC is decoded to 24 bit 44.1 kHz, then resampled to 24 bit 96 kHz?

post #11 of 19
Quote:
Originally Posted by vhobhstr View Post

So, if I set my output to 24 bit 96 kHz, the 16/44.1 AAC is decoded to 24 bit 44.1 kHz, then resampled to 24 bit 96 kHz?

 

Yes. Also, the resampling is done by the operating system, not by iTunes. If you set your operating system output to 44.1/16 bit, for example, iTunes requantizes to 24 bit and then the operating system requantizes again to 16 bits.

 

The 24 bit decoding step is one reason why Apple is trying to push labels to submit 24 bit originals for encoding.

post #12 of 19
Thread Starter 
Quote:
Originally Posted by MoonUnit View Post

 

Also, the resampling is done by the operating system, not by iTunes.

Is it done this way on both Windows and Mac? I remember reading an article (I think it was on computeraudiophile.com) that said, at least on a Mac, iTunes does the resampling. The article said that iTunes' resampling algorithm was better in some way than the resampling algorithm in CoreAudio (part of the Mac operating system).

 

Edit: here's a link (I'm not sure if it's the same article, but it does state that both iTunes and CoreAudio have sample rate conversion) - http://www.computeraudiophile.com/content/76-itunes-poor-performance-explained/


Edited by vhobhstr - 12/14/12 at 7:01pm
post #13 of 19
Quote:
Originally Posted by vhobhstr View Post

I'm actually interested in improving the frequencies that can be heard in the original recording. Imagine what a single cosine wave would look like decoded at 16 bit 44.1 kHz, and what the same cosine wave would look like decoded at 24 bit 96 kHz. If you zoom in, the 24 bit 96 kHz wave would be a little less jagged. Also, there are audible frequencies which only have a handful of samples per cycle. For example, 11025 Hz is represented by only four samples per cycle in a 44.1 kHz file. At 96 kHz it would have about eight. The higher sample rate would make waves look (and perhaps sound) a little more wavelike. A 16 bit 44.1 kHz AAC decoded at 24 bit 96 kHz would definitely be different from the original lossless file that the AAC was produced from (and even if it were decoded at 16 bit 44.1 kHz, it still wouldn't be exactly the same as the lossless file), but perhaps at 24/96 it would sound smoother.
 
However, maybe decoding a 16/44.1 AAC at 24/96 wouldn't sound significantly different from decoding the same AAC at 16/44.1 then resampling the decoded stream to 24/96, and that's why the method I'm suggesting isn't used. Maybe it would sound worse. Maybe there are some technical barriers.
 
By the way, I'm pretty sure that going from 44.1 kHz to 96 kHz is called resampling. What is the corresponding term that should be used to describe conversion from 16 bits to 24 bits? 

There's no "improving the frequencies" to be done by resampling, not to a higher sampling frequency or a greater bit depth.  Yes, if you zoom in it may look smoother, but all that's filtered off anyway.  The output analog wave is smooth in either case.    What you see in a DAW when you zoom in is before the DAC filters anything, and absolutely not representative of what comes tumbling out of your analog audio output. If you put an analog oscilloscope on the output of any DAC, you will not see jagged waveforms because the anti-aliasing filter removes them.

 

Everybody wants to get more out of the original recording by somehow magically resampling and creating more detail.  If it's not there in the original, ain't nothin' going to create the missing information.  All any kind of resampling does is connect the dots by interpolation.  That's it, no more than that.  

 

The only 24/96 recordings that contain more information than 16/44 recordings are ones created by digitizing at that rate from an original analog source, and the best analog source is a live mix. There may be a small advantage to recording originals at 24/96, but the big advantage lies in digital post where there's a lot of extra room for DSP to work on a 24 bit file.    Even some 15ips analog tapes have some information above 20KHz, so sampling at 96KHz might preserve that.  Too bad it's all inaudible.  No analog tape has dynamic range greater than 16 bits, not even with the best noise reduction, so not much to be gained from 24bit depth except to allow for more data for a DSP to work on, like a noise reduction or restoration processor. 

 

You cannot create more information by converting to a higher sampling rate or higher bit depth.  All either can ever do is connect the dots with interpolated data between know existing data, and frankly that does nothing to improve anything except to make anti-aliasing filters less difficult.  For example, lets take two samples at 44.1KHz, any two samples.  Now we want to make it into an 88.2KHz file.  That means there will be an additional sample added between the original two.  And where would you place the value of that sample?  Right smack between the originals.  Where else could it be?  And, even if you did skew it from a straight line interpolation, what you just did is add a distortion component at the new sampling frequency, which will in fact be completely filtered off by the final anti-aliasing filter.  It's just an academic exercise.

 

I believe the term "resampling" applies to both sample rate conversion and bit depth conversion.  

 

The best you can do is play the original as is and let your oversamling DAC do it's thing.

post #14 of 19
Quote:
Originally Posted by vhobhstr View Post

Is it done this way on both Windows and Mac? I remember reading an article (I think it was on computeraudiophile.com) that said, at least on a Mac, iTunes does the resampling. The article said that iTunes' resampling algorithm was better in some way than the resampling algorithm in CoreAudio (part of the Mac operating system).

 

Edit: here's a link (I'm not sure if it's the same article, but it does state that both iTunes and CoreAudio have sample rate conversion) - http://www.computeraudiophile.com/content/76-itunes-poor-performance-explained/

 

I looked into it... you're right, on OS X at least, iTunes invokes the system resampler separately (or at least prior versions did). The Core Audio resampler has a parameter that controls the quality, and iTunes specifies the highest quality parameter. I wonder if this has changed in recent versions, since Lion dramatically improved the system resampler even at normal settings.

 

Note that I'm definitely not wrong about the 24 bit requantization. Apple published a white paper on that to its iTunes partners.


Edited by MoonUnit - 12/14/12 at 7:52pm
post #15 of 19
Thread Starter 

jaddie - I didn't know that the DAC did filtering. Thanks for explaining that. However, in your example of making a 44.1KHz file into an 88.2KHz file, if the interpolation is skewed from a straight line interpolation, wouldn't the distortion component be at half the new sampling frequency, since only every other sample is skewed? Also, if a 44.1KHz AAC were decoded at 88.2KHz, not every interpolated sample would be skewed the same amount, or in the same direction. For example, in one cycle of an 11.025KHz cosine wave, the first sample would be an original*, the second would be slightly higher than a straight line interpolation, the third: an original, the fourth: slightly lower than a SLI, the fifth: an original, the sixth: slightly lower than a SLI, the seventh: an original, the eighth: slightly higher than a SLI. The interpolated samples would fit right on the curve of the cosine wave and so no distortion component would be added. With a straight line interpolation, however, the interpolated samples would not fit on the curve of the cosine wave. In fact they would add a low amplitude, odd shaped distortion component. This distortion component would be repeated at a rate of 11.025KHz.

 
*I'm using the term "original" assuming that there aren't any imperfections in the encoding and decoding process.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Sound Science
Head-Fi.org › Forums › Equipment Forums › Sound Science › Decoding AAC at a Higher Bit Depth and Sample Rate