There is a lot of talk of Hi-Rez audio these days, instead of one, there are now several Hi-Rez formats becoming available. I'll try to explain here what these formats are, what they do and the reality behind their existence. Although there is a great deal of technical complexity in digital audio, I'll try and keep this as simple as I can (loss of accuracy is inevitable when simplifying but I'll do my best to minimise it).
Sound waves travelling through the air to our ears have only two attributes, the frequency of the waves, (IE how many waves per second, measured in Hertz, Hz), and the height or energy of the waves (applitude, measured in Decibels, dB). The different frequencies contained in sound waves are related to the pitch and tonal characteristics of what we hear and the amplitude is related to what we hear as volume. A simple way of understanding digital audio, is to assume that frequency is encoded by the sample rate (how many times a second we measure the sound waves) and the amplitude is encoded by the bit depth (the range of values available for each measurement).
Bit Depth
I will just summarize here, as I covered this in a thread in more detail a couple of years ago (see here). Bit depth is responsible for encoding the volumes or dynamic range of the sound waves. Each bit allows for approximately 6dB of dynamic range, 16bit = 96dB and 24bit = 144dB. There is no quality (or any other) difference between 16bit and 24bit except for the additional 48dB of dynamic range. This extra dynamic range is useful for professional recording (to provide headroom) but as headroom is not required on playback, 24bit is of no benefit to the consumer. Due to some clever technology (noise-shaped dither) we can enhance 16bit so it appears to the ear to be equivalent to 20bit (120dB dynamic range). Consider that the most dynamic recordings ever released have a dynamic range of no more than about 60dB and you can see that CD (16bit) already allows roughly 1000 times more dynamic range than is ever used. There is no disadvantages of 24bit over 16bit, except the additional storage space required.
Sampling Rate
The sampling rate is measured in samples per second (S/s: kS/s or mS/s). The science upon which digital audio exists is called the Nyquist-Shannon Sampling Theorem. The sound wave frequencies which we can encode in digital audio are defined by the Nyquist Point, which works out at half the sampling frequency. With CD the sampling frequency is 44.1kS/s, so the audio frequency we can encode is limited to 22.05kHz. Anything above 22.05kHz (mostly just noise) has to be removed using a filter.
Human hearing in adults does not extend beyond 20kHz, so CD (44.1kS/s) with it's audio limit of 22.05kHz appears to cover any eventuality. However, applying the required filter (to remove anything beyond the Nyquist Point) is likely to cause some issues (phase or ringing issues) lower than the Nyquist Point (EG. At around 20kHz). Increasing the sample rate to 96kS/s means the Nyquist Point is at 48kHz, which in theory allows plenty of space outside the range of human hearing to cure the phase or ringing issues. So in theory a sample rate of 96kS/s should provide a more linear (accurate) recording within the hearing range than 44.1kS/s. Indeed, measurements bare this out. However, I say in theory because a number of scientific studies have shown that using correctly constructed tests, no one has been able to perceive a difference between 44.1kS/s and 96kS/s.
What about sample rates of 176.4kS/s, 192kS/s and even 384kS/s. Here we start running into problems. Electronic engineering is based on compromise. For example using a filter creates complications with phase and ringing, however not using a filter breaks the Nyquist-Shannon Sampling Theorem and causes even greater problems (distortion due to alias images). So using a filter is a necessary compromise, being the lesser of two evils. These higher sample rates are also a compromise. 192KS/s provides a potential benefit of recording frequencies up to 96kHz. The downside is that the calculations which need to be carried out when implementing the required filter are now 4 times more complex and have to be carried out 4 times more quickly with 192kS/s than with 44.1kS/s. Unfortunately the laws of physics make this impossible*. To get around this problem, the chip designers have had to simplify the filter implementation at these higher sample rates, resulting in much less efficient filters. Distortion, ringing and phase issues are all measurably poorer at 176.4kS/s and 192kS/s than at 96kS/s. So, the trade off here is: The benefit of being able to record frequencies between 48kHz and 96kHz verses more distortion and other unwanted artefacts. Also, consider these points:
1. Musical instruments produce virtually no energy beyond 48kHz. So there is nothing much there to record except noise.
2. Few standard studio microphones can record much above 20kHz and none record above 48kHz anyway.
2. 48kHz is already more than twice the highest frequency a human can hear.
3. Almost no commonly available speakers or cans reproduce any thing higher than about 40kHz and most can't produce above 20kHz.
With this in mind, the are no advantages of theoretically being able to record frequencies between 48kHz and 96kHz, all that is left is the disadvantages of 176.4kS/s and higher.
For these reasons, some of the very highest end professional converters do not even provide sampling rates higher than 96kS/s. Any self respecting, well educated audio professional would not use sample rates of 176.4kS/s or higher unless forced by clients.
Conclusion
There are no benefits of 24bit (or 32bit) over 16bit. There is a theoretical benefit of 88.2kS/s and 96kS/s over 44.1kS/s. The so called Hi-Rez formats of 176.4kS/s and higher are actually of poorer quality and should be avoided. In other words, 16/44 provides more “hi-rez” than the human ear can detect but if you want to play it absolutely safe, 24/88.2 or 24/96 provides the highest resolution available.
Marketing
The difficulty facing the audio industry is that 16/44 is an old and well established technology. It's difficult and not very profitable to keep selling the same thing for years. On the other hand, it's easy to convince consumers that bigger numbers are better, so hi-rez provides an ideal opportunity to sell the same customers new equipment and new music collections. Everyone wins, the companies stay in business and the consumers think they are getting something better. The real shame is that instead of spending their development money improving the quality of their products at 16/44, they are spending their money aiming for bigger and bigger meaningless numbers to make their marketing departments happy, while actually reducing audio fidelity.
Observations
So, all those people who believe 24/192 is better than 16/44 are just fooling themselves? In a nut shell yes but it's not quite that simple! I know of examples where a 16/44 version has been deliberately doctored to sound worse than a 24/96 or 24/192 version. So in this case, they are not fooling themselves but are deliberately being fooled. Also, concerning converter design: As mentioned before, electronic engineering is effectively a trade off. This trade off can be the lesser of two technical problems or it can be cost verses quality. A converter or chip manufacturer may decide to spend more time and money on handling one sample rate better than another. So it's entirely possible that 24/96 may sound better than 16/44 with a particular converter due to it's filter or other design considerations. There's no real way around the problems with 176.4kS/s and higher though, so the only explanation for hearing improvements at these sample rates is extraordinarily bad 44.1k and 96k filters in your DAC or placebo effect.
*Modern technology has reached the speed limits of the laws of physics (the speed at which a capacitor can be charged for example). Most CPU advances these days is largely centred around data handling and the efficiency of breaking down complex tasks into simpler ones and providing more cores, so more of these simpler tasks can be computed at the same time. More cores is of limited benefit to digital signal processing because often the tasks cannot be broken down any further. One task often starts with the results from a previous task, so these two tasks cannot be calculated at the same time.
Some further reading and supporting evidence:
Lavry White Paper
Benchmark Statement
ProSoundWeb (Big guns discussion)
G
Sound waves travelling through the air to our ears have only two attributes, the frequency of the waves, (IE how many waves per second, measured in Hertz, Hz), and the height or energy of the waves (applitude, measured in Decibels, dB). The different frequencies contained in sound waves are related to the pitch and tonal characteristics of what we hear and the amplitude is related to what we hear as volume. A simple way of understanding digital audio, is to assume that frequency is encoded by the sample rate (how many times a second we measure the sound waves) and the amplitude is encoded by the bit depth (the range of values available for each measurement).
Bit Depth
I will just summarize here, as I covered this in a thread in more detail a couple of years ago (see here). Bit depth is responsible for encoding the volumes or dynamic range of the sound waves. Each bit allows for approximately 6dB of dynamic range, 16bit = 96dB and 24bit = 144dB. There is no quality (or any other) difference between 16bit and 24bit except for the additional 48dB of dynamic range. This extra dynamic range is useful for professional recording (to provide headroom) but as headroom is not required on playback, 24bit is of no benefit to the consumer. Due to some clever technology (noise-shaped dither) we can enhance 16bit so it appears to the ear to be equivalent to 20bit (120dB dynamic range). Consider that the most dynamic recordings ever released have a dynamic range of no more than about 60dB and you can see that CD (16bit) already allows roughly 1000 times more dynamic range than is ever used. There is no disadvantages of 24bit over 16bit, except the additional storage space required.
Sampling Rate
The sampling rate is measured in samples per second (S/s: kS/s or mS/s). The science upon which digital audio exists is called the Nyquist-Shannon Sampling Theorem. The sound wave frequencies which we can encode in digital audio are defined by the Nyquist Point, which works out at half the sampling frequency. With CD the sampling frequency is 44.1kS/s, so the audio frequency we can encode is limited to 22.05kHz. Anything above 22.05kHz (mostly just noise) has to be removed using a filter.
Human hearing in adults does not extend beyond 20kHz, so CD (44.1kS/s) with it's audio limit of 22.05kHz appears to cover any eventuality. However, applying the required filter (to remove anything beyond the Nyquist Point) is likely to cause some issues (phase or ringing issues) lower than the Nyquist Point (EG. At around 20kHz). Increasing the sample rate to 96kS/s means the Nyquist Point is at 48kHz, which in theory allows plenty of space outside the range of human hearing to cure the phase or ringing issues. So in theory a sample rate of 96kS/s should provide a more linear (accurate) recording within the hearing range than 44.1kS/s. Indeed, measurements bare this out. However, I say in theory because a number of scientific studies have shown that using correctly constructed tests, no one has been able to perceive a difference between 44.1kS/s and 96kS/s.
What about sample rates of 176.4kS/s, 192kS/s and even 384kS/s. Here we start running into problems. Electronic engineering is based on compromise. For example using a filter creates complications with phase and ringing, however not using a filter breaks the Nyquist-Shannon Sampling Theorem and causes even greater problems (distortion due to alias images). So using a filter is a necessary compromise, being the lesser of two evils. These higher sample rates are also a compromise. 192KS/s provides a potential benefit of recording frequencies up to 96kHz. The downside is that the calculations which need to be carried out when implementing the required filter are now 4 times more complex and have to be carried out 4 times more quickly with 192kS/s than with 44.1kS/s. Unfortunately the laws of physics make this impossible*. To get around this problem, the chip designers have had to simplify the filter implementation at these higher sample rates, resulting in much less efficient filters. Distortion, ringing and phase issues are all measurably poorer at 176.4kS/s and 192kS/s than at 96kS/s. So, the trade off here is: The benefit of being able to record frequencies between 48kHz and 96kHz verses more distortion and other unwanted artefacts. Also, consider these points:
1. Musical instruments produce virtually no energy beyond 48kHz. So there is nothing much there to record except noise.
2. Few standard studio microphones can record much above 20kHz and none record above 48kHz anyway.
2. 48kHz is already more than twice the highest frequency a human can hear.
3. Almost no commonly available speakers or cans reproduce any thing higher than about 40kHz and most can't produce above 20kHz.
With this in mind, the are no advantages of theoretically being able to record frequencies between 48kHz and 96kHz, all that is left is the disadvantages of 176.4kS/s and higher.
For these reasons, some of the very highest end professional converters do not even provide sampling rates higher than 96kS/s. Any self respecting, well educated audio professional would not use sample rates of 176.4kS/s or higher unless forced by clients.
Conclusion
There are no benefits of 24bit (or 32bit) over 16bit. There is a theoretical benefit of 88.2kS/s and 96kS/s over 44.1kS/s. The so called Hi-Rez formats of 176.4kS/s and higher are actually of poorer quality and should be avoided. In other words, 16/44 provides more “hi-rez” than the human ear can detect but if you want to play it absolutely safe, 24/88.2 or 24/96 provides the highest resolution available.
Marketing
The difficulty facing the audio industry is that 16/44 is an old and well established technology. It's difficult and not very profitable to keep selling the same thing for years. On the other hand, it's easy to convince consumers that bigger numbers are better, so hi-rez provides an ideal opportunity to sell the same customers new equipment and new music collections. Everyone wins, the companies stay in business and the consumers think they are getting something better. The real shame is that instead of spending their development money improving the quality of their products at 16/44, they are spending their money aiming for bigger and bigger meaningless numbers to make their marketing departments happy, while actually reducing audio fidelity.
Observations
So, all those people who believe 24/192 is better than 16/44 are just fooling themselves? In a nut shell yes but it's not quite that simple! I know of examples where a 16/44 version has been deliberately doctored to sound worse than a 24/96 or 24/192 version. So in this case, they are not fooling themselves but are deliberately being fooled. Also, concerning converter design: As mentioned before, electronic engineering is effectively a trade off. This trade off can be the lesser of two technical problems or it can be cost verses quality. A converter or chip manufacturer may decide to spend more time and money on handling one sample rate better than another. So it's entirely possible that 24/96 may sound better than 16/44 with a particular converter due to it's filter or other design considerations. There's no real way around the problems with 176.4kS/s and higher though, so the only explanation for hearing improvements at these sample rates is extraordinarily bad 44.1k and 96k filters in your DAC or placebo effect.
*Modern technology has reached the speed limits of the laws of physics (the speed at which a capacitor can be charged for example). Most CPU advances these days is largely centred around data handling and the efficiency of breaking down complex tasks into simpler ones and providing more cores, so more of these simpler tasks can be computed at the same time. More cores is of limited benefit to digital signal processing because often the tasks cannot be broken down any further. One task often starts with the results from a previous task, so these two tasks cannot be calculated at the same time.
Some further reading and supporting evidence:
Lavry White Paper
Benchmark Statement
ProSoundWeb (Big guns discussion)
G