All legitimate questions, so I'll try to reply as best as I can.
First of all, very short (< 200 milliseconds) sounds of up to 170 db, such as discharging of a firearm, do not cause immediate permanent hearing loss due to rupturing or breakage of middle ear elements, which shows that the middle ear machinery is pretty robust: http://www.keepandbeararms.com/information/XcIBViewItem.asp?id=2052.
What does cause damage is more prolonged sound of over 120 db (corresponding, evolutionary, to perhaps the loudest sound in nature humans heard before the modern age - the sound of a very close thunder). And of course prolonged exposure to sounds over 85 db can cause permanent damage as well. That hearing damage occurs due to the exhaustion of hair cells and nerve cells in cochlea: http://brneurosci.org/noise.html.
As I explained in my previous post, very high amplitudes may appear momentarily (for tens of microseconds to tens of milliseconds) as a result of summation of the "safe level" sine waves. Given that there are 24 or less main frequencies in a practical music recording at any given moment, and assuming that the component waves are all at 96db, the combined amplitude can go - momentarily - up to 123 db, which can still be safely handled by the middle ear machinery.
The inner ear is essentially a Fourier transformer, and the sound energy applied to a specific hair cell is just a fraction of the overall energy supplied through the middle ear machinery. In effect, the hair cell only "hears" the frequency band it is anatomically tuned to hear. This also explains why pure sine waves, even of moderate amplitude, are so unpleasant - the whole energy of the signal is concentrated on a narrow subset of hair cells and can overwhelm them.
Now, let's see what happens when the amplitude dynamic range is limited to 96db (that is, 16 bit). The sound engineer has two extreme choices and some compromise room in between:
(1) Ensure that all summary amplitudes are recorded without clipping, which immediately drops the maximum amplitude of component waves, making the music sound too quiet at normal loudness settings, and also elevating the noise floor relative to the maximum component amplitude;
(2) Compress the dynamic range by clipping the summary amplitudes, which of course distorts the original signal - when transformed into the frequency domain by the inner ear, all kinds of missed frequencies, invalid amplitudes, and phantom frequencies appear - that's where the characteristic "metallic", scratchy sound of over-boosted CD mixes comes from.
Combined with inability to capture the high-frequency component frequencies correctly at 44.1 sampling, due to the non-compliance with Nyquist theorem infinite signal length requirement, the 16/44 does pretty poor job of capturing live sound. We need to remember that CD was developed in late 1970-ies, and its designers were severely limited by the capabilities of optics and electronics of that era. Those were days of Apple II, based on 8-bit 1MHz CPU, which retailed for $1290.
I once read an extremely technical article, to which unfortunately I couldn't readily find online reference, proving that the threshold of realistic sound recording is somewhere in the vicinity of 20/70, and of course 24/96 is exceeding it. With 24 bits, there is no need to worry about the summary amplitudes clipping, and thus the original frequencies are restored accurately by the inner ear, aided by the higher sample rate that takes care of the non-infinite nature of the signal.
Subjectively, SACD provides "natural" dynamic range and clear separation of realistically sounding instruments, with inaudible noise. This is especially noticeable on symphonic music. Of course, sharp hearing and high-quality sound system are required to notice the difference.
Quote:
Since you seem knowledgeable in this subject, I'd like to ask the following questions
1) As far as I know, there is practically no music reproduction that requires a >96 dB range, if you room is quiet, you'd still have ~ 30 dB background noise, making a reproduced signal of 126 dB unlistenable since it would we too loud. So why would someone need a >16 bit recording for listening to music (I purposely exclude mastering, mixing, producing music).
2) No DAC or ADC has ever shown more than a -130 dB noise floor, or a better than -120 dB THD+N due to thermal moise, RFI, whatever.., music beyond 20-21 bit is impossible to reproduce.
3) In practice, how often does music go beyond a 96 dB range, ie. if your loudest sound is 130 dB, I would assume softest significant sound is above 34 dB.
While 32 or even 64 bit might be useful for internal calculations, shouldn't 16 bit be sufficient for 99.9% of musical material, and 24 bit sufficient for the remaining .1%? Not to mention that the subjective dynamic range increases with proper noise shaping?