Ferbose
Headphoneus Supremus
- Joined
- Jun 27, 2004
- Posts
- 1,823
- Likes
- 24
While I was looking into topics related to ambience, I borrowed a few books from our university library and was surprised to learn about the importance of room acoustics on our basic perception of sound and music. I also realized that ambience reproduction is one of the most challenging aspects of music playback. It is also shocking that the most basic concepts in textbooks about room acoustics and spatial perception are rarely mentioned in audiophile press and discussion boards. Therefore I think it is good to share a few important things I have learned from the books.
The most basic thing to know about room acoustics is probably this: in an ordinary room the acoustic energy reaching our ears is more from reflected and diffracted sound than from direct sound, if the source is more than a few feet away. Here is a simple figure from a book that illustrates this phenomenon:
Figure A is a hypothetical source playing several sustained notes of fixed amplitude. Figure B shows the amplitude detected by a recording microphone in a room with moderate reverberation. Figure C is the amplitude measured in a more reverberant room. (Olson, Music, Physics and Engineering, 2nd Ed., Dover, New York, 1967)
It obvious that a simple on-off sound actually shows a pattern of attack, sustenance and decay in a real room. At the loudest moment, the energy reaching the ear has more contributions from reflected or diffracted sounds than direct sound. We know there is more to a sound than just amplitude--there is a time domain as well. Different frequencies of sound interact differently with the room, and therefore a complex sound comprised of multiple frequencies look very different in its waveform after just a few reflections, and soon becomes unrecognizable. On top of that, the relative position of the ear (microphone) to the source in the room has a huge effect on what is being detected. Even moving the microphone by a few inches can have drastic effects on frequency response and waveform shape. When a sound is made in a room and detected by a microphone a few feet away, the only predictable signal is the direct sound that travels at 1100 feet per second to the mike, and the initial waveform being recorded will closely resemble the original sound. Right after that reflected sound starts to dominate and the waveform becomes very unpredictable. It appears that inside any room other than an anechoic chamber, the ear will hear nothing but chaos. However, we don’t hear chaos in a real room. We can clearly identify people’s voice and their speeches, and we can differentiate the sounds of different instruments, in basically any room. In fact, we rarely notice the acoustical property of a room directly, unless there are obvious echoes (for instance, in a cave) or something. How does the ear-brain cope with the acoustic complexity due to room reflections?
As discussed earlier, direct sound reaches the ear before the reflected sound, and the waveform of the direct sound is basically unadulterated. Early reflected (a few bounces) sounds will have waveforms which are distorted but still correlated with the direct sound, and the brain will automatically merge the direct sound and correlated early reflections into a single sound, if time difference is less than ~60 ms. This is called precedence effect, or Haas effect. In fact, even if the echo within 60 ms is a few dB louder than direct sound, the person still hears a single sound, the direct sound only. If the reflected sound reaches the ear after 60 ms, it will be perceived as an echo. When one shouts at a big, flat wall in an open space, he can hear an echo only if he is far enough from the wall—we all know this from experience. Precedence effect says this distance has to be larger than ~30 feet for an echo to be heard. In a living room, there are still reflected sounds reaching the ear after 60 ms, but these sounds have been reflected and diffracted enough times that they become uncorrelated with the original sound. Their waveforms have become so distorted that they are no longer recognized by the brain as a part of the original sound, and instead are perceived as diffused reverberation. In a cave, however, the stones have flat, hard surfaces that reflect very efficiently without little distortion, and the ear keeps receiving correlated waveforms after 60 ms and interprets them as echoes.
The term ambience really includes two kinds of reflected sounds: early reflection (correlated) and reverberation (uncorrelated). Reverberation is late reflection that becomes uncorrelated with the original sound. We all know that reverberation is important for music since good orchestral halls all have reverberation time of around 2.2 seconds (reverberation time is defined as the time for reflected sound to become 1000 times, or -60 dB, less loud than the direct sound). Early reflections turn out to be even more important for sound quality than reverberation. When the ear first hears the direct sound, the brain can determine its location by two major mechanisms: interaural delay and head-related transfer function. The explanations for these two phenomena can be easily found by Google, so I will skip them. However, the brain also relies on early reflections to further confirm the source’s location and its spatial relation to room boundaries and objects. This readily explains why some close-miked studio recordings lack spatiality. When a monoaural source recorded without ambience is placed to the left or right by simple panpotting, there is no early reflected sound to further convince the brain it is really there. The brain is expecting early reflections but finds none, which causes spatial confusion. It is important to realize that although early reflected sound within 60 ms can’t be heard separately, it actually changes the perceived quality of the direct sound. When early reflected sound is heard, the direct sound is interpreted as being louder, clearer, warmer and more three-dimensional (all of which are qualities desired by audiophiles). Bob Katz has an excellent chapter in his book Mastering Audio about ambience and audio quality, and I learned a lot from there.
I hope the preceding discussion is enough to convince everyone that it is important to capture the natural ambiance in a recording (of acoustic music at least). Ambience conveys clarity, warmth, dimensionality and the sense of realism. Looking back at the figure shown earlier, we can see that if ambience is missing, the attack, sustenance, and decay of instrumental sounds will change significantly. It is not possible to faithfully reproduce the timbre of an instrument if ambience is too much missing. In the next post I will try to explain why stereophonic recordings cannot capture sufficient ambiance to simulate an actual concert experience. There are some inherent physical limitations associated with stereophonic playback, and I will also try to discuss why using headphones can makes things better in some cases but worse in most cases. Stay tuned….
The most basic thing to know about room acoustics is probably this: in an ordinary room the acoustic energy reaching our ears is more from reflected and diffracted sound than from direct sound, if the source is more than a few feet away. Here is a simple figure from a book that illustrates this phenomenon:
![reflected%20sound.gif](http://www.its.caltech.edu/~tai/reflected%20sound.gif)
Figure A is a hypothetical source playing several sustained notes of fixed amplitude. Figure B shows the amplitude detected by a recording microphone in a room with moderate reverberation. Figure C is the amplitude measured in a more reverberant room. (Olson, Music, Physics and Engineering, 2nd Ed., Dover, New York, 1967)
It obvious that a simple on-off sound actually shows a pattern of attack, sustenance and decay in a real room. At the loudest moment, the energy reaching the ear has more contributions from reflected or diffracted sounds than direct sound. We know there is more to a sound than just amplitude--there is a time domain as well. Different frequencies of sound interact differently with the room, and therefore a complex sound comprised of multiple frequencies look very different in its waveform after just a few reflections, and soon becomes unrecognizable. On top of that, the relative position of the ear (microphone) to the source in the room has a huge effect on what is being detected. Even moving the microphone by a few inches can have drastic effects on frequency response and waveform shape. When a sound is made in a room and detected by a microphone a few feet away, the only predictable signal is the direct sound that travels at 1100 feet per second to the mike, and the initial waveform being recorded will closely resemble the original sound. Right after that reflected sound starts to dominate and the waveform becomes very unpredictable. It appears that inside any room other than an anechoic chamber, the ear will hear nothing but chaos. However, we don’t hear chaos in a real room. We can clearly identify people’s voice and their speeches, and we can differentiate the sounds of different instruments, in basically any room. In fact, we rarely notice the acoustical property of a room directly, unless there are obvious echoes (for instance, in a cave) or something. How does the ear-brain cope with the acoustic complexity due to room reflections?
As discussed earlier, direct sound reaches the ear before the reflected sound, and the waveform of the direct sound is basically unadulterated. Early reflected (a few bounces) sounds will have waveforms which are distorted but still correlated with the direct sound, and the brain will automatically merge the direct sound and correlated early reflections into a single sound, if time difference is less than ~60 ms. This is called precedence effect, or Haas effect. In fact, even if the echo within 60 ms is a few dB louder than direct sound, the person still hears a single sound, the direct sound only. If the reflected sound reaches the ear after 60 ms, it will be perceived as an echo. When one shouts at a big, flat wall in an open space, he can hear an echo only if he is far enough from the wall—we all know this from experience. Precedence effect says this distance has to be larger than ~30 feet for an echo to be heard. In a living room, there are still reflected sounds reaching the ear after 60 ms, but these sounds have been reflected and diffracted enough times that they become uncorrelated with the original sound. Their waveforms have become so distorted that they are no longer recognized by the brain as a part of the original sound, and instead are perceived as diffused reverberation. In a cave, however, the stones have flat, hard surfaces that reflect very efficiently without little distortion, and the ear keeps receiving correlated waveforms after 60 ms and interprets them as echoes.
The term ambience really includes two kinds of reflected sounds: early reflection (correlated) and reverberation (uncorrelated). Reverberation is late reflection that becomes uncorrelated with the original sound. We all know that reverberation is important for music since good orchestral halls all have reverberation time of around 2.2 seconds (reverberation time is defined as the time for reflected sound to become 1000 times, or -60 dB, less loud than the direct sound). Early reflections turn out to be even more important for sound quality than reverberation. When the ear first hears the direct sound, the brain can determine its location by two major mechanisms: interaural delay and head-related transfer function. The explanations for these two phenomena can be easily found by Google, so I will skip them. However, the brain also relies on early reflections to further confirm the source’s location and its spatial relation to room boundaries and objects. This readily explains why some close-miked studio recordings lack spatiality. When a monoaural source recorded without ambience is placed to the left or right by simple panpotting, there is no early reflected sound to further convince the brain it is really there. The brain is expecting early reflections but finds none, which causes spatial confusion. It is important to realize that although early reflected sound within 60 ms can’t be heard separately, it actually changes the perceived quality of the direct sound. When early reflected sound is heard, the direct sound is interpreted as being louder, clearer, warmer and more three-dimensional (all of which are qualities desired by audiophiles). Bob Katz has an excellent chapter in his book Mastering Audio about ambience and audio quality, and I learned a lot from there.
I hope the preceding discussion is enough to convince everyone that it is important to capture the natural ambiance in a recording (of acoustic music at least). Ambience conveys clarity, warmth, dimensionality and the sense of realism. Looking back at the figure shown earlier, we can see that if ambience is missing, the attack, sustenance, and decay of instrumental sounds will change significantly. It is not possible to faithfully reproduce the timbre of an instrument if ambience is too much missing. In the next post I will try to explain why stereophonic recordings cannot capture sufficient ambiance to simulate an actual concert experience. There are some inherent physical limitations associated with stereophonic playback, and I will also try to discuss why using headphones can makes things better in some cases but worse in most cases. Stay tuned….