The Most Important Spec Sheet: The Human Ear
Jan 12, 2013 at 2:51 PM Thread Starter Post #1 of 95

bigshot

Headphoneus Supremus
Joined
Nov 16, 2004
Posts
26,691
Likes
6,286
Location
A Secret Lab
I think it would be useful to assemble a spec sheet for human hearing, outlining the thresholds of human perception. That way people can put published specs in context. Feel free to suggest specs and link to citations, and I'll assemble them to this top post. I'll start it out...

SPECIFICATIONS OF THE HUMAN EAR
Thresholds of Perception

FREQUENCY RESPONSE

20 Hz to 20 kHz (optimal hearing)
20 Hz to 15 kHz (over 50)
http://en.wikipedia.org/wiki/Hearing_range

DYNAMIC RANGE

Peak volume 130 dB (threshold of pain)
http://hyperphysics.phy-astr.gsu.edu/hbase/sound/earsens.html

Noise floor 30 dB for (quiet listening room)
http://www.gcaudio.com/resources/howtos/loudness.html

GROUP DELAY (PHASE SHIFT)

Threshold of Audibility 1 to 3 ms (500Hz to 8kHz)
http://sound.westhost.com/ptd.htm

DISTORTION

Just Detectable Threshold: 1% (Non Linear Distortion)
http://www.audioholics.com/education/acoustics-principles/human-hearing-distortion-audibility-part-3
http://www.alpsadriaacoustics.org/archives/Full Papers/Furdek_Harmonic Distortion Perception Threshold.pdf

JITTER

Just Detectable Threshold in Music 20ns, perhaps higher than 200ns (ref. the work of Dr. Ashihara)
http://www.aes.org/e-lib/browse.cfm?elib=8354 (needs subscription)
http://www.nanophon.com/audio/1394_sampling_jitter.pdf (cited in section 2.2)
 
Last edited:
Jan 12, 2013 at 4:28 PM Post #2 of 95
This isn't quite a limitation because evolution has corrected for it, but.... The neurons responsible for sending sound information to the brain do so by creating action potentials. These action potentials represent the voltage spike observed within a neuron when it fires. Since this is a physical process, there are limitations to how many times/second a single neuron can fire, and that limit is about 300-500 times per second, equivalent to 300-500Hz. By audio equipment standards, that kind of frequency response is laughable. The reason you can hear sounds above 500Hz is that multiple neurons fire in a "volley" mimicing the higher frequency signals from the ear, and thus producing higher frequency sounds for the brain to perceive. This is not a limitation of the cilia/cochlea/ear, it is a limitation of neurons, and can be compared to the effect that lower sampling rates have on frequency. So I guess in spec terms: the frequency response of a single neuron is limited to 500Hz
 
Jan 12, 2013 at 6:44 PM Post #3 of 95
I'm looking to create a list of specs that help people put equipment specs in context. It's a sort of straightforward thing. The idea is to create a useful resource. I did frequency response. If someone wants to do distortion thresholds, dynamic range, jitter thresholds, etc...
 
Jan 12, 2013 at 7:48 PM Post #4 of 95
Well, this site has some facts regarding dynamic range, pitch resolution, and amplification... http://hyperphysics.phyastr.gsu.edu/hbase/sound/earsens.html Dynamic range: 130db, and distortion threshold appears to be linked to pain threshold. I'm curious to see if anyone can come up with THD or noise floor numbers. People tend to go crazy after an hour in an anechoic chamber, so the ear must be used to having atleast 10-20db of ambient noise. I've also heard that people within anechoic chambers hear a humming noise which is often attributed to the sound of their nervous system. Perhaps that nervous system buzzing is equivalent to a noise floor?   
 
Jan 12, 2013 at 9:42 PM Post #5 of 95
Noise floor would probably correlate to the noise floor in a quiet listening room
 
Jan 12, 2013 at 9:46 PM Post #6 of 95
For group delay, there's the old 1978 paper by Blauert and Laws:
http://asadl.org/jasa/resource/1/jasman/v63/i5/p1478_s1 (need subscription)
 
Values from that listed in Table 5 here:
http://sound.westhost.com/ptd.htm
 
Not sure about the methodology there, as I have not read the paper.
 
 
For distortion associated with jitter, there are a lot of papers.  Dunn's 1992 paper claims 20ps (!) at 20 kHz, but it seems to be based on modeling and not listening tests:
http://www.nanophon.com/audio/jitter92.pdf
 
A later paper by Benjamin and Gannon in 1998 says 10ns rms, apparently using a 17 kHz test tone; 20ns for actual music:
http://www.aes.org/e-lib/browse.cfm?elib=8354 (need subscription)
 
as cited afterwards by Dunn here (see section 2.2):
http://www.nanophon.com/audio/1394_sampling_jitter.pdf
 
Seems like some publications and people like quoting the really pessimistic numbers.  Though of course then you see Stereophile reviews where devices with unusually bad jitter or whatnot getting great marks.
 
 
 
Many listening rooms should be under 30 dB, particularly out in the country, best-case scenario, etc.  (edit: by "many", I do not mean to imply "most")  Anyway, what about if people are using IEMs with 20+ dB attenuation?
 
Jan 12, 2013 at 9:55 PM Post #7 of 95
20ns to 20ps is quite a range. Assuming this spec sheet is going to be used by people who are listening to music, what figure and citation do you think would e the best.

Is the group delay spec ok? That's a bit out of my ballywick. I'm looking for just a simple spec that doesn't require a lot of explanation so I put it as a range
 
Jan 12, 2013 at 10:05 PM Post #8 of 95
20ps seems to be the analytical / assumptions / modeling result, seemingly very pessimistic.  20ns was with actual listeners with actual music, so use that one of course.  There is a lot more out there on the subject which may not agree.  Best to look at any results that involve people listening to things, and then categorize by worst-case test signal vs. music.
 
 
Here's a table:
 
Group Delay Audibility Thresholds
Frequency Threshold
500 Hz 3.2 ms
1 kHz 2 ms
2 kHz 1 ms
4 kHz 1.5 ms
8 kHz 2 ms
 
Man that looks kind of ugly.  I don't think head-fi has code tags?
 
Code:
[/font][font=verdana] Frequency    Threshold[/font]
 [font=verdana]   500Hz         3.2 ms[/font]
 [font=verdana]   1kHz          2   ms[/font]
 [font=verdana]   2kHz          1   ms[/font]
 [font=verdana]   4kHz          1.5 ms[/font]
 [font=verdana]   8kHz          2   ms
 
Jan 13, 2013 at 2:25 AM Post #9 of 95
Thats the group delay table, not the jitter table. I'd suggest neither applies to headphone specs though. No headphone or amp would ever have anywhere near audible group delay, and jitter only occurs in digital systems. You have to be caful soul that jitter spec too, as the data stream may have lots of jitter but it's possible to design a dac that reclocks it out.
 
Jan 13, 2013 at 4:03 AM Post #10 of 95
Quote:
Thats the group delay table, not the jitter table. I'd suggest neither applies to headphone specs though. No headphone or amp would ever have anywhere near audible group delay, and jitter only occurs in digital systems. You have to be caful soul that jitter spec too, as the data stream may have lots of jitter but it's possible to design a dac that reclocks it out.

 
Weird, I think I was posting the table in response to something that got edited out, or I was thinking about something else.  Any time I put two lines of space between paragraphs, that's a separate topic—a convention that nobody else follows, so everybody else should have gotten the memo via ESP.  (What?  You missed it?)
 
But yeah, all those group delay values are way higher than anything you'd see normally.  As always, some people claim lower values, maybe not with much evidence.  I agree that the appropriate label seems to be "doesn't matter" for both jitter and group delay, because few devices have problems of that kind of magnitude.  As you say, it might be worth a mention that we're talking about jitter on the D/A, as that's what matters (not jitter of any link or anywhere else, unless that actually has an effect on the D/A).
 
 
Does anybody have the references on distortion audibility?  These seem to be a bit over the map as well.  Furthermore, under what circumstances are these tested?  Some levels quoted are far below what transducers can manage.  e.g. If you're using a test signal with 0.05% THD (5th harmonic only) @ 1 kHz test tone vs. 0.01% through transducers with 0.1% THD, what does that really mean?
 
Jan 13, 2013 at 9:36 AM Post #11 of 95
Quote:
Does anybody have the references on distortion audibility?  These seem to be a bit over the map as well.  Furthermore, under what circumstances are these tested?  Some levels quoted are far below what transducers can manage.  e.g. If you're using a test signal with 0.05% THD (5th harmonic only) @ 1 kHz test tone vs. 0.01% through transducers with 0.1% THD, what does that really mean?

I'll look for the reference later, but you can't have a single THD figure, the specific harmonic content makes the difference.  For example, from memory, even-order harmonic distortion can be as high as 10% without reliable  detection, where odd-order THD is clearly audible above 3%, and sometimes less than that. But the vanishingly low .05% figures are way below what can be heard, as long as you specify the exact content.  To compare against what a transducer does, you need a spectrum of the transducer.  This isn't going to be simple or easy.
 
The problem is the traditional measurement method for THD+N is simply to excite with a pure tone then when the output of the DUT is measured, subtract that tone's fundamental only, and sum everything that's left including noise.  Interesting as a single number test, but it may not equate to audibility of distortion well at all.  
 
To use "human hearing specs" as a base-line for THD would be really a 3 dimensional array of data.
 
edit: Sorry, "even-order" should have been "second-order"
 
Jan 13, 2013 at 1:22 PM Post #12 of 95
Quote:
you can't have a single THD figure, the specific harmonic content makes the difference. For example, from memory, even-order harmonic distortion can be as high as 10% without reliable detection, where odd-order THD is clearly audible above 3%, and sometimes less than that.

 
Yes, sort of.
biggrin.gif

 
Most musical instruments have both even- and odd-order harmonics, so an audio device that adds that sort of distortion just changes the instrument's timbre somewhat. This may or may not be noticeable. What matters even more is IM distortion, and also the spectrum of the distortion. Our ears are most sensitive around 2 to 4 KHz, so harmonic distortion there is most noticeable, regardless of whether it's odd or even.
 
Another important reason you can't just define "what is audible" with numbers is the Masking Effect. This is from my book The Audio Expert:
 
The masking effect influences the audibility of artifacts. Masking is an important principle because it affects how well we can hear one sound in the presence of another sound. If you’re standing next to a jackhammer, you won’t hear someone talking softly ten feet away. Masking is strongest when the loud and soft sounds have similar frequency ranges. So when playing an old Led Zeppelin cassette, you might hear the tape hiss during a bass solo but not when the cymbals are prominent. Likewise, you’ll easily hear low-frequency AC power line hum when only a tambourine is playing, but maybe not during a bass or timpani solo.

Low-frequency hum in an audio system is the same volume whether the music is playing or not. So when you stop the CD, you can more easily hear the hum because the music no longer masks the sound. Some artifacts like tape modulation noise and digital jitter occur only while the music plays. So unless they’re fairly loud, they won’t be audible at all. Note that masking affects our ears only. Spectrum analyzers and other test gear can easily identify any frequency in the presence of any other frequency, even when one is 100 dB or more lower in level than the other. In fact, this is the basis for lossy MP3-type compression, where musical data that are deemed inaudible due to masking are removed, reducing the file size.
 
--Ethan
 
Jan 13, 2013 at 2:18 PM Post #13 of 95
Yes, sort of. :D

Most musical instruments have both even- and odd-order harmonics, so an audio device that adds that sort of distortion just changes the instrument's timbre somewhat. This may or may not be noticeable. What matters even more is IM distortion, and also the spectrum of the distortion. Our ears are most sensitive around 2 to 4 KHz, so harmonic distortion there is most noticeable, regardless of whether it's odd or even.

Another important reason you can't just define "what is audible" with numbers is the Masking Effect. This is from my book The Audio Expert:

The masking effect influences the audibility of artifacts. Masking is an important principle because it affects how well we can hear one sound in the presence of another sound. If you’re standing next to a jackhammer, you won’t hear someone talking softly ten feet away. Masking is strongest when the loud and soft sounds have similar frequency ranges. So when playing an old Led Zeppelin cassette, you might hear the tape hiss during a bass solo but not when the cymbals are prominent. Likewise, you’ll easily hear low-frequency AC power line hum when only a tambourine is playing, but maybe not during a bass or timpani solo.

Low-frequency hum in an audio system is the same volume whether the music is playing or not. So when you stop the CD, you can more easily hear the hum because the music no longer masks the sound. Some artifacts like tape modulation noise and digital jitter occur only while the music plays. So unless they’re fairly loud, they won’t be audible at all. Note that masking affects our ears only. Spectrum analyzers and other test gear can easily identify any frequency in the presence of any other frequency, even when one is 100 dB or more lower in level than the other. In fact, this is the basis for lossy MP3-type compression, where musical data that are deemed inaudible due to masking are removed, reducing the file size.

--Ethan


Hi Ethan!

Great post, thanks. If you read the rest of my post you quoted, this kind kind of thing is exactly what I was alluding to by indicating that single figure THD cannot represent distortion audibility.
 
Jan 13, 2013 at 2:30 PM Post #14 of 95
A range covering worse case for music listening is fine. The idea is to give someone a simple figure they can hold up against the THD specs on equipment tear sheets. We can link to a web page that has all the nitty gritty details if someone wants to look into it further.

As for specs that don't matter, that's the point of this thread. People get all wrapped up in specs they can't hear. We'll list the thresholds so they can clearly see what matters and what doesn't.
 
Jan 13, 2013 at 4:02 PM Post #15 of 95
Right, generally there should be some kind of list like:
 
2nd harmonic:  no more than X%
3rd harmonic:  no more than Y%  (a much lower number than X%)
4th harmonic:  no more than Z%
5th... (and so on, the odd ones and high-order ones, need be much lower)
 
...etc.
 
but that's just for a single tone, and at a certain frequency, at a certain loudness.  And as noted, all this could be masked in practice by other sounds, and there are other factors of play too.  Anyway, of more practical interest for music might be the IMD (but we test which tones?), but that has the same caveats associated.
 
10% second-order seems very high, easy to detect.  I just generated 220 Hz + 440 Hz (2nd harmonic at 10% amplitude) vs. pure 220 Hz, replaygain, and it was obvious, at least for that pure tone.
 

Users who are viewing this thread

Back
Top