Turning it around: FFT beats human hearing, EZ!
Feb 26, 2013 at 1:16 PM Thread Starter Post #1 of 11

xnor

Headphoneus Supremus
Joined
May 28, 2009
Posts
4,092
Likes
227
Last month a paper with the imho misleading title "Human Time-Frequency Acuity Beats the Fourier Uncertainty Principle" was published and consequently several nonsensical claims were made. Mostly by people who do not understand or haven't read much more than the title.
 
Stuff like: "human hearing beats FFT"
 
So I thought why not turn it around and beat human hearing with the FFT? It's a trivial test actually:
A test file contains several sine waves with different frequencies. The catch is the short length of the sine waves, because as we know, there's a tradeoff between time and frequency resolution with the FFT.
 
Basically, a N point FFT with a signal sampled at Fs times per second gives us a frequency resolution of: Fs/N.
Example: 48000/480 = 100 Hz resolution, so we have 240 evenly spaced frequency "bins" with a width of +/- 50 Hz from DC to 24000 Hz.
 
 
So back to the test. Test files are in the format 44.1/16.
Here's file A with 5 ms (220 samples) long sine waves:
sines_a
 
I doubt you can hear different frequencies, let alone sine waves. What you probably hear sounds more like *click* *click* *click*.
But using the FFT we can see it's sine waves at different frequencies.
The frequency resolution is just 200 Hz, but we can easily measure 10 Hz differences... so the FFT beats human hearing. Well that was easy.
 
 
10 ms (441 samples) long sine waves:
sines_b
 
Still sounds like clicks, right, so pretty hard to tell what the frequencies are by ear..
 
 
100 ms (4410 samples) long sine waves:
sines_c
 
Now you should be able to hear sine waves, but can you pick out the different frequencies? Which tone is higher pitched and which is lower?
 
 
500 ms (22050 samples) long sine waves:
sines_d
 
Now you should be able to clearly hear the different frequencies, if you can hear 10 Hz differences that is..
 
 
Frequency spectrum graphs will follow shortly. Here they are:
Here's the spectrum with padding in the time domain (= interpolation in the frequency domain) to 44100 samples (= "fake" 1 Hz visual resolution):

 
 
And here is the same unpadded (fft taken using only the 5 ms / 220 samples of data) with a resolution of 200.45 Hz:

 
As you can see, a frequency that doesn't hit the center of a bin leaks energy into adjacent bins. For example #5 leaks the most into the lower bin because it's also the tone with the lowest frequency.
 
 
Good day.
 
Feb 26, 2013 at 4:24 PM Post #2 of 11
Graphs posted above in the spoiler container.
 
Feb 26, 2013 at 5:06 PM Post #3 of 11
yes the title is generating some audio forum buzz - but you only have to read down the web page to see them backing away from the sensationalism of the headline
 
but apparently even reading the whole press release is too much for most who want to jump on the "human hearing has mysterious, unknown to Science capabilities" bandwagon
 
Feb 26, 2013 at 8:44 PM Post #5 of 11
In case anyone cannot read the graphs above and wonders what the frequencies of the tones are:
#1: 2000 Hz
#2: 2010 Hz
#3: 2000 Hz
#4: 1990 Hz
#5: 1980 Hz
#6: 2000 Hz
with 200 ms (8820 samples) pauses in between.
 
The tones were processed with the blackman window with 58 dB sidelobe attenuation.
 
Feb 26, 2013 at 11:45 PM Post #6 of 11
Oops, I tried C before B and A.  It's pretty easy to hear on C.  I'm not sure if I would have been able to do B without already knowing the order from C.  (it sounds apparent on B now, but would it have without already knowing it from C?)
 
Anyway, I need to check out the actual paper some time.
 
Feb 27, 2013 at 1:50 AM Post #7 of 11
With A I only heard clicks.
With B I heard a little more than clicks but couldn't tell them apart.
With C I could only tell apart 5 from the rest.
With D I could not tell apart 1 3 6, 2 and 4 with some difficulty, 5 definitively.
 
Feb 27, 2013 at 6:31 AM Post #8 of 11
Even if you could tell apart the tones (or clicks) in A it doesn't really matter. I could reduce the difference between frequencies to 1 Hz, or a fraction of that. Then you'd even have problems with the 500 ms sines while it would still show up differently in the graphs.
 
Feb 27, 2013 at 7:59 AM Post #9 of 11
It would be a more fair test if more than one tone would be playing at the same time, but with lengths, frequencies, and envelopes more similar to real (but fast and complex) music. The frequency of a single sine wave can be measured accurately using only a small number of samples (in the ideal case, with no windowing, 3 samples are enough). My sinetest utility, with the -w -130 parameter, measures the frequency of the 5 ms tones with less than 0.005 Hz error, even though it uses an FFT length of only 128 samples (1 bin = 344.53 Hz). However, with these short lengths, there is significant leakage into the other frequencies. For example, in the 5 ms case, there is less than 0.1 dB attenuation at +/- 10 Hz from the frequency of the tone.
 
In any case, this paper is not really relevant to measurements of audio gear, as the goal here is to ensure that the sound is not changed in ways typical of analog hardware (non-linear distortion, noise, etc.), rather than to analyze the music itself for purposes like transcribing or lossy compression.
 
Feb 27, 2013 at 9:22 AM Post #10 of 11
Quote:
In any case, this paper is not really relevant to measurements of audio gear, as the goal here is to ensure that the sound is not changed in ways typical of analog hardware (non-linear distortion, noise, etc.), rather than to analyze the music itself for purposes like transcribing or lossy compression.

Exactly, yet people misuse it trying (and failing) to support all kind of wild claims.
 
Feb 27, 2013 at 9:44 AM Post #11 of 11
Quote:
It would be a more fair test if more than one tone would be playing at the same time, but with lengths, frequencies, and envelopes more similar to real (but fast and complex) music.

I think the 10 Hz difference makes it pretty fair, not realistic but fair.
 
This reminds me of the clip pitch bug (0.25% off or 2.5 Hz for a 1 kHz tone). I don't think anyone heard that until it was measured.
 

Users who are viewing this thread

Back
Top