Last month a paper with the imho misleading title "Human Time-Frequency Acuity Beats the Fourier Uncertainty Principle" was published and consequently several nonsensical claims were made. Mostly by people who do not understand or haven't read much more than the title.

Stuff like: "human hearing beats FFT"

So I thought why not turn it around and beat human hearing with the FFT? It's a trivial test actually:

A test file contains several sine waves with different frequencies. The catch is the short length of the sine waves, because as we know, there's a tradeoff between time and frequency resolution with the FFT.

Basically, a N point FFT with a signal sampled at Fs times per second gives us a frequency resolution of: Fs/N.

Example: 48000/480 = 100 Hz resolution, so we have 240 evenly spaced frequency "bins" with a width of +/- 50 Hz from DC to 24000 Hz.

So back to the test. Test files are in the format 44.1/16.

Here's file A with **5 ms** (220 samples) long sine waves:

I doubt you can hear different frequencies, let alone sine waves. What you probably hear sounds more like *click* *click* *click*.

But using the FFT we can see it's sine waves at different frequencies.

The frequency resolution is just 200 Hz, but we can easily measure 10 Hz differences... so the **FFT beats human hearing. Well that was easy.**

**10 ms** (441 samples) long sine waves:

Still sounds like clicks, right, so pretty hard to tell what the frequencies are by ear..

**100 ms** (4410 samples) long sine waves:

Now you should be able to hear sine waves, but can you pick out the different frequencies? Which tone is higher pitched and which is lower?

**500 ms** (22050 samples) long sine waves:

Now you should be able to clearly hear the different frequencies, if you can hear 10 Hz differences that is..

Frequency spectrum graphs will follow shortly. Here they are:

**Warning: Spoiler!**(Click to show)

Here's the spectrum with padding in the time domain (= *interpolation* in the frequency domain) to 44100 samples (= "fake" 1 Hz visual resolution):

And here is the same unpadded (fft taken using only the 5 ms / 220 samples of data) with a resolution of 200.45 Hz:

As you can see, a frequency that doesn't hit the center of a bin leaks energy into adjacent bins. For example #5 leaks the most into the lower bin because it's also the tone with the lowest frequency.

Good day.

Edited by xnor - 2/26/13 at 1:22pm