... portions deleted ...
No, not even close. But NVidia's marketing is amazing at making ordinary people believe this nonsense.
Now playing: Fionn Regan - Cormorant Bird
Jawed, where do you think the bottleneck is?
You got me curious to find out exactly how expensive 1MM taps is. It certainly isn't the bottleneck.
I don't have a CUDA dev kit to run tests on, but testing with a standard CPU that's 5 yrs old (my PC is an i7-4770K @ 3.5GHz), I can get:
138 xRT on 1 million taps at 1Fs (48kHz) and 24 bit PCM resolution.
62 xRT at 2Fs
28 xRT at 4Fs
I used an off-the-shelf free package (
http://convolver.sourceforge.net/) that seems to crash at 6Fs. Output below.
The sublinear performance is due to memory access overhead bec I'm simulating 2Fs with 4 tracks (whereas in reality, it's just 2 tracks with higher sampling rate so memory will not need to thrash). I did hte extra tests to insure that the trend is stable.
This is on a basic 4 core CPU. To make numbers round, let's call it 128 xRT at 1Fs, which would require 64 cores to get 2048Fs at 1xRT.
I can imagine that a CUDA core is typically at half the clock speed, so say we double it to require 128 cores....
Or we can build in a lot of margin and say it requires.... 1024 cores... I'm being very generous today.
A consumer grade gaming GeForce GTX 1080 has 2560 cores. I still have 1500+ cores to spare for the oversampling processing, noise-shaper, etc.
I can imagine that loading data at 2048Fs might be the bottleneck, but you're never really loading at greater than 8Fs and you're oversampling on the board to get to 2048Fs.
Backup data below. Both the 30sec wav and the FIR represented as a wav were randomly generated to be of the right length.
----
C:\Program Files (x86)\Convolver\Convolver>convolverCMD.exe 0 1 0 c:\temp\convolver\1fs.wav c:\temp\convolver\1fs30secs.wav c:\temp\convolver\out.wav
Using overlap-save convolution
Input file format: Stereo WAV (Microsoft) 48kHz 1440000 frames
Filter format: 2 Paths (Stereo to Stereo direct) 48kHz 2000000 taps Lag: 1000000 taps 21s Estimated gain: 35dB Peak gain: 94dB
Optimum attenuation: -45dB calculated in 3.77s
Using attenuation of 0dB (ie, scaling factor of 1)
Convolved and wrote 4000000 frames to c:\temp\convolver\out.wav in 0.665s (ie, at 1.3e+002 times real time)
C:\Program Files (x86)\Convolver\Convolver>convolverCMD.exe 0 1 0 c:\temp\convolver\2fs.wav c:\temp\convolver\2fs30secs.wav c:\temp\convolver\out.wav
Using overlap-save convolution
Input file format: 4-channel WAV (Microsoft) 48kHz 1440000 frames
Filter format: 4 Paths (4 channels to 4-channel direct) 48kHz 2000000 taps Lag: 1000000 taps 21s Estimated gain: 35dB Peak gain: 94dB
Optimum attenuation: -46dB calculated in 7.38s
Using attenuation of 0dB (ie, scaling factor of 1)
Convolved and wrote 4000000 frames to c:\temp\convolver\out.wav in 1.35s (ie, at 62 times real time)
C:\Program Files (x86)\Convolver\Convolver>convolverCMD.exe 0 1 0 c:\temp\convolver\4fs.wav c:\temp\convolver\4fs30secs.wav c:\temp\convolver\out.wav
Using overlap-save convolution
Input file format: 8-channel WAV (Microsoft) 48kHz 1440000 frames
Filter format: 8 Paths (8 channels to 8-channel direct) 48kHz 2000000 taps Lag: 1000000 taps 21s Estimated gain: 35dB Peak gain: 94dB
Optimum attenuation: -46dB calculated in 14.9s
Using attenuation of 0dB (ie, scaling factor of 1)
Convolved and wrote 4000000 frames to c:\temp\convolver\out.wav in 2.94s (ie, at 28 times real time)
C:\Program Files (x86)\Convolver\Convolver>convolverCMD.exe 0 1 0 c:\temp\convolver\8fs.wav c:\temp\convolver\8fs30secs.wav c:\temp\convolver\out.wav
Using overlap-save convolution
Input file format: 16-channel WAV (Microsoft) 48kHz 1440000 frames
Filter format: 16 Paths (16 channels to 16-channel direct) 48kHz 2000000 taps Lag: 1000000 taps 21s Estimated gain: 35dB Peak gain: 94dB
Standard exception: bad allocation