Rob Watts
Member of the Trade: Chord Electronics
- Joined
- Apr 1, 2014
- Posts
- 3,231
- Likes
- 13,652
This discussion has sent me rummaging and I came across: Efficient Convolution without Input/Output Delay
A motivation for this technique is room simulation which can involve many seconds worth of impulse response, resulting in a convolution using 10s or 100s of thousands taps.
Direct multiply-accumulate convolution is extremely expensive in terms of mathematical operations. It is very low in latency.
Ordinary FFT/inverse-FFT based techniques result in a huge delay between the input and output. The delay is reduced by working in blocks. More and smaller blocks result in more mathematical operations.
The paper presents a hybrid of multiply-accumulate with block-based FFT processing to substantially reduce the mathematical cost, while at the same time keeping a reasonable delay. Fig 6 is quite an eye-opener.
Is this the technique you've implemented for convolution, Rob? Or is this technique too heavy on internal bandwidth/memory to be applicable in an FPGA?
No I always use the direct form. Within an FPGA generally one is SRAM memory limited not multiplier limited (plenty of DSP's and fabric to make your own dsp cores). Moreover, if I need lower multiplier count, I use a folded FIR, which halves the number of multipliers but gives the identical result. The direct form absolutely guarantees perfect transient reconstruction accuracy too under all conditions.
With WTA filters the group delay is determined by the filter needs; you simply can't remove the delay without affecting the filter performance and hence SQ.
Rob