Is a radix-4 implementation faster than a equivalently well coded radix-2 FFT? And if so, why would it be faster?
5 Answers
It depends. Theoretically you can save a few multiplies with a radix-4 as radix-4 has a 1/4th the number of butterflies and 3 mpy + 8 adds per butterfly (if properly structured) and the radix 2 has 1 mpy + 2 adds per butterfly.
So in terms of multiplies it's a bit better, however there is higher complexity in terms of code structure, exception handling, coefficient management, register management, digit-reverse addressing, etc.
So it's only an advantage if the number of mpy is the limiting factor which for most hardware these days is not the case.
- 44,604
- 1
- 32
- 63
here! you can find an explanation of the main differences between the two algorithms for the FFT. At the end of the document there are some tables in which is it possible to note that, if the size of the data increases, the performance of the radix-4 fft are better than the radix-2.
- 140
- 5
-
1The link was broken for me but I believe it is supposed to go here – CMH12 Nov 10 '23 at 13:53
-
1@CMH12 link fixed :) – Leos313 Nov 13 '23 at 09:52
a simple way of looking at a radix-4 FFT is to think of one radix-4 butterfly as containing 4 radix-2 butterflies; 2 butterflies in one pass and 2 butterflies in the following pass. and the twiddle factors are the same except the complex twiddle factor for the the butterflies are off by a phase difference of $\frac{\pi}{2}$. but all that means is swapping $\sin(\cdot)$ with $\cos(\cdot)$ and swapping some plus and minus signs. so your radix-4 FFT alg only needs to read in the 4 complex values once, load in the complex twiddle once, do a bunch of arithmetic, and store the 4 results once. you do one radix-4 pass and you accomplish the same task as two radix-2 passes.
the net number of multiplications and additions i think are the same, but the radix-4 butterfly can be all done in the processor register bank (i think there are about 16 different floating-point registers and you need 8 for the real and imag parts of the 4 values, 2 registers for the sin and cosine twiddles, and maybe some other register or two for scratch). this is faster than doing it in memory.
- 20,661
- 4
- 38
- 76
The Radix-2 fft (fft-2) has $log_2 n$ layers, while the Radix-4 fft (fft-4) has $log_4 n=\frac{1}{2} log_2 n$ layers, which is half the layers of fft-2. In each layer, fft-2 has $\frac{n}{2}$ multiplication operations, and fft-4 has $\frac{3}{4}n$ operations. Like the fft-2, fft-4 takes the advantage that multipying $j$ and $-j$ just switch the real and imaginary parts and change the sign accordingly. In general, fft-2 has $\frac{1}{2} n log_2 n$ mult operations, fft-4 has $\frac{3}{8} n log_2 n$ mult operations. The total add operations are almost the same per layer. FFT-4 is slightly better.
- 11
- 1
In radix 2, the number of sample is in terms of power of 2 power but in radix 4 the number of samples belong is a power of 4.
-
2I would suggest explaining why that has an effect on algorithm speed, which is not obvious from the exponent value. – MBaz Jun 23 '18 at 00:12