Can a FFT be decomposed into multiple parallel channels?

Question

Suppose I have a data stream that is sampled at a certain frequency, i.e. 1 GHz. I now want to perform a FFT on that data stream but since the hardware I have is limited to a maximum clock frequency of 250 MHz the data arrives in packets of 4. In other words, each clock-cycle, 4 different samples arrive. Is it possible to use 4 parallel FFT's to accomplish this?

Now I know that nothing can stop me from simply doing that, so my question is if I loose any precision in frequency? Each FFT only operates on 250 MHz so my fear is that the highest frequency that I can safely measure w.r.t. to Shannons's theorem is only 125 MHz compared to 500 MHz

Does this answer your question? Algorithms for computing FFT in parallel — Dan Boschen, Jan 05 '22 at 19:18

Marcus Müller · Accepted Answer · 2022-01-05T17:04:56.937

4

So, decomposing a large DFT into smaller DFTs, where some computations are redudant, is exactly the trick the Cooley-Tukey algorithm uses. So, your FFT is already doing that internally, and I doubt you'll write a better FFT. 4 parallel FFTs aren't sufficient - you need a butterfly structure afterwards to combine them again. (your considerations with the four 250 MHz streams¹ spots the fact that your original signal has a bandwidth of 1 GHz, so if you just do an FFT on every fourth sample, you'll have aliasing; the butterfly effectively is a mathematical way to cancel that aliasing using the info from the other three FFTs.)

You might want to read the Radix-2 section of the wikipedia page on the Cooley-Tukey algorithm.

¹ what's wrong about your consideration is that your Nyquist bandwidth == sampling rate for all practical purposes, since the DFT deals with complex signals, not only real signals.

edited Jan 05 '22 at 17:04

answered Jan 05 '22 at 16:58

Marcus Müller

30,525
4
34
58

Thank you for the answer! I should have elaborated more; I currently have an FFT implementation that I can use, it just doesn‘t allow me to enter samples parallel. So if I understand you correctly, what I need to do is add a final butterfly-stage at the end of the chain? Could I also instead simply double the length and pad with zeroes or some other trickery like that? – Schottky Jan 05 '22 at 18:09
@Marcua- but is the OP’s data real or complex? If the data is real and there indeed is only one real channel per ADC then I think this point is mute since the resulting data will be complex conjugate symmetric and the OP’s consideration is not wrong. Or am I wrong? – Dan Boschen Jan 05 '22 at 21:13
1

you're not wrong! Let me think about this, I think the cancellation becomes easier, because it just becomes swapping real and imaginary part, but you're rightt. – Marcus Müller Jan 05 '22 at 22:22

Schottky · Answer 2 · 2022-01-09T09:30:25.983

I have assembled a little python script that displays @Marcus's solution. I wanted to share this as proof of concept on how this works for anyone that is stuck on the same problem that I was.

The reason I need to use four separate channels is because I am working on an FPGA implementation that includes an FFT. I cannot choose an implementation that processes data in parallel but the data that I get always arrives in packets of 4.

Any improvements to this method are appreciated

from numpy.fft import fft, fftshift
import numpy as np
import matplotlib.pyplot as plt
size = 2048
n = np.arange(size)
x = np.e ** (-2j * np.pi * 0.3 * n) # some arbitrary data
N = 4 # the amount of channels
x_i = [x[idx::N] for idx in range(N)] # The data, rearranged for the channels
def bf_2(even, odd):
    """
    2-point butterfly implementation
    """
    points = len(even) + len(odd)
    def w(k):
        return np.e ** (-2j * np.pi * k / points)
    twiddles = w(np.arange(points/2))
    odd = odd * twiddles
    even_ret = even + odd
    odd_ret = even - odd
    return np.concatenate([even_ret, odd_ret])
def bf_4(x0, x1, x2, x3):
    """
    four-point butterfly implementation
    """
    res_1 = bf_2(x0, x2) # equivalent to bit-reverse ordering
    res_2 = bf_2(x1, x3) # equivalent to bit-reverse ordering
    return bf_2(res_1, res_2)
Compute four separate FFT's
A_i = map(fft, x_i)
And re-assemble them using the butterfly-structure
x_restored = bf_4(*A_i)
mag = np.abs(x_restored)
f = np.linspace(-0.5, 0.5, len(mag))
plt.plot(f, fftshift(mag))
plt.show()

Can a FFT be decomposed into multiple parallel channels?

2 Answers2

Compute four separate FFT's

And re-assemble them using the butterfly-structure