1

Suppose I have a data stream that is sampled at a certain frequency, i.e. 1 GHz. I now want to perform a FFT on that data stream but since the hardware I have is limited to a maximum clock frequency of 250 MHz the data arrives in packets of 4. In other words, each clock-cycle, 4 different samples arrive. Is it possible to use 4 parallel FFT's to accomplish this?

Now I know that nothing can stop me from simply doing that, so my question is if I loose any precision in frequency? Each FFT only operates on 250 MHz so my fear is that the highest frequency that I can safely measure w.r.t. to Shannons's theorem is only 125 MHz compared to 500 MHz

Schottky
  • 113
  • 4

2 Answers2

4

So, decomposing a large DFT into smaller DFTs, where some computations are redudant, is exactly the trick the Cooley-Tukey algorithm uses. So, your FFT is already doing that internally, and I doubt you'll write a better FFT. 4 parallel FFTs aren't sufficient - you need a butterfly structure afterwards to combine them again. (your considerations with the four 250 MHz streams¹ spots the fact that your original signal has a bandwidth of 1 GHz, so if you just do an FFT on every fourth sample, you'll have aliasing; the butterfly effectively is a mathematical way to cancel that aliasing using the info from the other three FFTs.)

You might want to read the Radix-2 section of the wikipedia page on the Cooley-Tukey algorithm.


¹ what's wrong about your consideration is that your Nyquist bandwidth == sampling rate for all practical purposes, since the DFT deals with complex signals, not only real signals.

Marcus Müller
  • 30,525
  • 4
  • 34
  • 58
  • Thank you for the answer! I should have elaborated more; I currently have an FFT implementation that I can use, it just doesn‘t allow me to enter samples parallel. So if I understand you correctly, what I need to do is add a final butterfly-stage at the end of the chain? Could I also instead simply double the length and pad with zeroes or some other trickery like that? – Schottky Jan 05 '22 at 18:09
  • @Marcua- but is the OP’s data real or complex? If the data is real and there indeed is only one real channel per ADC then I think this point is mute since the resulting data will be complex conjugate symmetric and the OP’s consideration is not wrong. Or am I wrong? – Dan Boschen Jan 05 '22 at 21:13
  • 1
    you're not wrong! Let me think about this, I think the cancellation becomes easier, because it just becomes swapping real and imaginary part, but you're rightt. – Marcus Müller Jan 05 '22 at 22:22
0

I have assembled a little python script that displays @Marcus's solution. I wanted to share this as proof of concept on how this works for anyone that is stuck on the same problem that I was.

The reason I need to use four separate channels is because I am working on an FPGA implementation that includes an FFT. I cannot choose an implementation that processes data in parallel but the data that I get always arrives in packets of 4.

Any improvements to this method are appreciated

from numpy.fft import fft, fftshift
import numpy as np
import matplotlib.pyplot as plt

size = 2048 n = np.arange(size) x = np.e ** (-2j * np.pi * 0.3 * n) # some arbitrary data N = 4 # the amount of channels x_i = [x[idx::N] for idx in range(N)] # The data, rearranged for the channels

def bf_2(even, odd): """ 2-point butterfly implementation """ points = len(even) + len(odd) def w(k): return np.e ** (-2j * np.pi * k / points) twiddles = w(np.arange(points/2)) odd = odd * twiddles even_ret = even + odd odd_ret = even - odd return np.concatenate([even_ret, odd_ret])

def bf_4(x0, x1, x2, x3): """ four-point butterfly implementation """ res_1 = bf_2(x0, x2) # equivalent to bit-reverse ordering res_2 = bf_2(x1, x3) # equivalent to bit-reverse ordering return bf_2(res_1, res_2)

Compute four separate FFT's

A_i = map(fft, x_i)

And re-assemble them using the butterfly-structure

x_restored = bf_4(*A_i)

mag = np.abs(x_restored) f = np.linspace(-0.5, 0.5, len(mag)) plt.plot(f, fftshift(mag)) plt.show()

Schottky
  • 113
  • 4