0

I'm doing a Machine Learning project that incorporates many DSP elements in feature extraction. I am looking at the STFT of a piece of audio and trying to search for specific spectral features within it - for example, the exponential decay of the fundamental frequency of a kick drum - to determine what kind of sound the piece of audio is.

One thing I've looked into is finding the frequencies of all the local maxima of a DFT, unquantized. This includes being able to identify when the peak is "between" bins based on contextual information. A possible path to go down is to find all of the changes in direction within the spectrum, and then for N changes fit an $(N+1)$-degree polynomial using a regression, then finding all the zeroes of the derivative of this polynomial.

But does an algorithm for this already exist? It seems like something that might have been done before.

NmdMystery
  • 125
  • 5

1 Answers1

1

That sounds a lot like you shouldn't be using the DFT, but rather a parametric spectrum estimator.

For example, there's spectrum estimators that actually give you a function to evaluate at real-valued points rather than a vector of a sampled spectrum (your sentence "DFT, unquantized" doesn't make all that sense to me – the DFT is discrete, inherently, so what you describe as search algorithm only works if you do a massive DFT and then start somewhere in the and look for peaks from these points on).

My go-to example here is either

  • ESPRIT, which, given knowledge about the number of significant signals will just give you a vector of real-valued frequencies of peaks or
  • ROOT-MUSIC (dunno if that's the official term, it's what we used, based on [1]) which uses the pseudospectrum function generated by the MUSIC algorithm and finds the signals by looking for roots of a specifically crafted polynomial [1].

[1]: Barabell, Arthur J.: Improving the resolution performance of eigenstructure-based direction-finding algorithms. In: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’83., Band 8, April 1983, available online.

Marcus Müller
  • 30,525
  • 4
  • 34
  • 58
  • I will definitely look into those methods, but would upsampling the FFT also be sufficient, or are there problems with that method? I know you can upsample by zero-padding the time signal, and that will give me the peaks as estimated by an N-FFT but with additional resolution between bins. – NmdMystery Mar 29 '17 at 17:45
  • well aside from the additional computational load of doing the larger FFT and searching for a max in more bins, and the question of how much that'll actually increase your accuracy (which does incorporate your signal model)... – Marcus Müller Mar 29 '17 at 18:34
  • So, I think I should've spelled this out more clearly: Zero-padding doesn't actually increase resolution, in interpolates. See: https://dsp.stackexchange.com/questions/37927/what-happens-when-n-increases-in-n-point-dft/37931#37931 – Marcus Müller Mar 30 '17 at 12:58
  • Right, the goal here is just to find the maximum from the information I have so interpolation is fine. I just want to avoid discretized frequencies in my model. – NmdMystery Mar 30 '17 at 20:08
  • @NmdMystery that's contradicting the fact that you're using DFT – discrete fourier transform. If you can't have discretized frequencies, you must not use the DFT, padded or not. – Marcus Müller Mar 30 '17 at 23:18
  • Sorry I meant quantized. If the peak is between bins, I don't want to just choose the center frequency of the bin. I want to interpolate to some higher sampling rate of the spectrum to avoid significant stair-stepping of the peaks. – NmdMystery Mar 31 '17 at 02:23
  • Discrete is a special case of quantized. And I stand by: you're really not looking for a longer dft! – Marcus Müller Mar 31 '17 at 07:30