Inverse STFT causing unwanted noise spikes?

Question

I have an (audio) signal. I am applying STFT continuously and after applying a certain magnitude threshold I am doing inverse STFT to reconstruct the signal. But I am getting some unwanted mini noise spikes during the whole reconstructed audio.

In the time-freq domain, I just zeroed the value if it surpasses a certain threshold. I am also applying Hanning's window while doing STFT. What's wrong in there?

The idea was simply like :

for(int i=0; i<total_samples ; i+=hopsize){
    for(int j=0; j<frame.size(); j++){
        apply_hanning_window(frame[j]);
    }
stft_frame = apply_STFT(frame);
reconstructed_overlapping_frame = apply_INVERSE_STFT(stft_frame);

reconstructed_audio.push(reconstructed_overlapping_frame);

}
apply_INVERSE_STFT(stft_frame){
    Iterate_over_fullframe(){
        if(stft_frame[i].magnitude > threshold) stft_frame[i] = 0; // applying magnitude threshold
        // I can also apply frequency threshold here
    }
    filtered_frame = do_INVERSE_STFT(stft_frame):
    return filtered_frame;
}

Please make me know if I am doing anything wrong. Thanks

score 3 · Answer 1 · answered Feb 02 '22 at 20:23

3

You are creating time domain aliasing.

For a static filter you would need to zero pad and use overlap-add or overlap-save. For a time variant filter, you should decrease hop size and do square root windowing on both the forward and inverse transform.

In general, zeroing lines in the spectrum is a bad idea (see Why is it a bad idea to filter by zeroing out FFT bins?) and doing a time- variant filter in the frequency domain is complicated and will require some trade-offs and tweaking.

answered Feb 02 '22 at 20:23

Hilmar

44,604
1
32
63

It would be nice if you could elaborate a little. I understand zeroing frequency bins can create beats if the range is high (few sample sizes). Currently, I am applying the Hanning window before doing STFT and after inverse STFT I am just adding the overlapping frame values? I checked that if I remove the threshold filter then the reconstructed audio is fully fine that means windowing and reverse adding part is ok? I just didn't understand how reducing hop size can improve or why I have to specifically use root windowing. and where should I zero pad? Thanks for the reply – Nafiul Alam Fuji Feb 02 '22 at 21:06
my audio is mono 16-bit 16kz and my Hanning window is using 50% overlap with a frame size of 1024 samples and hop size of 512 samples. – Nafiul Alam Fuji Feb 02 '22 at 21:29
Time variant frequency domain filtering is mathematically very complicated. You need to be able to manage time domain aliasing which is a function of how aggressive your filters are and how fast they change over time. – Hilmar Feb 02 '22 at 21:39
as without thresholding the reconstructed audio is fine so the windowing function and inverse stft overlap-adding is working? – Nafiul Alam Fuji Feb 02 '22 at 21:42
This only works if you don't do anything with the data in the frequency domain. As soon as you modify the spectrum in any way, you will get time domain aliasing – Hilmar Feb 02 '22 at 21:51
is there any way I can detect and make up the aliasing in time domain or rather than zeroing I should try different approach in frequency domain? as I am already using window functions and frame samples numbers count as power of 2's, what more can I do or try to reduce this aliasing? – Nafiul Alam Fuji Feb 02 '22 at 21:56
Are you familiar with the concepts in here: https://ccrma.stanford.edu/~jos/sasp/sasp.html ? He covers most of the topics you will need to implement your application. – Hilmar Feb 02 '22 at 23:21
1

One more thing: your current audio is NOT "fine". It may sound fine, but if you compare input to output you will see significant differences. A 50% overlapping Hanning window is NOT perfect reconstruction. – Hilmar Feb 02 '22 at 23:29
So what would you suggest? much lower overlapping? the link you shared covers the detailed spectral analysis of an audio signal. So you are saying I have to read all of those first?(Thanks for the link). It took me 1 week+ to grasp the Fourier, now I am fearful about the spectral analysis link you just shared :3 – Nafiul Alam Fuji Feb 03 '22 at 02:07

hotpaw2 · Answer 2 · 2022-02-03T01:30:52.647

2

You are being bitten by circular convolution. The impulse response of zeroing a few bins is quite long, say M. The result of a linear convolution is of length N+M-1 ; and that does NOT fit in a length N IFFT result. So the end of the convolution circularly messes up the beginning of the result (and maybe the rest of it as well) of each frame instead, which produces discontinuities between frames.

Try overlap-add/save/scrap FFT/IFFT filtering instead. See this question for some details. And you will need to determine the length of the impulse response of your frequency domain changes.

edited Feb 03 '22 at 01:30

answered Feb 03 '22 at 01:25

hotpaw2

35,346
9
47
90

So, after doing sfft of 1024 real samples if I apply magnitude threshold (let's say I am zeroing all indexes greater than 512 as those frequencies are duplicate) and then do inverse FFT, the resulting array size will be 1024+512-1? currently, I am doing 50% overlapping of real frames (1024 samples), then applying window function and while reconstructing after thresholding I am just adding the overlapping values of inverse FFT, so this is not overlap-add? rather I have to add extra zeroes with real samples then apply the window function, apply FFT then again inverse FFT after thresholding? – Nafiul Alam Fuji Feb 03 '22 at 03:02
If you zero half your array, it is no longer Hemitian symmetric, and thus the IFFT no longer represents real data. I don't think you want an imaginary result. You also can't filter without a lot of zero (or scrap) padding before the first FFT. – hotpaw2 Feb 06 '22 at 06:09
suppose I am adding 256 number of 0's in front and back of my 1024 samples (so my actual samples is between 256 and 1536) and then doing fft. If I use static filter to scale/zero some bins and then do inverse fft will I get back my desired data from 256 to 1536 indexes? is this where zero padding comes to play? – Nafiul Alam Fuji Feb 06 '22 at 10:57
1

How to zero pad and do overlapping block convolution might be a good separate question, not just a comment. – hotpaw2 Feb 06 '22 at 18:22

Inverse STFT causing unwanted noise spikes?

2 Answers2