PSD to FFT for audio

Question

I'm trying to convert a noisy input to fft and extract the signal using a local average.

By the way, after the transformation of fft, it was found that the values of adjacent times of the same frequency and adjacent frequencies of the same time often have a negative correlation. However, when comparing the absolute values of fft, it was confirmed that there was a positive correlation. However, the question arises if it is possible to convert this absolute value back to fft. Is it possible?

It doesn't have to produce exactly the same fft, but humans shouldn't be able to tell the two sounds apart. For example, considering fft as a suitable floating-point matrix, it seems that it is difficult to distinguish if the rotation matrix is multiplied by a matrix.

import scipy.io.wavfile as wav
import scipy.signal
import numpy as np
def fft_to_psd_to_fft(fft):
    # fft to psd
    psd = abs(fft)
# psd to fft
... # what I want to know

return fft


def rotation_transform(fft):
    theta = np.random.randn()
    rot_fft = np.einsum("FTC,Cc->FTc", np.stack([fft.real, fft.imag], axis = -1), np.asarray(
        [[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]
    ))
    return rot_fft[..., 0] + 1j * rot_fft[..., 1]
wav to fft
samplerate, samples = wav.read(file_input_path)
frequencies, times, spectrogram = scipy.signal.stft(samples, samplerate, nperseg=511)
what I want to know
spectrogram = fft_to_psd_to_fft(spectrogram)
example case of using rotation matrix
spectrogram = rotation_transform(spectrogram)
fft to wav
r_times, r_samples = scipy.signal.istft(spectrogram, samplerate)
data = np.clip(r_samples, -231, 231 - 1)
data = data.astype(np.int32)
wav.write(file_output_path, samplerate, data)

your PSD (power spectral density) is, mathematically strictly speaking, an amplitude, not a power spectrum: Power is (proportional to) magnitude squared, not just maginitude. — Marcus Müller, May 28 '22 at 11:25
This addresses what you want to do. Bad news is that taking the absolute simply erases all the phase information. It's gone and cannot be recovered. There's more or less great algorithms to estimate the phase from context (in the case of the question I've linked to, the overlap between multiple abs(FFT)s), but "not noticeable by humans" is probably a bit too high of a requirement. I'd aim for "voice can still be understood"… — Marcus Müller, May 28 '22 at 11:28
Unfortunately I've used matlab before, but now I don't understand. However, this is exactly what I want to know. I knew it would lose information by replacing a complex number with a real number. However, I learned that there is not yet any good conversion function to inversely convert amplitude to fft. Thank you. — tsp, May 28 '22 at 14:20
that's not what I wrote. There is information loss, but there is good methods to estimate based on context. you should really read the answer, and the three papers linked from the answer, that I linked to. — Marcus Müller, May 28 '22 at 14:25

PSD to FFT for audio

wav to fft

what I want to know

example case of using rotation matrix

fft to wav

0 Answers0