2

Is there somewhere an implementation available for the reconstruction of audio from a spectrogram? (e.g. based on this approach).

I could not find anything, but I think there should be an implementation somewhere. Where can I find it?

Thank you

Kevin Meier
  • 123
  • 4

1 Answers1

2

Here is an example in Matlab that I found some years ago. It has been turned to Python by one of us, apparently (STFT ISTFT Matlab Python).

function d = stft(x, f, w, h)
% D = stft(X, F, W, H)                            Short-time Fourier transform.
%   Returns some frames of short-term Fourier transform of x.  Each 
%   column of the result is one F-point fft; each successive frame is 
%   offset by H points until X is exhausted.  Data is hamm-windowed 
%   at W pts..
%   See also 'istft.m'.
% dpwe 1994may05.  Uses built-in 'fft'
% $Header: /homes/dpwe/public_html/resources/matlab/RCS/stft.m,v 1.1 2002/02/13 16:15:55 dpwe Exp $

s = length(x);

if rem(w, 2) == 0   % force window to be odd-len
  w = w + 1;
end

halflen = (w-1)/2;
halff = f/2;   % midpoint of win
acthalflen = min(halff, halflen);

halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen));
win = zeros(1, f);
win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen);
win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen);

c = 1;

% pre-allocate output array
d = zeros((1+f/2),1+fix((s-f)/h));

for b = 0:h:(s-f)
  u = win.*x((b+1):(b+f));
  t = fft(u);
  d(:,c) = t(1:(1+f/2))';
  c = c+1;
end;

and the inversion:

function x = istft(d, ftsize, w, h)
% X = istft(D, F, W, H)                   Inverse short-time Fourier transform.
%   Performs overlap-add resynthesis from the short-time Fourier transform 
%   data in D.  Each column of D is taken as the result of an F-point 
%   fft; each successive frame was offset by H points. Data is 
%   hamm-windowed at W pts..
% dpwe 1994may24.  Uses built-in 'ifft' etc.
% $Header: /homes/dpwe/public_html/resources/matlab/RCS/istft.m,v 1.1 2002/02/13 16:16:12 dpwe Exp $

s = size(d);
%if s(1) != (ftsize/2)+1
%  error('number of rows should be fftsize/2+1')
%end

cols = s(2);
xlen = ftsize + cols * (h);
x = zeros(1,xlen);

if rem(w, 2) == 0   % force window to be odd-len
  w = w + 1;
end

win = zeros(1, ftsize);

halff = ftsize/2;   % midpoint of win
halflen = (w-1)/2;
acthalflen = min(halff, halflen);

halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen));
win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen);
win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen);

for b = 0:h:(h*(cols-1))
  ft = d(:,1+b/h)';
  ft = [ft, conj(ft([((ftsize/2)):-1:2]))];
  px = real(ifft(ft));
  x((b+1):(b+ftsize)) = x((b+1):(b+ftsize))+px.*win;
end;

EDIT: if you want to start from STFT magnitude, then you run into the more complex problem of phase retrieval. A few paper to start from:

Laurent Duval
  • 31,850
  • 3
  • 33
  • 101
  • Is this really what the OP asked for? I understood the question so that he wants to reconstruct the signal from the spectrogram, which usually means the magnitude of the STFT. – Jazzmaniac Feb 24 '17 at 19:38