22

I want to implement the Fast Cosine Transform. I read on wikipedia, that there is a fast version of the DCT which is similarly computed to the FFT. I tried to read the cited Makhoul* paper, for the FTPACK and FFTW implementations that are also used in Scipy, but I were not able to extract the actually algorithm. This is what I have so far:

FFT code:

def fft(x):
    if x.size ==1:
        return x
    N = x.size
    x0 = my_fft(x[0:N:2])
    x1 = my_fft(x[0+1:N:2])
    k = numpy.arange(N/2)
    e = numpy.exp(-2j*numpy.pi*k/N)
    l = x0 + x1 * e
    r = x0 - x1 * e  
    return numpy.hstack([l,r])

DCT code:

def dct(x):
    k = 0
    N = x.size
    xk = numpy.zeros(N)
    for k in range(N):     
        for n in range(N):
            xn = x[n]
            xk[k] += xn*numpy.cos(numpy.pi/N*(n+1/2.0)*k)
    return xk 

FCT trial:

def my_fct(x):
    if x.size ==1:
        return x
    N = x.size
    x0 = my_fct(x[0:N:2]) # have to be set to zero?
    x1 = my_fct(x[0+1:N:2])
    k = numpy.arange(N/2)
    n = # ???
    c = numpy.cos(numpy.pi/N*(n+1/2.0)*k)
    l = x0 #???
    r = x0 #???
    return numpy.hstack([l,r])

*J. Makhoul, "A fast cosine transform in one and two dimensions," IEEE Trans. Acoust. Speech Sig. Proc. 28 (1), 27-34 (1980).

jojeck
  • 11,107
  • 6
  • 38
  • 74
Framester
  • 321
  • 1
  • 2
  • 4

5 Answers5

31

I've been reading about this and there are multiple ways to do it, using different size N. My Matlab is rusty, so here they are in Python (N is length of input signal x, k is arange(N) = $[0, 1, 2, ..., N-1]$):

Type 2 DCT using 4N FFT and no shifts

Signal [a, b, c, d] becomes

[0, a, 0, b, 0, c, 0, d, 0, d, 0, c, 0, b, 0, a].

Then take the FFT to get the spectrum

[A, B, C, D, 0, -D, -C, -B, -A, -B, -C, -D, 0, D, C, B]

then throw away everything but the first [A, B, C, D], and you're done:

u = zeros(4 * N)
u[1:2*N:2] = x
u[2*N+1::2] = x[::-1]

U = fft(u)[:N]
return U.real

Type 2 DCT using 2N FFT mirrored (Makhoul)

[a, b, c, d] becomes [a, b, c, d, d, c, b, a]. Take the FFT of that to get [A, B, C, D, 0, D*, C*, B*], then throw away everything but [A, B, C, D] and multiply it by $e^{-j \pi k \over 2 N}$ (half-sample shift) to get the DCT:

y = empty(2*N)
y[:N] = x
y[N:] = x[::-1]

Y = fft(y)[:N]

Y *= exp(-1j*pi*k/(2*N))
return Y.real

Type 2 DCT using 2N FFT padded (Makhoul)

[a, b, c, d] becomes [a, b, c, d, 0, 0, 0, 0]. Take the FFT of that to get [A, B, C, D, E, D*, C*, B*], then throw away everything but [A, B, C, D] and multiply it by $2e^{-j \pi k \over 2 N}$ to get the DCT:

y = zeros(2*N)
y[:N] = x

Y = fft(y)[:N]

Y *= 2 * exp(-1j*pi*k/(2*N))
return Y.real

Type 2 DCT using N FFT (Makhoul)

Signal [a, b, c, d, e, f] becomes [a, c, e, f, d, b], then take the FFT to get [A, B, C, D, C*, B*]. Multiply by $2 e^{-j \pi k \over 2N}$ and then take the real part:

v = empty_like(x)
v[:(N-1)//2+1] = x[::2]

if N % 2: # odd length
    v[(N-1)//2+1:] = x[-2::-2]
else: # even length
    v[(N-1)//2+1:] = x[::-2]

V = fft(v)

V *= 2 * exp(-1j*pi*k/(2*N))
return V.real

On my machine, these are all roughly the same speed, since generating exp(-1j*pi*k/(2*N)) takes longer than the FFT. :D

In [99]: timeit dct2_4nfft(a)
10 loops, best of 3: 23.6 ms per loop

In [100]: timeit dct2_2nfft_1(a)
10 loops, best of 3: 20.1 ms per loop

In [101]: timeit dct2_2nfft_2(a)
10 loops, best of 3: 20.8 ms per loop

In [102]: timeit dct2_nfft(a)
100 loops, best of 3: 16.4 ms per loop

In [103]: timeit scipy.fftpack.dct(a, 2)
100 loops, best of 3: 3 ms per loop
endolith
  • 15,759
  • 8
  • 67
  • 118
  • 3
    Great answer, helped with my implementation a lot! Additional note: The last method "Type 2 DCT using N FFT" still works properly if the signal length is odd; the last element moves to the middle element. I have verified the math and code for this fact. – Nayuki Jul 04 '17 at 23:55
  • 1
    @Nayuki Are you generating exp(-1j*pi*k/(2*N)) or is there a shortcut to that step? – endolith Jul 06 '17 at 17:43
  • I am generating exp(-1j*pi*k/(2*N)) in my code, because a quarter-sample shift is necessary to make the DCT-to-DFT mapping work. What makes you ask? – Nayuki Jul 07 '17 at 06:36
  • Hi, how would this work for the Type III DCT, to compute the inverse of the DCT-II? – Jack H Jan 25 '18 at 20:55
  • Type 2 DCT using 2N FFT mirrored (Makhoul) is the way to go for 2D. – Eduardo Reis Sep 11 '20 at 21:53
  • Reading this again, on the Type 2 DCT using N FFT (Makhoul) we have [a, b, c, d, e, f] on the input, but the output there is no e, f samples. In other words, the input is size 6, and the output is size 4. Where e, f would como from? – Eduardo Reis Feb 04 '21 at 15:02
  • @EduardoReis I think it's for real input only – endolith Feb 04 '21 at 16:16
  • 2
    Just noticed that [A, B, C, D, C*, B*] is representing the output of the FFT, and not the output of the DCT. That was the root of the entire confusion in my thoughts. – Eduardo Reis Feb 04 '21 at 16:31
  • I keep coming back to this post, really useful! BTW What was the size of a when timing the different approaches??? – Eduardo Reis Apr 22 '21 at 12:30
  • @EduardoReis I don't know what a was, you'll have to run your own tests :) – endolith Apr 22 '21 at 14:52
  • I wonder if you could consider answering this related question calling for a step-by-step "manual" calculation of a single coefficient of a 2 x 2 matrix to match the Makhoul Python formula. Thanks a lot in advance. – Antoni Parellada May 05 '21 at 18:45
10

I'm not sure what method the paper you referenced uses, but I have derived this before and here is what I ended up with: (Assuming $x(n)$ is the input)

let

$y(n) = \Bigl\lbrace \begin{array}{ll} x(n), & n=0,1,...,N-1\\ x(2N - 1 - n), & n=N,N+1,...,2N-1 \end{array}$

The DCT is then given by

$C(k) = \mathrm{Re}\Bigl\lbrace e^{-j\pi \frac{k}{2N}} FFT\lbrace y(n)\rbrace\Bigr\rbrace$

So you basically create the $2N$ length sequence $y(n)$ where the first half is $x(n)$ and the second half is $x(n)$ reversed. Then just take the FFT and multiply that vector by a phase ramp. Finally take only the real part and you have the DCT.

Here's the code in MATLAB.

function C = fdct(x)
    N = length(x);
    y = zeros(1,2*N);
    y(1:N) = x;
    y(N+1:2*N) = fliplr(x);
    Y = fft(y);
    k=0:N-1;
    C = real(exp(-j.* pi.*k./(2*N)).*Y(1:N));

Edit:

Note: The DCT formula this is using is:

$C(k) = 2\sum_{n=0}^{N-1} x(n) \cos\left(\frac{\pi k}{2N}(2n+1) \right)$

There are several ways of scaling the summation so it may not match up exactly with other implementations. For instance, MATLAB uses:

$C(k) = w(k)\sum_{n=0}^{N-1} x(n) \cos\left(\frac{\pi k}{2N}(2n+1) \right)$

where $w(0) = \sqrt\frac{1}{N}$ and $w(1...N-1) = \sqrt{\frac{2}{N}}$

You can account for this by properly scaling the output.

Jason B
  • 741
  • 4
  • 7
  • 1
    y(n) is supposed to be N-length, not 2N-length. That's how you get the 4x computation speed, by calculating N-length DCT from N-length signal instead of 2N FFT from 2N signal. http://fourier.eng.hmc.edu/e161/lectures/dct/node2.html http://www-ee.uta.edu/dip/Courses/EE5355/Discrete%20class%201.pdf – endolith Aug 29 '13 at 18:57
2

For true scientific computing, the amount of memory usage is important too. Therefore the N point FFT is more attractive to me. This is only possible due to Hermitian Symmetry of the signal. The reference Makhoul is given here. And actually has the algorithm for calculating DCT and IDCT or DCT10 and DCT01.
http://ieeexplore.ieee.org/abstract/document/1163351/

JimBamFeng
  • 121
  • 1
1

I wrote python numpy/scipy functions for DCT and DST, this can be usefull if you want to use it with autodiff libraries like pytorch that have differentiable FFT but not DCT and DST (yet).

import numpy as np
import scipy.fftpack

def dct_with_fft(x): N = x.size v = np.empty_like(x) v[:(N-1)//2+1] = x[::2]

if N % 2: # odd length
    v[(N-1)//2+1:] = x[-2::-2]
else: # even length
    v[(N-1)//2+1:] = x[::-2]

V = scipy.fftpack.fft(v)

k = np.arange(N)
V *= 2 * np.exp(-1j*np.pi*k/(2*N))
return V.real


def dst_with_fft(x): N = x.size v = np.empty_like(x) v[:(N-1)//2+1] = x[::2]

if N % 2: # odd length
    v[(N-1)//2+1:] = -x[-2::-2]
else: # even length
    v[(N-1)//2+1:] = -x[::-2]

V = scipy.fftpack.fft(1j*v)

k = np.arange(N)
V *= 2 * np.exp(-1j*np.pi*k/(2*N))
return V.imag[::-1]

N = 5 x = np.random.rand(N)

print(dct_with_fft(x)) print(scipy.fft.dct(x))

print(dst_with_fft(x)) print(scipy.fft.dst(x))

Loui Ty
  • 11
  • 1
0

Beautiful! "Type 2 DCT using N FFT (Makhoul)" with considering "DCT scaling" is my favorite.

%dct via fft N-point
N = 4;
n = (0:1:(N-1))
k = (0:1:(N-1))
a    = [1 2 3 4]
a_oe = [1 3 4 2]
Afft = fft(a_oe)
W = 1*exp(-1i*1*pi*(k)/(2*N))
for i = 1:1:N
    if i == 1
        W(i) = W(i)*sqrt(1/N);
    else
        W(i) = W(i)*sqrt(2/N);
    end
end

A = W .* Afft A = real(A) check = A - dct(a) %should be all zero if correct

Mou
  • 1
  • 2