14

I have a basic implementation knowledge of the 2D 8x8 DCT used in image & video compression. Whilst reading about Principle Component Analysis, I can see a lot of similarity, albeit PCA is clearly more generic. When I've read about DCT previously it was always presented in relation to DFT. So my question is how can the DCT be derived from a PCA perspective? (even a hand-wavey explanation is sufficient)

Many thanks

trican
  • 499
  • 1
  • 4
  • 10

1 Answers1

21

The main difference between DCT and PCA (more precisely, representing a dataset in the basis formed by the eigenvectors of its correlation matrix - also known as the Karhunen Loeve Transform) is that the PCA must be defined with respect to a given dataset (from which the correlation matrix is estimated), while the DCT is "absolute" and is only defined by the input size. This makes the PCA an "adaptive" transform, while the DCT is data-independent.

One might wonder why the PCA is not used more often in image or audio compression, because of its adaptivity. There are two reasons :

  1. Imagine an encoder computing a PCA of a dataset and encoding the coefficients. To reconstruct the dataset, the decoder will need not only the coefficients themselves, but also the transform matrix (it depends on the data, which it does not have access to!). The DCT or any other data-independent transform might be less efficient in removing statistical dependencies in the input data, but the transform matrix is known in advance by both the coder and decoder without the need for transmitting it. A "good enough" transform which requires little side information is sometimes better than an optimal transform which requires an extra load of side information...

  2. Take a large collection of $N$ 8x8 tiles extracted from photos. Form a $N \times 64$ matrix with the luminosity of these tiles. Compute a PCA on this data, and plot the principal components that will be estimated. This is a very enlightening experiment! There is a very good chance that most of the higher-ranked eigenvectors will actually look like the kind of modulated sine-wave patterns of the DCT basis. This means that for a sufficiently large and generic set of image tiles, the DCT is a very good approximation of the eigenbasis. The same thing has also been verified for audio, where the eigenbasis for log- signal energy in mel-spaced frequency bands, estimated on a large volume of audio recordings, is close to the DCT basis (hence the use of DCT as a decorrelation transform when computing MFCC).

pichenettes
  • 19,413
  • 1
  • 50
  • 69
  • 1
    It is interesting, however might not a different basis set be constructed based on the 'usual' statistics of images to begin with, and those used instead of DCT? I imagine such a basis would not be as good as PCA, but better then DCT no? – Spacey Feb 15 '13 at 14:41
  • @pichenettes - with regard to the DCT, what are the commonly seen images of increasing horizontal and vertical frequency (i.e. http://goo.gl/XLMt5)? Is it a image representation of the DCT basis functions? If that is the case, if I calculated the PCA / eigenvectors from the covariance matrix of these images - would this essentially give me DCT coefficient matrix? – trican Feb 15 '13 at 15:15
  • Btw @pichenettes many thanks for your insightful answer. I was aware of point 1, but hadn't really considered point 2. – trican Feb 15 '13 at 15:17
  • 1
    @Mohammad: this is a good question, and I don't know the answer. I see advantages in using the DCT: easier to write specs (it's easier to print "our transform is this closed form function" than "our transform is this 64x64 matrix published in the annex"), no standardization committees meetings about which dataset to train the transform on, less lookup tables to embed in decoders' ROM, and probably "symmetries" in the transform matrix that make its hardware acceleration possible compared to a brutal 64x64 matrix multiplication - these advantages might outweigh marginal compression gains. – pichenettes Feb 15 '13 at 15:57
  • 1
    @trican: the image you linked to represents the 2-D DCT basis for 8x8 tiles. Each of the 64 small tiles is a basis function. If you take a large collection of 8x8 tiles from actual images and perform a PCA on the data, the eigenbasis you will get will be quite similar to that. – pichenettes Feb 15 '13 at 16:02
  • @pichenettes Yes, makes sense. The DCT is standard and 'simple'. I imagine however, that with (natural) images being sparse, there might be nice bases which occur very frequently, and where spareseness can be utilized. I am thinking out loud here, but yes, DCT is standard and good enough for those purposes, for now. – Spacey Feb 15 '13 at 16:51
  • thank you @pichnette. So the 2D DCT is only "quite similar" to PCA (and similarly with the Karhunen–Loève transform and hotelling transform I guess) in terms of approaching the optimality in a decorrelation sense. – trican Feb 15 '13 at 16:58
  • Even though this discussion has been going on for a long time, I have a question about exactly the above answer: as I understand it, the principal components are calculated on the centred data set. If I now want to project a new dataset onto the PCA basis vectors, I also have to first subtract the mean of the original dataset (and add it back after reconstruction). If the DCT is an approximation to the PCA of many images, wouldn't one also have to subtract a mean value here (presumably simply a mean grey), so that some pixels are positive and some negative? Wouldn't that improve the quality of – Mathias Leonhardt May 24 '22 at 11:03
  • @MathiasLeonhardt Welcome to SE.SP! This is not a discussion forum. If you have a new question, please ask it as such! Your "answer" here does not contain an answer to the original question which already has an accepted answer. – Peter K. May 24 '22 at 14:18