Downsampling an image by an integer factor

Question

When downsampling an image by an integer factor $n$, the obvious method is to set the pixels of the output image to the average of the corresponding $n \times n$ blocks in the input image.

I remember vaguely having read somewhere that this method is not optimal (sorry I don't remember any details)

Is it true that there is a better method (and if so, where does the above method fail, although it seems "obviously" correct)? I do not know a lot about signal processing, this question just interests me.

if you don't have signal processing background, the laymans explanation is: There are better methods for downsampling. Your downsampling algorithm will technically make the image pixel size smaller by N BUT the algorithm described above will be greatly degraded in quality when compared to better downsampling algorithms. — Trevor Boyd Smith, Sep 22 '11 at 15:00

score 17 · Answer 1 · edited Sep 19 '11 at 03:44

Downsampling an image reduces the number of samples that can represent the signal. In terms of frequency domain, when a signal is downsampled, the high-frequency portion of the signal will be aliased with the low-frequency portion. When applied to image processing, the desired outcome is to preserve only the low-frequency portion. In order to do this, the original image needs to be preprocessed (alias-filtered) to remove the high-frequency portion so that aliasing will not occur.

The optimal digital filter to remove the high-frequency portion (with the sharpest cutoff) is sinc function. The reason is that the Sinc function's frequency domain representation is a nearly constant 1 over the entire low-frequency region, and nearly constant 0 over the entire high-frequency region.

$$\text{sinc}(x)=\frac{\sin(\pi x)}{\pi x}$$

The impulse response of the sinc filter is infinite. Lanczos filter is a modified sinc filter which attenuates the sinc coefficients and truncates them once the values drop to insignificance.

However, being optimal in frequency domain does not imply being optimal in human eyes. There are upsampling and downsampling methods that do not obey linear transformations but produce better results than linear ones.

With regard to the statement about $n \times n$, it is important to keep in mind that during image sampling, the choice of coordinates correspondence between the high-resolution signal and the low-resolution signal is not arbitrary, nor is it sufficient to align them to the same origin (0) on the real or discrete number line.

The minimum requirement in the coordinates correspondence is that

Upsampling an image containing arbitrary random values by an integer factor, then downsampling by the same integer factor, should result in the same image with minimal change numerically.
Upsampling/downsampling an image consisting of just one uniform value, followed by the opposite operation, should result in an image consisting of the same value uniformly, with minimal numerical deviations.
Repeatedly applying pairs of upsampling/downsampling should minimize the shift in image content as much as possible.

What do you mean by the transform of the sinc function being "nearly" 1 and 0 in the respective regions? — , Oct 06 '11 at 20:19
@Tim: Gibbs phenomenon near the cutoff frequency of the sinc filter. — rwong, Oct 06 '11 at 21:25
Sinc resampling only makes sense on signals which are perceived/processed in the frequency domain, such as audio. Images are perceived, at least approximately, in the spacial domain (this is debatable; it's possible that some perception of repeated patterns is in the frequency domain) and any frequency-domain-based transformation produces nasty distortions (ringing, etc.) in the spacial domain. Basically, any convolution/linear operation with any negative coefficients will produce nasty artifacts, and any with all nonnegative coefficients will produce blurring. — R.. GitHub STOP HELPING ICE, Jan 12 '14 at 22:29

score 6 · Answer 2 · answered Sep 19 '11 at 14:08

You're right that area-averaging is pretty close to the "most correct" you can get, but the problem is inconsistent behavior when downscaling a sharp width-N line by a factor of N. If the location of the line is aligned modulo N, you'll get a sharp 1-pixel line, but if it's roughly N/2 mod N, you'll get a very blurred line (2 pixels wide at half intensity). This can look very bad, and with nonlinear gamma, it will even result in differences in intensity. (Ideally all resampling has to take place with gamma corrected for a linear intensity scale, but in practice almost nobody does that because it's really expensive.)

If you want to improve on this, you need to first accept the fact that it's impossible to reduce blurring in some cases, so the only way to get uniform output involves increasing the blurring. The ideal way is to use a gaussian kernel with radius larger than N/2, rather than a step function, as the convolution function with the source image. A cheap way way to tack on an approximation, however, if you already have your N-by-N area averaging implementation, is just to apply a (1/4,1/2,1/4) blur convolution to the resulting downsampled image.

Downsampling an image by an integer factor

2 Answers2