3

Because in a typical digital camera, each pixel records only one color though use of a Bayer filter or similar mechanism to achieve color vision, the true resolution of a ## megapixel sensor is actually lower, even after interpolation is applied.

Does anyone have any data on what the true effective resolution is and how this is determined? I saw multiple posts online which said it's reduced by 1 / sqrt(2), but they offered no explanation of why this is the case. Assume optimal interpolation method (or a typical "good" method which is commonly used).

Edit Jan 23, 2020 to include some clarification: What is of in view is the true color resolution and also how 1 / sqrt(2) is obtained as the number.

g491
  • 153
  • 5

3 Answers3

2

I saw multiple posts online which said it's reduced by 1 / sqrt(2), but they offered no explanation of why this is the case.

This one is easy to explain. The typical Bayer tile has two identical green-filtered photosites and one instance of each of red- and blue-filtered photosites. The green-filtered ones are usually on the diagonal.

Suppose the horizontal (and vertical) distance between neighbor photosites is a. We have three lattices of identical photosites: two lattices with period 2 a (red and blue, lattice vectors horizontal and vertical in both of these), and one lattice with period a√2 (green, lattice vectors point along diagonals).

Now, suppose we have a camera without any antialiasing filters, and take a photo of a scene in perfect focus. If we are only interested in the green component, we can simply take our green-filtered photosites, rotate the raw image by 45°, and then (after correction for black level, nonlinearity etc.) we get our (rotated) photo with the resolution √2 smaller than that of the sensor itself.

Of course, in real life we are interested in the full color, so we want to use red and green components too. And in real-life images the values of red- and blue-filtered photosites are correlated with the values of the neighboring green-filtered ones. Good demosaicing algorithms can take this into account and yield even better resolution — up to native sensor resolution.

But this improvement is a gamble. You can easily get various artifacts lowering image quality down to 2 times smaller resolution than that of the unfiltered sensor. So in practice effective resolution depends on the scene and is between 1× and 2× the resolution of the unfiltered sensor, with a good guess indeed being 1/√2 × native resolution.

Ruslan
  • 560
  • 3
  • 17
  • Why wouldn't the green-only image be 1/2 the resolution instead of sqrt(2) since exactly half the photosites are green-filtered? – g491 Jan 23 '20 at 04:41
  • @g491 well, that depends on what you understand by resolution: if it's "pixels per inch", i.e. linear resolution, then no, it's the square root. If it's "pixels per square inch", i.e. resolution in area, then yes, you'll have 1/2. – Ruslan Jan 23 '20 at 05:54
  • Wouldn't it be 1/2 in both scenarios since every other pixel in either dimension is green? Sorry if there's something obvious I'm missing. – g491 Jan 23 '20 at 07:36
  • 1
    @g491 p²=p only when p=0 or p=1. Note that 1/2 isn't a solution. So you can't have both linear and areal resolutions have the same factor. That's one way to look at it. Another is to note that areal resolution is what you're citing, "how many pixels per unit square", which is indeed 2 a, while linear resolution is "how far apart are closest(!) pixels", which is √2 a. – Ruslan Jan 23 '20 at 08:32
  • How can a sensor w/ a colored filter array have LESS color accuracy/resolution than an unfiltered sensor? If we are talking about image resolution, the CFA causes a potential reduction in contrast (light loss) compared to an unfiltered sensor. If we are talking about color accuracy/resolution, then what are we comparing it to? A Foveon sensor, 3 chip video, the human eye? – Steven Kersting Jan 23 '20 at 14:30
  • @StevenKersting I was talking about spatial resolution, and compared it to that of an unfiltered sensor. In the worst case like e.g. monochromatically-lit scene (by a 440 nm "deep blue" light source, if we consider the particular sensitivity curves from the answer by Michael C), spatial resolution will be half (in each direction) of the resolution of an unfiltered sensor with the same pixel pitch. As for color accuracy, I'd compare it to the 3-chip camera, with the chips being the copies of a de-Bayered version of the same sensor. – Ruslan Jan 23 '20 at 14:37
  • @Ruslan, your answer was basically the same as mine in that the loss of spatial resolution is a monochrome (i.e. B&W, contrast) comparison; to which the OP replied he was concerned with "color resolution." I realize a 3 chip/beam splitter system has the greatest potential color accuracy/resolution; but is that a realistic comparison? It would seem to me that the best comparison would be to human vision as that is what all methods are trying to replicate in order to "accurately" reproduce the scene. – Steven Kersting Jan 23 '20 at 15:07
  • @StevenKersting resolution of human vision is the last thing you want to replicate. Do you really think that replicating the resolution like periphery-macula-fovea is useful for digital image capture? No, replication of human vision is only useful for color accuracy (spectral sensitivities of components, photopic vs skotopic luminance etc.), not image resolution. – Ruslan Jan 23 '20 at 15:11
  • @StevenKersting as for whether my answer is the same as yours — it's up to the OP to decide (and to ask for clarifications). The Question itself might have been poorly worded to be ambiguous, so it might need to be edited. – Ruslan Jan 23 '20 at 15:12
  • @Ruslan, I specifically referred to color accuracy/resolution – Steven Kersting Jan 23 '20 at 19:25
  • @Ruslan that makes sense on the sqrt(2) explanation - thanks – g491 Jan 23 '20 at 22:37
1

IF the filters in Bayer masks created three discrete color ranges in which any particular wavelength could only pass through a single filter, then the resolution would be 1/2 for the "green" filtered wavelengths and 1/4 for the "blue" and "red" filtered wavelengths.

IF the filters in Bayer masks created three discrete color ranges in which any particular wavelength could only pass through a single filter, then color reproduction that looks anything like what our eye/brain systems perceive would also be impossible.

This is because there is no such thing as "color" in wavelengths of light. Color is a perception constructed by an eye/brain system that detects certain wavelengths of light due to a chemical response in the retinas of those eyes. This perception of color is due to the brain comparing the differences in response to the same light by the three types of cones in human retinas. The response of the three types of cones in the human retina have a LOT of overlap, particularly in the 'M' (medium wavelengths) and 'L' (long wavelengths) cones.

enter image description here

Please note that our "red" cones are most sensitive to light at wavelengths which we typically call "yellow" rather than red. It is only in our trichromatic color reproductions systems (printing presses and electronic screens that use three "primary" colors to, hopefully, produce a similar response from our eye/brain systems) that "red" is sometimes a primary color

If the filters in a Bayer mask did not also allow this overlapping of the response curves of each of the three filter colors, then our cameras could not interpolate color information from the results in the same way that our brains create color from the overlapping response of our retinal cones to various wavelengths of light.

Typical response curves of a modern Bayer masked digital sensor:

enter image description here

Because of the way that the human eye/brain systems works, the range of wavelengths to which our "green" cones are most sensitive affect our perception of fine details/local contrast much more than the ranges of wavelengths to which our "blue" and "red (yellow)" cones are sensitive. Our best demosaicing algorithms take this into account, and the colors interpolated for each photosite are weighted to imitate the way our eye/brain systems do it, rather than just doing a simple "nearest neighbors" interpolation method.

Keep in mind that since the peak colors of each of the Bayers mask's filters are not the same colors as the three primary colors in our RGB color reproductions systems, all three values for each of the three channels in RGB color must be interpolated, not just the "other missing two" colors.

Compare the peak sensitivities of the bayer filters compared to the colors used in our RGB TV/monitor screens (a few also include a yellow channel)

enter image description here

So even though our camera sensors only have half of their photosites (a/k/a pixel wells) filtered with green, the information that those "green" photosites record has a greater effect on our perception of fine details than the information recorded by the "blue" and "red (yellow)" photosites do. When all of this is combined, an optimally interpolated image from a Bayer masked sensor produces the same perceived resolution as if we take a monochrome sensor with 1/√2 as many pixels, shoot three images with three different color filters centered on our RGB primary colors completely covering all of the sensor's photosites, and combine those values to produce RGB values for each photosite.

Michael C
  • 175,039
  • 10
  • 209
  • 561
-1

There is no reduction in "true resolution." It is optimally 1 point per pixel (photosite), 1 line per pixel row/column, or 1 line pair per two pixel rows/columns (nyquist). Although it seldom is in practice due to the lens, AA filters, etc..

Photosites do not record only one color. They are filtered to short wavelengths (blue), medium wavelengths (green), and long wavelengths (red); and there tends to be a certain amount of overlap between them. It should come as no surprise that this is exactly how our eyes work, because that is what the design is based on.

I suspect what you have read is noting a reduction in the ability to resolve a B&W scene (test target) compared to an unfiltered sensor array. It's not necessarily a reduction in the ability to resolve the target, but rather a reduced ability to resolve it with the same amount of contrast. And that is due to the light rejection (absorption) the filter array causes. I.e. a lower contrast result is a lower MTF50 (50% contrast) rating.

Steven Kersting
  • 17,087
  • 1
  • 11
  • 33
  • I should have clarified that I'm referring to reduction in full color resolution because each piece of R/G/B spectrum isn't sampled at every photo site. The interpolation that's performed to make it seem like that actually was the case is only able so good and results in an effective full color resolution drop that's unrelated to the contrast. – g491 Jan 22 '20 at 21:00
  • As I said, it isn't really true that each pixel only sees red, green, or blue. I found this question/answer that are basically the same: https://photo.stackexchange.com/questions/87528/how-much-light-and-resolution-is-lost-to-color-filter-arrays – Steven Kersting Jan 22 '20 at 21:16
  • @StevenKersting, No matter how you slice it, the fact is, a camera that uses a Bayer sensor only records one third as many bits of information as a hypothetical, three-chip camera that specifies the same pixel width, pixel height, and bit-depth. OP effectively is asking how the image quality from the Bayer-filtered camera compares with the image quality from the three-chip camera. – Solomon Slow Jan 22 '20 at 22:04
  • 1
    @SolomonSlow, I read it as "compared to an unfiltered sensor." And there is no bit depth at the sensor. 3 chip video cameras were used because the sensors were tiny so using 3 collected more/lost less light; and because the small sensors have greater DOF... but they are mostly not used these days. At best you could say a bayer array results in a potential reduction in color accuracy, but not a reduction in image resolution. – Steven Kersting Jan 23 '20 at 14:11
  • Re, "there is no bit depth at the sensor." I don't understand what you mean by that. Each light sensitive element in a sensor (what you called "photosites.") measures the amount of light that fell on it while the shutter (real or simulated) was open. The analog values captured by those photosites must be converted to digital form before the camera's CPU can store them or process them in any way. Each of those analog values is represented by a number that fits into so-many bits. (In my own camera, it's 14 bits.) That's what I meant when I said "bit depth." What did you mean? – Solomon Slow Jan 23 '20 at 14:21
  • Re, "3-chip" camera: I am fully aware that a 3-chip camera would be unwieldy and expensive and a total failure in today's market. That's why I said, "hypothetical three-chip camera." – Solomon Slow Jan 23 '20 at 14:23
  • @SolomonSlow, you are mostly correct... a photon discharges an electron and that voltage is converted to a digital value by the ADC. But it is the range of values that determines the bit depth... at low SNR levels (high ISO) 8bit may be more than enough; and only at the highest levels is 14bit ever required by any camera (few ever reach 14bit, but many exceed 12). The ADC conversion accuracy (12-14bit) and the file format (16bit raw) simply cannot cause a reduction from what the sensor generated... I.e. the "bit depth" is scene dependent, not really sensor dependent (other than max potential). – Steven Kersting Jan 23 '20 at 14:50
  • @SolomonSlow although 3-chip cameras are not too common, they do actually exist and are (were?) available (for whatever crazy price that might be). – Ruslan Jan 23 '20 at 15:19
  • @StevenKersting you seem to be very confident in the fact that CFA can't give real reduction in image resolution. But there do exist spatial artifacts resulting from demosaicing, e.g. zippering. – Ruslan Jan 23 '20 at 15:22
  • @Ruslan, yes, there can be demosaicing artifacts, but I would say that is more an issue with the algorithm (other than moire). And at low SNR levels the sensor's ability to distinguish low contrast details can be reduced; and the CFA causes a loss of light which reduces the SNR per pixel. So I think it's reasonable to state that the CFA can contribute to a loss of spatial resolution... but IDT it's a simple "does cause." – Steven Kersting Jan 23 '20 at 20:04