11

I have a project in which I'm trying to balance load between several computing devices. These devices are similar and are supposed to have a camera segment. Device captures an image and then processes the image.

For simplicity I want to consider the load as the number of the images waiting on the process queue, but this requires the images to be the same size and have the same specifications.

So my question is that do the images of the same camera have the same size? I know that compressing the images and converting them to .jpeg format, will probably change their size, but how about raw images of the same camera? Do raw images have the same size?

Pablo
  • 227
  • 2
  • 5
  • 2
    Speaking from the programming/computing side of this question, the image data that you're operating on will be as close to identically sized as makes no difference; once the images are decoded (from either raw or JPEG), the bitmap size will be the same, and the decoding steps are usually extremely fast. – chrylis -cautiouslyoptimistic- Jan 23 '19 at 09:05
  • @chrylis Thanks a lot, your comment was really helpful for me. – Pablo Jan 23 '19 at 13:10
  • 1
    If the images being take are of similar composition and under similar conditions and settings (lighting, ISO, shutter speed, etc). The images will be very close to the same size. If the pictures themselves vary a lot, so could the size. – JPhi1618 Jan 23 '19 at 18:43
  • Why do you "require" images be the same size? Just use the average expected file size, or if you're interested in worst case, use the largest expected size. – xiota Jan 24 '19 at 01:16
  • 2
    vtc b/c The OP isn't performing a task photographers would be expected to have any expertise in; the quesion is unanswerable without more information about the specific cameras involved; and OP could easily answer the question himself by taking and examining a few photos. – xiota Jan 24 '19 at 01:20
  • 1
    @xiota On the other hand, the basic question and its answers are certainly of practical value to photographers. That makes it on topic here. – Michael C Jan 24 '19 at 14:18
  • @xiota I have mentioned the reason in my post. I need to know if they are the same size, because I want to consider the load as the number of images waiting for the process, which requires images to to have similar sizes. About your second comment, unfortunately I'm not a photographer and all images I have captured are through my phone which doesn't provide the access to raw format of my photos, this is why I asked my question here. – Pablo Jan 24 '19 at 22:24
  • Are you talking about processing or bandwidth? For processing, why does size matter? The usual approach in computer science when specifics are not exactly known is to use the average case. In the long run, it will average out. – xiota Jan 24 '19 at 22:37
  • Just have a central queue. Every device has a local short queue (2-3 items). When only one item is left in the local queue, it refills from the central queue. Everyone stays busy. – xiota Jan 24 '19 at 22:41

7 Answers7

27

Many digital cameras use lossless compression with raw files. That means the size of raw files from the same camera is somewhat content dependent.

The more detail and different colors a scene contains, the larger the file will be. The more homogeneity a scene contains, the smaller the file will be. The degree of the differences will also be governed by differences in things such as noise in dark areas (noise usually adds to a file size by creating a greater number of unique brightness levels).

xiota
  • 26,951
  • 4
  • 39
  • 126
Michael C
  • 175,039
  • 10
  • 209
  • 561
  • 1
    Worth noting that Sonys had an infamous bug in their not-so-lossless compression, so many of us turn it off and write uncompressed raw, which is the same size every time. Converting those to DNG usually results in lossless 40-65% size reduction. – chrylis -cautiouslyoptimistic- Jan 23 '19 at 09:04
  • @chrylis There's a Sony specific answer below. Also, converting to DNG is an entirely different can of worms that is also highly dependent upon the codecs of the original raw files, what information they do or do not contain, and whether the end user wants/needs to use that portion of the information that is stripped from files when they are converted to DNG. That, IMHO, is a little much to go into in such a short and generic answer as this one (considering the OP did not specify a brand of camera/specific type of raw file). If you feel it is so vital, you could include it in your answer. – Michael C Jan 25 '19 at 11:52
27

A picture being worth a thousand spreadsheet cells, here is an histogram of the size of the RAW files from my camera for 2018 (EOS 70D, 20Mpx). Sizes are in 1000's of K (not really MB).enter image description here

For the mathematically inclined:

Average:    24538
Median:     24300   
Std dev.:    2119
xenoid
  • 21,297
  • 1
  • 28
  • 62
  • Perhaps you could post the mean and SD? It would be informative – Azor Ahai -him- Jan 22 '19 at 22:25
  • Added the numbers – xenoid Jan 22 '19 at 22:52
  • 2
    Thanks a lot. Would you please clarify about what the x-axis and y-axis show? I'm not sure about what these values exactly are. – Pablo Jan 22 '19 at 23:03
  • 3
    X is size ('24' is for pics between 24000K and 25000K) and Y is the number of pics in the bucket. – xenoid Jan 22 '19 at 23:25
  • 4
    1000s of KB = MB, 1024s of KiB -> MiB. Common operating systems report 1,000 byte KB and 1,000,000 byte MB, except for RAM, as is standard. – Dietrich Epp Jan 23 '19 at 14:08
  • 5
    If you want to split hairs, these are 1000's of KiB :) The shape of the histogram would remain the same... – xenoid Jan 23 '19 at 14:55
  • 1
    I would also include a copy of the smallest and largest photo that you took. – JonathanReez Jan 23 '19 at 23:07
  • thanks for posting numbers! it is interesting how little compression is done on RAWs – aaaaa says reinstate Monica Jan 24 '19 at 02:53
  • @DietrichEpp Good luck getting everyone to adopt that distinction on a worldwide scale! – Michael C Jan 24 '19 at 11:45
  • @aaaaaa Even more interesting is the difference in compression between brighter images and dimmer images. Bright images tend to have fewer unique brightness levels measured by the sensor than darker images do, even though shooting dimmer scenes at high ISO reduces the number of potential different brightness levels. (Because multiplying the amplification reduces the maximum value to 1/2, 1/4, 1/8, etc. of full well capacity. – Michael C Jan 24 '19 at 11:46
  • @MichaelC Well, my smaller image is roughly 2Mpx of moon in 18Mpx of black sky, while the biggest one is an overexposed shot of a river at low tide with white silt and whitening bushes.... – xenoid Jan 24 '19 at 12:53
  • @xenoid The moon in a sky exposed to be totally black is a special case because almost the entire frame is essentially pixels with a value of (0,0,0). – Michael C Jan 24 '19 at 12:58
  • @MichaelC: Thank you! I'm glad that you agree that it's important to standardize units. – Dietrich Epp Jan 24 '19 at 14:31
  • @DietrichEpp I hope you don't think I'm holding my breath waiting on it. – Michael C Jan 24 '19 at 16:00
  • @MichaelC: Of course not! It’s a slow process to get everyone used to the same standards and on the same page. It’s important work, but not fast by any means. – Dietrich Epp Jan 24 '19 at 17:27
  • From your distribution graph (almost a normal distribution or bell distribution ) your camera is compressing raws loosely. The reason of your distribution is most probably that you tend to shoot photos with an centered histogram, meaning that you don’t expose for highlights, whatever you usually photograph have a lot of 18% grey (in color luminance) and you don’t tend to take neither low or high key photos. A good use of Python anyways – abetancort Feb 03 '19 at 02:29
  • @abetancort No Python here, just a bash one-liner (something like du 2018*/**/*.CR2 | sort -n | cut -f 1) with the output fed to LibreOffice calc. – xenoid Feb 03 '19 at 08:24
4

There are two main types of compression methods:

  1. lossless compression
  2. lossy compression

As you mentioned, JPEG is a lossy compression method which uses some mathematical tricks to save data, therefore losing picture information resulting in quality loss.

Basically, if you save a picture and store the color information for every pixel without any encoding, then every picture would most likely be exactly the same size.

But as there exist lossless compression methods, you have the ability to save file size without losing any quality. The most basic example would be Run-length encoding where you can combine identical successive information and thus save the space you would need to store them one by one. For example you would store the information like "2 white, 3 black" instead of saying "white, white, black, black, black".

This results in pictures without much variance being compressed to relatively small file sizes, while this is not possible for those with a lot of variance in them.

This is why different raw pictures taken on the same camera will most likely result in different file sizes.

xiota
  • 26,951
  • 4
  • 39
  • 126
sLaiN
  • 41
  • 3
4

A little extra info: If the raw file includes a preview (they generally do) that's likely to be jpeg compressed and will cause a small variation in file size.

Checking some raw CR2 files I shot yesterday (I keep an old Canon 350D in my desk), 3 shots of essentially the same scene vary by about 3%. I was fiddling with the lighting and used a very black background so one has both blown highlights and (almost) pure black, both of which compress well even losslessly.

However in terms of load balancing you're probably fine: averaged over a sensible number of images the load will be sufficiently similar unless your system is right on the edge, and transfer- or decompression-limited.

Chris H
  • 3,812
  • 1
  • 15
  • 19
3

This may be camera-dependent, but for my Canon EOS 7D Mark II, different raw images are definitely not the same size:

# ls -l *.cr2
-rwx------ 1 tew tew 23868042 Jan 21 10:59 20190121105920-6996.cr2
-rwx------ 1 tew tew 24408037 Jan 21 11:07 20190121110757-7002.cr2
-rwx------ 1 tew tew 25928707 Jan 21 11:08 20190121110823-7003.cr2
-rwx------ 1 tew tew 23777211 Jan 21 11:08 20190121110852-7004.cr2
-rwx------ 1 tew tew 25369539 Jan 21 11:09 20190121110922-7005.cr2
-rwx------ 1 tew tew 22675822 Jan 21 11:11 20190121111113-7006.cr2
-rwx------ 1 tew tew 23377077 Jan 21 11:11 20190121111119-7007.cr2

They are all pretty close in size, but there's definitely some variance, which is primarily due to compression of the raw sensor data as well as the metadata and embedded JPG preview image.

twalberg
  • 5,138
  • 1
  • 12
  • 20
  • Thanks. Since I don't have enough knowledge in photography, I wanted to know if this difference in size is so high that I can't consider load as the number of images? Does "23868042" mean 23.8 Megabyte? – Pablo Jan 22 '19 at 18:56
  • 1
    Correct - the 7D II has a 20.2 megapixel sensor, and the resulting raw images are generally between 19 and 36 megabytes, given my current collection of photos... – twalberg Jan 22 '19 at 19:10
0

Now, in case you are also interested in less popular brands, here is how Sony handles RAWs.

Currently used RAW files (file extension ".ARW") come in 2 types: 8 bits per pixel (called "Compressed RAW") and 16 bits per pixel ("Uncompressed RAW"). Some cameras are limited to 8 bit, the high end cameras can write either type.

Consequently, all RAW files from a given camera are nearly the same size, equal to the number of megapixels (for 8 bits) or twice the megapixel count (for 16 bits). Actual file sizes fluctuate a bit because of the embedded JPEG preview but the RAW data itself is always constant size.

szulat
  • 5,059
  • 1
  • 20
  • 31
  • Do Sony cameras really reduce bit depth of compressed raw files? What's the point of having "raw" files with the same color depth as JPEG? – xiota Jan 25 '19 at 20:13
  • @xiota 8 bits of file data for each pixel but not a literal 8 bit image - the effective bit depth depends on the local contrast and can be between 11 and 7 bits and then there is gamma curve (similar to JPEG) stretching the output to 13 bits. artifacts introduced by this compression algorithm are invisible in typical images and Sony owners usually don't even know these files are not really RAW. it's not a bad algorithm but it's a shame that no APS-C camera from Sony can shoot true RAW - there is no option to switch to uncompressed RAW for the rare cases where it makes a difference. – szulat Jan 25 '19 at 20:39
  • @xiota Raw files do not have any color depth. They are monochromatic luminance values. – Michael C Jan 29 '19 at 03:43
  • They contain "monochromatic" luminance values of light that has passed through color filters, hence representing color. It may not contain complete color information for each pixel, but it's still color depth. – xiota Jan 29 '19 at 05:25
0

The straight forward answer to your question:

  1. Raw files from the same camera, shot at the same resolution, most probably won’t have similar size if you are using raws that are either lossless or loosely compressed.

  2. On the other hand, if they are not compressed, the size difference among them will be negligible for cameras with sensors with large pixel count (>=30 megapixel and >=12 bit) and mostly due to the embedded jpeg preview in the raw files (loosely compressed by definition).

  3. If when you shot uncompressed raws and the change in size among them is significant, your camera is compressing them and not telling you about it.
abetancort
  • 438
  • 2
  • 7