2

I used Google Takeout to download all of my photos from Google Photos and realized that Google compresses these images 2-3x to give me free storage. This is great, but a lot of my images are stored at original size. Unfortunately, both 'google-compressed' images and 'original-high quality' images are stored with the jpg extension. I am wondering how to figure out which is which? Does Google add metadata tags to identify if they have been recompressed?

xiota
  • 26,951
  • 4
  • 39
  • 126
nitin
  • 123
  • 3

2 Answers2

2

Google does add some tags to images that it recompresses, including images stored as "High Quality", which may be downsampled to 16MP or less. Images stored at "Original" quality appear to be kept unaltered. At this time, the following tags appear to be added or altered:

  • XMPToolkit = XMP Core 5.5.0
  • ImageUniqueID

The following command may list images that have been altered by Google:

exiftool -if '($XMPToolkit =~ /^XMP\ Core\ [\.\d]+$/) \
   and ($ImageUniqueID)' -s2 -q -FilePath -ext jpg .

Some caveates

  • The command may include images not altered by Google. Other programs may use the same, or similar, XMPToolkit strings that Google does, especially if they happen to use the same image-writing library that Google does. For instance, GIMP uses "XMP Core 4.4.0-Exiv2". Photoshop uses "Adobe XMP Core 5.3-c011 66.145661, 2012/02/06-14:56:27" (as noted by StarGeek).

  • The command may miss images altered by Google. This depends on how Google has changed their image processing over the years. For instance, it's not known (to me) when Google started using the ImageUniqueID tag. So some images may not have it set.

  • There are other tags that may be altered by Google, but they are not reliable to check because many JPEG images have them, including those straight from my camera (FujiFilm X-T20):

    • JPEGDigest
    • YCbCrSubSampling

Other options

You may also guess whether images have been altered by comparing file sizes or using tools like jpegjudge.

xiota
  • 26,951
  • 4
  • 39
  • 126
  • 1
    In the sample images I looked at here, the XMPToolKit tag had a value of XMP Core 5.5.0. You could list just those images with this command:
    exiftool -if "$XMPToolkit eq 'XMP Core 5.5.0'" -filename -ext jpg .
    (Reverse double/single quotes if on Linux/Mac)
    – StarGeek Aug 28 '18 at 23:24
  • Good point, it could be revised to $XMPToolkit=~/^XMP Core/ to ignore version numbers – StarGeek Aug 28 '18 at 23:33
  • @StarGeek Why is it not enough to just detect existence of the tag? – xiota Aug 28 '18 at 23:34
  • 1
    Because many programs, such as Lightroom and Photoshop, with write their own toolkit data there. One example I found had Adobe XMP Core 5.3-c011 66.145661, 2012/02/06-14:56:27 in that field. This can be used to filter out files which have definitely not been re-compressed. – StarGeek Aug 28 '18 at 23:39
  • Using the samples here I was able to figure out with exiftool that JPEGDigest and YCbCrSubSampling will also be changed by the recompression. This does not guarantee that a file was recompressed, but can be used to filter out files that were not recompressed. – StarGeek Aug 28 '18 at 23:47
  • Did you have some that were definitely recompressed but had values that weren't YCbCr4:2:0 (2 2) and Unknown and didn't start that way? Quite possible, my test sample consisted of only of the three images that were on the HuffingtonPost article. – StarGeek Aug 28 '18 at 23:57
  • @StarGeek I created my own test image by uploading and redownloading from Google Photos. My original image also has JPEGDigest = Unknown (...) ; It is a straight from camera JPEG. Doing a diff on the exiftool output, Google also appears to add ImageUniqueID – xiota Aug 29 '18 at 00:08
  • @ xiota Oooh, that's a good one to check on. I don't recall very many programs using that. Most use XMP tags for unique ids. – StarGeek Aug 29 '18 at 00:10
0

If you know the typical size of a jpeg from your camera, you would then be able to tell which images had additional compression added simply by their file size.

frank
  • 572
  • 2
  • 4