1

I have written a script that access cogs on GCS (google cloud storage), clip it with several vectors I have, and then uploading it back to GCS. I thought that the clipped image size should be the same/lower size after the clipping, however, I was surprise to see that even though the images are the same , It is now heavier.

for example. an image with size of 635.6 MB before the clipping, is 951 MB after.

Details about how I preformed the clipping:

  1. read data from GCS - vector data and each time one raster
  2. used rastio.mask.mask to mask the raster
  3. saved the raster locally with rasterio, using the following parameters:
    with rasterio.open(savedir /'clipped.tif', 
                                   'w',
                                   driver='cog',
                                   height=img_arr.shape[1],
                                   width=img_arr.shape[2],
                                   count=25,
                                   dtype=img_arr.dtype,
                                   crs=crs_img,
                                   nodata=None, 
                                   transform=transf) as dst:
        for n in np.arange(0,number_of_bands):
              dst.write(img[n,:,:],int(n+1))
              dst.set_band_description(int(n+1), band_names[n])   

  1. uploaded to GCS

the pixels of the two images (before and after) have the same values (where they are not clipped) . I thought maybe it has to do with image dtype/ saving as cog, but I need clue for understanding why I have this difference.

I have seen this post but I am not sure if it's the same on rasterio and if my solution will be just to add to "compress='lzw'" parameter, and how will it affect the images.

My goal is to understand what could increase the size of my GeoTIFFs

EDIT: Overviews based on user2856 comment, I have checked the overviews and I found out that the clipped image is saved with overview:

metadata = src.meta

Get overviews

overviews = src.overviews(1) # the first band

Print the metadata and overviews

print(metadata) print(overviews)

>>> {'driver': 'GTiff', 'dtype': 'float32', 'nodata': None, 'width': 3340, 'height': 3342, 'count': 22, 'crs': CRS.from_epsg(4326), 'transform': Affine(removed)} []

... metadata = clipped.meta

Get overviews

overviews = clipped.overviews(1) # first band

Print the metadata and overviews

print(metadata) print(overviews) >>> {'driver': 'GTiff', 'dtype': 'float32', 'nodata': None, 'width': 3340, 'height': 3342, 'count': 22, 'crs': CRS.from_epsg(4326), 'transform': Affine(removed)} [2, 4, 8]

I saw that there are ways to remove overviews after openin the image, still looking for a ways to save the image without the overviews.

Edit 3: based on user2856 comments, I have checked the profile:

src.profile
>>>
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': None, 'width': 3340, 'height': 3342, 'count': 22, 'crs': CRS.from_epsg(4326), 'transform': Affine(removed), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'lzw', 'interleave': 'pixel'}

clipped.profile >>> {'driver': 'GTiff', 'dtype': 'float32', 'nodata': None, 'width': 3340, 'height': 3342, 'count': 22, 'crs': CRS.from_epsg(4326), 'transform': Affine(removed), 'blockxsize': 512, 'blockysize': 512, 'tiled': True, 'compress': 'lzw', 'interleave': 'pixel'}

I have immediately realized that the difference between the clipped image to the original image is that the clipped image has different blocksize (512 instead of 256) I was trying to force the clipped image to have 256 when I write it.:

    with rasterio.open(savedir /tiff_name, 
                                   'w',
                                   driver='cog',
                                   height=img.shape[1],
                                   width=img.shape[2],
                                   count=number_of_bands,
                                   dtype=img.dtype,
                                   crs=crs_img,
                                   nodata=None, # change if data has nodata value,
                                   blockxsize=256, 
                                   blockysize=256,
                                   transform=transf) as dst:
        for n in np.arange(0,number_of_bands):

              dst.write(img[n,:,:],int(n+1))
              dst.set_band_description(int(n+1), band_names[n])   

But when I was checking the size of the forced blocksize, it still was too large (954MB...) and it seems like blocksize doesn't really change when I print profile

ReutKeller
  • 2,139
  • 4
  • 30
  • 84
  • 4
    Compressed images tend to be always smaller than uncompressed images. Notice also that COG by default contains overview levels and they add about 30% to the file size (but with a good reason). – user30184 May 24 '23 at 15:07
  • @user30184 thank you for your comment. If I compress them, what will change in the image? how will it influence the pixels values? – ReutKeller May 25 '23 at 05:48
  • 1
    I just tried your code. The COG driver writes the output with LZW (lossless) compression by default. Your original image may not have overviews. – user2856 May 25 '23 at 08:49
  • @user2856 so that means that it's not about compression right? – ReutKeller May 31 '23 at 05:26
  • Depends on whether your version of the COG driver writes with compression. Why don't you check? rio info savedir/clipped.tif – user2856 May 31 '23 at 06:53
  • 2
    There's no way your question can be answered without info about your input and output rasters. Suggest you edit your question to include output of either rio info filename.tif and rio overview --ls filename.tif or gdalinfo filename.tif for both the input and output rasters. – user2856 May 31 '23 at 07:05
  • Please clarify what is "GCS". – Andrea Giudiceandrea Jun 03 '23 at 13:12
  • @AndreaGiudiceandrea Google cloud storage, sorry for the confusion. Added to the original post as well. – ReutKeller Jun 04 '23 at 05:33
  • @user2856 I have added information about overviews – ReutKeller Jun 07 '23 at 06:09
  • @Reut Dataset.meta is just the basic metadata, to see if it's compressed, use Dataset.profile - https://gis.stackexchange.com/a/283134/2856 – user2856 Jun 07 '23 at 06:14
  • 1
    @user2856 added this information + something I have tried based on it – ReutKeller Jun 07 '23 at 06:42
  • Ok, it's definitely the overviews. src.overviews(1) == [] and clipped.overviews(1) == [2, 4, 8]. So my question is... now that you know why, does it matter? I ask, because if you remove the overviews, it's not a valid COG, but just a geotiff. So why wouldn't you just write a geotiff in the first place using src.profile i.e. rasterio.open(savedir /'clipped.tif', 'w', **src.profile) as dst: – user2856 Jun 07 '23 at 06:57
  • @user2856 for your question - I have tried also to save it as gtiff, it's still very large - less than a cog but it's 936MB. The idea to write as cog is mainly for using it on GEE (cogs are required) . I have added also an attempt to reduce the blockxsize which also seems to have difference (edited the post). – ReutKeller Jun 07 '23 at 07:01

1 Answers1

0

As user2856 thought, I was able to get reasonable file size by defining overviews=None when writing the result clipped image using rasterio. It was something like this:

#img is ndarray
with rasterio.open(savedir /tiff_name, 
                                   'w',
                                   driver='cog',
                                   height=img.shape[1],
                                   width=img.shape[2],
                                   count=15,
                                   dtype=img.dtype,
                                   crs=crs_img,
                                   nodata=None,
                           #this is what I added - overviews
                               overviews=None,
                               transform=transf) as dst:

        for n in np.arange(0,15):

              dst.write(img[n,:,:],n+1)

ReutKeller
  • 2,139
  • 4
  • 30
  • 84