Increased raster file size after clipping and uploading to GCS

Question

I have written a script that access cogs on GCS (google cloud storage), clip it with several vectors I have, and then uploading it back to GCS. I thought that the clipped image size should be the same/lower size after the clipping, however, I was surprise to see that even though the images are the same , It is now heavier.

for example. an image with size of 635.6 MB before the clipping, is 951 MB after.

Details about how I preformed the clipping:

read data from GCS - vector data and each time one raster
used rastio.mask.mask to mask the raster
saved the raster locally with rasterio, using the following parameters:

    with rasterio.open(savedir /'clipped.tif', 
                                   'w',
                                   driver='cog',
                                   height=img_arr.shape[1],
                                   width=img_arr.shape[2],
                                   count=25,
                                   dtype=img_arr.dtype,
                                   crs=crs_img,
                                   nodata=None, 
                                   transform=transf) as dst:
        for n in np.arange(0,number_of_bands):
              dst.write(img[n,:,:],int(n+1))
              dst.set_band_description(int(n+1), band_names[n])

uploaded to GCS

the pixels of the two images (before and after) have the same values (where they are not clipped) . I thought maybe it has to do with image dtype/ saving as cog, but I need clue for understanding why I have this difference.

I have seen this post but I am not sure if it's the same on rasterio and if my solution will be just to add to "compress='lzw'" parameter, and how will it affect the images.

My goal is to understand what could increase the size of my GeoTIFFs

EDIT: Overviews based on user2856 comment, I have checked the overviews and I found out that the clipped image is saved with overview:

metadata = src.meta
Get overviews
overviews = src.overviews(1)  # the first band
Print the metadata and overviews
print(metadata)
print(overviews)
>>>
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': None, 'width': 3340, 'height': 3342, 'count': 22, 'crs': CRS.from_epsg(4326), 'transform': Affine(removed)}
[]
...
metadata = clipped.meta
Get overviews
overviews = clipped.overviews(1)  # first band
Print the metadata and overviews
print(metadata)
print(overviews)
>>>
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': None, 'width': 3340, 'height': 3342, 'count': 22, 'crs': CRS.from_epsg(4326), 'transform': Affine(removed)}
[2, 4, 8]

I saw that there are ways to remove overviews after openin the image, still looking for a ways to save the image without the overviews.

Edit 3: based on user2856 comments, I have checked the profile:

src.profile
>>>
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': None, 'width': 3340, 'height': 3342, 'count': 22, 'crs': CRS.from_epsg(4326), 'transform': Affine(removed), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'lzw', 'interleave': 'pixel'}
clipped.profile
>>>
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': None, 'width': 3340, 'height': 3342, 'count': 22, 'crs': CRS.from_epsg(4326), 'transform': Affine(removed), 'blockxsize': 512, 'blockysize': 512, 'tiled': True, 'compress': 'lzw', 'interleave': 'pixel'}

I have immediately realized that the difference between the clipped image to the original image is that the clipped image has different blocksize (512 instead of 256) I was trying to force the clipped image to have 256 when I write it.:

    with rasterio.open(savedir /tiff_name, 
                                   'w',
                                   driver='cog',
                                   height=img.shape[1],
                                   width=img.shape[2],
                                   count=number_of_bands,
                                   dtype=img.dtype,
                                   crs=crs_img,
                                   nodata=None, # change if data has nodata value,
                                   blockxsize=256, 
                                   blockysize=256,
                                   transform=transf) as dst:
        for n in np.arange(0,number_of_bands):

              dst.write(img[n,:,:],int(n+1))
              dst.set_band_description(int(n+1), band_names[n])

But when I was checking the size of the forced blocksize, it still was too large (954MB...) and it seems like blocksize doesn't really change when I print profile

Compressed images tend to be always smaller than uncompressed images. Notice also that COG by default contains overview levels and they add about 30% to the file size (but with a good reason). — user30184, May 24 '23 at 15:07
@user30184 thank you for your comment. If I compress them, what will change in the image? how will it influence the pixels values? — ReutKeller, May 25 '23 at 05:48
I just tried your code. The COG driver writes the output with LZW (lossless) compression by default. Your original image may not have overviews. — user2856, May 25 '23 at 08:49
@user2856 so that means that it's not about compression right? — ReutKeller, May 31 '23 at 05:26
Depends on whether your version of the COG driver writes with compression. Why don't you check? rio info savedir/clipped.tif — user2856, May 31 '23 at 06:53
There's no way your question can be answered without info about your input and output rasters. Suggest you edit your question to include output of either rio info filename.tif and rio overview --ls filename.tif or gdalinfo filename.tif for both the input and output rasters. — user2856, May 31 '23 at 07:05
@AndreaGiudiceandrea Google cloud storage, sorry for the confusion. Added to the original post as well. — ReutKeller, Jun 04 '23 at 05:33
@Reut Dataset.meta is just the basic metadata, to see if it's compressed, use Dataset.profile - https://gis.stackexchange.com/a/283134/2856 — user2856, Jun 07 '23 at 06:14
@user2856 added this information + something I have tried based on it — ReutKeller, Jun 07 '23 at 06:42
Ok, it's definitely the overviews. src.overviews(1) == [] and clipped.overviews(1) == [2, 4, 8]. So my question is... now that you know why, does it matter? I ask, because if you remove the overviews, it's not a valid COG, but just a geotiff. So why wouldn't you just write a geotiff in the first place using src.profile i.e. rasterio.open(savedir /'clipped.tif', 'w', **src.profile) as dst: — user2856, Jun 07 '23 at 06:57
@user2856 for your question - I have tried also to save it as gtiff, it's still very large - less than a cog but it's 936MB. The idea to write as cog is mainly for using it on GEE (cogs are required) . I have added also an attempt to reduce the blockxsize which also seems to have difference (edited the post). — ReutKeller, Jun 07 '23 at 07:01

ReutKeller · Accepted Answer · 2023-06-08T06:01:15.737

As user2856 thought, I was able to get reasonable file size by defining overviews=None when writing the result clipped image using rasterio. It was something like this:

#img is ndarray
with rasterio.open(savedir /tiff_name, 
                                   'w',
                                   driver='cog',
                                   height=img.shape[1],
                                   width=img.shape[2],
                                   count=15,
                                   dtype=img.dtype,
                                   crs=crs_img,
                                   nodata=None,
                           #this is what I added - overviews
                               overviews=None,
                               transform=transf) as dst:

        for n in np.arange(0,15):

              dst.write(img[n,:,:],n+1)

Increased raster file size after clipping and uploading to GCS

Get overviews

Print the metadata and overviews

Get overviews

Print the metadata and overviews

1 Answers1