3

tl;dr: I can't get gdal_translate to use multiple cores. How to fix?

I am using gdalwarp followed by gdal_translate to process a large GeoTIFF by first cropping to a polygon cutline and outputting a virtual raster, then translating the .vrt to a .tif. I have followed suggestions from a few different answers on this site, first I split up the processes into two to enable better compression following this answer about gdalwarp, then I attempted to speed up the performance of gdal_translate following this answer about multithread support for gdal_translate. I am running this on a remote server which has GDAL v2.2.2 installed and the OS is Ubuntu 16.04.6 LTS (Xenial Xerus).

This is my code.

gdalwarp -of vrt -crop_to_cutline \
  -cutline  ${path}/counties_chesapeake_watershed.gpkg ${path}/bigraster.tif ${path}/clippedraster.vrt
gdal_translate -co compress=LZW -co NUM_THREADS=8 --config GDAL_CACHEMAX 512 \
  ${path}/clippedraster.vrt ${path}/clippedraster.tif

My issue is that I don't believe that gdal_translate is using multiple cores, though I've tried to specify this with NUM_THREADS and also to increase GDAL_CACHEMAX. This is a very large raster (~12GB, several hundred km extent at 1 m resolution) so it is running extremely slowly. Can anyone help me parallelize the compression done by gdal_translate so this will run faster?

qdread
  • 283
  • 3
  • 13
  • 1
    Couple of suggestions: 1) Have a try by creating the output of gdalwarp as physical tiled tiff file -co TILED=YES 2) It would help to have the input image also as tiled 3) Create also the final output from gdal_translate as tiled. – user30184 Aug 05 '20 at 13:29

1 Answers1

3

You're getting a speedup using NUM_THREADS, but only at the compression stage. gdal_translate cannot used multithreading for any function apart from compression.

Probably the GDAL_CACHEMAX command is helping you out more than the NUM_THREADS option.

  • Thanks that's helpful to know. The gdal_translate command has been running for 5 days and is only 30% complete so it really seems like I am doing this inefficiently. Is there any way to use mosaicking to make it faster, perhaps? – qdread May 04 '20 at 13:18
  • How big is the data you're working with? – shubhamgoel27 May 04 '20 at 13:50
  • The tif is approximately 12 gb in size, with several hundred km in extent and 1 m resolution, roughly 250000P x 350000L – qdread May 04 '20 at 13:56
  • 1
    That's not that large. I believe you're using the best possible configuration using GDAL. If you have access to something like ArcGIS Pro, maybe that can speed up the process. – shubhamgoel27 May 04 '20 at 14:24