3

I am using gdalwarp to convert GeoPDFs of processed maps to GeoTIFFs (for later stitching into a larger GeoTIFF) using the following command:

gdalwarp -t_srs EPSG:28356 -r cubic -cutline "nsw_map_boundaries\20160506_nsw_map_bounds.geojson" -cwhere "name = '9030-4S SPRINGWOOD'" -crop_to_cutline -dstalpha "9030-4S SPRINGWOOD.pdf" "9030-4S SPRINGWOOD.tif"

The GeoPDFs have collars, so the cutline file contains the boundaries of the actual maps. EPSG:28356 is the projection of the map (GDA94 / MGA Zone 56).

enter image description here

Unfortunately this approach turns a 10MB PDF into a 70MB GeoTiff! The warping also re-orients the map to align with the UTM grid.

enter image description here

The main reason for the size is that the output GeoTIFFs are in 32-bit format. The original PDF files only have around 30 distinct colours (see below), so it would be more efficient if the GeoTIFFs were in 8-bit paletted colour. I haven't been able to find a flag or setting to do this.

enter image description here

Is there a way of achieving this - either with gdalwarp, or other GDAL tools (or both)?

One constraint is that the GeoTIFFs do need to have transparency - either via alpha, or via a NoData value - for anything outside the cutline. The current gdalwarp command uses alpha (-dstalpha flag), but only because I couldn't easily get a NoData value to work.

Sample PDF file available from the NSW Topo Map Portal: https://portal.spatial.nsw.gov.au/download/NSWTopographicMaps/DTDB_GeoReferenced_Raster_CollarOn_161070/2017/25k/9030-4S+SPRINGWOOD.pdf

Sample cutline file with all map boundaries can be downloaded from https://maps.ozultimate.com/wiki/downloads (direct link)

Tom Brennan
  • 4,787
  • 6
  • 26
  • You can do this in two steps, RGB2PCT.py https://gdal.org/programs/rgb2pct.html, if you don't supply a pallete file it should choose its own. As for the alpha, you're causing that yourself with the -dstalpha switch, what value are you getting outside the cutline? you also don't need -t_srs as you're not supplying a from SRS, nor do you need -r CUBIC as no transformation is being done. You could use -co compress=lzw to reduce the size of the intermediate raster. – Michael Stimson Aug 07 '23 at 01:30
  • Unfortunately it doesn't look like rgb2pct will deal with the NoData/transparency issue? – Tom Brennan Aug 07 '23 at 01:49
  • 1
    There's a notable difference with and without -r cubic, so there must be some transformation going on - without that flag, there are quite a few artefacts. Without -dstalpha, the value outside the cutline is 0/0/0. With -dstalpha, it's 0/0/0/0 (4 bands). Using -co compress=lzw does reduce the size by about 50%, but with (say) 48 colours and compression, it could be around 7MB – Tom Brennan Aug 07 '23 at 04:47
  • Do any legitimate instances of 0/0/0 exist in your data? What about -dstnodata 10 10 10? Cubic resampling has some idiosyncrasies that might cause problems later; I'm a bit surprised that resampling is being done, perhaps keep bilinear in the back of your mind just in case you do come afoul of the cubic method. – Michael Stimson Aug 07 '23 at 06:22
  • 1
    I think the resampling is due to the fact that the original PDF is aligned to true north, but any version processed by GDAL aligns to the UTM grid. What sort of issues do you hit with cubic resampling? – Tom Brennan Aug 07 '23 at 21:27
  • 1
    When there is an abrupt colour change the cubic method makes the low value lower and the high value higher, you may not notice this because it's subtle but if you're aming for pseudocolor you will have numerous similar values. – Michael Stimson Aug 07 '23 at 23:53

2 Answers2

6

You need to specify the output compression type. Using (lossy) JPEG will get you a much smaller output tif (~12MB).

 gdalwarp -co compress=JPEG -co tiled=YES -ot Byte -t_srs EPSG:28356 -r cubic -cutline "9030-4S SPRINGWOOD.geojson" -crop_to_cutline -dstalpha "9030-4S SPRINGWOOD.pdf" "9030-4S SPRINGWOOD.tif"

Alternatively, a 3 stage process will get you a paletted (~9MB) image. There is a manual step, you'll need to figure out what value is outside the clipline after converting to palletted with rgb2pct.py to assign it to NoData.

gdalwarp -overwrite -co tiled=YES -ot Byte -t_srs EPSG:28356 -r cubic -cutline "9030-4S SPRINGWOOD.geojson" -crop_to_cutline -dstalpha "9030-4S SPRINGWOOD.pdf" "9030-4S SPRINGWOOD_warp.tif"
rgb2pct.py "9030-4S SPRINGWOOD_warp.tif" "9030-4S SPRINGWOOD_PCT.tif"
gdal_translate -co compress=LZW -a_nodata 234 -co tiled=YES "9030-4S SPRINGWOOD_PCT.tif" "9030-4S SPRINGWOOD.tif"

enter image description here

enter image description here

user2856
  • 65,736
  • 6
  • 115
  • 196
  • Using -co compress=lzw does reduce the size to around by a bit over 50%, from 74MB to 30MB. But with (say) 48 paletted colours and compression, it could be around 7MB. The output type (-ot Byte) had no effect (files were binary identical when I ran with and without). – Tom Brennan Aug 07 '23 at 05:03
  • Quick test without doing anything for the collars: rgb2pct 9030-4S+SPRINGWOOD.pdf paletted.tif yields a 23 MB single-band paletted file, and compressing it with gdal_translate -co compress=LZW paletted.tif paletted_lzw.tif gives 12 MB file size. – user30184 Aug 07 '23 at 08:18
  • In terms of the 3 step method - if you first do gdalwarp without -dstalpha, then rgb2pct -n 254..., and finally gdalwarp -dstnodata 255, you can avoid the manual step - as 255 will be the NoData value – Tom Brennan Aug 08 '23 at 01:03
  • Sounds like you have it sorted :) Perhaps add an answer with your final process? – user2856 Aug 08 '23 at 01:36
1

There look to be a few different approaches, mostly variants on the same general theme. I'll attempt to summarise.

Resampling

In terms of resampling methods, resampling is recommended (-r <method> in gdalwarp), as the original image is aligned to longitude (true north) whereas the processed images are aligned to UTM grid, and are thus rotated. Cubic resampling seems to give the best results. Bilinear results in more blurring. Lanczos is almost identical to cubic. Near (default) is objectively sharper than cubic, but leaves noticeable artifacts due to the rotation.

Approach 1 - JPEG compression

While the original question asked for reduction in colors, and the base image is essentially paletted, (lossy) JPEG compression actually does a decent job of both file size reduction (74MB->12MB) and quality. It also has the advantage of being achievable in one step.

gdalwarp -r cubic -dstalpha -co compress=JPEG -co tiled=YES -cutline "9030-4S SPRINGWOOD.geojson" -crop_to_cutline "9030-4S SPRINGWOOD.pdf" "9030-4S SPRINGWOOD.tif"

Approach 2 - rgb2pct.py & gdalwarp

Due to the transparency requirement, paletting is a 3-step process

  1. First pass with gdalwarp strips the collar off (to avoid processing colors in the collar) and applies resampling
  2. Then we reduce the colors to 254 (or smaller if desired) using rgb2pct
  3. Finally, a second pass with gdalwarp sets the NoData value to 255

The file size reduction is from 74MB->9MB. The color fidelity is worse than the JPEG version, but larger blocks of color are "cleaner" due to having fewer artifacts.

gdalwarp -r cubic -co compress=LZW -cutline "9030-4S SPRINGWOOD.geojson" -crop_to_cutline "9030-4S SPRINGWOOD.pdf" "9030-4S SPRINGWOOD_1.tif"

rgb2pct -n 254 "9030-4S SPRINGWOOD_1.tif" "9030-4S SPRINGWOOD_2.tif"

gdalwarp -co tiled=YES -co compress=LZW -dstnodata 255 -cutline "9030-4S SPRINGWOOD.geojson" -crop_to_cutline "9030-4S SPRINGWOOD_2.tif" "9030-4S SPRINGWOOD.tif"

Results for both approaches are fairly good at 100% zoom.

Other approaches

I also looked at a 2-step process just using rgb2pct.py & gdalwarp in that order. This fails due to resampling/rotating a paletted image - which gdalwarp warns about! Just don't...

In terms of improving quality, at the cost of size, there is also the option of upsizing in the first step. With a pixel size of 2 (-tr 2 2), which is about 2.1x, the resulting images are 31MB for approach 1 (JPG) and 34MB for approach 2 (paletting). The comparative comments about color fidelity/artifacts stand.

Interestingly, using WEBP compression (-co compress=WEBP) in approach 1 gave a 4.5MB image at default resolution, and 11MB at a pixel size of 2. Quality appears fairly to both the JPEG and palette approach, at less than 50% size of either. The only downside is that WEBP compression is not as widely supported - if you need to take the files out of GDAL (eg Photoshop can't handle it). If you can live with that, this might be as good an option as any:

gdalwarp -r cubic -dstalpha -co compress=WEBP -co tiled=YES -cutline "9030-4S SPRINGWOOD.geojson" -crop_to_cutline "9030-4S SPRINGWOOD.pdf" "9030-4S SPRINGWOOD.tif"

The compression level can be specified (75% is default). Lossless WEBP can also be specified (18MB at default resolution).

Tom Brennan
  • 4,787
  • 6
  • 26