gdal_calc.py outputs huge files

Question

I think this is a general enough question to apply to more than just gdal_calc, but perhaps not. When I run

gdal_calc.py -A map.tif --outfile=deforestation_00-10.tif --NoDataValue=0 --calc="A >= 7"

on a 38000 x 38000 62.5MB tif, the output is 1.37GB (and still 38000 x 38000). I feel like i'm probably missing something big here, i.e. how tif data is stored. [the same thing happens when I run it using the raster calculator in qGIS] Thanks in advance.

Bonus points--anyone know how to include and/or logic in the --calc="..." in order to evaluate the following: (map@1 >= 7 AND map@1 <= 9) OR ((map@1 >= 10 AND map@1 <= 12) * 2)

Are you aware that TIFF actually comprises multiple formats, including (but not requiring) compression? Because 38000 * 38000 / 2^30 = 1.35, it is clear the output is not compressed and the input is compressed. For an example of using Boolean operations in gdal_calc, please see this recent answer. — whuber, Aug 22 '13 at 21:00
Your "bonus points" line should probably have been asked as a separate question - that helps keep things organized and easy to search. — Richard, Aug 18 '14 at 22:08

user2856 · Accepted Answer · 2013-08-23T01:28:28.153

17

Use the --co=creationoptions parameter to compress the output.

gdal_calc.py --co="COMPRESS=LZW" -A map.tif --outfile=deforestation_00-10.tif --NoDataValue=0 --calc="A >= 7"

For more compression options, see the GDAL GTiff format description.

edited Aug 23 '13 at 01:28

answered Aug 22 '13 at 22:26

user2856

65,736
6
115
196

Mike T · Answer 2 · 2019-01-15T03:25:31.047

9

Filesize question

If the result data is boolean True/False (or 1s and 0s), use --type=Byte with the creation option NBITS=1 to create a file with 1 bit per sample. This will pack the uncompressed data 8 times smaller. And then as @Luke has answered, specify a compression to use. There are a dozen different compression methods; another good one is COMPRESS=DEFLATE.

gdal_calc -A map.tif --outfile=out.tif --type=Byte --co="NBITS=1" --co="COMPRESS=DEFLATE" --calc="A >= 7"

With some example data that I have, I'm seeing a compression ratio of 0.15%. So I'd expect a result from a 1.37GB file to compress down to about 2MB.

Bonus question

According to the help for --calc:

calculation in gdalnumeric syntax using +-/* or any numpy array functions (i.e. logical_and())

The expression is processed by eval. So try this:

--calc="logical_or(logical_and(A >= 7, A <= 9), logical_and(A >= 10, A <= 12))"

I wasn't sure what the whole * 2 part of your original expression was, so it is left out.

edited Jan 15 '19 at 03:25

answered Aug 23 '13 at 01:02

Mike T

42,095
10
126
187

Hmm, was trying to get output such that values 7 <= A <= 9 would evaluate to 1, and values 10 <= A <= 12 would evaluate to 2, using a raster mask as described here. However, although logical_and(A >= 10, A <= 12) * 2 evaluates to 2 for inputs 10, 11, 12, adding it as the second input to the logical_or function results in all inputs 7 <= A <= 12 returning 1 (I guess b/c when 'logical_orevaluates 2, it reads it asTrue`, which returns 1). – James Conkling Aug 24 '13 at 05:17
Update: apparently logicals here are not evaluated the same as logicals elsewhere. E.g. in python, False or 2 returns 2 while here it returns 1. I can get the result I need by running gdal_calc three times--one to produce the value 1 for 7,8,9; a second time to produce the value 2 for 10,11,12; and third time to combine them--but wondering if there's a better way. – James Conkling Aug 24 '13 at 05:34

gdal_calc.py outputs huge files

2 Answers2

Filesize question

Bonus question