6

I managed to extract raster values in PostGIS based on a table of point geometries by following Point sampling raster with PostGIS

SELECT id, ST_Value(rastertable.rast, 1, pointtable.geom) as RasterValue
FROM rastertable, pointtable
WHERE ST_Intersects(rastertable.rast, pointtable.geom)
LIMIT 20

The above query takes roughly 20 seconds, which is one second per case. In the end I want to extract raster values for over 10 million points. With the current query that would take roughly 115 days; not really an option. How could I adjust the above query to increase the performance? Or do I maybe even need to change the method?

Edit: Using QGIS point sampling plugin also results in a 3 second per 10 point speed, reducing the waiting time to about 40 days. That still is not workable.

Version info: enter image description here

John Powell
  • 13,649
  • 5
  • 46
  • 62
LMB
  • 1,166
  • 8
  • 27
  • 2
    By complete luck I was facing this exact problem today. The solution in my case was to tile the raster, using -t switch, going from 1000 x 1000 pixels per input tile to 100 x 100. Query went from 28 minutes to a few seconds. I used raster2pgsql, for what it is worth. – John Powell Mar 01 '18 at 12:04
  • I'll give that a try in a minute. Currently trying raster>polygon in QGIS, followed by a spatial join. Will post back the performance results. – LMB Mar 01 '18 at 12:09
  • @JohnPowellakaBarça, Your method on 10mln records took 12 minutes on an M2 SSD, nice. If you provide it as an answer I will mark it as the answer. – LMB Mar 01 '18 at 13:27
  • Great. When I get home. I have 260 million rows of points and a 5m terrain coverage of the UK. Mine was also looking at taking weeks till I figured this out. I suspect it is to do with how raster in encoded as a blob which needs to be opened for Get value to work. Will investigate before writing up – John Powell Mar 01 '18 at 13:51

1 Answers1

9

I was facing the same issues as you, and simply going from 1000x1000 to 100x100 less to a speed up of several orders of magnitude. In order to tile your input raster use the -t switch in raster2pgsql, eg,

raster2pgsql -t 200x200 -I -s .......

My understanding is that ST_Value works as an offset into an array (2D or 3D, depending on bands) and that as each individual raster is being stored as a blob, you will need to open each one in order to find the pixel at x, y or that intersects with a geometry. So, it follows that smaller tiles lead to quicker lookups, especially as the index structure will be very evenly balanced boxes.

There is obviously a sweet spot for tile size, which will depend on which form of ST_Value you are using and the work load in question -- I would expect polygons to behave very differently to points. There are some good pointers in this blog post by Duncan Golicher, showing a continual decline in speed as tile size falls for point lookups and an initial decline with polygons, with a sweet spot at around 150, before increasing again -- which makes sense a priori.

Here are a couple of images from Duncan's post illustrating the point.

enter image description here

enter image description here

I found that occasionally I got an error from ST_Value about lookups being outside the raster bounds and eventually used ST_NearestValue instead, with the same performance characteristics.

John Powell
  • 13,649
  • 5
  • 46
  • 62