PostGIS: Using ST_CoveredBy (or similar) to speed up Difference on large datasets?

Question

I need to get the Difference of a single large-area polygon (input layer) and a very numerous set of polygons (overlay layer). The overlay layer has ~500k polygons, of which ~200k intersect the input layer. I am trying to use PostGIS to do this in the hope that it can be much faster than e.g. running a Difference algorithm in QGIS (which is unfeasibly slow).

I've got the basic approach to do this with ST_Difference from here, i.e. just run ST_Difference on a left join on ST_Intersects (using Coalesce to avoid losing polygons from the input layer that don't intersect with the overlay layer - although as it happens in this particular case there are no such polygons to lose as there is only a single input polygon which does intersect).

So my current query is:

SELECT COALESCE(ST_Difference(input.geom,overlay.geom),input.geom)
FROM input LEFT JOIN overlay ON ST_Intersects(input.geom,overlay.geom)

Unsurprisingly this query is taking a very long time, as it's calculating ~200k differences.

I've read about speeding up intersections enormously by skipping ST_Intersection where polygons are entirely contained within one another e.g. here and here. I.e. using ST_CoveredBy / ST_Within / ST_Contains etc. (I haven't completely sussed out the difference between these yet.)

I feel like I should be able to do something similar here, as the vast majority of the overlay polygons are entirely contained within the input polygon. But I can't get my head around the query code to use to do the same with ST_Difference.

we need to see the output of EXPLAIN on the query before we can say anything about it — Ian Turton, Mar 17 '21 at 10:48
@IanTurton can I get that output before the query completes? (I'm expecting it to take days to run...) — MDB, Mar 17 '21 at 11:05
Years back this had been a scenario that made me think outside the box (and PostGIS/GIS can be a sticky box...); for some client, I had to do this with +7.5M highly complex, but small polygons on an even more complex continental Eurasia Polygon. Turned out it takes less than 5 seconds to add the formatted coordinate strings of them all as inner rings to the WKT of the large Polygon... — geozelot, Mar 17 '21 at 11:39
@geozelot Revealing my ignorance I'm afraid I didn't fully understand that, but thank you. I've since found link and realised by far my biggest problem is that the input layer is one big polygon. Subdividing the input into 30,000 polygons (with st_subdivide) has taken runtime down to ~2 mins. I still have a mistake in the query because I'm getting all of the original (un--differenced) input polygons as well as the differenced ones, but I'll work that out! Thanks. — MDB, Mar 17 '21 at 12:17
The WKT format is a string representation of a geometry, and a Polygon is defined as a set of rings (outer, plus any amount of inner rings) which are made up by coordinates; making holes into a Polygon would thus mean to simply add some rings. Anyways, subdividing is the ultimate improvement, it wasn't clear from the question if your Polygon needs to stay as one. — geozelot, Mar 17 '21 at 13:54
Do you want to remove ALL the overlay polygons from the input polygon, or compute the difference of the input and EACH overlay polygon? — dr_jts, Mar 17 '21 at 17:34
@dr_jts I want to remove all the overlay polygons from the input polygon. You've seen (and helpfully commented) on a subsequent question of mine, which has led to getting the solution. I learnt a key step from this question though - the critical importance of subdividing the input so that bounding box checks can speed things up massively. Thanks. — MDB, Mar 17 '21 at 22:51
Subdividing the input will definitely increase performance, as noted. I'd be a bit worried that numeric roundoff would lead to small anomalies in the result, however (i.e. "gores" where the added intersections are not quite identical across two adjacent subdivided pieces). But this should happen rarely, and may not matter for your use case. — dr_jts, Mar 17 '21 at 23:58
it would be nice to see the SQL for this that you ended up with. — dr_jts, Mar 17 '21 at 23:58
@dr_jts I never did this in the end - it was fast enough just using the code provided by @JGH here. I don't actually know if using ST_CoveredBy (or similar) would make ST_Difference any faster - I suppose it depends on whether A - B is faster than ST_Difference(A,B) for the case when B is entirely contained within A. — MDB, Mar 19 '21 at 09:22

PostGIS: Using ST_CoveredBy (or similar) to speed up Difference on large datasets?

0 Answers0