6

I have two table of polygons, and I want to 'flatten' them into one table, and have the result tell me which areas are in one table of polygons, which in the other table of polygons, and which are in both.

I have had a go at doing this in PostGIS, and below present a self-contained example, but is there a better way?

DROP TABLE IF EXISTS a_table;
CREATE TEMP TABLE a_table AS (
    SELECT  'a1' as id, st_geomfromtext('POLYGON((50 50, 50 150, 150 150, 150 50, 50 50))') geom
    UNION ALL
    SELECT  'a2' as id, st_geomfromtext('POLYGON((150 150, 150 200, 200 200, 200 150, 150 150))') geom
    UNION ALL
    SELECT  'a3' as id, st_geomfromtext('POLYGON((10 10, 10 20, 20 20, 20 10, 10 10))') geom
    );

DROP TABLE IF EXISTS b_table; CREATE TEMP TABLE b_table AS ( SELECT 'b1' as id, st_geomfromtext('POLYGON((40 40, 40 140, 140 140, 140 40, 40 40))') geom UNION ALL SELECT 'b2' as id, st_geomfromtext('POLYGON((30 30, 30 40, 40 40, 40 30, 30 30))') geom UNION ALL SELECT 'b3' as id, st_geomfromtext('POLYGON((155 155, 155 175, 175 175, 175 155, 155 155))') geom );

DROP TABLE IF EXISTS intersects; CREATE TEMP TABLE intersects AS ( SELECT a_table.id aid, b_table.id bid, CASE WHEN st_within(a_table.geom, b_table.geom) THEN a_table.geom ELSE st_intersection(a_table.geom, b_table.geom) END AS geom FROM a_table INNER JOIN b_table ON (st_intersects(a_table.geom, b_table.geom) AND NOT st_touches(a_table.geom, b_table.geom)) );

DROP TABLE IF EXISTS non_intersect_diff; CREATE TEMP TABLE non_intersect_diff AS ( SELECT a_table.id aid, '' as bid, a_table.geom FROM a_table WHERE a_table.id NOT IN (SELECT aid FROM intersects GROUP BY aid) );

DROP TABLE IF EXISTS intersect_diff; CREATE TEMP TABLE intersect_diff AS ( WITH collect AS ( SELECT a_table.id aid, st_union(b_table.geom) geom FROM a_table INNER JOIN b_table ON (st_intersects(a_table.geom, b_table.geom) AND NOT st_touches(a_table.geom, b_table.geom)) GROUP BY a_table.id ) SELECT a_table.id as aid, '' as bid, st_difference(a_table.geom, collect.geom) geom FROM a_table INNER JOIN collect ON a_table.id = collect.aid);

DROP TABLE IF EXISTS non_intersect_diff_reverse; CREATE TEMP TABLE non_intersect_diff_reverse AS ( SELECT '' as aid, b_table.id as bid, b_table.geom FROM b_table WHERE b_table.id NOT IN (SELECT bid FROM intersects GROUP BY bid) );

DROP TABLE IF EXISTS intersect_diff_reverse; CREATE TEMP TABLE intersect_diff_reverse AS ( WITH collect AS ( SELECT b_table.id bid, st_union(a_table.geom) geom FROM b_table INNER JOIN a_table ON (st_intersects(b_table.geom, a_table.geom) AND NOT st_touches(b_table.geom, a_table.geom)) GROUP BY b_table.id ) SELECT '' as aid, b_table.id bid,
st_difference(b_table.geom, collect.geom) geom FROM b_table INNER JOIN collect ON b_table.id = collect.bid );

CREATE TEMP TABLE result AS ( SELECT * FROM intersects UNION ALL SELECT * FROM non_intersect_diff UNION ALL SELECT * FROM intersect_diff UNION ALL SELECT * FROM non_intersect_diff_reverse UNION ALL SELECT * FROM intersect_diff_reverse );

SELECT * FROM result

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
TheRealJimShady
  • 523
  • 3
  • 9
  • 1
    See also this solution in the PostGIS wiki: https://trac.osgeo.org/postgis/wiki/UsersWikiExamplesOverlayTables – dr_jts Jun 11 '21 at 17:23
  • 1
    See also this answer to a similar question: https://gis.stackexchange.com/a/31562/14766 – dr_jts Jun 11 '21 at 17:29
  • 1
    Here is a full example of the Node-Polygonize approach: https://trac.osgeo.org/postgis/wiki/UsersWikiExamplesOverlayTables#WorkedExample – dr_jts Jun 11 '21 at 19:34
  • Wow @dr_jts . Thanks so much. This is a game-changer for the work I'm doing. I'm going to give it a go on some real-world data now. – TheRealJimShady Jun 11 '21 at 20:24
  • I'm actually going to need to 'composite' 6 polygon tables. I can use the my_overlay function that Chris Hodgson defines here ( https://trac.osgeo.org/postgis/wiki/UsersWikiExamplesOverlayTables#WorkedExample ) in a loop, using the result of the first as input to the next and so on. But again .... maybe there's a better way? – TheRealJimShady Jun 11 '21 at 20:25
  • 1
    The advantage of the node-polgonize approach is that it scales easily to N tables. Just extract the linework from all the tables, and then join the resultants back to each parent table. – dr_jts Jun 11 '21 at 20:38

1 Answers1

8

One way to do this is to create a polygonal coverage from the linework of the inputs, and then spatially join this back to recover the parent attributes. The advantage of using this technique is that it should be more robust in handling polygons which have very small overlaps or coincident linework. It also avoids the need to drop polygon intersection which result in lines.

  • Extract linework using ST_Boundary
  • Node/dissolve lines using ST_Union
  • Polygonize resultants using ST_Polygonize
  • Generate an interior point for each resultant using ST_PointOnSurface
  • Attach parent attribution by joining on interior points using ST_Contains
WITH poly_a(id, geom) AS (VALUES
    ( 'a1', 'POLYGON ((10 40, 30 40, 30 10, 10 10, 10 40))'::geometry ),
    ( 'a2', 'POLYGON ((70 10, 30 10, 30 90, 70 90, 70 10), (40 40, 60 40, 60 20, 40 20, 40 40), (40 80, 60 80, 60 60, 40 60, 40 80))'::geometry ),
    ( 'a3', 'POLYGON ((40 40, 60 40, 60 20, 40 20, 40 40))'::geometry )
)
,poly_b(id, geom) AS (VALUES
    ( 'b1', 'POLYGON ((90 70, 90 50, 50 50, 50 70, 90 70))'::geometry ),
    ( 'b2', 'POLYGON ((90 30, 50 30, 50 50, 90 50, 90 30))'::geometry ),
    ( 'b2', 'POLYGON ((90 10, 70 10, 70 30, 90 30, 90 10))'::geometry )
)
,lines AS ( 
  SELECT ST_Boundary(geom) AS geom FROM poly_a
  UNION ALL
  SELECT ST_Boundary(geom) AS geom FROM poly_b
)
,noded_lines AS ( SELECT ST_Union(geom) AS geom FROM lines ) 
,resultants AS (  
  SELECT geom, ST_PointOnSurface(geom) AS pip 
    FROM St_Dump(
           ( SELECT ST_Polygonize(geom) AS geom FROM noded_lines ))   
)
SELECT a.id AS ida, b.id AS idb, r.geom
  FROM resultants r
  LEFT JOIN poly_a a ON ST_Contains(a.geom, r.pip) 
  LEFT JOIN poly_b b ON ST_Contains(b.geom, r.pip)
  WHERE a.id IS NOT NULL OR b.id IS NOT NULL;
dr_jts
  • 5,038
  • 18
  • 15
  • This is a great answer. Thanks. Couple of follow-up questions. 1. I'm only interested in having single polygons in the result. Not lines or points, and multi-polygons should be split apart into single polygon rows. I guess I can post-process to achieve this using st_dump on the multi-polygons? Also I can perhaps removing anything with st_area = 0 ? Does that sound sensible? – TheRealJimShady Jun 11 '21 at 20:34
  • 1
    The output of polygonization is individual polygons. So there will be no lines or MultiPolygons in the output. And there should be no zero-area polygons, either. – dr_jts Jun 11 '21 at 20:39
  • Brilliant, thanks again. In terms of performance, do you think that there would be benefit to materialising some of the CTE tables, and adding an index to them? Then using them as input to the next stage. – TheRealJimShady Jun 11 '21 at 20:43
  • 1
    Since all the operations are table-wide, I don't think materializing would help. Even the final ST_Contains query uses the indexes on the input tables, but is a full scan of the resultants. – dr_jts Jun 11 '21 at 21:37
  • Do you think fine tuning any of the PostgreSQL parameters might help with the st_union part of the query @dr_jts ? I'm working on quite a large dataset, and it's pretty slow. I've read increasing work_mem setting might help. – TheRealJimShady Jun 14 '21 at 11:25
  • 1
    No, increasing work_mem has no effect on the performance or memory usage of spatial operations like ST_Union. They are performed by the GEOS subsytem, which allocates memory outside the pool managed by Postgres. – dr_jts Jun 14 '21 at 16:01
  • Got it, thank you. Anything I can do that you can think of? st_union been running for almost 3 hours now. Was wondering about trying to do the noding of the lines in groups or something and then union them after. Like a cascading union or something. – TheRealJimShady Jun 14 '21 at 16:38
  • 1
    What is the size of the input?! – dr_jts Jun 14 '21 at 17:04
  • 1
    Partioning probably won't help, unfortunately, since in the end you need to union the entire dataset of lines anyway. That is a downside to this approach over your intersection-based approach above (which although slower is at least incremental) – dr_jts Jun 14 '21 at 17:06
  • 1
    We're thinking about how to do the linework node/polygonize in an incremental way using window functions - but so far that is just thinkware. – dr_jts Jun 14 '21 at 17:07
  • "since in the end you need to union the entire dataset of lines anyway" -- that was what I figured unfortunately. – TheRealJimShady Jun 14 '21 at 17:07
  • I'm running this on six administrative boundary levels of Austria. 4-digit postcode, municipality, district, state, cresta high, cresta low. The intersection approach is SO problematic with self-intersections, topology errors etc it's a nightmare. This approach is simpler, cleaner, and has none of those issues. It's just the performance .... – TheRealJimShady Jun 14 '21 at 17:09
  • Wondered whether this might help a little: http://blog.cleverelephant.ca/2018/09/postgis-external-storage.html – TheRealJimShady Jun 14 '21 at 17:09
  • Also if st_memunion or st_unaryunion might be better in this case than normal st_union . – TheRealJimShady Jun 14 '21 at 17:11
  • 1
    I don't think either of those will help. In fact for linework ST_Union just uses the ST_MemUnion algorithm, since there's no benefit to cascading. – dr_jts Jun 14 '21 at 17:23
  • 1
    Can you partition the data by the largest admin units? – dr_jts Jun 14 '21 at 17:23
  • Yes I could do that. Not sure how, but I'll get googling! – TheRealJimShady Jun 14 '21 at 17:25