Getting the 'composite' of two polygon layers

Question

I have two table of polygons, and I want to 'flatten' them into one table, and have the result tell me which areas are in one table of polygons, which in the other table of polygons, and which are in both.

I have had a go at doing this in PostGIS, and below present a self-contained example, but is there a better way?

DROP TABLE IF EXISTS a_table;
CREATE TEMP TABLE a_table AS (
    SELECT  'a1' as id, st_geomfromtext('POLYGON((50 50, 50 150, 150 150, 150 50, 50 50))') geom
    UNION ALL
    SELECT  'a2' as id, st_geomfromtext('POLYGON((150 150, 150 200, 200 200, 200 150, 150 150))') geom
    UNION ALL
    SELECT  'a3' as id, st_geomfromtext('POLYGON((10 10, 10 20, 20 20, 20 10, 10 10))') geom
    );
DROP TABLE IF EXISTS b_table;
CREATE TEMP TABLE b_table AS (
    SELECT  'b1' as id, st_geomfromtext('POLYGON((40 40, 40 140, 140 140, 140 40, 40 40))') geom
    UNION ALL
    SELECT  'b2' as id, st_geomfromtext('POLYGON((30 30, 30 40, 40 40, 40 30, 30 30))') geom
    UNION ALL
    SELECT  'b3' as id, st_geomfromtext('POLYGON((155 155, 155 175, 175 175, 175 155, 155 155))') geom
    );
DROP TABLE IF EXISTS intersects;
CREATE TEMP TABLE intersects AS (
    SELECT      a_table.id aid,
                b_table.id bid,
                CASE WHEN   st_within(a_table.geom, b_table.geom) THEN a_table.geom
                ELSE        st_intersection(a_table.geom, b_table.geom)
                END AS      geom
    FROM        a_table
    INNER JOIN  b_table
    ON          (st_intersects(a_table.geom, b_table.geom)
    AND NOT     st_touches(a_table.geom, b_table.geom))
    );
DROP TABLE IF EXISTS non_intersect_diff;
CREATE TEMP TABLE non_intersect_diff AS (
    SELECT      a_table.id aid,
                '' as bid,
                a_table.geom
    FROM        a_table
    WHERE       a_table.id NOT IN (SELECT aid FROM intersects GROUP BY aid)
    );
DROP TABLE IF EXISTS intersect_diff;
CREATE TEMP TABLE intersect_diff AS (
    WITH collect AS (
        SELECT      a_table.id aid,
                    st_union(b_table.geom) geom
        FROM        a_table
        INNER JOIN  b_table
        ON          (st_intersects(a_table.geom, b_table.geom)
        AND NOT     st_touches(a_table.geom, b_table.geom))
        GROUP BY    a_table.id
        )
        SELECT      a_table.id as aid,
                    '' as bid,
                    st_difference(a_table.geom, collect.geom) geom
        FROM        a_table
        INNER JOIN  collect
        ON          a_table.id = collect.aid);
DROP TABLE IF EXISTS non_intersect_diff_reverse;
CREATE TEMP TABLE non_intersect_diff_reverse AS (
    SELECT      '' as aid,
                b_table.id as bid,
                b_table.geom
    FROM        b_table
    WHERE       b_table.id NOT IN (SELECT bid FROM intersects GROUP BY bid)
);
DROP TABLE IF EXISTS intersect_diff_reverse;
CREATE TEMP TABLE intersect_diff_reverse AS (
    WITH collect AS (
        SELECT      b_table.id bid,
                    st_union(a_table.geom) geom
        FROM        b_table
        INNER JOIN  a_table
        ON          (st_intersects(b_table.geom, a_table.geom)
        AND NOT     st_touches(b_table.geom, a_table.geom))
        GROUP BY    b_table.id
        )
        SELECT      '' as aid,
                    b_table.id bid,

                    st_difference(b_table.geom, collect.geom) geom
        FROM        b_table
        INNER JOIN  collect
        ON          b_table.id = collect.bid
);
CREATE TEMP TABLE result AS (
    SELECT * FROM intersects
    UNION ALL
    SELECT * FROM non_intersect_diff
    UNION ALL
    SELECT * FROM intersect_diff
    UNION ALL
    SELECT * FROM non_intersect_diff_reverse
    UNION ALL
    SELECT * FROM intersect_diff_reverse
);
SELECT * FROM result

See also this solution in the PostGIS wiki: https://trac.osgeo.org/postgis/wiki/UsersWikiExamplesOverlayTables — dr_jts, Jun 11 '21 at 17:23
See also this answer to a similar question: https://gis.stackexchange.com/a/31562/14766 — dr_jts, Jun 11 '21 at 17:29
Here is a full example of the Node-Polygonize approach: https://trac.osgeo.org/postgis/wiki/UsersWikiExamplesOverlayTables#WorkedExample — dr_jts, Jun 11 '21 at 19:34
Wow @dr_jts . Thanks so much. This is a game-changer for the work I'm doing. I'm going to give it a go on some real-world data now. — TheRealJimShady, Jun 11 '21 at 20:24
I'm actually going to need to 'composite' 6 polygon tables. I can use the my_overlay function that Chris Hodgson defines here ( https://trac.osgeo.org/postgis/wiki/UsersWikiExamplesOverlayTables#WorkedExample ) in a loop, using the result of the first as input to the next and so on. But again .... maybe there's a better way? — TheRealJimShady, Jun 11 '21 at 20:25
The advantage of the node-polgonize approach is that it scales easily to N tables. Just extract the linework from all the tables, and then join the resultants back to each parent table. — dr_jts, Jun 11 '21 at 20:38

dr_jts · Accepted Answer · 2021-06-11T20:08:50.323

8

One way to do this is to create a polygonal coverage from the linework of the inputs, and then spatially join this back to recover the parent attributes. The advantage of using this technique is that it should be more robust in handling polygons which have very small overlaps or coincident linework. It also avoids the need to drop polygon intersection which result in lines.

Extract linework using ST_Boundary
Node/dissolve lines using ST_Union
Polygonize resultants using ST_Polygonize
Generate an interior point for each resultant using ST_PointOnSurface
Attach parent attribution by joining on interior points using ST_Contains

WITH poly_a(id, geom) AS (VALUES
    ( 'a1', 'POLYGON ((10 40, 30 40, 30 10, 10 10, 10 40))'::geometry ),
    ( 'a2', 'POLYGON ((70 10, 30 10, 30 90, 70 90, 70 10), (40 40, 60 40, 60 20, 40 20, 40 40), (40 80, 60 80, 60 60, 40 60, 40 80))'::geometry ),
    ( 'a3', 'POLYGON ((40 40, 60 40, 60 20, 40 20, 40 40))'::geometry )
)
,poly_b(id, geom) AS (VALUES
    ( 'b1', 'POLYGON ((90 70, 90 50, 50 50, 50 70, 90 70))'::geometry ),
    ( 'b2', 'POLYGON ((90 30, 50 30, 50 50, 90 50, 90 30))'::geometry ),
    ( 'b2', 'POLYGON ((90 10, 70 10, 70 30, 90 30, 90 10))'::geometry )
)
,lines AS ( 
  SELECT ST_Boundary(geom) AS geom FROM poly_a
  UNION ALL
  SELECT ST_Boundary(geom) AS geom FROM poly_b
)
,noded_lines AS ( SELECT ST_Union(geom) AS geom FROM lines ) 
,resultants AS (  
  SELECT geom, ST_PointOnSurface(geom) AS pip 
    FROM St_Dump(
           ( SELECT ST_Polygonize(geom) AS geom FROM noded_lines ))   
)
SELECT a.id AS ida, b.id AS idb, r.geom
  FROM resultants r
  LEFT JOIN poly_a a ON ST_Contains(a.geom, r.pip) 
  LEFT JOIN poly_b b ON ST_Contains(b.geom, r.pip)
  WHERE a.id IS NOT NULL OR b.id IS NOT NULL;

edited Jun 11 '21 at 20:08

answered Jun 11 '21 at 19:48

dr_jts

5,038
18
15

This is a great answer. Thanks. Couple of follow-up questions. 1. I'm only interested in having single polygons in the result. Not lines or points, and multi-polygons should be split apart into single polygon rows. I guess I can post-process to achieve this using st_dump on the multi-polygons? Also I can perhaps removing anything with st_area = 0 ? Does that sound sensible? – TheRealJimShady Jun 11 '21 at 20:34
1

The output of polygonization is individual polygons. So there will be no lines or MultiPolygons in the output. And there should be no zero-area polygons, either. – dr_jts Jun 11 '21 at 20:39
Brilliant, thanks again. In terms of performance, do you think that there would be benefit to materialising some of the CTE tables, and adding an index to them? Then using them as input to the next stage. – TheRealJimShady Jun 11 '21 at 20:43
1

Since all the operations are table-wide, I don't think materializing would help. Even the final ST_Contains query uses the indexes on the input tables, but is a full scan of the resultants. – dr_jts Jun 11 '21 at 21:37
Do you think fine tuning any of the PostgreSQL parameters might help with the st_union part of the query @dr_jts ? I'm working on quite a large dataset, and it's pretty slow. I've read increasing work_mem setting might help. – TheRealJimShady Jun 14 '21 at 11:25
1

No, increasing work_mem has no effect on the performance or memory usage of spatial operations like ST_Union. They are performed by the GEOS subsytem, which allocates memory outside the pool managed by Postgres. – dr_jts Jun 14 '21 at 16:01
Got it, thank you. Anything I can do that you can think of? st_union been running for almost 3 hours now. Was wondering about trying to do the noding of the lines in groups or something and then union them after. Like a cascading union or something. – TheRealJimShady Jun 14 '21 at 16:38
1

What is the size of the input?! – dr_jts Jun 14 '21 at 17:04
1

Partioning probably won't help, unfortunately, since in the end you need to union the entire dataset of lines anyway. That is a downside to this approach over your intersection-based approach above (which although slower is at least incremental) – dr_jts Jun 14 '21 at 17:06
1

We're thinking about how to do the linework node/polygonize in an incremental way using window functions - but so far that is just thinkware. – dr_jts Jun 14 '21 at 17:07
"since in the end you need to union the entire dataset of lines anyway" -- that was what I figured unfortunately. – TheRealJimShady Jun 14 '21 at 17:07
I'm running this on six administrative boundary levels of Austria. 4-digit postcode, municipality, district, state, cresta high, cresta low. The intersection approach is SO problematic with self-intersections, topology errors etc it's a nightmare. This approach is simpler, cleaner, and has none of those issues. It's just the performance .... – TheRealJimShady Jun 14 '21 at 17:09
Wondered whether this might help a little: http://blog.cleverelephant.ca/2018/09/postgis-external-storage.html – TheRealJimShady Jun 14 '21 at 17:09
Also if st_memunion or st_unaryunion might be better in this case than normal st_union . – TheRealJimShady Jun 14 '21 at 17:11
1

I don't think either of those will help. In fact for linework ST_Union just uses the ST_MemUnion algorithm, since there's no benefit to cascading. – dr_jts Jun 14 '21 at 17:23
1

Can you partition the data by the largest admin units? – dr_jts Jun 14 '21 at 17:23
Yes I could do that. Not sure how, but I'll get googling! – TheRealJimShady Jun 14 '21 at 17:25
Let us continue this discussion in chat. – dr_jts Jun 15 '21 at 18:07

Getting the 'composite' of two polygon layers

1 Answers1

Linked