1

I am very new to PostGIS. I read that it should work faster than QGIS, so I thought I'd give it a go. I want to completely dissolve a shapefile of 1.2m polygons in the same way as QGIS built in dissolve function works.

This is my current code, which is very basic (filetodissolve is the table):

SELECT ST_Union(geom)
FROM filetodissolve f;

I've been running this for 1h 30m now and is showing no sign of stopping. Is there any method to speed this up.

Melanie Baker
  • 511
  • 2
  • 13
  • @BERA What code do I need to change in that answer. Just where it says 'table'? – Melanie Baker Feb 02 '23 at 14:44
  • 1
    Yes to the name of your table. With schema, for example public.yourtablename – BERA Feb 02 '23 at 14:45
  • 1
    If you do not need extremely accurate result, try this trick https://gis.stackexchange.com/questions/222976/cleaning-large-shapefile-using-v-clean-in-order-to-dissolve-features. – user30184 Feb 02 '23 at 14:48
  • "PostGIS is faster than QGIS" is fake news, or at least sufficiently stark and unnuanced to not address reality. – Vince Feb 02 '23 at 14:54
  • @Vince Just looking for a quicker method/a method that actually works. Happy to hear other options. – Melanie Baker Feb 02 '23 at 14:56
  • @BERA will the code make much difference if most of my polygons overlap? – Melanie Baker Feb 02 '23 at 15:31
  • do you have an index? – Ian Turton Feb 02 '23 at 15:54
  • @IanTurton I have columns gid, fid and geom. – Melanie Baker Feb 02 '23 at 15:55
  • add a spatial index – Ian Turton Feb 02 '23 at 15:55
  • 1
    I found a suggestion to use ST_Buffer(St_Collect(wkb_geometry), 0) in some old comment. That might also be worth trying. Using SET work_mem=50000; for giving more memory was also suggested. If you test with 10000 or 100000 features you will get preliminary results faster. – user30184 Feb 02 '23 at 15:55
  • @IanTurton how do I add a spatial index? – Melanie Baker Feb 02 '23 at 15:56
  • 1
    Do a research on 'add postgis gist index' – Kasper Feb 02 '23 at 16:21
  • 2
    Before adding a spatial index it might be good to check if it is missing https://gis.stackexchange.com/questions/241599/finding-postgis-tables-that-are-missing-indexes. – user30184 Feb 02 '23 at 16:37
  • You haven't talked about the complexity of the polygons to be unioned nor about their connectivity. If processing in one go is too much, you have to break it down in smaller batches. This could be via clusters as shown by Bera, or by using a grid and computing for each quadrant. The more complex as the polygons, the smaller should be the quadrant. Once done, do it again using the previously unioned polygons and a bigger quadrant. But in any cases it is very important to work on nearby geometries – JGH Feb 03 '23 at 14:31
  • What version of PostGIS are you using? The more recent versions of PostGIS/GEOS might provided faster unioning, due to some improvements in the implementation. – dr_jts Feb 03 '23 at 16:39
  • Is there any way you can share the data? I'm curious to see what it looks like, and experiment with the union. – dr_jts Feb 03 '23 at 16:43
  • @dr_jts I don't think I can share the data due to its owners data agreements unfortunately, but thankyou! – Melanie Baker Feb 06 '23 at 08:14
  • @dr_jts I've only just downloaded PostGIS - its 3.3.2 – Melanie Baker Feb 06 '23 at 08:17

1 Answers1

5

This is basically the same answer as to this question. It uses ST_ClusterDBSCAN to assign each cluster of intersecting/adjacent polygons an id and union based on id:

create index table123_index on test.table123 using GIST(geom); --Make sure you have a spatial index

create table test.table123_dissolved as with clusters as (select st_clusterdbscan(geom,0,2) over() cluster_id, geom from test.table123)

select st_union(geom) geom from clusters where cluster_id is not null --Where there are adjacent polygons that have been assigned a cluster id group by cluster_id

union

select geom from clusters where cluster_id is null --Polygons that are separate from all others get no cluster id ; alter table test.table123_dissolved add column id serial;

With my test data it finishes in 120 s for 3 million features enter image description here

(I canceled Dissolve in QGIS after 20 min/35 % finished.)

BERA
  • 72,339
  • 13
  • 72
  • 161
  • Thankyou. I have been running the code for 18hs now. I have almost 1.3m polygons. The polygon layer is 1.2GB and is the extent of the UK. Will any of this change the speed of the union? – Melanie Baker Feb 03 '23 at 08:52
  • Do you have large complex multipolygons? Try converting them to single parts using ST_DUMP. But first try my answer of a subset of your 1.3 million polygons to make sure it is working – BERA Feb 03 '23 at 09:05
  • 1
    No I did a multipart to single part conversion in QGIS before I loaded the file into PostGIS/pgAgmin4 – Melanie Baker Feb 03 '23 at 09:11
  • 1
    I dont know if it makes any difference but you could try subdividing the geometries – BERA Feb 03 '23 at 09:14
  • This is a good approach if the data contains disjoint clumps of polygons. Probably won't help if all/many of the polygons are touching, though. – dr_jts Feb 03 '23 at 16:42
  • In addition to the previous comments, clustering the spatial index could also be tested - postgis.net/workshops/postgis-intro/clusterindex.html. – Brent Edwards Jun 09 '23 at 16:38