6

We have many datasets, each containing about 150'000 polygons. We would like to dissolve each dataset down to about 2500 polygons, based on a given attribute.

We've successfully done this using the ArcGIS Dissolve tool, but we would like to move towards an open-source solution. The problem is that every open source solution we have tried so far is incredibly slow. It takes ArcGIS about one to two minutes to dissolve one dataset. With other methods, it takes at least one hour, at which point I stop it since my guess is it has crashed completely due to memory issues.

These are the methods we have tried so far:

  1. QGIS Dissolve tool (NOT the Saga or GRASS tools). I use QGIS 2.18, would QGIS3 make a big difference?
  2. GeoPandas/Shapely.
  3. ogr2ogr (as described here http://darrencope.com/2015/01/23/dissolve-shapefiles-using-ogr/)
  4. We would gladly try other tools such as GRASS, but we require this to be able to run on a machine for which we do not have admin rights. This means no installations or sudo commands.

Are there are there other tools or methods we can try? Are there some "tricks" to try out? For example, would dissolving in tiles and then dissolving the entire shapefile be more efficient?

Chouroud
  • 97
  • 4
  • That it also my experience, dissolve is much faster in ArcGIS. You could try postgis, it is also very fast with a clever query. See the answer to Only union/dissolve intersecting features to save time? – BERA Nov 12 '18 at 21:06
  • I wonder if this workaround of rasterizing-polygonizing would be usable for your needs https://gis.stackexchange.com/questions/222976/cleaning-a-large-shapefile-in-order-to-dissolve-features/223008#223008. – user30184 Nov 12 '18 at 21:13
  • @user30184 good catch, but the shapefiles I have already came from a raster in the first place, so I don't see any benefits of repeating the cycle. – Chouroud Nov 12 '18 at 21:15
  • @BERA postgis may be a solution. I will admit we were really hoping to find a Shapely solution since the rest of the backend is all done using geopandas. From your experience, are there any tricks that may speed up the geopandas dissolve? – Chouroud Nov 12 '18 at 21:20
  • I would suggest to try PostGIS as well http://blog.cleverelephant.ca/2009/01/must-faster-unions-in-postgis-14.html. Perhaps QGIS and GDAL does not implement the fast route even they do use GEOS. – user30184 Nov 12 '18 at 21:30
  • Try a newer qgis too - there's always improvements to the speed of these routines – ndawson Nov 12 '18 at 23:32
  • I think you should focus your question on the software that you are most likely to use. If you want to ask about another software just ask it as a separate question. As per the [Tour] there should only be one question asked per question. – PolyGeo Nov 13 '18 at 01:27
  • @PolyGeo This question https://gis.stackexchange.com/questions/222976/cleaning-a-large-shapefile-in-order-to-dissolve-features/223008#223008 is quite similar to mine, and also asked for other software and methods to use. I really think asking about a specific dataset and a specific operation is really quite narrow already. The fact that I am open to hearing about more than one software method should not be held against me. – Chouroud Nov 13 '18 at 17:22
  • It's similar but not as broad. That one does not tag four large pieces of software and it was easy to turn one of its two questions into a statement. My issues with your question are the multiple questions at the end and that you have not focused it on the software that you are most likely to use (if you can get past where you are stuck with it). – PolyGeo Nov 13 '18 at 19:12
  • I second what @ndawson recommended, use the latest QGIS. I have version 3.4 and am impressed. The 64bit version can use more memory, also create a Spatial Index under Properties, Source before Dissolve. – klewis Nov 13 '18 at 23:41
  • @klewis You're using the native QGIS dissolve? Or one of the others (SAGA, GRASS)? – Chouroud Nov 14 '18 at 11:54
  • @Chouroud, I use the native Dissolve, testing with Shapefile and geopackage data. – klewis Nov 14 '18 at 14:01
  • Better late than never... Because I had the same problem processing larger geo files, I wrote a python package that speeds this up up, also for dissolve: geofileops – Pieter Jan 12 '24 at 08:28

0 Answers0