I'm trying to create a seamless vector dataset from an integer raster. "Seamless" as in: not gaps between features and no overlapping features.
The tools gdal_polygonize, pkpolygonize, terra::as.polygons (in R) all do pretty much exactly that, but they have a major flaw: They trace the raster cells precisely, resulting in overly complex features as soon as they include diagonals, curves etc.
Cleaning these features in a post processing step (eg. ogr2ogr -simplify leads to topological errors (overlaps and gaps).
I'm very surprised that solving this turns out to be so hard. I imagine that this is a common problem in GIS workflows, e.g. the CORINE Landcover dataset is based on satellite imagery, but they also produce a vector data product with the attributes I desire: no gaps, no overlaps and simplified. How do they generate this vector data? Regrettably, I could not find information on this.
This answer Extract polygons from an image talks about using scikit-image for this. Would this be the way to go?
I'm looking for ways to solve this using python, R or the command line. The output should be something gdal compatible, preferably .gpkg or .shp.
The data I'm using can be downloaded here (4kb):





gdal_contour, but this approach is for continuous variables and cannot handle skipping classes. My data is huge and I really don't want to create topological errors in the first place. – Ratnanil Feb 17 '22 at 08:38