11

Shapefiles can mix simple POLYGONs and MULTIPOLYGONs in the same data source. Spatial databases like PostGIS and SpatiaLite are strict, and will not put a POLYGON in a MULTIPOLYGON geometry column.

I've gotten used to using ST_Multi to fix this issue. But now I am trying to use GeoPandas to do some file processing, including converting from shapefile to GeoPackage (with a bunch of stuff in the middle), and I am running into this error:

gdf.to_file("garbage.gpkg", "GPKG")
ValueError: Record's geometry type does not match collection schema's 
   geometry type: 'MultiPolygon' != 'Polygon'

Is there a GeoPandas equivalent to ST_Multi that I can use to fix the geometry before saving to the GeoPackage or SpatiaLite format?

BERA
  • 72,339
  • 13
  • 72
  • 161
Lee Hachadoorian
  • 4,605
  • 1
  • 17
  • 43

2 Answers2

21

TL;DR

Given a geopandas data frame gdf with mixed Polygons and MultiPolygons, the geometry column can be converted to Multi with the following:

from shapely.geometry.polygon import Polygon
from shapely.geometry.multipolygon import MultiPolygon

gdf["geometry"] = [MultiPolygon([feature]) if isinstance(feature, Polygon)
else feature for feature in gdf["geometry"]]

More info:

There doesn't appear to be an equivalent of PostGIS's ST_Multi that can accept a Polygon or a MultiPolygon, and casts to Multi while not harming geometries that are already Multi.

The problem with casting to Multi without checking the input type

The MultiPolygon contructor requires a list of polygons. If the feature is a Polygon, the feature has to be wrapped in list brackets: MultiPolygon([feature]). Otherwise, MultiPolygon(feature) throws the following error:

TypeError: 'Polygon' object is not iterable

If the feature is already a MultiPolygon, MultiPolygon(feature) is harmless, but MultiPolygon([feature]) will extract only one polygon part from the multipart feature, and drop all others.

The solution

Hence, the type must be determined first, and MultiPolygon only applied to non-Multi features (e.g., simple Polygons). The list comprehension above:

  1. Extracts each feature with for feature in gdf["geometry"].
  2. Checks if it is a Polygon with if isinstance(feature, Polygon).
  3. Passes Polygons only to the MultiPolygon constructor with MultiPolygon([feature]).
  4. Returns the feature untouched with else feature for features which are already MultiPolygons.
  5. Assigns back to the geometry column with gdf["geometry"] =
Lee Hachadoorian
  • 4,605
  • 1
  • 17
  • 43
0

I've made an extension to GeoSeries, using Pandas api extension, that works similar to ST_Multi in PostGIS. The extension is not really needed for this problem, but it is convenient if you want to add even more functionallity to GeoSeries.

import pandas
import shapely.geometry
from geopandas import GeoSeries

Imports for the examples

from shapely.geometry import Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, GeometryCollection

Adding a geoaccessor

For more info: https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.register_series_accessor.html

@pandas.api.extensions.register_series_accessor("geom") class GeoAccessor: def init(self, series: GeoSeries): self._s = series

def make_multi(self, inplace: bool = False):
    geoms = []
    for i, s in self._s.items():
        # Getting the geometry type
        tp = s.geom_type
        if not tp[0:5] == 'Multi' and tp != 'GeometryCollection':
            # Getting the shapely Multi* function for the geometry type
            func = getattr(shapely.geometry, 'Multi' + tp)
            geoms.append(func([s]))
        else:
            geoms.append(s)
    r = GeoSeries(geoms)

    if inplace:
        self._s.iloc[:] = r
    else:
        return r


EXAMPLES

Points:

points = [Point([1, 2]), MultiPoint([[1, 2], [1, 1]])] gs = GeoSeries(points) gs.geom.make_multi(inplace=True) # Demonstrating inplace print(gs)

Lines:

lines = [LineString([[1, 2], [2, 3]]), MultiLineString([[[1, 2], [2, 3]], [[2, 3], [3, 4]]])] gs = GeoSeries(lines) print(gs.geom.make_multi())

Polygons:

polygon = [Polygon([[0, 0], [1, 0], [1, 1], [0, 1], [0, 0]]), MultiPolygon([ [ [[0, 0], [1, 0], [1, 1], [0, 1], [0, 0]], [[[2, 2], [3, 2], [3, 3], [2, 3], [2, 2]]] ] ])] gs = GeoSeries(polygon) print(gs.geom.make_multi())

GeometryCollections:

gc = [GeometryCollection([points[0], lines[0], polygon[1]]), GeometryCollection([points[1], lines[1], polygon[0]])] gs = GeoSeries(gc) print(gs.geom.make_multi())

Mix:

gs = GeoSeries([points[0], lines[0], polygon[1]]) print(gs.geom.make_multi())