3

I am trying to estimate summary statistics of points located inside a polygon. Each points in the point layer is associated with N attributes. My target is to summarize (say mean, min,..) the attributes of points located within a polygon and populate the attribute fields of the corresponding polygon.

I am looking for a solution using GeoPandas or other Python libraries.

import geopandas as gpd  

gdf_points = gpd.read_file('/path_to_points.json')
gdf_polygon = gpd.read_file('/path_to_polygons.json')

dfsjoin = gpd.sjoin(gdf_polygon ,gdf_points)

Now, how can I summarize the stats for each attribute in the point layer and add it to the polygon shapefile? Which function can I use?

I am looking for something that is functionally equivalent to ESRI ArcGIS SpatialJoin_analysis with fieldmappings

PPR
  • 501
  • 1
  • 6
  • 18

1 Answers1

6

enter image description here

First of all it is necessary to determine the points that are contained in the polygons and which points in which polygons

points = gpd.read_file("points.shp")
points.head()
   id  value1 value2    geometry
0   1   300   300003    POINT (19.579 -18.625)
1   2   400   400003    POINT (80.639 -114.895)
2   3   500   500003    POINT (98.021 -70.326)
3   4   100   100003    POINT (118.522 -100.187)
4   5   200   200003    POINT (186.713 -35.562)
polys = gpd.read_file("polys.shp")
polys
   id     geometry
0   1   POLYGON ((51.223 -134.951, 50.777 -74.337, 106...
1   2   POLYGON ((223.706 -134.506, 228.163 -68.543, 3...
2   3   POLYGON ((151.058 -185.315, 167.994 -167.487, ...

Use a spatial join (as in More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc for example)

from geopandas.tools import sjoin
points_polys = gpd.sjoin(points, polys, how="left")
points_polys.head()
 id_left value1 value2      geometry         index_right  id_right
0   1     300   300003  POINT (19.579 -18.625)  NaN        NaN
1   2     400   400003  POINT (80.639 -114.895) 0.0        1.0
2   3     500   500003  POINT (98.021 -70.326)  0.0        1.0
3   4     100   100003  POINT (118.522 -100.187)0.0        1.0
4   5     200   200003  POINT (186.713 -35.562) NaN        NaN

The points id 1,2,3 are contained in the polygon 1 (id_right), etc...
Control of the number of points contained in the polygons

print(points_polys.loc[points_polys.id_right == 1,'value1'].count())
3
print(points_polys.loc[points_polys.id_right == 2,'value1'].count())
2
print(points_polys.loc[points_polys.id_right == 3,'value1'].count())
6

To summarize the stats for each attribute in the point layer and add it to the polygon layer, group the points_polys by the id_right column (= polygons) and compute the mean, standard deviation, max and min of the attributes of each group of points (Naming returned columns in Pandas aggregate function)

stats_pt = points_polys.groupby('id_right')['value1','value2'].agg(['mean','std','max','min'])
stats_pt.columns = ["_".join(x) for x in result.columns.ravel()] # 
stats_pt 

        value1_mean value1_std value1_max value1_min value2_mean    value2_std    value2_max value2_min
id_right                                
1.0     333.333333  208.166600   500       100      333336.333333   208166.599947   500003    100003
2.0     735.000000   91.923882   800       670      735003.000000   91923.881554    800003    670003
3.0     36.333333    19.459359   60          7      36336.333333    19459.359359    60003       7003

It is also possible to use Named aggregations (Pandas in 2019 - let's see what's new!)

stats_pt  = points_polys.groupby('id_right').agg( 
       value1_mean = ('value1','mean'),
       value1_std  = ('value1','std'),
       value1_max  = ('value1','max'),
       value1_min  = ('value1','min'),
       value2_mean = ('value2','mean'),
       value2_std  = ('value2','std'),
       value2_max  = ('value2','max'),
       value1_min  = ('value2','min'))

Finally join this DataFrame to the polygon GeoDataFrame and save the resulting layer

import pandas as pd
result = pd.merge(polys, stats_pt , left_on='id',right_index=True,how='outer')
result
   id                   geometry               value1_mean  value1_std  value1_max  value1_min  value2_mean   value2_std      value2_max value2_min
 0  1   POLYGON ((51.223 -134.951, 50.77...     333.333333  208.166600    500         100       333336.333333   208166.599947 500003    100003
 1  2   POLYGON ((223.706 -134.506, 228.16...   735.000000  91.923882     800         670       735003.000000   91923.881554  800003    670003
 2  3   POLYGON ((151.058 -185.315, 167.99...   36.333333   19.459359      60           7       36336.333333    19459.359359  60003       7003


 result.to_file("stat_point_poly.shp")

With value1_std as label:

enter image description here

gene
  • 54,868
  • 3
  • 110
  • 187