4

I have a csv with two sets of latitude and longitude points, and a shapefile with espg 4326. I want to find which points are inside the shapefile. I've been following the guide here which uses a spatial index to speed up the search however I'm running into an error.

The csv looks like:

e6ed541a288f13745d2755a79372c0c4,fc707fee61baa13b40278a70c679b9a7,3fe94a002317b5f9259f82690aeea4cd,1,2016-11-01T02:03:18.000Z,2,2016-11-01T03:01:13.000Z,2,51.456,0.146,51.506,-0.125,false,false,EE,2,3,2,,,,
36e6a9d201be3957fb78be4dfaacb932,d3cbd057ca3bf69f6805d054b4a228c5,52720e003547c70561bf5e03b95aa99f,1,2016-11-01T16:21:45.467Z,2,2016-11-01T16:43:01.627Z,2,51.499,-0.008,51.506,-0.101,false,false,EE,1,1,1,,,,
f15a4077cc79217e620270a24c127ed4,581c63e4232ae5208f3ae23a09fff3ea,13f9896df61279c928f19721878fac41,1,2016-11-01T20:31:36.000Z,2,2016-11-01T21:21:10.000Z,2,51.529,-0.124,51.512,-0.092,false,false,EI,2,2,1,,,,
bcd71b06e56e2e023fa055a40ee03f20,0ed657e42b4d820eed2affd146067623,3fe94a002317b5f9259f82690aeea4cd,1,2016-11-01T22:58:42.000Z,2,2016-11-01T23:27:24.000Z,2,51.503,-0.113,51.505,-0.086,false,false,EE,2,3,2,,,,

My code so far is:

trip = pd.read_csv(r'test\TripRecordsReporttrips.csv', sep=',',header=None, usecols=[0, 4, 8, 9, 10, 11],names=['TripID', 'Date', 'StartLat', 'StartLon','EndLat','EndLon'])
geometry = [Point(xy) for xy in zip(trip['StartLon'], trip['StartLat'])]
geometry2 = [Point(xy) for xy in zip(trip['EndLon'], trip['EndLat'])]
trip = trip.drop(['StartLon', 'StartLat','EndLon','EndLat'], axis=1)
starts = GeoDataFrame(trip, crs=crs, geometry=geometry) #turn dataframe into geodataframe
ends = GeoDataFrame(trip, crs=crs, geometry=geometry2)
FRC1  = geopandas.GeoDataFrame.from_file('FRC1Shapefile.shp') # import FRC1 polygon

spatial_index = starts.sindex
possible_matches_index = list(spatial_index.intersection(FRC1.bounds))
possible_matches = starts.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(FRC1)]

the error I'm getting is from RTree:

RTreeError: Coordinates must be in the form (minx, miny, maxx, maxy) or (x, y) for 2D indexes

EDIT: Here's an image of the points sjoin is selecting. enter image description here

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
Joshua Kidd
  • 565
  • 1
  • 4
  • 15

1 Answers1

3

There are easier ways, look at More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc or Check if a point falls within a multipolygon with Python.

1) Use directly a GeoPandas spatial-join with the command sjoin which incorporates a spatial index (rtree, line 29 of sjoin.py)

import pandas as pd
trip = pd.read_csv('TripRecordsReporttrips.csv', sep=',',header=None, usecols=[0, 4, 8, 9, 10, 11],names=['TripID', 'Date', 'StartLat', 'StartLon','EndLat','EndLon'])
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip(trip['StartLon'], trip['StartLat'])]
geometry2 = [Point(xy) for xy in zip(trip['EndLon'], trip['EndLat'])]
trip = trip.drop(['StartLon', 'StartLat','EndLon','EndLat'], axis=1)
crs = {'init' :'epsg:4326'}
import geopandas as gp
starts = gp.GeoDataFrame(trip, crs=crs, geometry=geometry)
ends  = gp.GeoDataFrame(trip, crs=crs, geometry=geometry2)
starts.head()

enter image description here

 FRC1  = gp.read_file('polygons.shp')

enter image description here

2) Now use sjoin

from geopandas.tools import sjoin
pointInPolys = sjoin(starts, FRC1, how='left',op="within")

enter image description here

The point 0 is within polygon 2, the points 2, 3 are within polygon 1, the point 1 is not in a polygon

Eliminate the Nan values: point(s) is/are not in a polygon

pointInPolys = pointInPolys.dropna()
pointInPolys 

enter image description here

3) Conclusion

These 3 points are inside the shapefile

gene
  • 54,868
  • 3
  • 110
  • 187
  • I'm having an issue using your code, not all of the points within the polygon are selected. I've added a picture to my original question. Any ideas about what might be causing the issue? – Joshua Kidd Mar 03 '17 at 13:07
  • Nevermind, It was an issue with the end lat and lon, I've fixed it now – Joshua Kidd Mar 03 '17 at 15:31