9

I have a dataframe of X,Y coordinates that represent points along the paths taken by several different entities. Pseudo-data here, but it is roughly of the form:

entity_id   lat   lon   time

1001        34.5  14.2  4:55 pm
1001        34.7  14.5  4:58 pm
1001        35.0  14.6  5.03 pm

1002        27.1  19.2  2:01 pm
1002        27.4  19.3  2:08 pm
1002        27.4  19.9  2:09 pm

What I would like to do is group these points by entity_id, and then arrange the points sequentially in time to create a LineString object for each entity_id. The output will be several lines/paths, with each corresponding to an entity_id.

I can do this by looping through each entity_id and each point in entity_id and using the instructions provided here, but is there a faster/more efficient way to do this leveraging GeoPandas or Shapely, perhaps with groupby?

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
atkat12
  • 1,549
  • 2
  • 13
  • 16

1 Answers1

20

I think I found an interim solution, which I'm posting in case it's useful for anyone:

import pandas as pd
import numpy as np
from geopandas import GeoDataFrame
from shapely.geometry import Point, LineString

# Zip the coordinates into a point object and convert to a GeoDataFrame
geometry = [Point(xy) for xy in zip(df.lon, df.lat)]
df = GeoDataFrame(df, geometry=geometry)

# Aggregate these points with the GroupBy
df = df.groupby(['entity_id'])['geometry'].apply(lambda x: LineString(x.tolist()))
df = GeoDataFrame(df, geometry='geometry')

Note that if you have single-point trajectories in your data, you will have to discard these first or LineString will throw an error.

This and this post were helpful in writing the GroupBy function.


Update: If you didn't discard the single point, you can also use the conditional sentence like:

 df = df.groupby(['entity_id'])['geometry'].apply(lambda x: LineString(x.tolist()) if x.size > 1 else x.tolist())
atkat12
  • 1,549
  • 2
  • 13
  • 16
  • that is amazing! – Ufos Aug 30 '18 at 17:04
  • 2
    In case you only want the grouped dataset and to keep ID as a column, this should help: df.groupby('entity_id', as_index=False).agg({'geometry': lambda x: ...}) – Ufos Aug 30 '18 at 17:25
  • You can actually condense the first two commands (zip and gdf creation) into one: df = gpd.GeoDataFrame(df, geometry = gpd.points_from_xy(df.x, df.y)) – Felipe D. Sep 02 '20 at 21:31