79

This seems like a simple enough question, but I can't figure out how to convert a Pandas DataFrame to a GeoDataFrame for a spatial join?

Here is an example of what my data looks like using df.head():

    Date/Time           Lat       Lon       ID
0   4/1/2014 0:11:00    40.7690   -73.9549  140
1   4/1/2014 0:17:00    40.7267   -74.0345  NaN

In fact, this dataframe was created from a CSV so if it's easier to read the CSV in directly as a GeoDataFrame that's fine too.

Taras
  • 32,823
  • 4
  • 66
  • 137
atkat12
  • 1,549
  • 2
  • 13
  • 16

3 Answers3

139

Convert the DataFrame's content (e.g. Lat and Lon columns) into appropriate Shapely geometries first and then use them together with the original DataFrame to create a GeoDataFrame.

from geopandas import GeoDataFrame
from shapely.geometry import Point

geometry = [Point(xy) for xy in zip(df.Lon, df.Lat)] df = df.drop(['Lon', 'Lat'], axis=1) gdf = GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

Result:

    Date/Time           ID      geometry
0   4/1/2014 0:11:00    140     POINT (-73.95489999999999 40.769)
1   4/1/2014 0:17:00    NaN     POINT (-74.03449999999999 40.7267)

Since the geometries often come in the WKT format, I thought I'd include an example for that case as well:

import geopandas as gpd
import shapely.wkt

geometry = df['wktcolumn'].map(shapely.wkt.loads) df = df.drop('wktcolumn', axis=1) gdf = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

Martin Valgur
  • 2,118
  • 1
  • 16
  • 17
  • 1
    Thanks again! That's much simpler and runs very fast - much better than iterating through every row of the df at my n=500,000 :) – atkat12 Dec 16 '15 at 22:42
  • 7
    Gosh, thanks! I check this answer like every 2 days :) – Owen Dec 21 '16 at 16:25
  • 1
    you'd think this would be the first entry in the documentation! – Dominik May 14 '17 at 16:53
  • +1 for the shapely.wkt. It took me a while to figure this out! – StefanK Dec 12 '17 at 15:14
  • 1
    In order to avoid deleting lat/lon columns from the pandas df (in case you need to use it later), I would instead recommend dropping lat/lon in the creation of gdf like so gdf = GeoDataFrame(df.drop(['Lon', 'Lat'], axis=1), crs=crs, geometry=geometry) – Gene Burinsky May 27 '20 at 19:43
56

Update 201912: The official documentation at https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html does it succinctly using geopandas.points_from_xy like so:

gdf = geopandas.GeoDataFrame(
    df, geometry=geopandas.points_from_xy(x=df.Longitude, y=df.Latitude)
)

You can also set a crs or z (e.g. elevation) value if you want.


Old Method: Using shapely

One-liners! Plus some performance pointers for big-data people.

Given a pandas.DataFrame that has x Longitude and y Latitude like so:

df.head()
x   y
0   229.617902  -73.133816
1   229.611157  -73.141299
2   229.609825  -73.142795
3   229.607159  -73.145782
4   229.605825  -73.147274

Let's convert the pandas.DataFrame into a geopandas.GeoDataFrame as follows:

Library imports and shapely speedups:

import geopandas as gpd
import shapely
shapely.speedups.enable() # enabled by default from version 1.6.0

Code + benchmark times on a test dataset I have lying around:

#Martin's original version:
#%timeit 1.87 s ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
                                crs={'init': 'epsg:4326'},
                                geometry=[shapely.geometry.Point(xy) for xy in zip(df.x, df.y)])



#Pandas apply method
#%timeit 8.59 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
                       crs={'init': 'epsg:4326'},
                       geometry=df.apply(lambda row: shapely.geometry.Point((row.x, row.y)), axis=1))

Using pandas.apply is surprisingly slower, but may be a better fit for some other workflows (e.g. on bigger datasets using dask library):

Credits to:

Some Work-In-Progress references (as of 2017) for handling big dask datasets:

weiji14
  • 1,751
  • 13
  • 31
0

Here's a function taken from the internals of geopandas and slightly modified to handle a dataframe with a geometry/polygon column already in wkt format.

from geopandas import GeoDataFrame
import shapely

def df_to_geodf(df, geom_col="geom", crs=None, wkt=True): """ Transforms a pandas DataFrame into a GeoDataFrame. The column 'geom_col' must be a geometry column in WKB representation. To be used to convert df based on pd.read_sql to gdf. Parameters


df : DataFrame pandas DataFrame with geometry column in WKB representation. geom_col : string, default 'geom' column name to convert to shapely geometries crs : pyproj.CRS, optional CRS to use for the returned GeoDataFrame. The value can be anything accepted by :meth:pyproj.CRS.from_user_input() <pyproj.crs.CRS.from_user_input>, such as an authority string (eg "EPSG:4326") or a WKT string. If not set, tries to determine CRS from the SRID associated with the first geometry in the database, and assigns that to all geometries. Returns


GeoDataFrame """

if geom_col not in df: raise ValueError("Query missing geometry column '{}'".format(geom_col))

geoms = df[geom_col].dropna()

if not geoms.empty: if wkt == True: load_geom = shapely.wkt.loads else: load_geom_bytes = shapely.wkb.loads """Load from Python 3 binary."""

  def load_geom_buffer(x):
    """Load from Python 2 binary."""
    return shapely.wkb.loads(str(x))

  def load_geom_text(x):
    """Load from binary encoded as text."""
    return shapely.wkb.loads(str(x), hex=True)

  if isinstance(geoms.iat[0], bytes):
    load_geom = load_geom_bytes
  else:
    load_geom = load_geom_text

df[geom_col] = geoms = geoms.apply(load_geom)
if crs is None:
  srid = shapely.geos.lgeos.GEOSGetSRID(geoms.iat[0]._geom)
  # if no defined SRID in geodatabase, returns SRID of 0
  if srid != 0:
    crs = "epsg:{}".format(srid)

return GeoDataFrame(df, crs=crs, geometry=geom_col)

user3496060
  • 183
  • 3