Converting Pandas DataFrame to GeoDataFrame

Question

This seems like a simple enough question, but I can't figure out how to convert a Pandas DataFrame to a GeoDataFrame for a spatial join?

Here is an example of what my data looks like using df.head():

    Date/Time           Lat       Lon       ID
0   4/1/2014 0:11:00    40.7690   -73.9549  140
1   4/1/2014 0:17:00    40.7267   -74.0345  NaN

In fact, this dataframe was created from a CSV so if it's easier to read the CSV in directly as a GeoDataFrame that's fine too.

Martin Valgur · Accepted Answer · 2020-06-26T19:52:48.660

139

Convert the DataFrame's content (e.g. Lat and Lon columns) into appropriate Shapely geometries first and then use them together with the original DataFrame to create a GeoDataFrame.

from geopandas import GeoDataFrame
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip(df.Lon, df.Lat)]
df = df.drop(['Lon', 'Lat'], axis=1)
gdf = GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

Result:

    Date/Time           ID      geometry
0   4/1/2014 0:11:00    140     POINT (-73.95489999999999 40.769)
1   4/1/2014 0:17:00    NaN     POINT (-74.03449999999999 40.7267)

Since the geometries often come in the WKT format, I thought I'd include an example for that case as well:

import geopandas as gpd
import shapely.wkt
geometry = df['wktcolumn'].map(shapely.wkt.loads)
df = df.drop('wktcolumn', axis=1)
gdf = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

edited Jun 26 '20 at 19:52

answered Dec 16 '15 at 21:39

Martin Valgur

2,118
1
16
17

1

Thanks again! That's much simpler and runs very fast - much better than iterating through every row of the df at my n=500,000 :) – atkat12 Dec 16 '15 at 22:42
7

Gosh, thanks! I check this answer like every 2 days :) – Owen Dec 21 '16 at 16:25
1

you'd think this would be the first entry in the documentation! – Dominik May 14 '17 at 16:53
+1 for the shapely.wkt. It took me a while to figure this out! – StefanK Dec 12 '17 at 15:14
1

In order to avoid deleting lat/lon columns from the pandas df (in case you need to use it later), I would instead recommend dropping lat/lon in the creation of gdf like so gdf = GeoDataFrame(df.drop(['Lon', 'Lat'], axis=1), crs=crs, geometry=geometry) – Gene Burinsky May 27 '20 at 19:43

weiji14 · Answer 2 · 2019-12-29T22:53:16.377

Update 201912: The official documentation at https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html does it succinctly using geopandas.points_from_xy like so:

gdf = geopandas.GeoDataFrame(
    df, geometry=geopandas.points_from_xy(x=df.Longitude, y=df.Latitude)
)

You can also set a crs or z (e.g. elevation) value if you want.

Old Method: Using shapely

One-liners! Plus some performance pointers for big-data people.

Given a pandas.DataFrame that has x Longitude and y Latitude like so:

df.head()
x   y
0   229.617902  -73.133816
1   229.611157  -73.141299
2   229.609825  -73.142795
3   229.607159  -73.145782
4   229.605825  -73.147274

Let's convert the pandas.DataFrame into a geopandas.GeoDataFrame as follows:

Library imports and shapely speedups:

import geopandas as gpd
import shapely
shapely.speedups.enable() # enabled by default from version 1.6.0

Code + benchmark times on a test dataset I have lying around:

#Martin's original version:
#%timeit 1.87 s ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
                                crs={'init': 'epsg:4326'},
                                geometry=[shapely.geometry.Point(xy) for xy in zip(df.x, df.y)])



#Pandas apply method
#%timeit 8.59 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
                       crs={'init': 'epsg:4326'},
                       geometry=df.apply(lambda row: shapely.geometry.Point((row.x, row.y)), axis=1))

Using pandas.apply is surprisingly slower, but may be a better fit for some other workflows (e.g. on bigger datasets using dask library):

Credits to:

Making shapefile from Pandas dataframe? (for the pandas apply method)
Speed up row-wise point in polygon with Geopandas (for the speedup hint)

Some Work-In-Progress references (as of 2017) for handling big dask datasets:

Thanks for the comparison, indeed the zip version is way faster — MCMZL, Mar 27 '19 at 10:58

score 0 · Answer 3 · answered Feb 26 '22 at 17:20

Here's a function taken from the internals of geopandas and slightly modified to handle a dataframe with a geometry/polygon column already in wkt format.

from geopandas import GeoDataFrame
import shapely
def df_to_geodf(df, geom_col="geom", crs=None, wkt=True):
  """
  Transforms a pandas DataFrame into a GeoDataFrame.
  The column 'geom_col' must be a geometry column in WKB representation.
  To be used to convert df based on pd.read_sql to gdf.
  Parameters

df : DataFrame
      pandas DataFrame with geometry column in WKB representation.
  geom_col : string, default 'geom'
      column name to convert to shapely geometries
  crs : pyproj.CRS, optional
      CRS to use for the returned GeoDataFrame. The value can be anything accepted
      by :meth:pyproj.CRS.from_user_input() &lt;pyproj.crs.CRS.from_user_input&gt;,
      such as an authority string (eg "EPSG:4326") or a WKT string.
      If not set, tries to determine CRS from the SRID associated with the
      first geometry in the database, and assigns that to all geometries.
  Returns

GeoDataFrame
  """
if geom_col not in df:
    raise ValueError("Query missing geometry column '{}'".format(geom_col))
geoms = df[geom_col].dropna()
if not geoms.empty:
    if wkt == True:
      load_geom = shapely.wkt.loads
    else:
      load_geom_bytes = shapely.wkb.loads
      """Load from Python 3 binary."""
  def load_geom_buffer(x):
    &quot;&quot;&quot;Load from Python 2 binary.&quot;&quot;&quot;
    return shapely.wkb.loads(str(x))

  def load_geom_text(x):
    &quot;&quot;&quot;Load from binary encoded as text.&quot;&quot;&quot;
    return shapely.wkb.loads(str(x), hex=True)

  if isinstance(geoms.iat[0], bytes):
    load_geom = load_geom_bytes
  else:
    load_geom = load_geom_text

df[geom_col] = geoms = geoms.apply(load_geom)
if crs is None:
  srid = shapely.geos.lgeos.GEOSGetSRID(geoms.iat[0]._geom)
  # if no defined SRID in geodatabase, returns SRID of 0
  if srid != 0:
    crs = &quot;epsg:{}&quot;.format(srid)


return GeoDataFrame(df, crs=crs, geometry=geom_col)

Converting Pandas DataFrame to GeoDataFrame

3 Answers3

Linked