Create a column 'geometry' of points with longitude and latitude data given in a pandas DataFrame

Question

I have loaded a .csv data into a pandas DataFrame and want to create a column named 'geometry' which will be made up of the shapely points from the lat and lon as given in the data. The data is below

I therefore would want to create a Shapely point on each row, based on columns 'lon' and 'lat' and zip lon and lat columns and create the points using a for-loop (loop over the zipped object), OR use the apply method to apply the shapely Point constructor on each row.

This is what I tried:

import pandas as pd
from shapely.geometry import Point
fp = 'C:/Users/pku/Desktop/data/lat_lon.csv'
data = pd.read_csv(fp)
data.head()
dataframe = pd.DataFrame()
datafram['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)]
print(data['geometry'].head())

I had a bunch of errors...the last one being.

You set datafram['geometry'] not data['geometry'] – user2856 Jul 17 '21 at 06:13 — user2856, Jul 17 '21 at 06:13

score 7 · Accepted Answer · edited Jul 17 '21 at 08:56

When you read a csv file with Pandas the result is a Pandas DataFrame, therefore why create a new DataFrame: dataframe = pd.DataFrame() ?

data = pd.read_csv("lat_long.csv")
list(data.columns)
['lat', 'lon']
type(data)
<class 'pandas.core.frame.DataFrame'>
data.head()
     lat       lon
0  41.389474  2.156421
1  41.383093  2.181116
2  41.373258  2.159358
3  41.385252  2.168779
4  41.390692  2.148911

Now

You can use List Comprehension

data['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)] 
data.head()
     lat       lon                    geometry
0  41.389474  2.156421  POINT (2.156421 41.389474)
1  41.383093  2.181116  POINT (2.181116 41.383093)
2  41.373258  2.159358  POINT (2.159358 41.373258)
3  41.385252  2.168779  POINT (2.168779 41.385252)
4  41.390692  2.148911  POINT (2.148911 41.390692)

You can use the apply command

data['geometry2'] = data.apply(lambda row:Point(row['lon'],row['lat']), axis=1)
data.head()
     lat       lon                    geometry                   geometry2
0  41.389474  2.156421  POINT (2.156421 41.389474)  POINT (2.156421 41.389474)
1  41.383093  2.181116  POINT (2.181116 41.383093)  POINT (2.181116 41.383093)
2  41.373258  2.159358  POINT (2.159358 41.373258)  POINT (2.159358 41.373258)
3  41.385252  2.168779  POINT (2.168779 41.385252)  POINT (2.168779 41.385252)
4  41.390692  2.148911  POINT (2.148911 41.390692)  POINT (2.148911 41.390692)

You can also use GeoPandas (From CSV to GeoDataFrame in two lines)

import geopandas as gpd
data = pd.read_csv("lat_long.csv")
gdf = gpd.GeoDataFrame(data, geometry=gpd.points_from_xy(data.lon, data.lat) 
gdf.head()
     lat       lon                  geometry
0  41.389474  2.156421  POINT (2.15642 41.38947)
1  41.383093  2.181116  POINT (2.18112 41.38309)
2  41.373258  2.159358  POINT (2.15936 41.37326)
3  41.385252  2.168779  POINT (2.16878 41.38525)
4  41.390692  2.148911  POINT (2.14891 41.39069)

Perfect! It worked fine. Happy to be on this platform. Thanks, families. — PKU, Jul 17 '21 at 10:29
If this answer is acceptable to you, you must close the question by accepting the answer please. — gene, Jul 17 '21 at 17:27

Create a column 'geometry' of points with longitude and latitude data given in a pandas DataFrame

1 Answers1