5

I have loaded a .csv data into a pandas DataFrame and want to create a column named 'geometry' which will be made up of the shapely points from the lat and lon as given in the data. The data is below

the .csv data loaded; made up of lat, lon, timestamp and userid.

I therefore would want to create a Shapely point on each row, based on columns 'lon' and 'lat' and zip lon and lat columns and create the points using a for-loop (loop over the zipped object), OR use the apply method to apply the shapely Point constructor on each row.

This is what I tried:

import pandas as pd

from shapely.geometry import Point

fp = 'C:/Users/pku/Desktop/data/lat_lon.csv'

data = pd.read_csv(fp)

data.head()

dataframe = pd.DataFrame()

datafram['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)]

print(data['geometry'].head())

I had a bunch of errors...the last one being.

the end of the error

Fezter
  • 21,867
  • 11
  • 68
  • 123
PKU
  • 53
  • 1
  • 1
  • 5

1 Answers1

7

When you read a csv file with Pandas the result is a Pandas DataFrame, therefore why create a new DataFrame: dataframe = pd.DataFrame() ?

data = pd.read_csv("lat_long.csv")
list(data.columns)
['lat', 'lon']
type(data)
<class 'pandas.core.frame.DataFrame'>
data.head()
     lat       lon
0  41.389474  2.156421
1  41.383093  2.181116
2  41.373258  2.159358
3  41.385252  2.168779
4  41.390692  2.148911

Now

You can use List Comprehension

data['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)] 
data.head()
     lat       lon                    geometry
0  41.389474  2.156421  POINT (2.156421 41.389474)
1  41.383093  2.181116  POINT (2.181116 41.383093)
2  41.373258  2.159358  POINT (2.159358 41.373258)
3  41.385252  2.168779  POINT (2.168779 41.385252)
4  41.390692  2.148911  POINT (2.148911 41.390692)

You can use the apply command

data['geometry2'] = data.apply(lambda row:Point(row['lon'],row['lat']), axis=1)
data.head()
     lat       lon                    geometry                   geometry2
0  41.389474  2.156421  POINT (2.156421 41.389474)  POINT (2.156421 41.389474)
1  41.383093  2.181116  POINT (2.181116 41.383093)  POINT (2.181116 41.383093)
2  41.373258  2.159358  POINT (2.159358 41.373258)  POINT (2.159358 41.373258)
3  41.385252  2.168779  POINT (2.168779 41.385252)  POINT (2.168779 41.385252)
4  41.390692  2.148911  POINT (2.148911 41.390692)  POINT (2.148911 41.390692)

You can also use GeoPandas (From CSV to GeoDataFrame in two lines)

import geopandas as gpd
data = pd.read_csv("lat_long.csv")
gdf = gpd.GeoDataFrame(data, geometry=gpd.points_from_xy(data.lon, data.lat) 
gdf.head()
     lat       lon                  geometry
0  41.389474  2.156421  POINT (2.15642 41.38947)
1  41.383093  2.181116  POINT (2.18112 41.38309)
2  41.373258  2.159358  POINT (2.15936 41.37326)
3  41.385252  2.168779  POINT (2.16878 41.38525)
4  41.390692  2.148911  POINT (2.14891 41.39069)
BERA
  • 72,339
  • 13
  • 72
  • 161
gene
  • 54,868
  • 3
  • 110
  • 187
  • Perfect! It worked fine. Happy to be on this platform. Thanks, families. – PKU Jul 17 '21 at 10:29
  • If this answer is acceptable to you, you must close the question by accepting the answer please. – gene Jul 17 '21 at 17:27