1

GeoPandas distance calculations are non-intuitive: they only work if you convert your data to the right CRS.

Calculating distance over short distances has been explained here:

import geopandas as gpd
from shapely.geometry import Point
geom=[Point(xy) for xy in zip([117.454361,117.459880],[38.8459879,38.846255])]
gdf=gpd.GeoDataFrame(geometry=geom,crs={'init':'epsg:4326'})
gdf.to_crs(epsg=3310,inplace=True)
gdf.distance(gdf.shift())

The result is 479.450134meters.

However, calculating distance over different continents is not so easy. The available UTM systems are invariably local. Consider the following example calculating the difference between a point in Japan and a point in Russia:

geom=[Point(xy) for xy in zip([137.72000, 36.240000],[53.05199, 63.349449])]
gdf=gpd.GeoDataFrame(geometry=geom,crs={'init':'epsg:4326'})
gdf.to_crs(epsg=3310,inplace=True)
gdf.distance(gdf.shift())

This gives a distance of 1.202962e+07 meters, which is a bit far.

This user suggested using EPSG 32663. Unfortunately it doesn't help:

geom=[Point(xy) for xy in zip([137.72000, 36.240000],[53.05199, 63.349449])]
gdf=gpd.GeoDataFrame(geometry=geom,crs={'init':'epsg:4326'})
gdf.to_crs(epsg=32663,inplace=True)
gdf.distance(gdf.shift())

The result is 1.135471e+07 meters, still almost twice as far as the correct distance of about 6200km.

What is the correct way to do this calculation?

Vince
  • 20,017
  • 15
  • 45
  • 64
moblie
  • 113
  • 3
  • You could use https://geopy.readthedocs.io/en/latest/#module-geopy.distance. – TomazicM Apr 16 '23 at 18:03
  • I'm processing large amounts of data, so I need something which is extremely fast. GeoPandas vectorized operations fit the bill. Am I going to get the same level of performane with Geopy? – moblie Apr 18 '23 at 03:57

1 Answers1

1

GeoPandas only performs planimetric calculations, but it seems like you want to calculate the ellipsoidal distance between two points. So the correct way is probably to use a geodesic library instead.

However, there are projections that preserve the distances in one direction, such is the case of the Azimuthal equidistant projection. Centered on a pole, it preserves the distances in the direction of the meridians. In general, it preserves distances in the direction of vertical circles passing through the center of projection.

Therefore, you can define the CRS from a PROJ string (legacy, but useful in this case) that builds an azimuthal equidistant projection centered on one of the points (https://proj.org/operations/projections/aeqd.html):

import geopandas as gpd
from shapely.geometry import Point

geom = [Point(xy) for xy in zip([137.72000, 36.240000], [53.05199, 63.349449])] gdf = gpd.GeoDataFrame(geometry=geom, crs='EPSG:4326') gdf.to_crs( crs='+proj=aeqd +lat_0=53.05199 +lon_0=137.72000 +datum=WGS84', inplace=True ) gdf.distance(gdf.shift())

Gabriel De Luca
  • 14,289
  • 3
  • 20
  • 51
  • That's pretty good! But you're mixing latitude from one point with longitude of the other -- is that right? – moblie Apr 18 '23 at 06:07
  • Hi, [137.72000, 36.240000] are longitudes for first and second point, [53.05199, 63.349449] are latitudes for first and second point. That's how the list comprehension work with the iterator created by zip, and the always xy feature of geopandas. I may be wrong, but please check it. – Gabriel De Luca Apr 18 '23 at 06:22
  • My error, no coffee yet. – moblie Apr 18 '23 at 06:37
  • Would also be interested in a proper geodesic solution that is capable of very high processing speed (similar to Geopandas .distance()) – moblie Apr 18 '23 at 06:40
  • All implementations of Charles Karney algorithm: his own GeographicLib, of course, but also pyproj and GeoPy use the same algorithm. Any of them should be faster than projecting and using pythagoras, because projecting may not be that fast. Likewise, your question is explicit about doing it with GeoPandas. – Gabriel De Luca Apr 18 '23 at 06:52
  • Also, there should be a little difference in distance. The geodesic is the shortest on the ellipsoid. The radial, straight line of the equidistant azimuthal projection I'm not sure would travel the shortest path using an ellipsoidal datum. – Gabriel De Luca Apr 18 '23 at 06:56
  • the algorithm is not the issue: the key thing is that the implementation runs in compiled C code (fast) instead of pure python (slow). Geopandas is compiled (like Pandas). Geopy is (as far as I know) pure Python and not suitable for processing large amounts of data.

    There may be a way to, for example, vectorize the Karney algorithm and do it in GeoPandas. But I am open to doing it another way.

    – moblie Apr 18 '23 at 08:44
  • With GeoPandas you must build the points through Shapely (via GEOS), create the dataframe (via pandas), reproject (via pyproj), and just then calculate the flat distance (pythagoras). pyproj is a Python interface to PROJ, you pass it the coordinates and it returns the ellipsoidal distance between the coordinates. Time them, I don't think it's slower but I didn't measure it. – Gabriel De Luca Apr 18 '23 at 15:00