I need to calculate the distance between points of two different datasets, once with locations of towns, the other with locations of dams. Both are located in different countries in Africa.
The more I read the more confused I keep getting. I'm using the Python ecosystem (Geopandas, Shapely, Fiona).
A) The first dataset is a .shp file with locations of towns as Points. This one is nice and provides the crs as epsg=4326 (which I understand is the code for WGS 84):
import geopandas as gpd
towns = gpd.read_file('towns.shp')) #Uses fiona to load
print(towns.crs)
print(towns.geometry.head(3))
#{'init': 'epsg:4326'}
#0 POINT (8.877318824300001 9.93427297769)
#1 POINT (9.163896418389999 9.47532028526)
B) The second one has locations of dams. It came in an excel file with latitude and longitude with decimal places. It didn't give any information on the crs.
#Omitting the excel loading of `dams`
print(dams.geometry.head(2))
print(dams.crs) # empty
#487 POINT (8.97333333333 9.76472222222)
#488 POINT (4.55305555556 8.44277777778)
What's the right way to measure the distance between these two points? (Let's assume for now they are both in WGS 84)
I can calculate the distance between all towns and the first element of the dams:
# Distance function uses Shapely
print(towns.geometry.distance(dams.geometry.iloc[1]))
#0 0.194849
#1 0.346508
#2 1.046174
Is this just calculating the Euclidian distance and hence very inaccurate?
What can I do for this workflow?
Should I transform the crs of the points to something that works for all of Africa and then take the Geopandas/Shapely distance function or would it be easier to keep the lat/lon (or WGS 84) and use a Haversine formula (or similar)?
To me this would break a bit the benefit of using Geopandas.