Update
Question off-topic because I launched the ArcGIS treatment again and for an unknown reason this time I obtained the 21.56km length I was expecting.
It means the error comes from either my ArcGIS model or whatever the software did while executing.
In short :
What's the difference beween ArcGIS Pro's Points To Line and its Geopandas equivalent (using groupby() apply LineString) ?
On some data I get a different value of the line length, while on some other data the difference is insignificant.
Details on my case :
I have a pandas DataFrame containing about a thousand points with longitude and latitude, corresponding to a car drive recording. I separated them in five zones (actually seven : from 0 to 6, but zones 0 and 3 are empty). I want to get the length in kilometers for each zone.
Screenshot of the head of the table corresponding to zone 5 (I added the geometry column with the code, see next) :

What I used to do was using ArcGIS Pro :
- Points to Line
- then Add Geometry Attributes on the created lines
- I would select "Geodesic Length" and the unit (km).
I'm trying to do the same on Geopandas.
Using the instructions from Turning GeoDataFrame of x,y coordinates into Linestrings using GROUPBY?, I got the part "transforming points into lines" covered.
Here is my code (still for zone 5) :
from shapely.geometry import Point, LineString, shape
import geopandas as gpd
geometry = [Point(xy) for xy in zip(df5.X, df5.Y)]
geo_df = gpd.GeoDataFrame(df5, geometry=geometry)
geo_df2 = geo_df.groupby(['Immatriculation'])['geometry'].apply(lambda x:LineString(x.tolist()))
geo_df2 = gpd.GeoDataFrame(geo_df2, geometry='geometry')
Then if I want the length in kilometers, I need to set a coordinate reference system and then change it to a projected coordinate system :
geo_df2.crs={'init' :'epsg:4326'}
geo_df2=geo_df2.to_crs({'init': 'epsg:3947'})
distance = [geo_df2.length[0]/1000]
I'm starting with the WGS 84 (EPSG=4326) because that's what ArcGIS uses for my data (both the points and line).
I chose to project to EPSG=3947 because it's what gives me the closest results to ArcGIS's results so far.
I realized that I got about 2 km of difference for zone 5 between ArcGIS's geodesic length and the one I get with Geopandas - given the precision I need, this is significative and I cannot ignore this error.
And we can see that it actually starts before the CRS projection :
Where does this significant difference for ONE zone and not the rest come from and can it be overcome?
It's the very same data for zone 5 in both cases so it doesn't come from the points.


I suppose I need to know which geodetic datums are used. In ArcGIS it uses the crs "GCS WGS 1984" with the datum "D WGS 1984".
How can I know which datum transformation is used in Geopandas by default ?
– ToddEmon Jul 09 '19 at 08:41