1

Update

Question off-topic because I launched the ArcGIS treatment again and for an unknown reason this time I obtained the 21.56km length I was expecting.

It means the error comes from either my ArcGIS model or whatever the software did while executing.

In short :

What's the difference beween ArcGIS Pro's Points To Line and its Geopandas equivalent (using groupby() apply LineString) ?

On some data I get a different value of the line length, while on some other data the difference is insignificant.

Details on my case :

I have a pandas DataFrame containing about a thousand points with longitude and latitude, corresponding to a car drive recording. I separated them in five zones (actually seven : from 0 to 6, but zones 0 and 3 are empty). I want to get the length in kilometers for each zone.

Screenshot of the head of the table corresponding to zone 5 (I added the geometry column with the code, see next) : Head of the df5 table

What I used to do was using ArcGIS Pro :

  • Points to Line
  • then Add Geometry Attributes on the created lines
  • I would select "Geodesic Length" and the unit (km).

I'm trying to do the same on Geopandas.

Using the instructions from Turning GeoDataFrame of x,y coordinates into Linestrings using GROUPBY?, I got the part "transforming points into lines" covered.

Here is my code (still for zone 5) :

from shapely.geometry import Point, LineString, shape
import geopandas as gpd

geometry = [Point(xy) for xy in zip(df5.X, df5.Y)] geo_df = gpd.GeoDataFrame(df5, geometry=geometry)

geo_df2 = geo_df.groupby(['Immatriculation'])['geometry'].apply(lambda x:LineString(x.tolist())) geo_df2 = gpd.GeoDataFrame(geo_df2, geometry='geometry')

Then if I want the length in kilometers, I need to set a coordinate reference system and then change it to a projected coordinate system :

geo_df2.crs={'init' :'epsg:4326'}
geo_df2=geo_df2.to_crs({'init': 'epsg:3947'})

distance = [geo_df2.length[0]/1000]

I'm starting with the WGS 84 (EPSG=4326) because that's what ArcGIS uses for my data (both the points and line).

I chose to project to EPSG=3947 because it's what gives me the closest results to ArcGIS's results so far.

I realized that I got about 2 km of difference for zone 5 between ArcGIS's geodesic length and the one I get with Geopandas - given the precision I need, this is significative and I cannot ignore this error.

difference ArcGIS/GeoP for each zone, in km

And we can see that it actually starts before the CRS projection :

difference ArcGIS/GeoP for each zone, in WGS84

Where does this significant difference for ONE zone and not the rest come from and can it be overcome?

It's the very same data for zone 5 in both cases so it doesn't come from the points.

ToddEmon
  • 113
  • 5
  • 1
    You're comparing outputs in different datums without datum transformation. 2km is the best you can hope for in some places. – Vince Jul 08 '19 at 17:31
  • I'm sorry but I don't understand your answer (limits of my English I guess). The datums are not different (same set of points). What do you mean by 'datum transformation' ? – ToddEmon Jul 08 '19 at 23:22
  • 1
    The geographic coordinate systems are based on different geodetic datums (reference ellipsoids). Without identical ellipsoids, datum transformation is necessary, and you don't seem to be applying one, which makes it likely that the math is all wrong. – Vince Jul 09 '19 at 00:04
  • How can I know what datum transformation is necessary ?

    I suppose I need to know which geodetic datums are used. In ArcGIS it uses the crs "GCS WGS 1984" with the datum "D WGS 1984".

    How can I know which datum transformation is used in Geopandas by default ?

    – ToddEmon Jul 09 '19 at 08:41
  • 1
    Indeed, that is the question. I'm an ArcGIS guy, though, so I'm not going to be much help on GeoPandas datum transformation. – Vince Jul 09 '19 at 11:40
  • 1
    Opened a new question to know GeoPandas default datum : https://gis.stackexchange.com/questions/328234/what-is-geopandas-default-datum – ToddEmon Jul 09 '19 at 14:25
  • I have closed this question as you have asked another, different, question that may solve your problem. If you get an answer to that question that leads to new information you can add to this one feel free to [edit] and we can look at reopening if necessary. – Midavalo Jul 10 '19 at 16:44

1 Answers1

0

I believe the key to your answer is that you are calculating the Geodesic length in ArcGIS and that is not what you are calculating in geopandas by re-projecting. To get the Geodesic length, you should use pyproj.Geod. An issue related to this is in pyproj here.

Here is an example of how you would calculate the Geodesic length with pyproj.Geod and geopandas with a dataset in the format similar to yours:

>>> import geopandas
>>> from shapely.geometry import Point, PolyLine
>>> gdf = geopandas.GeoDataFrame({"Immatriculation": ["AB101XY", "AB101XY"]}, geometry=[Point(-1.379083, 46.61430), Point(-1.379444, 46.61452)], crs="+init=epsg:4326")
>>> gdf2 = geopandas.GeoSeries(gdf.groupby(['Immatriculation']).geometry.apply(lambda x:LineString(x.tolist())))
>>> from pyproj import Geod
>>> geod = Geod(ellps="WGS84")
>>> def geod_distance_km(linestring):
...     x_coords, y_coords = linestring.xy
...     return sum(geod.inv(x_coords[:-1], y_coords[:-1], x_coords[1:], y_coords[1:])[2])/1000.0
... 
>>> gdf2.geometry.apply(geod_distance_km)
Immatriculation
AB101XY    0.036915949

More information:

snowman2
  • 7,321
  • 12
  • 29
  • 54
  • I applied your function to the zone 5 geodataframe with each ellipsoid available (from the list list(pyproj._list.get_ellps_map().keys) ), expecting to find a value around 19.66 km like ArcGIS', but all of the values are around 21.56 (from 21.53 to 21.63). It seems the difference lies somewhere else – ToddEmon Jul 10 '19 at 08:20