Why does geodataframe never complete?

Question

I had a feature class made up of 1,700,000 polygons. I used Geopandas to create a geodataframe:

state = "MD"
state_gdb = r"C:\Projects\Pop_Alloc\{}_Data.gdb".format(state)
join_feat = "{}_Ftprnt_CB_Join".format(state)
bldg_feat_df = gpd.read_file(state_gdb, layer=join_feat)

No problem; it took maybe 5-10 minutes to run. I have another feature class; let's call it 'parcels.' It has around 2,600,000 polygon features. I tried to do the same thing; make a geodataframe.

parcel_gdb = r"C:\Projects\Pop_Alloc\Parcels_by_state.gdb"
state_parcel = "{}_Parcels_merge".format(state)
parcel_feat_df = gpd.read_file(parcel_gdb, layer=state_parcel)

It has now been running for several hours. Is there a reason for this? Do I simply not have enough in memory to create this geodataframe. Is there a way to resolve this issue (generator, chunking)?

It appears that there is some limit to the record/size for which a gdf will run. I found a workaround here(https://gis.stackexchange.com/questions/129414/only-read-specific-attribute-columns-of-a-shapefile-with-geopandas-fiona), and will post a solution as soon as I test. — gwydion93, Jun 26 '19 at 19:35

score 0 · Accepted Answer · answered Jun 27 '19 at 14:45

I am still not 100% sure, but it appears that there is a limit to the file size, or amount of features in a feature class or shapefile when creating a geodataframe in GeoPandas. My solution to this problem comes from here and involves using a generator to iterate through the file after opening it in fiona and then creating a gdf with only the columns you want. It still took about 10 minutes to run, but that is a vast improvement. Hope this helps:

parcel_gdb = r"C:\Projects\Pop_Alloc\Parcels_by_state.gdb"
usecols = ['PARCEL_ID', 'LAND_USE_T', 'PROP_IND_T', 'STORY_NBR', 'BLD_UNITS']
def records(filename, cols):
with fiona.open(filename, layer='MD_Parcels_merge') as source:
    for feature in source:
        f = {k: feature[k] for k in ['id', 'geometry']}
        f['properties'] = {k: feature['properties'][k] for k in usecols}
        yield f
parcels_feat_df = gpd.GeoDataFrame.from_features(records(parcel_gdb, usecols))

Why does geodataframe never complete?

1 Answers1