3

I tried to read a GeoJSON file with Pandas, but I got a ValueError message:

'ValueError: Expected object or value'

Here's the approach I used:

import pandas as pd

geojsonPath = r"Z:\dems\address.geojson" pd_json = pd.io.json.read_json(geojsonPath,lines=True)

pd_json.head()

Attached is an extract from the file

{
"type": "FeatureCollection",
"name": "cameron-addresses-county",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "X": -78.1422444, "Y": 41.3286117, "hash": "93dd7b7e3ee3e8af", "number": "501", "street": "CASTLE GARDEN RD", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 7579 }, "geometry": { "type": "Point", "coordinates": [ -78.1422444, 41.3286117 ] } },
{ "type": "Feature", "properties": { "X": -78.143584, "Y": 41.3284045, "hash": "853eb0c5f6e70fe3", "number": "64", "street": "BELDIN DR", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 4502 }, "geometry": { "type": "Point", "coordinates": [ -78.143584, 41.3284045 ] } },
{ "type": "Feature", "properties": { "X": -78.1711061, "Y": 41.3282128, "hash": "99a13ba635404d80", "number": "9760", "street": "MIX RUN RD", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 8448 }, "geometry": { "type": "Point", "coordinates": [ -78.1711061, 41.3282128 ] } },
{ "type": "Feature", "properties": { "X": -78.1429278, "Y": 41.3282883, "hash": "70319cf9e435b858", "number": null, "street": null, "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": null }, "geometry": { "type": "Point", "coordinates": [ -78.1429278, 41.3282883 ] } },
{ "type": "Feature", "properties": { "X": -78.1427173, "Y": 41.3282733, "hash": "759f051e7a587eb2", "number": "465", "street": "CASTLE GARDEN RD", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 6447 }, "geometry": { "type": "Point", "coordinates": [ -78.1427173, 41.3282733 ] } },
{ "type": "Feature", "properties": { "X": -78.1433463, "Y": 41.3282308, "hash": "9fbb571fc16a6cb2", "number": "61", "street": "BELDIN DR", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 4466 }, "geometry": { "type": "Point", "coordinates": [ -78.1433463, 41.3282308 ] } },
{ "type": "Feature", "properties": { "X": -78.1432403, "Y": 41.3282179, "hash": "8f837d813626f1e1", "number": null, "street": null, "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": null }, "geometry": { "type": "Point", "coordinates": [ -78.1432403, 41.3282179 ] } },
{ "type": "Feature", "properties": { "X": -78.1715165, "Y": 41.3280965, "hash": "5004ba87bd6e668b", "number": "9736", "street": "MIX RUN RD", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 7434 }, "geometry": { "type": "Point", "coordinates": [ -78.1715165, 41.3280965 ] } }
Taras
  • 32,823
  • 4
  • 66
  • 137
Edudzi
  • 99
  • 1
  • 2
  • 11

1 Answers1

5

There are several things to keep in mind:

  • Do not forget to close the GeoJSON with ]}
  • There is no need to call the read_json() via pd.io.json.read_json, simply pd.read_json. Even if it is placed in the pandas/pandas/io/json/
  • "ValueError: Expected object or value" error comes because in terms of JSON your geojsonPath variable is the right type but with wrong values.

So, to get everything working you can either:

  1. As was commented by @SalimRodríguez, try to read your GeoJSON with GeoPandas

    Output data format: GeoDataFrame

    import geopandas as gpd
    

    absolute_path_to_file = 'C:/Documents/Python Scripts/address.geojson' addresses = gpd.read_file(absolute_path_to_file)

    print(addresses)

           X          Y  ...      id                    geometry
    

    0 -78.142244 41.328612 ... 7579.0 POINT (-78.14224 41.32861) 1 -78.143584 41.328404 ... 4502.0 POINT (-78.14358 41.32840) 2 -78.171106 41.328213 ... 8448.0 POINT (-78.17111 41.32821) 3 -78.142928 41.328288 ... NaN POINT (-78.14293 41.32829) 4 -78.142717 41.328273 ... 6447.0 POINT (-78.14272 41.32827) 5 -78.143346 41.328231 ... 4466.0 POINT (-78.14335 41.32823) 6 -78.143240 41.328218 ... NaN POINT (-78.14324 41.32822) 7 -78.171516 41.328097 ... 7434.0 POINT (-78.17152 41.32810)

  2. If geometry is not important, you can can skip it simply by parsing your GeoJSON as a normal JSON

    Output data format: DataFrame

    import json
    import pandas as pd
    

    absolute_path_to_file = 'C:/Documents/Python Scripts/address.geojson'

    with open(absolute_path_to_file) as f: data = json.load(f)

    raw_data = [feature['properties'] for feature in data['features']] addresses = pd.DataFrame(raw_data)

    print(addresses)

       X          Y              hash  ... region postcode      id
    

    0 -78.142244 41.328612 93dd7b7e3ee3e8af ... None None 7579.0 1 -78.143584 41.328404 853eb0c5f6e70fe3 ... None None 4502.0 2 -78.171106 41.328213 99a13ba635404d80 ... None None 8448.0 3 -78.142928 41.328288 70319cf9e435b858 ... None None NaN 4 -78.142717 41.328273 759f051e7a587eb2 ... None None 6447.0 5 -78.143346 41.328231 9fbb571fc16a6cb2 ... None None 4466.0 6 -78.143240 41.328218 8f837d813626f1e1 ... None None NaN 7 -78.171516 41.328097 5004ba87bd6e668b ... None None 7434.0

  3. If geometry still matters, then parse your GeoJSON as a normal JSON in a little bit different manner

    Output data format: DataFrame

    import json
    import pandas as pd
    from shapely.geometry import Point
    

    absolute_path_to_file = 'C:/Documents/Python Scripts/address.geojson'

    with open(absolute_path_to_file) as f: data = json.load(f)

    raw_data = [feature['properties'] | {'geometry': Point(feature['geometry']['coordinates'])} for feature in data['features']] addresses = pd.DataFrame(raw_data)

    print(addresses)

           X          Y  ...      id                        geometry
    

    0 -78.142244 41.328612 ... 7579.0 POINT (-78.1422444 41.3286117) 1 -78.143584 41.328404 ... 4502.0 POINT (-78.143584 41.3284045) 2 -78.171106 41.328213 ... 8448.0 POINT (-78.1711061 41.3282128) 3 -78.142928 41.328288 ... NaN POINT (-78.1429278 41.3282883) 4 -78.142717 41.328273 ... 6447.0 POINT (-78.1427173 41.3282733) 5 -78.143346 41.328231 ... 4466.0 POINT (-78.1433463 41.3282308) 6 -78.143240 41.328218 ... NaN POINT (-78.1432403 41.3282179) 7 -78.171516 41.328097 ... 7434.0 POINT (-78.1715165 41.3280965)

If it is still important to obtain a GeoDataFrame as a final output data format, one can achieve it either with

  • for option (2):

     gdf = gpd.GeoDataFrame(addresses, geometry=gpd.points_from_xy(addresses["X"], addresses["Y"]))
    
  • or for option (3):

     gdf = gpd.GeoDataFrame(addresses, geometry=addresses["geometry"])
    

References:

Taras
  • 32,823
  • 4
  • 66
  • 137