Raster to GeoPandas

Question

I have written a little script to raster to GeoPandas dataframe:

with rasterio.open("INPUT FILE", 'r') as raster: 
    df = pd.DataFrame()
    puntos_calados_combinados=[]
    puntos=[]
    xmin,ymax=raster.transform* (0, 0)
    arr = raster.read(1)  # read all raster values
    for row in range(0,raster.width ):
        for col in range(0,raster.height):
            x=xmin+ (0.5*raster.transform[0] + row*raster.transform[0])
            y=ymax+ (0.5*raster.transform[4] + col*raster.transform[4])
            valor= arr[col,row]
            if arr[col,row]>0:
                x_y=[x,y]
                puntos_calados_combinados.append(x_y)

But it's extremely slow, any idea to improve it?

See also this question: https://gis.stackexchange.com/questions/379412/creating-geopandas-geodataframe-from-rasterio-features — NielsFlohr, Jan 27 '24 at 19:10

score 5 · Answer 1 · answered Jan 19 '21 at 01:24

5

Based on: https://gis.stackexchange.com/a/358057/144357

rds = rioxarray.open_rasterio("input.tif")
rds.name = "data"
df = rds.squeeze().to_dataframe().reset_index()
geometry = gpd.points_from_xy(df.x, df.y)
gdf = gpd.GeoDataFrame(df, crs=rds.rio.crs, geometry=geometry)

answered Jan 19 '21 at 01:24

snowman2

7,321
12
29
54

Also, the orig and my code masks out nodata (I'm assuming 0 is nodata in orig because of this line if arr[col,row]>0:). Is there a rioxarray way to mask out the nodata before dumping to dataframe, or would you suggest just using pandas to get rid of them? – user2856 Jan 20 '21 at 09:15
Either way you will need to filter them out as all grid points are generated. There is a masked=True kwarg you can use when opening. That makes the nodata nan. I would recommend filtering with pandas to remove those rows. – snowman2 Jan 20 '21 at 13:24

user2856 · Accepted Answer · 2021-01-20T10:18:25.020

Rather than looping through each pixel, you could try creating numpy arrays of X & Y coordinates then dumping those (plus the array of the raster values) into a DataFrame, then converting to a GeoDataFrame.

Below is a runnable example (Note, you do not want to do this with large rasters, you can run out of memory...):

import geopandas as gpd
import numpy as np
import pandas as pd
import rasterio as rio
with rio.Env():
    with rio.open('https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/float32.tif') as src:
        crs = src.crs
    # create 1D coordinate arrays (coordinates of the pixel center)
    xmin, ymax = np.around(src.xy(0.00, 0.00), 9)  # src.xy(0, 0)
    xmax, ymin = np.around(src.xy(src.height-1, src.width-1), 9)  # src.xy(src.width-1, src.height-1)
    x = np.linspace(xmin, xmax, src.width)
    y = np.linspace(ymax, ymin, src.height)  # max -&gt; min so coords are top -&gt; bottom



    # create 2D arrays
    xs, ys = np.meshgrid(x, y)
    zs = src.read(1)

    # Apply NoData mask
    mask = src.read_masks(1) &gt; 0
    xs, ys, zs = xs[mask], ys[mask], zs[mask]


data = {"X": pd.Series(xs.ravel()),
        "Y": pd.Series(ys.ravel()),
        "Z": pd.Series(zs.ravel())}
df = pd.DataFrame(data=data)
geometry = gpd.points_from_xy(df.X, df.Y)
gdf = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)
print(gdf.head())

Output:

          X          Y      Z                        geometry
0  440750.0  3751290.0  107.0  POINT (440750.000 3751290.000)
1  440810.0  3751290.0  123.0  POINT (440810.000 3751290.000)
2  440870.0  3751290.0  132.0  POINT (440870.000 3751290.000)
3  440930.0  3751290.0  115.0  POINT (440930.000 3751290.000)
4  440990.0  3751290.0  132.0  POINT (440990.000 3751290.000)

No particular reason, mostly habit (often I need to customise the env), being explicit, copying boilerplate code and that's how the docs do it... Not specifically required here as rasterio.open will create a default env. — user2856, Jan 18 '21 at 11:10

score 0 · Answer 3 · answered Jun 03 '22 at 14:51

I have found this lib https://pypi.org/project/rastertodataframe/. It really works and satisfied my needs.

from rastertodataframe import raster_to_dataframe
raster_path = '/some/gdal/compatible/file.tif'
Extract all image pixels (no vector).
df = raster_to_dataframe(raster_path)

Raster to GeoPandas

3 Answers3

Extract all image pixels (no vector).

Linked