3

I have written a little script to raster to GeoPandas dataframe:

with rasterio.open("INPUT FILE", 'r') as raster: 
    df = pd.DataFrame()
    puntos_calados_combinados=[]
    puntos=[]
    xmin,ymax=raster.transform* (0, 0)
    arr = raster.read(1)  # read all raster values
    for row in range(0,raster.width ):
        for col in range(0,raster.height):
            x=xmin+ (0.5*raster.transform[0] + row*raster.transform[0])
            y=ymax+ (0.5*raster.transform[4] + col*raster.transform[4])
            valor= arr[col,row]
            if arr[col,row]>0:
                x_y=[x,y]
                puntos_calados_combinados.append(x_y)

But it's extremely slow, any idea to improve it?

Vince
  • 20,017
  • 15
  • 45
  • 64
Gregor D
  • 321
  • 2
  • 9
  • See also this question: https://gis.stackexchange.com/questions/379412/creating-geopandas-geodataframe-from-rasterio-features – NielsFlohr Jan 27 '24 at 19:10

3 Answers3

5

Based on: https://gis.stackexchange.com/a/358057/144357

rds = rioxarray.open_rasterio("input.tif")
rds.name = "data"
df = rds.squeeze().to_dataframe().reset_index()
geometry = gpd.points_from_xy(df.x, df.y)
gdf = gpd.GeoDataFrame(df, crs=rds.rio.crs, geometry=geometry)
snowman2
  • 7,321
  • 12
  • 29
  • 54
  • Also, the orig and my code masks out nodata (I'm assuming 0 is nodata in orig because of this line if arr[col,row]>0:). Is there a rioxarray way to mask out the nodata before dumping to dataframe, or would you suggest just using pandas to get rid of them? – user2856 Jan 20 '21 at 09:15
  • Either way you will need to filter them out as all grid points are generated. There is a masked=True kwarg you can use when opening. That makes the nodata nan. I would recommend filtering with pandas to remove those rows. – snowman2 Jan 20 '21 at 13:24
4

Rather than looping through each pixel, you could try creating numpy arrays of X & Y coordinates then dumping those (plus the array of the raster values) into a DataFrame, then converting to a GeoDataFrame.

Below is a runnable example (Note, you do not want to do this with large rasters, you can run out of memory...):

import geopandas as gpd
import numpy as np
import pandas as pd
import rasterio as rio

with rio.Env(): with rio.open('https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/float32.tif') as src: crs = src.crs

    # create 1D coordinate arrays (coordinates of the pixel center)
    xmin, ymax = np.around(src.xy(0.00, 0.00), 9)  # src.xy(0, 0)
    xmax, ymin = np.around(src.xy(src.height-1, src.width-1), 9)  # src.xy(src.width-1, src.height-1)
    x = np.linspace(xmin, xmax, src.width)
    y = np.linspace(ymax, ymin, src.height)  # max -> min so coords are top -> bottom



    # create 2D arrays
    xs, ys = np.meshgrid(x, y)
    zs = src.read(1)

    # Apply NoData mask
    mask = src.read_masks(1) > 0
    xs, ys, zs = xs[mask], ys[mask], zs[mask]

data = {"X": pd.Series(xs.ravel()), "Y": pd.Series(ys.ravel()), "Z": pd.Series(zs.ravel())}

df = pd.DataFrame(data=data) geometry = gpd.points_from_xy(df.X, df.Y) gdf = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)

print(gdf.head())

Output:

          X          Y      Z                        geometry
0  440750.0  3751290.0  107.0  POINT (440750.000 3751290.000)
1  440810.0  3751290.0  123.0  POINT (440810.000 3751290.000)
2  440870.0  3751290.0  132.0  POINT (440870.000 3751290.000)
3  440930.0  3751290.0  115.0  POINT (440930.000 3751290.000)
4  440990.0  3751290.0  132.0  POINT (440990.000 3751290.000)
user2856
  • 65,736
  • 6
  • 115
  • 196
0

I have found this lib https://pypi.org/project/rastertodataframe/. It really works and satisfied my needs.

from rastertodataframe import raster_to_dataframe
raster_path = '/some/gdal/compatible/file.tif'

Extract all image pixels (no vector).

df = raster_to_dataframe(raster_path)

arnaldo
  • 101
  • 2