2

I have a GeoTIFF image that looks like the following

f = rxr.open_rasterio('myFile.tif')
f5 = f.sel(band=5)
f5.plot.imshow()

enter image description here

I would like to create a Pandas dataframe with the values of each pixel and the coordinates of the centroid of the pixels. How can I find the values of the centroid of each pixel and and two columns lat and lon to the following dataframe?

a = np.ravel(f5)
df = pd.DataFrame({'LSTDay':a})
df.head()
LSTDay

0 15631.0 1 15647.0 2 15624.0 3 15624.0 4 15590.0

Here the link to the GeoTIFF file

Vince
  • 20,017
  • 15
  • 45
  • 64
emax
  • 269
  • 4
  • 17

2 Answers2

6

Your raster (2015LSTDay.tif, band 5)

enter image description here

You can use rasterio with pandas:

import rasterio as rio
import pandas as pd
with rio.open('2015LSTDay.tif') as dataset:
    val = dataset.read(5) # band 5
    no_data=dataset1.nodata
    data = [(dataset1.xy(x,y)[0],dataset1.xy(x,y)[1],val[x,y]) for x,y in np.ndindex(val.shape) if val[x,y] != no_data]
    lon = [i[0] for i in data]
    lat = [i[1] for i in data]
    d = [i[2] for i in data]
    res = pd.DataFrame({"long":lon,'lat':lat,"data":v})
res.head()
       long       lat      val
0  12.307060  42.0571  15631.0
1  12.315384  42.0571  15647.0
2  12.323708  42.0571  15624.0
3  12.332033  42.0571  15624.0
4  12.340357  42.0571  15590.0

Result

import matplotlib.pyplot as plt
from rasterio.plot import show
fig, ax = plt.subplots(figsize=(8, 8))
show(dataset.read(5), transform=dataset.transform,ax=ax)
ax.plot(res.x,res.y,'ro', markersize=3)

enter image description here

You can use rasterio with GeoPandas:

import geopandas as gpd
from shapely.geometry import Point
with rio.open('2015LSTDay.tif') as dataset:
    val = dataset.read(5) # band 5
    no_data=dataset.nodata
    geometry = [Point(dataset.xy(x,y)[0],dataset.xy(x,y)[1]) for x,y in np.ndindex(val.shape) if val[x,y] != no_data]
    v = [val[x,y] for x,y in np.ndindex(val.shape) if val[x,y] != no_data]
    df = gpd.GeoDataFrame({'geometry':geometry,'data':v})
    df.crs = dataset.crs
df.head()
                geometry     data
0  POINT (12.30706 42.05710)  15631.0
1  POINT (12.31538 42.05710)  15647.0
2  POINT (12.32371 42.05710)  15624.0
3  POINT (12.33203 42.05710)  15624.0
4  POINT (12.34036 42.05710)  15590.0

Export to shapefile

df.to_file("points.shap")

You can also use rioxarray as suggested by snowman2:

import rioxarray
rds = rioxarray.open_rasterio("2015LSTDay.tif")
rds = rds.squeeze().drop("spatial_ref").drop("band")
rds.name = "data"
res = rds.to_dataframe().reset_index()
res.head(2)
    band    y        x     data
0     0  42.0571  12.307060  15228.0
1     0  42.0571  12.315384  15246.0

Band 5 only

gr = res.groupby(res.band)
gr.get_group('5').head()
        band     y        x     data
11550     5  42.0571  12.307060  15652.0
11551     5  42.0571  12.315384  15671.5
11552     5  42.0571  12.323708  15702.5
11553     5  42.0571  12.332033  15702.5
11554     5  42.0571  12.340357  15642.5
gene
  • 54,868
  • 3
  • 110
  • 187
  • thank you. Can we also generate a geopandas dataframe where the geometry is a rectangle, i.e. the size of the pixel? – emax Apr 22 '21 at 13:06
3
import rioxarray

rds = rioxarray.open_rasterio("file.tif") rds.to_dataframe()

snowman2
  • 7,321
  • 12
  • 29
  • 54
  • What is the relevance of the two links included at the beginning of your answer? If there are three questions for which the same answer applies have you considered whether they are perhaps duplicate questions? – PolyGeo Apr 22 '21 at 00:01