9

I have a raster image with 3 bands. I would like to convert this image to a csv file where each row will be one pixel and each column will be one band, so that I can easily see the three values each pixel got.

This is how I have tried to do it:

import rasterio
import rasterio.features
import rasterio.warp
from matplotlib import pyplot
from rasterio.plot import show
import pandas as pd
import numpy as np

img=rasterio.open("01032020.tif") show(img,0)

#read image array=img.read()

#create np array array=np.array(array)

#create pandas df

dataset = pd.DataFrame({'Column1': [array[0]], 'Column2': [array[1]],'Column3': [array[2]]}) dataset

and also like this:

dataset = pd.DataFrame({'Column1': [array[0,:,:]], 'Column2': [array[1,:,:]],'Column3': [array[2:,:]]})

but i'm getting something weird like this table: enter image description here

I have also tried:

index = [i for i in range(0, len(array[0]))]
dataset = pd.DataFrame({'Column1': array[0], 'Column2': array[1],'Column3': array[2]},index=index)
dataset

but then I get the number of the rows I have and it's still not good: enter image description here

what do I do wrong?

My goal

Get one pandas table, where each row is a pixel, and it should have 3 columns, one for each band.

StefanBrand_EOX
  • 3,721
  • 13
  • 33
ReutKeller
  • 2,139
  • 4
  • 30
  • 84

3 Answers3

6

Quick solution

pd.DataFrame(array.reshape([3,-1]).T)

Explanation

  1. Take array of shape (3, x, y) and flatten out the 2nd and 3rd dimension. From the numpy docs: One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
reshaped_array = array.reshape([3,-1])
  1. Transpose array to get array of shape (x*y, 3)
transposed_array = reshaped_array.T
  1. Build DataFrame
pd.DataFrame(transposed_array)
StefanBrand_EOX
  • 3,721
  • 13
  • 33
  • 1
    thank you for yoir aswer, is ther any way to preserve the coordinates? – ReutKeller Oct 30 '20 at 10:14
  • 2
    You will need one or two extra columns that store the index/indices of the original image. I think that's for a new question. -> https://gis.stackexchange.com/questions/ask – StefanBrand_EOX Oct 30 '20 at 10:32
4

Or another simple solution with numpy ravel():

import rasterio as rio
src= rio.open('myraster.tif')
# number of bands
src.count
3
# read bands
array = src.read()
# convert to a DataFrame
import pandas as pd
df = pd.DataFrame()
df['band1'] = array[0].ravel() 
df['band2'] = array[1].ravel() 
df['band3'] = array[2].ravel() 
df.head(2)
           band1 band2 band 3
0           250   249   254
1           250   249   254
df.tail(2) # last
           band1 band2 band 3
78609002    190   182   180
78609003    190   186   174

Or

gene
  • 54,868
  • 3
  • 110
  • 187
1

You can check that here http://shreshai.blogspot.com/

The implementation is for a multiband raster and also keeps the coordinates

with rasterio.open(RASTER_PATH) as src:
    #read image
    image= src.read()
    # transform image
    bands,rows,cols = np.shape(image)
    image1 = image.reshape (rows*cols,bands)
    print(np.shape(image1))
    # bounding box of image
    l,b,r,t = src.bounds
    #resolution of image
    res = src.res
    res = src.res
    # meshgrid of X and Y
    x = np.arange(l,r, res[0])
    y = np.arange(t,b, -res[0])
    X,Y = np.meshgrid(x,y)
    print (np.shape(X))
    # flatten X and Y
    newX = np.array(X.flatten())
    newY = np.array(Y.flatten())
    print (np.shape(newX))
    # join XY and Z information
    export = np.column_stack((newX, newY, image1))