What I've done is download the images as tifs from GEE (something you might have to do in pieces given the size). I used the getDownloadURL() function because it is faster, For larger images use Export.image.toDrive().
As my bands are in separate tifs, I stack them into one tif using rasterio/GDAL.
I keep them in the output zip file to save on space.
# Collect path names of the single-band tifs in the folder and
# convert name into a format readable by rasterio.open()
import rasterio
import numpy as np
from zipfile import Zipfile
file_list = []
stack_path = 'C:\Users\stack.tif'
img_file = 'C:\Users\LC08_023036_20130429'
with ZipFile(str(img_file.with_suffix('.zip')), 'r') as f:
names = f.namelist()
names = [str(img_file.with_suffix('.zip!')) + name for name in names]
names = ['zip://' + name for name in names]
for file in names:
if file.endswith('.tif'):
file_list.append(file)
# Read each layer, convert to float and write it to stack
with rasterio.open(stack_path, 'w', **meta) as dst:
for id, layer in enumerate(file_list, start=0):
with rasterio.open(layer) as src1:
dst.write_band(id + 1, src1.read(1).astype('float32'))
As sklearn requires a 2D matrix, I just reshape it.
The data must be transposed for scikit-image. See rasterio interoperability
with rasterio.open(str(stack_path), 'r') as ds:
data = ds.read()
# rasterio.read output is (Depth, Width, Height).
data = data.transpose((1, -1, 0))
# Convert GeoTIFF NoData values in the image to np.nan
data[data == -999999] = np.nan
data[np.isneginf(data)] = np.nan
# Reshape into a 2D array, where rows = pixels and cols = features/bands
data_vector = data.reshape([data.shape[0] * data.shape[1], data.shape[2]])
# Remove NaNs
data_vector = data_vector[~np.isnan(data_vector).any(axis=1)]
Although downloading the files is cumbersome, if you create a tif stacking and reshaping pipeline for all of your files the process is greatly streamlined.
sampleRegions? it is specific for the AOI geometry as oppose tosampleRectangle– user88484 Jul 30 '21 at 11:56