6

I am processing a large amount of landsat data. Landsat scenes ship as a collection of tiffs- one file for each spectral band, plus a bunch of masks. I need to stack some of the bands into a single tiff for subsequent processing. I can easily do this in various programs, but would like to automate the process using GDAL in python. I will eventually need to automate the stacking of hundreds or thousands of images, so efficiency will be important.

I have written a script which is simple and perfectly functional but feels extremely hacky- I open the desired tiffs, convert them to numpy arrays, stack the arrays, and save the stacked array to a new geotiff with metadata taken from the input. Here is a simplified version of the code:

from osgeo import gdal, gdal_array
import numpy as np
b1 = gdal.Open("LT50250232011160PAC01_sr_band1.tif")
b2 = gdal.Open("LT50250232011160PAC01_sr_band2.tif")
array1 = b1.ReadAsArray()
array2 = b2.ReadAsArray()
stacked = np.array([array1,array2])
gdal_array.SaveArray(stacked.astype("int"), "b12_stacked.tif", "GTiff", gdal.Open("LT50250232011160PAC01_sr_band1.tif"))

If possible, I'd like to avoid converting the data to numpy arrays, and just stack the gdal.Dataset objects (b1 and b2). This would make the code cleaner and presumably more efficient. It seems like it should be straightforward to do, but I have not been able to find any information on how to do it.

Alternate approaches which I have come across in research would be to use gdal_merge.py or vrts (as suggested here). Neither of these strikes me as ideal; gdal_merge is designed for tasks much more complicated than mine and presumably involves a lot of overhead, while vrts are only usable in the command line. However, if the approach I outlined above is not feasible, would one of these appraoches be more efficient/pythonic than the hack I have already written?

Edit: my code needs to be run on a Windows system.

Joe
  • 435
  • 1
  • 5
  • 13
  • What do you mean that vrts are only usable in command line? Gdalbuildvrt http://www.gdal.org/gdalbuildvrt.html should be able to stack your bands to vrt within a second and after that you can use the vrt file just as if it was a multiband tiffs. Howerver, I have no experience on mask bands, only regular 7-band Landsat scenes. – user30184 Sep 29 '14 at 17:40
  • I mean that there is no gdalbuildvrt python function- I would need to use a subprocess, os.system or something like this. Which is not to say it's not an option- stacking gdal.Dataset objects just seems simpler, if it's possible. – Joe Sep 29 '14 at 18:20
  • 1
    There used to be gdal_vrtmerge.py. Could it be of some use for you? Some version is here http://trac.osgeo.org/gdal/attachment/ticket/1985/gdal_vrtmerge.py – user30184 Sep 29 '14 at 18:30
  • 2
    my suggestion is to use the code you have! maybe clean it up, add some options and have it accept commandline arguments and you have yourself a perfectly reasonable script to use over and over again. As you say, it does what you need. If your source files are produced by another process of your making, you can streamline the process by perhaps saving them as compressed numpy arrays (only if they're intermediate data, that is) or hdr/flt rasters (binary and fast to load). – user1269942 Sep 29 '14 at 20:55
  • @user30184, gdal_vrtmerge.py appears to be writing a vrt, and I do need a tiff at the end bot for subsequent processing and passing to collaborators. My python-fu is not great enough to pull it apart without considerable headache, but it could be useful to others with the same problem. – Joe Sep 30 '14 at 13:58
  • I appreciate all the ideas, but at this point I am leaning strongly towards just sticking with the approach I have now. I'd rather work with <10 lines of code that "feels hacky" than a much more complex solution that "feels clean". That said, if anyone ever happens upon a very simple solution I'd still love to hear it. – Joe Sep 30 '14 at 14:06
  • Reading in vrt in and writing out tif or whatever format you like is not hard at all for GDAL and it can also use the virtual raster as input for subsequent processes. Sometimes it saves a lot of time and disk space if GDAL writes out just a vrt file of some kilobytes instead of copying gigabytes of image data into a new physical file. But if the program you use later can't read vrt then you must write a new multi-band file anyway and your approach is fine. – user30184 Sep 30 '14 at 14:13
  • Just as a note here, as I have gone through a process similar to this... Using the numpy method is a lot faster than any other method I have found for stacking multispectral images and doing any manipulations. Trust in numpy! – A.A Aug 16 '17 at 11:16

1 Answers1

5

Although it would require another library (which currently is only available on OS X and Linux) you could use RSGISLib (http://rsgislib.org), which is built on top of GDAL to do this. There is a function to stack bands, as an example:

#!/usr/bin/env python
import rsgislib
from rsgislib import imageutils
# Create list of images
imageList = ['LC80430342013291LGN00_B1.TIF',
             'LC80430342013291LGN00_B2.TIF',
             'LC80430342013291LGN00_B3.TIF',
             'LC80430342013291LGN00_B4.TIF',
             'LC80430342013291LGN00_B5.TIF',
             'LC80430342013291LGN00_B6.TIF',
             'LC80430342013291LGN00_B7.TIF']
# Create list of band names           
bandNamesList = ['Coastal','Blue','Green','Red','NIR','SWIR1','SWIR2']
# Set output image
outputImage = 'LC80430342013291LGN00_stack.tif'
# Set format and type
gdalFormat = 'GTiff'
dataType = rsgislib.TYPE_16UINT
# Stack
imageutils.stackImageBands(imageList, bandNamesList, outputImage, None, 0, gdalFormat, dataType)

It also allows passing in a list of band names.

However, if you were going to install RSGISLib to stack the Landsat bands you could just install ARCSI, which would stack the bands and convert to radiance or reflectance (if you were looking to do this). More details are here: http://spectraldifferences.wordpress.com/2014/05/27/arcsi/

danclewley
  • 897
  • 5
  • 17
  • These both look like good resources, but unfortunately all of the computers in my lab run Windows. I could set up a virtual box, but I need to be able to pass this code around to my advisors and labmates, so I'm not sure that's wise- we run into enough compatibility/consistency issues as it is. – Joe Sep 30 '14 at 13:52