Optimizing pixel extraction from GeoTiff using Python

Question

I am creating IDF Curves for work using python and 30-minute rainfall rate rasters. There are 24 rasters per day and about two decades worth of data. I basically input lat/lon coordinates and my python script grabs the pixel value from each raster throughout the period and stores it in a DataFrame to use for statistical analysis. This process currently takes about 5 hours to run per lat/lon coordinate with the majority of the time coming from opening the raster and extracting the pixel value. Are there any ways to optimize this process for speed? I am completely stuck and can't find if there is any speed efficiencies native to GDAL. I've submitted the function I use for extracting the pixel value. Please let me know if you have any questions or suggestions, thanks!

def extract_point_values_from_raster(raster_filepath, lat_lon_str):

    lat_lon = lat_lon_str.split(',')
    lat = float(lat_lon[0])
    lon = float(lat_lon[1])

    try:
        src_ds = gdal.Open(raster_filepath)
        gt = src_ds.GetGeoTransform()
        rb = src_ds.GetRasterBand(1)

        if (lat < gt[3] and lat > (gt[4] - gt[3]) and 
            lon > gt[0] and lon < (gt[2] - gt[0])):

            px = int((lon-gt[0])/gt[1])
            py = int((lat-gt[3])/gt[5])

            intval = rb.ReadAsArray(px,py,1,1)

            if intval == None:
                raster_val = None
            else:
                raster_val = round(intval[0][0],2)

        del src_ds

    except:
        raster_val = None

    return(raster_val)

Are you doing this for multiple coordinates? One approach would be sampling all the coordinates for the raster you are opening at the time. I can post a numpy solution that might be helpful if that is the case. — Marcelo Villa, Mar 25 '20 at 21:11
thank you @MarceloVilla, but currently i am only analyzing one coordinate at a time! — JWB, Mar 25 '20 at 21:36
You could look into: https://gis.stackexchange.com/questions/358036/extracting-data-from-a-raster, but I am not sure that it would be faster or not. — snowman2, Apr 25 '20 at 04:17

score 0 · Answer 1 · answered Mar 25 '20 at 19:24

Some of my initial questions/thoughts/things to experiment with:

Every time you call the function it's re-opening the raster, then performing an intersection check. Presumably there may be multiple lat/lon queries against a given raster
- Can you pre-index all of the the raster bounds (e.g. with an rtree), then query all of a raster's intersecting points at once? This index could be built once and then saved for later queries.
- Or, can you lazily open a raster only when needed, but then keep it open in case it's needed again later?
Since it seems to be IO bound, make sure the rasters are located on the fastest storage you have access to, ideally local solid state drives
The raster format can have an impact on performance. For example, if a raster is interleaved by line, I believe that GDAL has to read the entire line to retreive your pixel of interest. If you're repeatedly reading this dataset, consider trying
- Converting to an internally-tiled format, if not already
- Stacking so that you have one 24-band raster per day and experimenting to find the best interleave (BIP perhaps?)
Have you profiled the code to ensure that it's slowest where you think it is, and that there are no other significant bottlenecks?

Thanks for the reply! So I'm attempting to read through about 350,000 different raster files and getting the value at the same lat/lon for each file. I don't use the same raster twice. This also is where the problem comes in with being unable to store the rasters on my fastest storage as I just don't have the space on that drive, so I have them saved to a synology. I have profiled the code, though, and this is the spot taking the most time! — JWB, Mar 25 '20 at 19:34
I see. For any given point query will there be valid data in every one of those 350,000 rasters, or only a subset of them? In other words, does every file cover the exact same area? — mikewatt, Mar 26 '20 at 18:35

Optimizing pixel extraction from GeoTiff using Python

1 Answers1