2

General Problem: I have a table of rasters. The rasters are classified MSI images, so each pixel is an integer indicating the class of the pixel. For a number of regions, I am querying pixel counts in those regions, i.e., making a table like:

filename | total pixel count | pixels in class 0 | pixels in class 1 | ...

Specific Problem: My problem is that the script takes a long time, so I want to get the runtime down.

What I've tried: Full disclosure, I'm not well versed in postgresql. Also, note that in the two code snippets below, the most inner sub-selects are identical. Here's my first attempt:

EXPLAIN ANALYZE SELECT 
    filename AS filename, 
    ST_Count(rast,1) AS totalpixels,
    (ST_ValueCount(rast,1,false,ARRAY[0.0])).count AS nodata,
    (ST_ValueCount(rast,1,false,ARRAY[1.0])).count AS lowveg,
    (ST_ValueCount(rast,1,false,ARRAY[2.0])).count AS highveg,
    (ST_ValueCount(rast,1,false,ARRAY[12.0])).count AS clouds,
    (ST_ValueCount(rast,1,false,ARRAY[13.0])).count AS shadow
FROM 
( 
    SELECT 
        filename, 
        ST_Clip(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326)) AS rast 
    FROM 
        rasters 
    WHERE 
        ST_Intersects(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326)) 
) AS source_rasters;

This runs in 185 ms. I thought surely it's sub-optimal to ST_ValueCount so many times. So here's my improved attempt -- running it once and converting the SETOF result to an array so that I can index the values:

EXPLAIN ANALYZE SELECT 
    filename AS filename,
    totalpixels AS totalpixels,
    pxcnt[1] AS nodata,
    pxcnt[2] AS lowveg,
    pxcnt[3] AS highveg,
    pxcnt[4] AS clouds,
    pxcnt[5] AS shadow
FROM 
(
    SELECT 
        filename AS filename,
        ST_Count(rast,1,false) AS totalpixels,
        ARRAY( SELECT count FROM ST_ValueCount(rast,1,false,ARRAY[0.0,1.0,2.0,12.0,13.0]) ) AS pxcnt
    FROM 
    ( 
        SELECT 
            filename, 
            ST_Clip(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326)) AS rast 
        FROM 
            rasters 
        WHERE 
            ST_Intersects(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326)) 
    ) AS source_rasters
) AS f;

But it only reduced the runtime to 155 ms.

But then I thought maybe the ST_ValueCount is only a small portion of the total work cost, so this is all the improvement I could expect. However, if I reference just one result:

EXPLAIN ANALYZE SELECT 
    filename AS filename,
    totalpixels AS totalpixels,
    pxcnt[1] AS nodata
FROM 
...

It cuts the runtime down to 55 ms, which I don't understand because it seems to have done all the work of intersecting, clipping, counting, etc.

Question: So is there a faster way to unpack the results of ST_ValueCount, or an obvious way to speed this up in general?

Just for what it's worth, I've made other incremental improvements since I began, e.g., tiling the rasters was a big improvement. At this point, this seems like the most likely opportunity for a significant improvement, please let me know if I might be wrong about that.

user55937
  • 1,293
  • 11
  • 18

0 Answers0