I have a grid vector table that covers the entire world and want to calculate the land area within each grid cell using another table that contains a single polygon for all continents.
I tried the following query:
CREATE TABLE ch01.cell_area AS SELECT ogc_fid, a.gid, xcoord, ycoord, "col", "row", area, wkb_geometry
FROM ch01.priogrid AS a
LEFT JOIN (SELECT a.gid, ST_Area(ST_Intersection(a.wkb_geometry, b.wkb_geometry)::geography) AS area
FROM ch01.grid AS a, ch01.landmass AS b
WHERE st_intersects(a.wkb_geometry, b.wkb_geometry)) AS e
ON a.gid = e.gid;
However this is very slow. I interrupted the procedure after around 8 hours. The query plan looks as follows:
Hash Left Join (cost=34.13..38314.90 rows=70953 width=168)
Hash Cond: (a.gid = a_1.gid)
-> Seq Scan on priogrid a (cost=0.00..2360.53 rows=70953 width=160)
-> Hash (cost=33.83..33.83 rows=24 width=166)
-> Nested Loop (cost=4.33..33.83 rows=24 width=166)
-> Seq Scan on landmass b (cost=0.00..1.01 rows=1 width=32)
-> Bitmap Heap Scan on priogrid a_1 (cost=4.33..32.80 rows=2 width=134)
Recheck Cond: (wkb_geometry && b.wkb_geometry)
Filter: _st_intersects(wkb_geometry, b.wkb_geometry)
-> Bitmap Index Scan on gd_geom_idx (cost=0.00..4.33 rows=7 width=0)
Index Cond: (wkb_geometry && b.wkb_geometry)
I am using Postgis 2.3/Postgres 9.6.0 as local host on a relatively new macbook pro (2.8 GHz, 16 RAM).
I will need to speed things up as much as possible, because I am planning to do a series of these calculations. I have already indexed the geometry columns (using a brin index on the grid table). I am aware that intersections are costly, but I don't see how I can avoid them in this particular case.
I also tried this method (described here), but this does not seem to be much faster:
CREATE TABLE ch01.cell_area
AS SELECT a.gid,
CASE WHEN st_intersects(a.wkb_geometry, b.wkb_geometry)
THEN ST_Area(ST_Intersection(a.wkb_geometry, b.wkb_geometry)::geography)
ELSE NULL END AS area
FROM ch01.priogrid a
LEFT JOIN ch01.landmass b
ON st_intersects(a.wkb_geometry, b.wkb_geometry);
I am pretty new when it comes to improving query performance in general so any ideas on how I can do this are most welcome!
"BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table."
which I thought describes a grid-cell table pretty well
– guyus Mar 16 '17 at 14:25