1

I have a grid vector table that covers the entire world and want to calculate the land area within each grid cell using another table that contains a single polygon for all continents.

I tried the following query:

CREATE TABLE ch01.cell_area AS SELECT ogc_fid, a.gid, xcoord, ycoord, "col", "row", area, wkb_geometry
    FROM ch01.priogrid AS a
      LEFT JOIN (SELECT a.gid, ST_Area(ST_Intersection(a.wkb_geometry, b.wkb_geometry)::geography) AS area
                 FROM ch01.grid AS a, ch01.landmass AS b
                 WHERE st_intersects(a.wkb_geometry, b.wkb_geometry)) AS e
        ON a.gid = e.gid;

However this is very slow. I interrupted the procedure after around 8 hours. The query plan looks as follows:

Hash Left Join  (cost=34.13..38314.90 rows=70953 width=168)
Hash Cond: (a.gid = a_1.gid)
->  Seq Scan on priogrid a  (cost=0.00..2360.53 rows=70953 width=160)
->  Hash  (cost=33.83..33.83 rows=24 width=166)
      ->  Nested Loop  (cost=4.33..33.83 rows=24 width=166)
          ->  Seq Scan on landmass b  (cost=0.00..1.01 rows=1 width=32)
          ->  Bitmap Heap Scan on priogrid a_1  (cost=4.33..32.80 rows=2 width=134)
                Recheck Cond: (wkb_geometry && b.wkb_geometry)
                Filter: _st_intersects(wkb_geometry, b.wkb_geometry)
                ->  Bitmap Index Scan on gd_geom_idx  (cost=0.00..4.33 rows=7 width=0)
                      Index Cond: (wkb_geometry && b.wkb_geometry)

I am using Postgis 2.3/Postgres 9.6.0 as local host on a relatively new macbook pro (2.8 GHz, 16 RAM).

I will need to speed things up as much as possible, because I am planning to do a series of these calculations. I have already indexed the geometry columns (using a brin index on the grid table). I am aware that intersections are costly, but I don't see how I can avoid them in this particular case.

I also tried this method (described here), but this does not seem to be much faster:

CREATE TABLE ch01.cell_area 
AS SELECT a.gid,
 CASE WHEN st_intersects(a.wkb_geometry, b.wkb_geometry)
   THEN ST_Area(ST_Intersection(a.wkb_geometry, b.wkb_geometry)::geography) 
 ELSE NULL END AS area 
 FROM ch01.priogrid a 
 LEFT JOIN ch01.landmass b
   ON st_intersects(a.wkb_geometry, b.wkb_geometry);

I am pretty new when it comes to improving query performance in general so any ideas on how I can do this are most welcome!

guyus
  • 141
  • 7
  • Why Brin instead of Gist, as a matter of interest? – John Powell Mar 16 '17 at 13:22
  • I tried both although it does not seem to make a large difference. in the documentation it says:

    "BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table."

    which I thought describes a grid-cell table pretty well

    – guyus Mar 16 '17 at 14:25
  • Yes, it does, except that a grid is two-dimensional, whereas time, which is what I think Brin is particularly suited for, has one. I have never tried Brin for spatial data -- maybe it is time I did. I know some people have suggested using clustering on a geohash to achieve similar, but I think R-trees (well, at least their Gist implementation ion Postgres), have other advantages over geohashes and resulting disk layout. Sorry, this doesn't help in the slightest with your question, I was just interested in the Brin part :-) – John Powell Mar 16 '17 at 14:58

0 Answers0