11

I am trying to perform an intersection between two layers:

  1. Polyline layer representing some roads (~5500 rows)
  2. Polygon layer representing irregularly shaped buffers around various points of interest (~47,000 rows)

Ultimately, what I'm trying to do is to clip the polylines to these many (sometimes overlapping) buffers, and then sum up the total length of roadway contained within each buffer.

The problem is that things are running SLOW. I'm not sure how long this should take, but I just aborted my query after > 34 hours. I'm hoping that someone can either point out where I've made some mistake with my SQL query, or can point me to a better way of doing this.

CREATE TABLE clip_roads AS

SELECT 
  ST_Intersection(b.the_geom, z.the_geom) AS clip_geom,
  b.*

FROM 
  public."roads" b, 
  public."buffer1KM" z

WHERE ST_Intersects(b.the_geom, z.the_geom);


CREATE INDEX "clip_roads_clip_geom_gist"
  ON "clip_roads"
  USING gist
  (clip_geom);



CREATE TABLE buffer1km_join AS

SELECT
  z.name, z.the_geom,
  sum(ST_Length(b.clip_geom)) AS sum_length_m

FROM
  public."clip_roads" b,
  public."buffer1KM" z

WHERE
  ST_Contains(z.the_geom, b.the_geom)

GROUP BY z.name, z.the_geom;

I do have a GiST index created for the original roads table, and (just to be safe?) create an index before doing the second table creation.

The query plan from PGAdmin III looks like this, though I'm afraid I don't have much skill in interpreting it:

"Nested Loop  (cost=0.00..29169.98 rows=35129 width=49364)"
"  Output: st_intersection(b.the_geom, z.the_geom), b.gid, b.geo_id, b.address_l, b.address_r, b.lf_name, b.lfn_id, b.lfn_name, b.lfn_type_c, b.lfn_type_d, b.lfn_dir_co, b.lfn_dir_de, b.lfn_desc, b.oe_flag_l, b.oe_flag_r, b.fcode_desc, b.fcode, b.fnode, b.tnode, b.metrd_num, b.lo_num_l, b.lo_n_suf_l, b.hi_num_l, b.hi_n_suf_l, b.lo_num_r, b.lo_n_suf_r, b.hi_num_r, b.hi_n_suf_r, b.juris_code, b.dir_code, b.dir_code_d, b.cp_type, b.length, b.the_geom"
"  Join Filter: _st_intersects(b.the_geom, z.the_geom)"
"  ->  Seq Scan on public."roads" b  (cost=0.00..306.72 rows=5472 width=918)"
"        Output: b.gid, b.geo_id, b.address_l, b.address_r, b.lf_name, b.lfn_id, b.lfn_name, b.lfn_type_c, b.lfn_type_d, b.lfn_dir_co, b.lfn_dir_de, b.lfn_desc, b.oe_flag_l, b.oe_flag_r, b.fcode_desc, b.fcode, b.fnode, b.tnode, b.metrd_num, b.lo_num_l, b.lo_n_suf_l, b.hi_num_l, b.hi_n_suf_l, b.lo_num_r, b.lo_n_suf_r, b.hi_num_r, b.hi_n_suf_r, b.juris_code, b.dir_code, b.dir_code_d, b.cp_type, b.length, b.the_geom"
"  ->  Index Scan using "buffer1KM_index_the_geom" on public."buffer1KM" z  (cost=0.00..3.41 rows=1 width=48446)"
"        Output: z.gid, z.objectid, z.facilityid, z.name, z.frombreak, z.tobreak, z.postal_cod, z.pc_area, z.ct_id, z.da_id, z.taz_id, z.edge_poly, z.cchs_0708, z.tts_06, z.the_geom"
"        Index Cond: (b.the_geom && z.the_geom)"

Is this operation just doomed to run for several days? I'm currently running this on PostGIS for Windows, but I could in theory throw more hardware at the problem by putting it up on Amazon EC2. However, I see that the query is only using one core at a time (is there a way to make it use more?).

mgri
  • 16,159
  • 6
  • 47
  • 80
Peter
  • 353
  • 2
  • 6

4 Answers4

6

Peter,

What version of PostGIS, GEOS, and PostgreSQL are you using?
do a

SELECT postgis_full_version(), version();

A lot of enhancements have been made between 1.4 and 1.5 and GEOS 3.2+ for this kind of thing.

Also how many vertices do your polygons have?

Do a

SELECT Max(ST_NPoints(the_geom)) As maxp FROM sometable;

To get a sense of your worst case scenario. Slow speed like this is often caused by geometries that are too finally grained. In which case you might want to simplify first.

Also have you made optimizations to your postgresql.conf file?

Regina Obe
  • 10,503
  • 1
  • 23
  • 28
  • Hi LR1234567:

    "POSTGIS="1.5.2" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.6.1, 21 August 2008" LIBXML="2.7.6" USE_STATS";"PostgreSQL 9.0.3, compiled by Visual C++ build 1500, 32-bit"

    (running the other query now)

    – Peter Apr 15 '11 at 15:50
  • The Max query ran faster than I expected: maxp = 2030

    I suspect that's fairly fine grained?

    – Peter Apr 15 '11 at 15:53
  • 1
    2,030 isn't bad actually. It could be you just have a lot of polygons that intersect. Generally the intersection is the part that is slowest.. Try doing a count on how many records actually intersect -- it could be huge. – Regina Obe Apr 15 '11 at 17:10
  • SELECT count(*)

    FROM public."roads" b, public."buffer1KM" z

    WHERE ST_Intersects(b.the_geom, z.the_geom);

    – Regina Obe Apr 15 '11 at 17:12
  • 1
    Is 910,978 huge? This is the nice thing about starting on a new tech--I have no normative expectations :-) – Peter Apr 15 '11 at 18:08
  • Simplifying the polygons a little has allowed the operation to complete in some sort of finite time. – Peter Apr 17 '11 at 04:05
  • Yap that is pretty huge. You could have one polygon enveloping a lot of geometries. I suspect getting almost 1 million geometries out of 40,000 is not quite the answer you are looking for. – Regina Obe Apr 17 '11 at 17:04
  • You probably want to union your intersection, but before -- do a count of how many intersects per each geometry: SELECT ST_Intersection(b.the_geom, z.the_geom) AS clip_geom, count(z.the_geom) ... GROUP BY z.the_geom, z.gid

    FROM public."roads" b, public."buffer1KM" z

    – Regina Obe Apr 17 '11 at 17:19
1

helpful stack-exchange answer: https://stackoverflow.com/questions/1162206/why-is-postgresql-so-slow-on-windows

Tuning postgres: http://wiki.postgresql.org/wiki/Performance_Optimization

from experience recommend VACUUM ANALYZE

Mapperz
  • 49,701
  • 9
  • 73
  • 132
  • Thank you, that sounds like good advice. Some of the Windows issues such as the fork() penalty shouldn't be an issue here because I'm running a single connection, right? Also, have run VACUUM ANALYZE. I haven't dug into any performance optimization yet though. – Peter Apr 15 '11 at 16:22
  • 1
    shared_buffers and work_mem generally make the most difference. For shared_buffers you are a bit more limited how much you can up that on windows than you are on linux – Regina Obe Apr 15 '11 at 17:10
  • shared_buffers was already on, but work_mem was off. I've added 1 GB of work mem now. – Peter Apr 15 '11 at 18:15
1

Shameless plug :) Might help to read chapter 8 and chapter 9 of our book. Just hot off the presses. We cover a lot of these kind of questions in those chapters.

http://www.postgis.us/chapter_08

http://www.postgis.us/chapter_09

Regina Obe
  • 10,503
  • 1
  • 23
  • 28
  • Links are broken, is this referring to PostGIS in Action or the PostGIS Cookbook? – HeikkiVesanto Apr 14 '14 at 21:19
  • 1
    ah you are right. Those were links to first edition of PostGIS in Action -- which were valid back then. When we introduced 2nd edition we had to change the link structure. Old links those referred to are now here: http://www.postgis.us/chapters_edition_1 – Regina Obe Apr 14 '14 at 23:32
0

See the two tips to optimize the spatial query. They work very well to me. http://kb.zillionics.com/optimize-spatial-query/

zgle
  • 1
  • 2
    This answer would be better with some more detail, such as how to apply them in this particular scenario. – BradHards Nov 07 '13 at 07:25