ST_Intersection slow query

Question

I am trying to perform an intersection between two layers:

Polyline layer representing some roads (~5500 rows)
Polygon layer representing irregularly shaped buffers around various points of interest (~47,000 rows)

Ultimately, what I'm trying to do is to clip the polylines to these many (sometimes overlapping) buffers, and then sum up the total length of roadway contained within each buffer.

The problem is that things are running SLOW. I'm not sure how long this should take, but I just aborted my query after > 34 hours. I'm hoping that someone can either point out where I've made some mistake with my SQL query, or can point me to a better way of doing this.

CREATE TABLE clip_roads AS

SELECT 
  ST_Intersection(b.the_geom, z.the_geom) AS clip_geom,
  b.*

FROM 
  public."roads" b, 
  public."buffer1KM" z

WHERE ST_Intersects(b.the_geom, z.the_geom);


CREATE INDEX "clip_roads_clip_geom_gist"
  ON "clip_roads"
  USING gist
  (clip_geom);



CREATE TABLE buffer1km_join AS

SELECT
  z.name, z.the_geom,
  sum(ST_Length(b.clip_geom)) AS sum_length_m

FROM
  public."clip_roads" b,
  public."buffer1KM" z

WHERE
  ST_Contains(z.the_geom, b.the_geom)

GROUP BY z.name, z.the_geom;

I do have a GiST index created for the original roads table, and (just to be safe?) create an index before doing the second table creation.

The query plan from PGAdmin III looks like this, though I'm afraid I don't have much skill in interpreting it:

"Nested Loop  (cost=0.00..29169.98 rows=35129 width=49364)"
"  Output: st_intersection(b.the_geom, z.the_geom), b.gid, b.geo_id, b.address_l, b.address_r, b.lf_name, b.lfn_id, b.lfn_name, b.lfn_type_c, b.lfn_type_d, b.lfn_dir_co, b.lfn_dir_de, b.lfn_desc, b.oe_flag_l, b.oe_flag_r, b.fcode_desc, b.fcode, b.fnode, b.tnode, b.metrd_num, b.lo_num_l, b.lo_n_suf_l, b.hi_num_l, b.hi_n_suf_l, b.lo_num_r, b.lo_n_suf_r, b.hi_num_r, b.hi_n_suf_r, b.juris_code, b.dir_code, b.dir_code_d, b.cp_type, b.length, b.the_geom"
"  Join Filter: _st_intersects(b.the_geom, z.the_geom)"
"  ->  Seq Scan on public."roads" b  (cost=0.00..306.72 rows=5472 width=918)"
"        Output: b.gid, b.geo_id, b.address_l, b.address_r, b.lf_name, b.lfn_id, b.lfn_name, b.lfn_type_c, b.lfn_type_d, b.lfn_dir_co, b.lfn_dir_de, b.lfn_desc, b.oe_flag_l, b.oe_flag_r, b.fcode_desc, b.fcode, b.fnode, b.tnode, b.metrd_num, b.lo_num_l, b.lo_n_suf_l, b.hi_num_l, b.hi_n_suf_l, b.lo_num_r, b.lo_n_suf_r, b.hi_num_r, b.hi_n_suf_r, b.juris_code, b.dir_code, b.dir_code_d, b.cp_type, b.length, b.the_geom"
"  ->  Index Scan using "buffer1KM_index_the_geom" on public."buffer1KM" z  (cost=0.00..3.41 rows=1 width=48446)"
"        Output: z.gid, z.objectid, z.facilityid, z.name, z.frombreak, z.tobreak, z.postal_cod, z.pc_area, z.ct_id, z.da_id, z.taz_id, z.edge_poly, z.cchs_0708, z.tts_06, z.the_geom"
"        Index Cond: (b.the_geom && z.the_geom)"

Is this operation just doomed to run for several days? I'm currently running this on PostGIS for Windows, but I could in theory throw more hardware at the problem by putting it up on Amazon EC2. However, I see that the query is only using one core at a time (is there a way to make it use more?).

What is Postgis running on? OS and Processor might be a factor. — Mapperz, Apr 15 '11 at 15:34
Hi Mapperz: OS is Windows 7, CPU is Core 2 Duo, Memory is 4GB (being Windows, running 32-bit PGSQL / PostGIS) — Peter, Apr 15 '11 at 15:49

score 6 · Accepted Answer · answered Apr 15 '11 at 15:30

6

Peter,

What version of PostGIS, GEOS, and PostgreSQL are you using?
do a

SELECT postgis_full_version(), version();

A lot of enhancements have been made between 1.4 and 1.5 and GEOS 3.2+ for this kind of thing.

Also how many vertices do your polygons have?

Do a

SELECT Max(ST_NPoints(the_geom)) As maxp FROM sometable;

To get a sense of your worst case scenario. Slow speed like this is often caused by geometries that are too finally grained. In which case you might want to simplify first.

Also have you made optimizations to your postgresql.conf file?

answered Apr 15 '11 at 15:30

Regina Obe

10,503
1
23
28

Hi LR1234567:
"POSTGIS="1.5.2" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.6.1, 21 August 2008" LIBXML="2.7.6" USE_STATS";"PostgreSQL 9.0.3, compiled by Visual C++ build 1500, 32-bit"

(running the other query now)
– Peter Apr 15 '11 at 15:50
The Max query ran faster than I expected: maxp = 2030
I suspect that's fairly fine grained?
– Peter Apr 15 '11 at 15:53
1

2,030 isn't bad actually. It could be you just have a lot of polygons that intersect. Generally the intersection is the part that is slowest.. Try doing a count on how many records actually intersect -- it could be huge. – Regina Obe Apr 15 '11 at 17:10
SELECT count(*)
FROM public."roads" b, public."buffer1KM" z

WHERE ST_Intersects(b.the_geom, z.the_geom);
– Regina Obe Apr 15 '11 at 17:12
1

Is 910,978 huge? This is the nice thing about starting on a new tech--I have no normative expectations :-) – Peter Apr 15 '11 at 18:08
Simplifying the polygons a little has allowed the operation to complete in some sort of finite time. – Peter Apr 17 '11 at 04:05
Yap that is pretty huge. You could have one polygon enveloping a lot of geometries. I suspect getting almost 1 million geometries out of 40,000 is not quite the answer you are looking for. – Regina Obe Apr 17 '11 at 17:04
You probably want to union your intersection, but before -- do a count of how many intersects per each geometry: SELECT ST_Intersection(b.the_geom, z.the_geom) AS clip_geom, count(z.the_geom) ... GROUP BY z.the_geom, z.gid
FROM public."roads" b, public."buffer1KM" z
– Regina Obe Apr 17 '11 at 17:19

score 1 · Answer 2 · edited May 23 '17 at 12:39

1

helpful stack-exchange answer: https://stackoverflow.com/questions/1162206/why-is-postgresql-so-slow-on-windows

Tuning postgres: http://wiki.postgresql.org/wiki/Performance_Optimization

from experience recommend VACUUM ANALYZE

edited May 23 '17 at 12:39

Community

1

answered Apr 15 '11 at 16:13

Mapperz

49,701
9
73
132

Thank you, that sounds like good advice. Some of the Windows issues such as the fork() penalty shouldn't be an issue here because I'm running a single connection, right? Also, have run VACUUM ANALYZE. I haven't dug into any performance optimization yet though. – Peter Apr 15 '11 at 16:22
1

shared_buffers and work_mem generally make the most difference. For shared_buffers you are a bit more limited how much you can up that on windows than you are on linux – Regina Obe Apr 15 '11 at 17:10
shared_buffers was already on, but work_mem was off. I've added 1 GB of work mem now. – Peter Apr 15 '11 at 18:15

score 1 · Answer 3 · answered Apr 17 '11 at 17:22

1

Shameless plug :) Might help to read chapter 8 and chapter 9 of our book. Just hot off the presses. We cover a lot of these kind of questions in those chapters.

http://www.postgis.us/chapter_08

http://www.postgis.us/chapter_09

answered Apr 17 '11 at 17:22

Regina Obe

10,503
1
23
28

Links are broken, is this referring to PostGIS in Action or the PostGIS Cookbook? – HeikkiVesanto Apr 14 '14 at 21:19
1

ah you are right. Those were links to first edition of PostGIS in Action -- which were valid back then. When we introduced 2nd edition we had to change the link structure. Old links those referred to are now here: http://www.postgis.us/chapters_edition_1 – Regina Obe Apr 14 '14 at 23:32

score 0 · Answer 4 · answered Nov 07 '13 at 07:02

0

See the two tips to optimize the spatial query. They work very well to me. http://kb.zillionics.com/optimize-spatial-query/

answered Nov 07 '13 at 07:02

zgle

1

2

This answer would be better with some more detail, such as how to apply them in this particular scenario. – BradHards Nov 07 '13 at 07:25

ST_Intersection slow query

4 Answers4

Linked