Grouping connected linestrings in PostGIS?

Question

I have a table of streets I've selected based on a set of attributes (let's say it's speed_limit < 25). There are groups of streets that are locally contiguous; I'd like to group these sets of connected linestrings into GeometryCollections. In the image below, there would be two GeometryCollections: one with the red lines and one with the blue lines.

enter image description here

I tried running a couple of "dissolve, deaggregate" queries along the lines of:

SELECT (ST_Dump(st_union)).geom
FROM 
    (SELECT ST_Union(geom) FROM roads) sq

With everything I've tried, I either end up with a single feature (ST_Union) or my original geometry (ST_Dump of ST_Union).

Maybe it's possible to do this with some kind of WITH RECURSIVE magic?

Something doesn't look right with "(ST_Dump(st_union)).geom" — Martin F, Apr 25 '14 at 19:04
Since he didn't alias ST_Union(geom) the name of the new geom inherited the name of the function to become st_union. That's why it looks a little funny — Regina Obe, Apr 26 '14 at 04:23

score 20 · Accepted Answer · answered Apr 25 '14 at 23:31

So, by example. Here's a simple table with two connected groups of edges:

drop table lines;
create table lines ( id integer primary key, geom geometry(linestring) );
insert into lines (id, geom) values ( 1, 'LINESTRING(0 0, 0 1)');
insert into lines (id, geom) values ( 2, 'LINESTRING(0 1, 1 1)');
insert into lines (id, geom) values ( 3, 'LINESTRING(1 1, 1 2)');
insert into lines (id, geom) values ( 4, 'LINESTRING(1 2, 2 2)');
insert into lines (id, geom) values ( 11, 'LINESTRING(10 10, 10 11)');
insert into lines (id, geom) values ( 12, 'LINESTRING(10 11, 11 11)');
insert into lines (id, geom) values ( 13, 'LINESTRING(11 11, 11 12)');
insert into lines (id, geom) values ( 14, 'LINESTRING(11 12, 12 12)');
create index lines_gix on lines using gist(geom);

Now, here's a recursive function that, given the id of an edge, accumulates all the edges that touch:

CREATE OR REPLACE FUNCTION find_connected(integer) returns integer[] AS
$$
WITH RECURSIVE lines_r AS (
  SELECT ARRAY[id] AS idlist, geom, id
  FROM lines 
  WHERE id = $1
  UNION ALL
  SELECT array_append(lines_r.idlist, lines.id) AS idlist, 
         lines.geom AS geom, 
         lines.id AS id
  FROM lines, lines_r
  WHERE ST_Touches(lines.geom, lines_r.geom)
  AND NOT lines_r.idlist @> ARRAY[lines.id]
)
SELECT 
  array_agg(id) AS idlist
  FROM lines_r
$$ 
LANGUAGE 'sql';

That just leaves us needing to find, after each group is accumulated, the id of an edge that is not already part of a group. Which, tragically, requires a second recursive query.

WITH RECURSIVE groups_r AS (
  (SELECT find_connected(id) AS idlist, 
          find_connected(id) AS grouplist, 
          id FROM lines WHERE id = 1)
  UNION ALL
  (SELECT array_cat(groups_r.idlist,find_connected(lines.id)) AS idlist,
         find_connected(lines.id) AS grouplist,
         lines.id
  FROM lines, groups_r
  WHERE NOT idlist @> ARRAY[lines.id]
  LIMIT 1)
)
SELECT id, grouplist
FROM groups_r;

Which taken together return a nice set with the seed id and each group it accumulated. I leave it as an exercise to the reader to turn the arrays of id back into a query to create geometry for mapping.

 id |   grouplist   
----+---------------
  1 | {1,2,3,4}
 11 | {11,12,13,14}
(2 rows)

I think this code could be simpler, if the geometry type supporting hashing in PostgreSQL (when you write a simpler RCTE that doesn't involve accumulating arrays of ids, you get an error "All column datatypes must be hashable"), so there's a little enhancement request for me. — Paul Ramsey, Apr 25 '14 at 23:41
This is a really awesome approach. I'm noticing some odd results as I apply it to a larger test set; I'll see if I can reduce the issue to a simple example.
100 lines: 85 clusters, largest cluster=3, 0.03 s //// 200 lines: 144 clusters, largest cluster=9, 0.08 s ////
300 lines: 180 clusters, largest cluster=51, 0.16 s //// 400 lines: 188 clusters, largest cluster=41, 0.27 s ////
500 lines: 176 clusters, largest cluster=112, 0.56 s //// 600 lines: 143 clusters, largest cluster=449, 1.0 s //// 650 lines: 133 clusters, largest cluster=7601, 6.8 s — dbaston, Apr 26 '14 at 13:05
Adding this to the test data will cause duplicate IDs in the grouplist array: insert into lines (id, geom) values ( 15, 'LINESTRING(0 0, 10 10)');. Changing array_agg(id) in the function return to array_agg(DISTINCT id) seems to resolve the issue. — dbaston, Apr 26 '14 at 13:26
This is a good solution, so now how can we store the geometries in a table so that we can see the connected lines? — zakaria mouqcit, Oct 13 '17 at 10:10

score 7 · Answer 2 · answered Apr 26 '14 at 13:12

Here's an approach that uses a temporary table to incrementally aggregate clusters together. I don't really care for the temporary table approach, but this seems to perform quite well as the number of lines increases (I have 1.2 M lines in my input).

DO
$$
DECLARE
this_id bigint;
this_geom geometry;
cluster_id_match integer;

id_a bigint;
id_b bigint;

BEGIN
DROP TABLE IF EXISTS clusters;
CREATE TABLE clusters (cluster_id serial, ids bigint[], geom geometry);
CREATE INDEX ON clusters USING GIST(geom);

-- Iterate through linestrings, assigning each to a cluster (if there is an intersection)
-- or creating a new cluster (if there is not)
FOR this_id, this_geom IN SELECT id, geom FROM lines LOOP
  -- Look for an intersecting cluster.  (There may be more than one.)
  SELECT cluster_id FROM clusters WHERE ST_Intersects(this_geom, clusters.geom)
     LIMIT 1 INTO cluster_id_match;

  IF cluster_id_match IS NULL THEN
     -- Create a new cluster
     INSERT INTO clusters (ids, geom) VALUES (ARRAY[this_id], this_geom);
  ELSE
     -- Append line to existing cluster
     UPDATE clusters SET geom = ST_Union(this_geom, geom),
                          ids = array_prepend(this_id, ids)
      WHERE clusters.cluster_id = cluster_id_match;
  END IF;
END LOOP;

-- Iterate through the clusters, combining clusters that intersect each other
LOOP
    SELECT a.cluster_id, b.cluster_id FROM clusters a, clusters b 
     WHERE ST_Intersects(a.geom, b.geom)
       AND a.cluster_id < b.cluster_id
      INTO id_a, id_b;

    EXIT WHEN id_a IS NULL;
    -- Merge cluster A into cluster B
    UPDATE clusters a SET geom = ST_Union(a.geom, b.geom), ids = array_cat(a.ids, b.ids)
      FROM clusters b
     WHERE a.cluster_id = id_a AND b.cluster_id = id_b;

    -- Remove cluster B
    DELETE FROM clusters WHERE cluster_id = id_b;
END LOOP;
END;
$$ language plpgsql;

@zakariamouqcit Glad this worked for you! I wrote this answer before I wrote the ST_ClusterIntersecting function in PostGIS. If your data is small enough to fit into memory, I'd suggest checking that out for a more performant solution. — dbaston, Oct 13 '17 at 12:38
searching for this question brought me here. Tried the iterative and the st_clusterintersecting but found st_clusterDBScan to be the most apt. In case anyone else is brought here too. https://postgis.net/docs/manual-dev/ST_ClusterDBSCAN.html — D_C, Dec 24 '19 at 17:07
Agreed, ST_ClusterDBSCAN is almost always the best way to go for PostGIS 2.3+ — dbaston, Dec 24 '19 at 18:15

Grouping connected linestrings in PostGIS?

2 Answers2

Linked