2

Spatial clustering with PostGIS including attributes

I have a points table with some attributes like : year, type, etc. And I'd like to do a points clustering in PostGIS. I followed the subject in Spatial clustering with PostGIS

But I have the following problems:

first problem without considering the attributes: I got a table with points and I can visualise it in Qgis but not in ArcMap and the result has no SRID (the original table has srid :4326) and I'd like the output with geom as point or polygon (not geometry collection); second problem: I'd like to do a clustering for every year (2014,2015, 2016) and every type (a, b, c) and get the result in one table but I do not know if it is possible and how to do it?

CREATE TABLE table_clusterd AS (
    SELECT row_number() over () AS id,
      ST_NumGeometries(gc) as radius,
      gc AS geom_collection,
      ST_Centroid(gc) AS centroid,
      ST_MinimumBoundingCircle(gc) AS circle,
   FROM (
     SELECT unnest(ST_ClusterWithin(geom, 100)) gc
     FROM myTable
   ) 
f);
sam
  • 45
  • 5
  • It is easy to set the SRID, you just wrap the geometries with ST_SetSRID and you can recovered the points using ST_CollectionExtract. You simply use ST_ClusterWithin in combination with Group By to do it by attributes. – John Powell Mar 17 '16 at 08:45

1 Answers1

5

To answer your main question, to cluster by attributes, you simply use GROUP BY attribute. AS ST_ClusterWithin will operate on any geometry type, it returns a GeometryCollection. To only pull out points, use ST_CollectionExtract(unnest(ST_ClusterWithin(geom, dist)),1). Finally, you can call ST_SetSRID(geom, 4326) to convert back to 4326. Here is an example that clusters on attributes, a, b, c and years, 2014, 2015 and 2016.

WITH 
  testvals (att, year, geom) AS 
    (SELECT 
      ('[0:2]={a,b,c}'::text[])[trunc(random()*3)], 
      ('[0:2]={2014, 2015, 2016}'::int[])[trunc(random()*3)],     
      ST_MakePoint(random()*100, random()*100) 
    FROM generate_series(1, 1000)),
 clusters(att, year, geom) AS
   (SELECT 
      att,
      year,
      ST_CollectionExtract(unnest(ST_ClusterWithin(geom, 50)), 1)
   FROM testvals 
   GROUP BY att, year 
   ORDER BY att, year)
SELECT 
  row_number() over() AS id,
  att, 
  year, 
  ST_Numgeometries(geom) as num_geoms, 
  ST_AsText(
    ST_SetSRID(
        ST_Centroid(
          ST_MinimumBoundingCircle(geom))
   , 4326)
  ) 
FROM clusters;

To convert this in to a create table, simply remove the ST_AsText (that is for debugging) and the WITH clause and replace testvals with your own table name and SELECT from clusters.

This produces something like the following:

id | att | year | num_geoms | geom
----+-----+------+------------------+-----------------------------

1 | a | 2014 | 104 | POINT(47.8713526488699 47.5714471330078)

2 | a | 2015 | 133 | POINT(50.5456980698809 48.7293153893785)

3 | a | 2016 | 92 | POINT(51.6473020685052 48.6573579298721)

4 | b | 2014 | 95 | POINT(53.6228174459529 45.5578126914714)

etc. As I have used 1000 points spread over a 100 x 100 grid with the distance parameter of 50, there are 9 clusters produced, as you would expect, for each attribute, year combo. For real world data, you will likely get more clusters. Note, if there is only one point in the cluster, then ST_Centroid(ST_MinimumBoundingCircle(... will return EMTPY POINT, which you might need to handle.

This crazy looking construct,

('[0:2]={a,b,c}'::text[])[trunc(random()*3)]

is a quick way to randomly generate a series of a, b, and c. For more see this answer and generate_series is used like a loop to create x number of samples.

Depending on the distance parameter to ST_ClusterWithin, and the number of rows created by generate_series, the above demo will return anything from 9 rows (distance 100), ie, all combinations of a, b, c and 2014, 2015, 2016 to 100 rows (distance 1). Usage may vary.

John Powell
  • 13,649
  • 5
  • 46
  • 62
  • Thanks a lot for answer. Still have some problems with geometry and srid.. For make it easier how can I get the result as a point from the centroid of every cluster. If I use the following – sam Mar 17 '16 at 14:41
  • Create table myTable AS ( SELECT (ST_CollectionExtract(unnest(ST_ClusterWithin(geom, 10000)), 1)) as geom, row_number() over () AS id, att, year, SUM(att2) as att3, ST_Centroid(geom) AS centroid, ST_NumGeometries(geom) AS radius FROM myTable GROUP BY att, year ORDER BY att, year ); I got that I should use geom with the Group by.. But if I do a group by geom I got multipoint.. thanks for help and explaination (I’m still learning) – sam Mar 17 '16 at 14:43
  • You will get a multipoint, that is what the clustering does. – John Powell Mar 17 '16 at 14:59
  • Can I get the center of the circle of the cluster (like in the example http://gis.stackexchange.com/questions/11567/spatial-clustering-with-postgis) – sam Mar 17 '16 at 15:02
  • You could, yes, but that isn't what you asked :D You would need to wrap the returned row with ST_MinimumBoundingCircle and then call ST_Centroid. Would you like me to write that up? – John Powell Mar 17 '16 at 16:50
  • yes, please.. To be more clear: I have a points table and I'd like to do a clustering.. the result should be points created from the center of every cluster with the number of geometry included in this cluster and the other attribute, srid, primary key (). I did the following – sam Mar 17 '16 at 17:30
  • create table newTable as ( select ST_Centroid((ST_MinimumBoundingCircle(unnest(ST_ClusterWithin(geom, 10000))))) as geometry, row_number() over () AS id, att, year, ST_NumGeometries(unnest(ST_ClusterWithin(geom, 10000))) AS radius, SUM(att2) as att2 from myTable GROUP BY att, year ORDER BY att, year ) – sam Mar 17 '16 at 17:30
  • Updated the query and fixed a typo, which missed the , 4326 in the ST_SetSRID call. I think this is now closer to what you wanted. – John Powell Mar 18 '16 at 10:05
  • thanks a lot John.. It's works and I will test it with the real data. – sam Mar 18 '16 at 12:19
  • No problem, it was quite fun writing that :D. There are new clustering algorithms coming in Postgis 2.3, for what it is worth, DBScan and k-means natively. – John Powell Mar 18 '16 at 12:22
  • waiting for that.. I do a cluster with mapserver and it is much better.. But with a very big data, I wanted to test this function.. Thanks again – sam Mar 18 '16 at 12:30