4

I am maintaining a table which has geometry datatype and it's possible that the table can contain duplicate geometries. If I want to remove duplicate rows or select only non-duplicate one, will both the approach work same?

SELECT (ST_dump(ST_union(geom))).geom from mytable;

OR

DELETE FROM mytable a
USING mytable b 
WHERE a.id > b.id 
AND ST_Equals(a.geom, b.geom)
AND a.geom && b.geom;

SELECT geom from mytable;

Will both the queries work same when we call SELECT statement?

geozelot
  • 30,050
  • 4
  • 32
  • 56
postgis-user
  • 103
  • 4
  • Not sure how union behaves for duplicate geometries but the latter query is the correct way to do this. You can speed it up with a simple additional within condition https://gis.stackexchange.com/a/393163/6253 – Timothy Dalton May 05 '23 at 08:56

1 Answers1

4

TL;DR: Depends¹.

A flexible and generally better performing approach for both cases would be to use an EXISTS expression:

  • SELECT:
    SELECT
      t.*
    FROM
      <mytable> AS t
    WHERE
      NOT EXISTS (
        SELECT
          1
        FROM
          <mytable> AS _t
        WHERE
          _t.id < t.id
          AND
          ST_Equals(_t.geom, t.geom)
      )
    ;
    
  • DELETE:
    DELETE
    FROM
      <mytable> AS t
    WHERE
      EXISTS (
        SELECT
          1
        FROM
          <mytable> AS _t
        WHERE
          _t.id < t.id
          AND
          ST_Equals(_t.geom, t.geom)
      )
    ;
    

Note that ST_Equals makes use of a spatial index via ~= - no need for additional conditions.

The foremost advantage is avoiding an aggregation in the SELECT, leaving you with full access to rows themselves.

Both statements will keep the duplicate with the lowest id.


¹ As soon as there are overlaps between non-equal geometries, the ST_Union operation may have side-effects, and may return different geometric compositions.

More generally, being pedantically precise:

  • the ST_Union of equal geometries may result in a different vertex order than a selection by a.id > b.id
  • ST_Union recreates geometries, which may be subject to coordinate differences in the range of floating point imprecision

For POINT geometries this doesn't matter much. For other types, however, you will see a larger impact with higher geometric complexity - up to a point where you get (topo)logically deteriorated results.

Overall these consequences may not be an issue for your use case; yet, the ST_Union approach is not guaranteed to give you the same results as a simple SELECT by equality.

geozelot
  • 30,050
  • 4
  • 32
  • 56