Select subset that is spatially equally distributed

Question

Let's say I have a table point with ~10.000 rows. Each row contains a POINT(x, y).

I also have a table polygon with POLYGONS(...).

The points in the point table are unequally distributed in space. There are areas with a lot of points, there are areas with little points. I want to draw sample of the table points of let's say 100, that is as much as possible equally distributed in one of the polygons from table polygon.

point
id    geom    name
1     ...     point1
2     ...     point2
3     ...     ...
4     ...     ...
...   ...     ...
...   ...     ...
polygon
id    geom    name
1     ...     poly1
2     ...     poly2
3     ...     ...
4     ...     ...
...   ...     ...
...   ...     ...

What is the best way to do this with postgres/postgis?

This is a screenshot of the points and the underlying polygons. These are 97300 points. The speed of the query giving me the sample does not really matter, since I only draw this sample once.

@BERA Yes, lets say 100 points in Spain, spreaded as much as possible over the whole area of Spain. What do you mean by optimize the spread, or is just "spread out enough"? — four-eyes, Apr 16 '22 at 10:43
Select all points in, say, Spain -> ST_ClusterKMeans with 100 custers -> select the closest point to the cluster centroid per cluster. — geozelot, Apr 16 '22 at 10:43
I mean that there can be many different solutions/results, is it important to find the optimal — BERA, Apr 16 '22 at 10:45
@BERA Not really. A "rough" spatial equally distribution of the points is enough. Its not scientific... — four-eyes, Apr 16 '22 at 10:47
Compute a kernel density smoothing, then use the inverse of that as a weighted random sampler of the points. Not sure how you'd implement that in PostGIS without doing something in PL/R or PL/some_other_language though... — Spacedman, Apr 16 '22 at 11:29
@geozelot won't K-means clustering tend to lump outliers in with denser clusters? Whereas for a "least-dense" distribution outliers should be included. — dr_jts, Apr 17 '22 at 02:06
Seems like @Spacedman has already provided a possible solution to this: https://stackoverflow.com/a/22284026/2308716. It involves iteratively computing the Voronoi of the points and removing the point with smallest cell. But probably doesn't scale to large datasets. — dr_jts, Apr 17 '22 at 05:30
If the requirement was to create a subset of points with a minimum separation, this approach could be used: https://gis.stackexchange.com/a/364847/14766. But requiring a fixed number of result points is harder. Possibly a decremental Delaunay algorithm could be used, but hard to do this in raw PostGIS. — dr_jts, Apr 17 '22 at 05:33
@dr_jts yeah...scientifically, kmeans probably won't satisfy an unbiased, truly equal-distribution sample. On the other hand, I thnk it would produce a sample that represents the actual distribution quite naturally, plus having the most diverse spread between sample points. BUT: it seems I may have misread the actual outcome...how about finding the closest points to the cell centroids of a hexagon grid with 100 cells? Can't be much more equally spaced than that.. — geozelot, Apr 17 '22 at 08:37
The hard part is to create a grid which covers an irregularly shaped region. — dr_jts, Apr 17 '22 at 17:15

Select subset that is spatially equally distributed

0 Answers0