8

I have a point layer with real estate data, obtained after geocoding by addresses. In case of multi apartment housing all points with data for several apartments in one building has identical coordinates and are regarded by the "Topology Checker" as duplicated geometries.

That's what I have after checking for duplicates

As soon as my intention is to get an interpolation of values associated with that points I can't just drop that duplicates, but need to redistribute points randomly within the boundaries of the underlying building polygon. Another possible way - random redistribution along the boundary line of a polygon. Like on my drawing.

enter image description here

How can I do that?

Taras
  • 32,823
  • 4
  • 66
  • 137
Burtsev
  • 535
  • 2
  • 8
  • Are you aware of this famous answer from @Babel: https://gis.stackexchange.com/questions/380388/re-locating-points-randomly-at-certain-distance-in-qgis/380390#380390 ? P.S. It is only for a 1:1 point-polygon relation – Taras Jul 29 '21 at 05:45
  • 1
    @Taras, thank you. Several possible solution explained. Will try to follow... – Burtsev Jul 30 '21 at 03:22
  • 1
    @Taras : famous? That's great! I feel flattered... ;-) – Babel Aug 18 '21 at 07:44
  • Until now it is your best answer, is not it? That is why it is famous :) – Taras Aug 18 '21 at 07:46

3 Answers3

7

Here is how you can do this using a PyQGIS script. Open the Python-Console, copy paste the code, change your settings at the top of the code and hit run. You can also cancel the execution via QGIS-Task-Manger (video showing how to use it) if you want to. See the example-GIF below. Comments in the script explain how it works.

### SETTINGS ###
POINT_LAYERNAME = 'Points' # Name of input-point-layer
POLYGON_LAYERNAME = 'Buildings' # Name of input-polygon-layer
OVERLAYDISTANCE = 0 # tolerance-distance (in CRS units) to identify geometric duplicates (0 is an exact duplicate)

""" This script redistributes points inside a polygon randomly, if there are duplicate-geometry-points. Tested with QGIS 3.10.12

Hints:

  • Both input layer need to be in the same CRS
  • Size and shape of the polygons do have a massive impact on performace due to the bruteforce-method of creating random points inside them
  • Use the QGIS-Task-Manager to cancel the process at any time

"""

NO CHANGES NEEDED BELOW

import random from datetime import datetime

from qgis.core import ( QgsApplication, QgsTask, QgsMessageLog, ) MESSAGE_CATEGORY = 'Redistribute Points in Polygon'

class redistributePoints(QgsTask): def init(self, description, duration): super().init(description, QgsTask.CanCancel) self.duration = duration self.total = 0 self.iterations = 0 self.starttime = datetime.now() self.exception = None self.vl = QgsVectorLayer("Point", "RedistributedPoints", "memory") # create the new output layer

def run(self):
    QgsMessageLog.logMessage('Started task "{}"'.format(
                                 self.description()),
                             MESSAGE_CATEGORY, Qgis.Info)

    ### RUN ###
    points = QgsProject.instance().mapLayersByName(POINT_LAYERNAME)[0] # get the input-point-layer
    polygons = QgsProject.instance().mapLayersByName(POLYGON_LAYERNAME)[0] # get the input-polygon-layer
    self.total = points.featureCount() # count features for progress bar

    if points.sourceCrs() != polygons.sourceCrs(): # Check if both layers are in the same CRS, if not cancel the task
        self.exception = 'Pointlayer and Polygonlayer need to be in the same CRS'
        return False # cancel the process

    # Create the new temporary layer
    crs = points.sourceCrs() # read source CRS
    fields = points.fields() # read source Fields
    self.vl.setCrs(crs) # Set outputlayer CRS to source CRS
    pr = self.vl.dataProvider()
    pr.addAttributes(fields) # add source Fields to output layer
    self.vl.updateFields() # update the outputlayer
    #QgsProject.instance().addMapLayer(vl) # we cannot add a layer to the canvas this way when using qgstasks. Need to add them when the task is finished

    # Create the spatial indizes
    points_idx = QgsSpatialIndex(points.getFeatures())
    polygons_idx = QgsSpatialIndex(polygons.getFeatures())

    with edit(self.vl): # edit the ouput layer
        for i, point in enumerate(points.getFeatures()): # iterate over source layer
            self.setProgress(i/self.total*100) # update the progressbar
            pr.addFeature(point) # copy the current point to the output layer
            self.vl.updateFeature(point) # update it

            # create a list of nearest neighbors (their feature ids) from the current point
            nearestneighbors = points_idx.nearestNeighbor(point.geometry(), neighbors=1, maxDistance=OVERLAYDISTANCE)

            # if the list has more than one entry, this means that there is another point at the exact same place,
            # so basically a duplicate geometry. Only shift the point if that is the case. If there is no duplicate geometry,
            # leave the point where it is. This will speed up things.
            if len(nearestneighbors) > 1:

                # iterate over the polygons that intersect with the current point.
                # Using a spatial index will speed up iterations, as we do not have to make a point-in-polygon-test for all polygons.
                for id in polygons_idx.intersects(point.geometry().boundingBox()):

                    polygon = polygons.getFeature(id) # get the polygon-feature by its id

                    # get the polygons bounding box and read maxima amd minima
                    ext = polygon.geometry().boundingBox() 
                    xmin = ext.xMinimum()
                    xmax = ext.xMaximum()
                    ymin = ext.yMinimum()
                    ymax = ext.yMaximum()

                    # set the indicator for the new point to find inside the polygon to false
                    inside = False

                    # now brute-force points until we get a random one that is not only inside the polygons boundingbox
                    # but also inside the polygon itself
                    # this is the step needing the most performance currently, so if there is another method, let me know
                    # idea is taken from this answer: https://gis.stackexchange.com/a/259185/107424
                    while inside is False:
                        if self.isCanceled(): # give user the ability to cancel the whole process
                            return False # cancel the process
                        self.iterations += 1 # count iterations for statistical purposes :)
                        x = random.uniform(xmin, xmax) # create a random x coordinate inside the polygons boundingbox
                        y = random.uniform(ymin, ymax) # create a random y coordinate inside the polygons boundingbox
                        pt = QgsPointXY(x, y) # create a QGIS-Point out of the random coordinates
                        if QgsGeometry.fromPointXY(pt).within(polygon.geometry()): # Check if that point is inside the polygon
                            inside = True # if it does, return true and end the while-loop
                    point.setGeometry(QgsGeometry.fromPointXY(pt)) # change the geometry of the current point to the new random location inside the polygon
                    self.vl.updateFeature(point) # update the changes
    return True # done with no errors

def finished(self, result):
    endtime = datetime.now()
    runtime = endtime - self.starttime

    if result:
        # add the outputlayer to the canvas
        QgsProject.instance().addMapLayer(self.vl)

        # throwback a log message
        QgsMessageLog.logMessage(
            'Task "{name}" completed\n' \
            'Runtime: {runtime} (with {iterations} '\
            'iterations for {total} points)'.format(
              name = self.description(),
              runtime = runtime,
              iterations = self.iterations,
              total = self.total),
          MESSAGE_CATEGORY, Qgis.Success)
    else:
        if self.exception is None:
            QgsMessageLog.logMessage(
                'Task "{name}" not successful but without '\
                'exception (probably the task was manually '\
                'canceled by the user)'.format(
                    name=self.description()),
                MESSAGE_CATEGORY, Qgis.Warning)
        else:
            QgsMessageLog.logMessage(
                'Task "{name}" Exception: {exception}'.format(
                    name=self.description(),
                    exception=self.exception),
                MESSAGE_CATEGORY, Qgis.Critical)
            raise self.exception

def cancel(self):
    QgsMessageLog.logMessage(
        'Task "{name}" was canceled'.format(
            name=self.description()),
        MESSAGE_CATEGORY, Qgis.Info)
    super().cancel()

start the task as background process to keep QGIS responsive

see https://docs.qgis.org/3.16/en/docs/pyqgis_developer_cookbook/tasks.html

task = redistributePoints('Redistribute Points in Polygon', 5) QgsApplication.taskManager().addTask(task)

enter image description here

MrXsquared
  • 34,292
  • 21
  • 67
  • 117
  • for a while I'm getting an AssertionError, while trying to edit my point layer with this script. – Burtsev Jul 30 '21 at 02:49
  • 1
    @Burtsev I have completely rewritten my answer with a lot of performance improvements. – MrXsquared Jul 31 '21 at 16:25
  • 1
    Amazing code! including the sleep() function – Taras Aug 02 '21 at 07:38
  • 1
    @Taras thanks :) and thanks for the hint: Sleep is an unused import, just a leftover from the QGIS-Docs-Example for implementing tasks: https://docs.qgis.org/3.16/en/docs/pyqgis_developer_cookbook/tasks.html#extending-qgstask Just removed that one. – MrXsquared Aug 02 '21 at 12:09
  • 1
    @MRXsquared, your help has been helpful and comprehensive. At the same time I learned about AssertionError and the possibilities of TaskManager! – Burtsev Aug 03 '21 at 04:07
7

If you have a PostgreSQL BD with PostGIS, you can do the following query :

SELECT
  a.*,
  ST_GeneratePoints(b.geom, 1, floor(random() * 5000 + 1)::int) AS new_address_geom
FROM
  addresses a,
  buildings b
WHERE
  ST_CONTAINS(b.geom, a.geom)
;
  • It generates for each address point a new location within the building
  • ST_GeneratePoints documentation
J. Monticolo
  • 15,695
  • 1
  • 29
  • 64
  • 1
    PostGIS was not mentioned – Taras Aug 02 '21 at 14:26
  • 2
    Yes, but maybe someone has the same question and has its data on PostGIS, so there is this solution. – J. Monticolo Aug 02 '21 at 16:04
  • 1
    Yes, PostGIS has its magic – Taras Aug 02 '21 at 16:04
  • 1
    So, in PostGIS everything is much more simpler! – Burtsev Aug 03 '21 at 04:09
  • @J.Monticolo I would like to use this postgis solution, but I'm lost on the last argument for the ST_GeneratePoints for defining the number of points: (b, geom, 2, floor(random()*5000+1)::int). I have duplicate points as well, but not sure how to define the duplicate points. – MJM Mar 23 '23 at 20:15
  • 1
    @MJM: look at the function documentation. You specify 2 points inside each polygon, the last argument is the random seed for the point position inside the geometry. – J. Monticolo Mar 25 '23 at 19:04
3

Probably not as good as the approach offered by @MrXsquared, however, it does not require scripting and purely based on domestic tools&functions.

Let's assume we have five features in 'poly_test' (purplish) and ten in 'points' (orange) accordingly, see image below. As you can see there once in a while several point per same location.

input

Here are several steps that can be executed to achieve the desired output.

Step 1. Count points in polygon and concatenate a unique attribute of those points. Alternatively the "Count points in polygon" can be used.

In the Field calculator it can be done via array_length(overlay_contains('points', expression:="id")) for number of points and array_to_string(overlay_contains('points', expression:="id")) for concatenation of unique attribute

step_1

Step 2. Run the "Random points inside polygons" geolgorithm with specifying the "pois_num" field as one for 'Point count or density' (I used Points count for 'Sampling strategy')

step_2

If needed filter split polygons into ones with only one point and ones that contain more than one point.

Step 3. Apply the "Join attributes by location" to get attributes of polygons

step_3

Step 4. "Add autoincremental field" to make a new "id" i.e. index within each group (a set of points and a polygon feature). Polygons' "_id" was used for 'Group values by [optional]'

step_4

Step 5. Get back the "id"s of the original point layer using the array_get(string_to_array("_pois_ids"),"AUTO") in the Field Calculator. Other attributes from the original point layer can achieved further via a joining to the "id"

result

P.S. If the procedure must be repeated multiple times then the above workflow recommended to transform into a model


I also tried a "Virtual Layer" through Layer > Add Layer > Add/Edit Virtual Layer... with the following query

SELECT poly.id AS id1,
       poi.id AS id2,
       point_on_surface(poly.geometry) AS geom
FROM "poly_test" AS poly
CROSS JOIN "points" AS poi ON st_within(poi.geometry, poly.geometry)

Unfortunately it does not create a new point on surface for each point feature. The same will be expected using the "Point on surface" geoalgorithm.

no_luck

Taras
  • 32,823
  • 4
  • 66
  • 137
  • Not yet. I can give a try to create a model if needed – Taras Jul 31 '21 at 20:26
  • Unfortunately I was not able yet to create a model, because I could not pass the number of points in each polygon as a variable for a new step i.e. "Random points inside polygons" geolgorithm. Maybe you, @MrXsquared know how to implement it? These were not helpful: https://gis.stackexchange.com/questions/379549/qgis-use-of-a-model-output-in-the-same-model-as-a-layer-in-the-field-calculator and https://gis.stackexchange.com/questions/350142/connecting-number-parameter-to-field-calculator-in-qgis-modeler – Taras Aug 05 '21 at 05:24
  • Unfortunately I do not see that option in my Model. Can I somehow send it to you and you gonna prove it... – Taras Aug 11 '21 at 08:12