19

What methods are available in ArcGIS 10.2 to randomly subset a selection of points. For example, in the attached screenshot I am interested in keeping 20% of the selected points and deleting the rest.

enter image description here

Aaron
  • 51,658
  • 28
  • 154
  • 317
  • Well I don't think there is a default method for selecting random points from layer. Did you try with python script? Or add-in? – Marcin D Nov 21 '13 at 20:08

6 Answers6

30

Here's a python function that will select random features in a layer based on percent, ignoring current selection:

def SelectRandomByPercent (layer, percent):
    #layer variable is the layer name in TOC
    #percent is percent as whole number  (0-100)
    if percent > 100:
        print "percent is greater than 100"
        return
    if percent < 0:
        print "percent is less than zero"
        return
    import random
    fc = arcpy.Describe (layer).catalogPath
    featureCount = float (arcpy.GetCount_management (fc).getOutput (0))
    count = int (featureCount * float (percent) / float (100))
    if not count:
        arcpy.SelectLayerByAttribute_management (layer, "CLEAR_SELECTION")
        return
    oids = [oid for oid, in arcpy.da.SearchCursor (fc, "OID@")]
    oidFldName = arcpy.Describe (layer).OIDFieldName
    path = arcpy.Describe (layer).path
    delimOidFld = arcpy.AddFieldDelimiters (path, oidFldName)
    randOids = random.sample (oids, count)
    oidsStr = ", ".join (map (str, randOids))
    sql = "{0} IN ({1})".format (delimOidFld, oidsStr)
    arcpy.SelectLayerByAttribute_management (layer, "", sql)

Copy/paste this into the python shell in ArcMap.

Then in the shell type SelectRandomByPercent ("layer", num), where layer is the name of your layer, and num is a whole number of your percent.

Random selection

A variation to find a subset selection as asked:

def SelectRandomByPercent (layer, percent):
    #layer variable is the layer name in TOC
    #percent is percent as whole number  (0-100)
    if percent > 100:
        print "percent is greater than 100"
        return
    if percent < 0:
        print "percent is less than zero"
        return
    import random
    featureCount = float (arcpy.GetCount_management (layer).getOutput (0))
    count = int (featureCount * float (percent) / float (100))
    if not count:
        arcpy.SelectLayerByAttribute_management (layer, "CLEAR_SELECTION")
        return
    oids = [oid for oid, in arcpy.da.SearchCursor (layer, "OID@")]
    oidFldName = arcpy.Describe (layer).OIDFieldName
    path = arcpy.Describe (layer).path
    delimOidFld = arcpy.AddFieldDelimiters (path, oidFldName)
    randOids = random.sample (oids, count)
    oidsStr = ", ".join (map (str, randOids))
    sql = "{0} IN ({1})".format (delimOidFld, oidsStr)
    arcpy.SelectLayerByAttribute_management (layer, "", sql)

Finally, one more variation to select a layer by a count, instead of a percent:

def SelectRandomByCount (layer, count):
    import random
    layerCount = int (arcpy.GetCount_management (layer).getOutput (0))
    if layerCount < count:
        print "input count is greater than layer count"
        return
    oids = [oid for oid, in arcpy.da.SearchCursor (layer, "OID@")]
    oidFldName = arcpy.Describe (layer).OIDFieldName
    path = arcpy.Describe (layer).path
    delimOidFld = arcpy.AddFieldDelimiters (path, oidFldName)
    randOids = random.sample (oids, count)
    oidsStr = ", ".join (map (str, randOids))
    sql = "{0} IN ({1})".format (delimOidFld, oidsStr)
    arcpy.SelectLayerByAttribute_management (layer, "", sql)
Emil Brundage
  • 13,859
  • 3
  • 26
  • 62
  • Nice use of random.sample(). – Aaron Oct 26 '15 at 21:26
  • Thanks @Aaron. I updated the answer for a subset selection without exporting first. – Emil Brundage Oct 26 '15 at 21:33
  • +1. Are there any known limitations on string length for the sql parameter? – Paul Oct 28 '15 at 19:01
  • @Paul I just tested this code to select 100% of features with a layer that has nearly 4 million features, which resulted in a memory error. So while there doesn't appear to be a hard string limit, there is a dependency on memory. There is also an SQL item limit for Oracle SDE databases, which I've blogged about here: http://emilsarcpython.blogspot.com/2015/10/working-around-oracles-sql-where-clause.html – Emil Brundage Oct 28 '15 at 19:10
  • just used this. works great! – geoJshaun Aug 05 '17 at 00:09
  • 1
    Esri used this code in a blog https://support.esri.com/en/technical-article/000013141 – Emil Brundage Jul 16 '18 at 15:24
14

Generally, I also recommend using the spatial ecology tools as discussed by blah238.

However, another method you could try would be to add an attribute called Random to store a random number: enter image description here

Then, using the field calculator on that attribute, with the Python Parser, use the following codeblock:

import random
def rand():
  return random.random()

See image below:

This will create random values between 0 and 1. Then, if you want to select 20% of the features, you could select features where the Random value is less than 0.2. Of course, this will work better with many features. I created a feature class with only 7 features as a test and there were no values less than 0.2. However, it looks like you have plenty of features, so that shouldn't matter.

enter image description here

Fezter
  • 21,867
  • 11
  • 68
  • 123
  • 7
    this method will return on average 20% of the features, which in some cases would be preferred. But if you want 20% every time, you can do as suggested, then sort the features by the random value and select the first 20%. – Llaves Nov 22 '13 at 04:04
  • Esri used this process in a blog: https://support.esri.com/en/technical-article/000013141 – Emil Brundage Jul 16 '18 at 17:44
6

There is also an earlier Select features at random script from @StephenLead available for ArcGIS Desktop. Although written, I think, for ArcGIS 9.x, and last modified in 2008, I used it in about 2010 at 10.0, and it still worked well.

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
5

You could try Hawth's Tools: http://www.spatialecology.com/htools/rndsel.php

Note that the existing selection is not honored so you would have to make a feature layer from the existing selection first.

blah238
  • 35,793
  • 7
  • 94
  • 195
  • Unfortunately, that version isn't compatible with ArcGIS 9.3 and above. Now it's called Geospatial Modelling Environment: http://www.spatialecology.com/gme – kenbuja Nov 21 '13 at 21:33
  • Good point, here is the equivalent command in GME: http://www.spatialecology.com/gme/rsample.htm – blah238 Nov 21 '13 at 21:43
  • The GME toolset does not work "within" ArcGIS, rather it is a stand alone tool – Ryan Garnett Sep 16 '15 at 20:08
3

Here's another random selection add-in for ArcGIS 10, the Sampling Design Tool. It will let you select 20% of the features in your dataset. However, this doesn't use a selected set to make a random selection, similar to the restrictions of the Hawth's Tools mentioned by blah238.

kenbuja
  • 5,700
  • 2
  • 18
  • 29
0

You could also use the Subset Features tool. According to the documentation:

Divides the original dataset into two parts: one part to be used to model the spatial structure and produce a surface, the other to be used to compare and validate the output surface.

One disadvantage is that you need the Geostatistical Analyst extension.

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
Ernesto561
  • 661
  • 8
  • 17