2

I havie over 8 million points in a shapefile. Now I would like to split these 8 mio points into parts of 100'000 points each, in order to facilitate further calculations with these points.

In the end I need a table containing the r,g,b,nir values of every pixel of an image, at the location of these points AND within a specific radius around these points (buffer).

I intended to use the "sample" tool to get this table. But unfortunately it seems that sample can't handle millions of points (gave me an empty table after 20 hours of calculating when I tried to sample 2 million of my 8 million points (worked perfectly fine for 10'000 points)). This is why I'm looking for a method to automatically iterate my 8 mio points and split them into smaller parts, over which I then can iterate in further steps to do the buffering, rasterizing and sampling.

The smaller shapefiles should still contain the attributes: x,y,Radii and NewID.

As I'm new to Python, I have no idea how this could be done, but I seek a Python solution if possible.

Sibir
  • 71
  • 4
  • Which GIS software are you using? – BERA May 09 '17 at 09:10
  • Sorry, completely forgot to mention that - ArcGIS 10.2 – Sibir May 09 '17 at 09:14
  • 1
    Have a look at ArcGIS' "focal statistics" tool. It sounds like this will get you the results you ultimately need for statistics within a moving window (I.e. Your buffer distance), then extract those values back to your original points with the "extract values to points" tool. – Jae May 09 '17 at 09:46
  • Combining Python and GDAL (OGR) you can use SQL to select a range of points and export those as the smaller parts. – Tins May 09 '17 at 09:53
  • @BERA, I've reopened this question. – Fezter May 09 '17 at 11:24
  • Here is an R solution that may be useful: http://www.gisremotesensing.com/2012/10/extract-raster-values-from-points.html. Here is a post to open source solutions that may be more efficient than an ArcGIS solution: https://gis.stackexchange.com/q/3538/8104 – Aaron May 09 '17 at 11:54
  • @Aaron Thank you for the links. My problem is that I don't need the raster values from a point location but from a "polygon-location" (points with buffer around them). But as the "sample" tool only takes rasters or points as location inputs, I try to buffer and then rasterize my initial points in order to be able to perform the sampling. But ideas of how to do this more efficiently are very welcome. – Sibir May 09 '17 at 12:50
  • 1
    If you use "focal statistics" with a circular window (of your defined radius), you can extract stats for max, min, mean, whatever you choose, within a "buffer" around each cell location. You can then use "extract multivalues to points" to extract raster values for the original r,g,b, nir as well as the summary raster values derived from the focal statistics tool, which would extract summary stats based on your defined radius, only at the original point locations. If your points reside within a geodatabase, you should have no concerns working with millions of points. – Jae May 09 '17 at 13:26
  • Sample of a rasterize polygon will give a value for every pixel of the input rasters that intersects each rasterized polygon. And you are looking to do this for every pixel on the images? Any time you are doing a process on every location in an image, raster processing is always the way to go. What are you going to do with the table afterwards? If you are going to run statistics, then why not do that at the pixel level as Jae has advised? – Rex May 09 '17 at 13:38
  • Welcome to GIS SE! As a new user please take the [tour]. – Midavalo May 09 '17 at 14:10
  • A question asking for help writing code should show your own attempts at writing the code. Please [edit] your question to include a snippet of the code you've tried, and details of what happens when you try it. See How to create a Minimal, Complete, and Verifiable example. – Midavalo May 09 '17 at 14:11
  • @Rex Yes I know that. :) And I intend to do this for the image-pixels intersecting the rasterized polygons (my points with radius-buffers around them which I then rasterize) because I do need the image values to process them in Matlab. I indeed do calculate some statistics, but I also calculate stuff like ndvi, evi, skewness, entropy, homogeneity, histograms etc. And Jae's answer is great if I only needed the statistics and the values at the circle center, but I don't see how I can get the pixel values for all pixels intersecting my polygons. Or am I missing something? – Sibir May 09 '17 at 17:04
  • @Sibir I am confused about what you need on a pixel by pixel basis. Are you really looking to build 8 million histograms? – Rex May 09 '17 at 19:57
  • Focal statistics will give you basic descriptive statistics of the points within your window (i.e. buffer). If you make a grid of all of those statistics, then it gives you the basic building blocks for many of your other calculations. Then you can use raster calculator/map algebra to calculate those. NVDI and VDI would be calculated on a pixel by pixel basis so I am not sure why you need the matrix of values of neighbors that would be returned by the buffer/sample workflow. – Rex May 09 '17 at 19:58
  • @Rex Thanks for all your inputs. Unfortunately I really am looking to build 8 mio histograms etc. My points/polygons are trees which I'm trying to classify by creating distinctive descriptors for each tree. This worked really nice and pretty fast for my 20'000 test samples and now I'd like to do this for the rest of my trees. I'm working on large scale classification in satellite images. But extracting the image values in Matlab took extremely long, which is why I switched to sampling the image values in ArcGis and doing the rest of the calculations based on the sample table in Matlab. – Sibir May 09 '17 at 20:11

1 Answers1

4

Code below will split a shapefile into separate shapefiles depending on a specified chunk size. Do you want shapefile outputs? I have not tried it on millions of Points.

import arcpy,os
input_shape=r"C:\infolder\points.shp"
outpath=r"C:\outfolder"
chunksize=100000
fields_to_keep=['x','y','Radii', 'NewID']

#Add all points to a list
pointlist=[]
with arcpy.da.SearchCursor(input_shape,fields_to_keep+['SHAPE@']) as scursor:
    for row in scursor:
        pointlist.append(row)

#Create a generator to split list of points into chunks of specified size (last chunk will contain whats left)
def chunks(l, n):
    for i in xrange(0, len(l), n):
        yield l[i:i + n]
x=chunks(pointlist,chunksize)

#For each chunk create a shapefile and insert the points using the insertcursor:
counter=1
fieldobjects=[f for f in arcpy.ListFields(input_shape) if f.name in fields_to_keep]
for y in x:
    outname='Pointset_{0}.shp'.format(counter)
    arcpy.CreateFeatureclass_management(out_path=outpath, out_name=outname, geometry_type='POINT',spatial_reference=arcpy.Describe(input_shape).spatialReference)
    for field in fieldobjects:
        arcpy.AddField_management(in_table=os.path.join(outpath,outname), field_name=field.name,field_type=field.type, field_length=field.length)
    cursor = arcpy.da.InsertCursor(os.path.join(outpath,outname),fields_to_keep+['SHAPE@'])
    for point in y:
        cursor.insertRow(point)
    del cursor
    counter+=1
BERA
  • 72,339
  • 13
  • 72
  • 161
  • Thank you very much for your help! I've edited my question and added the field names which I need for the further processing. – Sibir May 09 '17 at 12:40
  • I have changed the code so that the fields will remain – BERA May 09 '17 at 13:08
  • Unfortunately this error message always occurs (my computer has 32 GB RAM and I never had memory issues so far...): Runtime error Traceback (most recent call last): File "", line 11, in MemoryError Any ideas of how to solve this? – Sibir May 09 '17 at 20:00