1

I have a huge dataset (millions of points) and a irregular grid (quadtree) in shapefile format and I need to convert the points in countes over the grid provided. I managed to make it work, using shapely Polygon.contains and a naive double-loop that scales as O(n**2).

Obviously, the script is very slow. It takes already very long on thousands of points. Is there any other way to greatly reduce running time, without writing a lot more code?

What's the typical procedure to convert so many points to a grid system, and how is it called, so that I can search further information about it?

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
themiurgo
  • 131
  • 1
  • 2

1 Answers1

2

Rather than checking every point against every polygon in your grid, you can use a spatial index to speed things up. The spatial index allows you to search for polygons with a bounding box that intersects the query bounding box. In your particular case, as the polygons in your quadtree will be square and your search for points, you don't even need to call contains.

There is some example code using Shapely and Rtree in this answer:

https://gis.stackexchange.com/a/144764/12420

Snorfalorpagus
  • 4,934
  • 3
  • 34
  • 51