Intersect Shapefiles using Shapely

Question

What I'd like to try and do is intersect some point buffers with census tracts, maintain their ID's and the tracts attributes.

I found a sample from http://www.macwright.org/2012/10/31/gis-with-python-shapely-fiona.html but can't exactly substitute the cascaded union for an intersect. I'm also having trouble getting the intersection function to import from shapely.ops.

from shapely.geometry import Point, mapping, shape
from fiona import collection
#from shapely.ops import intersection
import shapely.ops

bufSHP = 'data/h1_buf.shp'
intSHP = 'data/h1_buf_int_ct.shp'
ctSHP  = 'data/nyct2010.shp'
with collection(bufSHP, "r") as input:
    schema = input.schema.copy()
    with collection(intSHP, "w", "ESRI Shapefile", schema) as output:
        shapes = []
        for f in input:
            shapes.append(shape(f['geometry']))
        merged = shapes.intersection(ctSHP)
        output.write({
            'properties': {
                'uid': point['properties']['uid']
                },
            'geometry': mapping(merged)
            })

I've also got the code and sample shapefiles up here https://github.com/nygeog/questions/tree/master/shapely_intersect

UPDATE:

So for some reason using Shapely, the larger buffers (1 km) I lose some of the features (census tracts) in the intersect.

Also, since the Shapely responder said the OGR command is good to use I started looking for into that. Anyway, I was able to test cd'ing to the directory and it works great. However, I've been trying to experiment with it so I can loop through the many sized buffers and dif. census years. However, why is it that I cannot get the following script to work. Can I not substitute paths for the folder 'data' in the ogr2ogr response below?

import os
wp = '/Volumes/Echo/GIS/projects/naas/tasks/201411_geoprocessing/data/processing/'
folder = wp+'test'
theGDALcmd = 'ogr2ogr -sql "SELECT ST_Intersection(A.geometry, B.geometry) AS geometry, A.*, B.* FROM nyct2010 A, hr1000m B WHERE ST_Intersects(A.geometry, B.geometry)" -dialect SQLITE '+folder+' '+folder+' -nln hr1000m_int_ct10_ogr_pytest'
os.system(theGDALcmd)

The error I get is

sh: ogr2ogr: command not found

However, when I go to the terminal, I have access to ogr2ogr.

UPDATE2:

Mike T include some code so I can run this ogr2ogr command in python.

Totally love how you put your question and example neatly up on github! +1! This is pretty simple if you have access to a geodatabase, if all you're looking for is an answer for these specific layers. — janechii, Oct 22 '14 at 22:45

Mike T · Accepted Answer · 2014-10-23T20:40:33.027

18

While I'm a big user of both shapely and fiona, I wouldn't go this approach. This is a task of writing an effective SQL statement.

Using ogr2ogr with an SQLITE dialect, you can process this from a command line. Change directory to one before the shapefiles, so that all of the shapefiles are in one directory called data. OGR treats directories of shapefiles as datasets. Now try this command:

ogr2ogr -sql "SELECT ST_Intersection(A.geometry, B.geometry) AS geometry, A.*, B.* FROM nyct2010 A, h1_buf B WHERE ST_Intersects(A.geometry, B.geometry)" -dialect SQLITE data data -nln h1_buf_int_ct2

This makes h1_buf_int_ct2.shp with the same attributes as A and B, showing the intersections. You could also have done the buffering operation somewhere in there too.

The same SQL statement works with PostGIS too, so you know.

Here's how to do the same from within Python using ogr:

from osgeo import ogr
ogr.UseExceptions()
ogr_ds = ogr.Open('/path/to/data', True)  # Windows: r'C:\path\to\data'
SQL = """\
    SELECT ST_Intersection(A.geometry, B.geometry) AS geometry, A.*, B.*
    FROM nyct2010 A, h1_buf B
    WHERE ST_Intersects(A.geometry, B.geometry);
"""
layer = ogr_ds.ExecuteSQL(SQL, dialect='SQLITE')
# copy result back to datasource as a new shapefile
layer2 = ogr_ds.CopyLayer(layer, 'h1_buf_int_ct3')
# save, close
layer = layer2 = ogr_ds = None

edited Oct 23 '14 at 20:40

answered Oct 23 '14 at 03:30

Mike T

42,095
10
126
187

I tried this command and got the following error: FAILURE: Unable to open datasource `data' with the following drivers. Should I modify this code to read 'ESRI Shapefile' where 'data' is listed? – GIS Danny Oct 23 '14 at 15:06
Nice (first time I cd'd to data, but reread and it worked). Output seems the same as the above but did get this message: Warning 1: organizePolygons() received an unexpected geometry. Either a polygon with interior rings, or a polygon with less than 4 points, or a non-Polygon geometry. Return arguments as a collection. Warning 1: Geometry of polygon of fid 2016 cannot be translated to Simple Geometry. All polygons will be contained in a multi polygon. --- Should I worry about that message? – GIS Danny Oct 23 '14 at 15:12
Ahh, and I could probably put this in a python script by using os.system(theGDALCommandAbove) – GIS Danny Oct 23 '14 at 15:16
Nice answer too. I made an answer but found it was complicated to make the job with Shapely compared to a SQL statement too (PostGIS / Spatialite lover) – ThomasG77 Oct 23 '14 at 15:37
@GISDanny you don't need to use an os.system call, see updated answer – Mike T Oct 23 '14 at 20:40
Crazy timing! Also, I'm not sure what the warning is. Shapefiles sometimes allow geometries that are not regarded to be valid, and perhaps the source shapefile has some of these. – Mike T Oct 23 '14 at 20:46
Also, in general, when using ogr should all my shapefiles be in the same folder? (Sorry for all these questions, I'm quickly trying to move from arcpy to FOSS4G) – GIS Danny Oct 23 '14 at 20:47
@GISDanny The layers referenced in the SQL query need to be in the same directory, but the CopyLayer can write the result to a different folder and/or a different file format (e.g., to write to any other vector format listed here where "creation" is supported). – Mike T Oct 23 '14 at 20:54
I really appreciate the snippet (and that this is possible from the commandline) but I'm unable to get this working. Is it possible to include two inputs for ogr2ogr processes?
When I run this I get unable to open datasource data: ogr2ogr -sql "SELECT ST_Intersection(A.geometry, B.geometry) AS geometry, A.*, B.* FROM footprint_5degrees A, data_index B WHERE ST_Intersects(A.geometry, B.geometry)" -dialect SQLITE data data -nln h1_buf_int_ct2
– Charlie Lefrak Aug 25 '16 at 17:00
@CharlieHofmann you'd have to work out if you can read the datasource, e.g. ogrinfo -so data footprint_5degrees and similar for data_index. – Mike T Aug 29 '16 at 20:59
2

Thanks @MikeT. Worked it out by using a VRT to store both datasets as in the example here: http://gis.stackexchange.com/questions/147820/st-intersect-with-ogr2ogr-and-spatialite – Charlie Lefrak Aug 30 '16 at 10:55

score 5 · Answer 2 · answered Oct 23 '14 at 02:10

Some issues in your code:

you only use correctly one table as input whereas, you should use both input bufSHP and ctSHP
you want to make an intersection between a list of shape and a filename with shapes.intersection(ctSHP) whereas you have to do an intersection between two shape elements

See below a possibility,

I choose to use Rtree to optimize, based on this blog post

It works but any improvement on my code is also welcome...

import fiona
from shapely.geometry import shape, mapping
import rtree

bufSHP = 'data/h1_buf.shp'
intSHP = 'data/h1_buf_int_ct.shp'
ctSHP  = 'data/nyct2010.shp'

with fiona.open(bufSHP, 'r') as layer1:
    with fiona.open(ctSHP, 'r') as layer2:
        # We copy schema and add the  new property for the new resulting shp
        schema = layer2.schema.copy()
        schema['properties']['uid'] = 'int:10'
        # We open a first empty shp to write new content from both others shp
        with fiona.open(intSHP, 'w', 'ESRI Shapefile', schema) as layer3:
            index = rtree.index.Index()
            for feat1 in layer1:
                fid = int(feat1['id'])
                geom1 = shape(feat1['geometry'])
                index.insert(fid, geom1.bounds)

            for feat2 in layer2:
                geom2 = shape(feat2['geometry'])
                for fid in list(index.intersection(geom2.bounds)):
                    if fid != int(feat2['id']):
                        feat1 = layer1[fid]
                        geom1 = shape(feat1['geometry'])
                        if geom1.intersects(geom2):
                            # We take attributes from ctSHP
                            props = feat2['properties']
                            # Then append the uid attribute we want from the other shp
                            props['uid'] = feat1['properties']['uid']
                            # Add the content to the right schema in the new shp
                            layer3.write({
                                'properties': props,
                                'geometry': mapping(geom1.intersection(geom2))
                            })

This answer is awesome. All I had to do was install RTree. http://toblerity.org/rtree/install.html — GIS Danny, Oct 23 '14 at 15:04
How to intersect two geojson files using draw tool then find the intersection of the one feature with the other feature with respective to the draw ed plygon — user28536, Aug 30 '16 at 06:09
@ThomasG77 What is the fid != int(feat2['id']) check supposed to accomplish? Is this needed if the two layers definitely don't have features in common? — derNincompoop, Feb 07 '17 at 20:41

Intersect Shapefiles using Shapely

UPDATE:

UPDATE2:

2 Answers2

Linked

Related