Calculating Focal Statistics for Special Neighborhood?

Question

I'm looking to calculate focal statistics for each cell of a raster, within a neighborhood of a specified criteria.

Background - I have three binary rasters, each representing a single vegetation type of interest. I'd like to calculate the percent coverage of each vegetation type within (e.g.) 20 km^2 of any cell in my study area (sum/total cells in neighborhood). The problem is that I can't use a simple circle or square neighborhood around each cell because, if I did, the search area used to calculate the sum would incorporate areas outside my study area. This exception is important because the statistics will be used as inputs for a habitat model, and the areas outside of my study area cannot be considered possible habitat - they're urbanized. Including them would give me erroneous statistics. So, what I'm looking to do is write some code in python that will choose a neighborhood representing the n nearest cells (n determined by number of cells required to cover an area equal to my desired neighborhood size) that meet my criteria. The criteria being that they do not fall within an urbanized area. I'm thinking that some form of cellular automata should be used. I've never worked with CA though.

I guess what I'd like is something like starter code, or a point in the right direction.

REPLY TO COMMENT BELOW:

Let's say I'm calculating this statistic for a cell on the boundary of my study site. If I assign all areas outside of my study area to zero (or ignore NoData), then I will get a statistic that represents roughly half of the areal coverage I'm interested in. So, percent coverage in a ~10 km^2 area, instead of 20 km^2 area. Since I'm studying home range sizes this is important. The neighborhood has to change shape, since that is how the animal views/uses the landscape. If they need 20 km^2, they'll change the shape or their home territory. If I do not check ignore NoData, cell output will be NoData - and NoData is no help.

"PROGRESS" AS OF 10/24/2014

Here is the code I've come up with so far using Shapely and Fiona:

import numpy as np
import pprint
import shapely
from shapely.geometry import*
import fiona
from fiona import collection
import math

traps = fiona.open('C:/Users/Curtis/Documents/ArcGIS/GIS_Data/occurrence/ss_occ.shp', 'r')

study_area = fiona.open('C:/Users/Curtis/Documents/ArcGIS/GIS_Data/Study_Area.shp', 'r')
for i in study_area: #for every record in 'study_area'
        sa = shape(i['geometry']) #make a variable called 'sa' that is a polygon

grassland = fiona.open('C:/Users/Curtis/Documents/ArcGIS/GIS_Data/land_cover/polys_for_aa/class3_aa.shp', 'r')
pol = grassland.next()
gl = MultiPolygon([shape(pol['geometry']) for pol in grassland])

areaKM2 = 20
with traps as input:
    r = (math.sqrt(areaKM2/math.pi))*1000
    for point in input:
        pt = shape(point['geometry'])
        pt_buff = pt.buffer(r)
        avail_area = pt_buff.intersection(sa).area
        # works to here
        while avail_area < areaKM2:
            r += 10
            pt_buff = pt.buffer(r)
            avail_area = pt_buff.intersection(sa).area

        perc_cov = pt_buff.intersection(gl).area//areaKM2
        print perc_cov

Unfortunately, it's INCREDIBLY slow.

that is an interesting problem. You could set all cells outside your study area to NoData but I don't know how you are ever going to get a neighborhood to adapt and keep the same 20 sq km size (it would have to change shape). — jbchurchill, Oct 16 '14 at 19:25
@CSB jbchurchill is right, the best thing to do here is to assign NoData values outside of your study area. The Focal Stats tool can treat those nodata values appropriately. See 'Processing cells of NoData' here http://resources.arcgis.com/en/help/main/10.1/index.html#//009z000000r7000000 — WhiteboxDev, Oct 16 '14 at 19:44
@WhiteboxDev - Your suggestion won't solve my issue. I'll edit the above and explain why that won't work. — CSB, Oct 16 '14 at 21:02
Have you seen this post, which discusses using Focal Statistics with a variable radius (http://gis.stackexchange.com/questions/34306/focal-statistics-with-variable-radius)? This seems to be your issue - cells on the edge should have a large radius and consider only a semicircular neighborhood. Of course, depending on your cell size, you may have to create many, many rasters to choose from. — phloem, Oct 16 '14 at 21:33
@CSB You're going to run into edge effects regardless of whether you use NoData and a shrunken neighbourhood or if you change the shape/placement of your neighbourhood to ensure size. At least with the former, you won't be oversampling/representing near-edge data in a non-transparent manner. This is part of the infamous Modifiable Areal Unit Problem. — WhiteboxDev, Oct 16 '14 at 21:37
@CSB Okay, I can respect an answer that refers to the physical process for logic so long as you're aware that you'll still have edge effects. Now let me ask, how large are your raster images? Can you fit the pixels in a K-D tree and perform a nearest neighbour analysis on x number of points, where x is the number of cells required to have a 20 km^2 area? Do you have any sample data that you can post? — WhiteboxDev, Oct 16 '14 at 21:44
@phloem Yes, I saw that post. But, my study area has a few peninsular shaped regions where a semi circle won't work. I think it would work in some areas, but I need a single solution, like a growable region. — CSB, Oct 16 '14 at 23:45
@WhiteboxDev I am aware that there will be edge effects, but that's a task for another day!
I'm not familiar with KD trees. The data I'm working with right now is a land cover classification derived from Landsat8, so the resolution is 30m x 30m. Does this sound like something that could work in a KD tree? — CSB, Oct 16 '14 at 23:59
K-D trees are more about space partitioning and rapid search (it basically being a binary search tree, but in 2 or more dimensions), than neighborhood analysis. Having said that, there is a decent Python implementation on the wikipedia page. However, have you considered using Postgis? You can load the rasters and use vectors to define your neighbourhoods and only do raster summary queries for those part that intersect your neighbourhood vectors. I love Python, don't get me wrong, but the built-in raster/vector overlay ops Postgis makes possible are very powerful. Just my 2c. — John Powell, Oct 17 '14 at 10:34
@JohnBarça I've never used PostGIS, but it sounds really close to what I'm looking for. Would it allow me to define my neighborhood as the nearest n cells with my vector file? — CSB, Oct 17 '14 at 20:39
In short, yes. I can't give a clearer answer without knowing how the neighborhood would be defined, but in principle, yes. — John Powell, Oct 18 '14 at 07:49
the neighborhood would be defined simply as the nearest n "available" cells. "available" cells would be defined as those that are not developed/urbanized. — CSB, Oct 18 '14 at 17:22
Not sure what tools you have or how big your cell sizes are, but you could create a euclidean raster for a cell, using the vegatation rasters as a template (snap) raster. Then iterate through the euclidean raster selecting for ever increasing distance until you exceed your target area (don't start at 1...some minimum based on cell size and a circular summary area..). You can use the selected cell count as a proxy for area, and use the final selection as a mask for your analysis. Loop this process for each cell...this looks like it may be what you have done..expect to wait.. :) — Mike, Jul 14 '15 at 02:11

score 0 · Accepted Answer · edited Apr 13 '17 at 12:34

The code above is the eventual, and imperfect, answer I came up for this problem. In the end I thought the best approach was to use a circular neighborhood and calculate the area intersecting my "available" area. (A circular neighborhood would give the n ~closest cells anyway - so, no need to get too fancy with Cellular Automata.) If the area was too small, I just grew the circle until it wasn't.

It worked well but, as I noted, it was very slow. See this thread for suggestions on how to speed it up. Maximizing Code Performance for Shapely. I followed the suggestions, which lead to this thread Understanding the Use of Spatial Indexes. I did not end up applying an r-tree in the end, because I actually never ended up using the code. I found out that I could approach the problem from an entirely different angle and save myself a lot of time/energy, so I did. Maybe I'll finish it one day...

Reading your code it looks like there is a good chance using a spatial index could speed up the code, often dramatically. Doing intersections against a MultiPolygon like that is very slow. — Snorfalorpagus, Dec 22 '15 at 16:55
@Snorfalorpagus Yes, if you look at the answer, I point to two other threads that are related to this question. Both discuss using a spatial index. — CSB, Dec 22 '15 at 16:58

Calculating Focal Statistics for Special Neighborhood?

1 Answers1