19

Moran’s I, a measure of spatial autocorrelation, is not a particularly robust statistic (it can be sensitive to skewed distributions of the spatial data attributes).

What are some more robust techniques for measuring spatial autocorrelation? I’m particularly interested in solutions that are readily available/implementable in a scripting language like R. If solutions apply to unique circumstances/data distributions, please specify those in your answer.


EDIT: I’m expanding the question with a few examples (in response to comments/answers to the original question)

It’s been suggested that permutation techniques (where a Moran’s I sampling distribution is generated using a Monte Carlo procedure) offers a robust solution. My understanding is that such test eliminates the need to make any assumptions about the Moran’s I distribution (given that the test statistic can be influenced by the spatial structure of the dataset) but, I fail to see how the permutation technique corrects for non-normally distributed attribute data. I offer two examples: one that demonstrates the influence of skewed data on local Moran’s I statistic, the other on global Moran’s I-–even under permutation tests.

I'll use Zhang et al.'s(2008) analyses as the first example. In their paper, they show attribute data distribution's influence on the local Moran’s I using permutation tests (9999 simulations). I’ve reproduced the authors’ hotspot results for lead (Pb) concentrations (at 5% confidence level) using the original data (left panel) and a log transformation of that same data (right panel) in GeoDa. Boxplots of the original and log-transformed Pb concentrations are also presented. Here, the number of significant hot spots nearly doubles when the data are transformed; this example shows that the local statistic is sensitive to attribute data distribution--even when using Monte Carlo techniques!

enter image description here

The second example (simulated data) demonstrates the influence skewed data can have on the global Moran’s I, even when using permutation tests. An example, in R, follows:

library(spdep)
library(maptools)
NC <- readShapePoly(system.file("etc/shapes/sids.shp", package="spdep")[1],ID="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
rn <- sapply(slot(NC, "polygons"), function(x) slot(x, "ID"))
NB <- read.gal(system.file("etc/weights/ncCR85.gal", package="spdep")[1], region.id=rn)
n  <- length(NB)
set.seed(4956)
x.norm <- rnorm(n) 
rho    <- 0.3          # autoregressive parameter
W      <- nb2listw(NB) # Generate spatial weights
# Generate autocorrelated datasets (one normally distributed the other skewed)
x.norm.auto <- invIrW(W, rho) %*% x.norm # Generate autocorrelated values
x.skew.auto <- exp(x.norm.auto) # Transform orginal data to create a 'skewed' version
# Run permutation tests
MCI.norm <- moran.mc(x.norm.auto, listw=W, nsim=9999)
MCI.skew <- moran.mc(x.skew.auto, listw=W, nsim=9999)
# Display p-values
MCI.norm$p.value;MCI.skew$p.value

Note the difference in P-values. The skewed data indicates that there is no clustering at a 5% significance level (p=0.167) whereas the normally distributed data indicates that there is (p=0.013).


Chaosheng Zhang, Lin Luo, Weilin Xu, Valerie Ledwith, Use of local Moran's I and GIS to identify pollution hotspots of Pb in urban soils of Galway, Ireland, Science of The Total Environment, Volume 398, Issues 1–3, 15 July 2008, Pages 212-221

MannyG
  • 653
  • 10
  • 13
  • 1
    Do you have a reference for the sensitivity to skewed distributions)? Are you interested in Global tests of non-random spatial distribution or identifying local abnormal features? What is the distribution of the outcome of interest (positive count variable?) – Andy W May 01 '12 at 15:03
  • 1
    AndyW: 1) One reference to the test's sensitivity is Fortin and Dale's 'Spatial Analysis, A guide to ecologists' (p. 125), 2) I'm interested in solutions to both Global and Local tests, 3) I have no specific data distribution in mind. – MannyG May 01 '12 at 15:13
  • 1
    Andy, because Moran's I is based on weighted variance and covariance estimates, it will have the same sensitivity to outliers as those estimates do, which (as is well known) is considerable. This insight also points the way to many possible solutions to Manny's problem: substitute your favorite robust versions of estimates of dispersion and association to form a robust weighted correlation and you're off and running. – whuber May 01 '12 at 16:18
  • I have to say I'm still not convinced of the sensitivity (the cited reference by Manny makes no references to other literature either). If we are interested in simply a global test of spatial randomness, when we utilize permutation tests to generate the test distribution I don't see how the test distribution is sensitive to the original distribution of the data. – Andy W May 01 '12 at 18:05
  • Andy, here's an example of the distribution's influence on Moran's I using R: library(spdep);data(columbus);attach(columbus); MC1 <- moran.mc(PLUMB,colqueen,999); MC2 <- moran.mc(log(PLUMB),colqueen,999). Here, the same dataset is used with different distributions (PLUMB being heavily skewed, log(PLUMB) being less so) yet both MC1 and MC2 outputs differ. – MannyG May 01 '12 at 18:54
  • 1
    It sounds like you may be conflating several concepts here, @Andy. First, Manny wants to measure autocorrelation; he's not necessarily conducting a hypothesis test. Second, the question with hypothesis testing is best framed in terms of power rather than robustness. But (third) the concepts do have a connection: a robust test statistic will tend to maintain its power under a wide range of violations of distributional assumptions (such as contamination by outliers) whereas a non-robust test statistic may lose most or all of its power in those situations. – whuber May 01 '12 at 19:27
  • @MannyG I figure out you have not received the answer you were looking for in this page. I have posted a similar question here (http://gis.stackexchange.com/questions/171127/algorithms-for-spatial-clustering). Have you reached any conclusion so far? Do you have any relevant reference you may suggest? – FaCoffee Nov 22 '15 at 17:24
  • 1
    @FC84, I revisited this issue last year and wrote up a proposed solution. But it needs vetting. I plan to offer a (much) reduced version of that write-up as an answer here at some point. Feel free to glean what you can from what I have. But use it with caution! – MannyG Nov 23 '15 at 02:19
  • Have you published anything yet? I mean, a paper? There could be something interesting here: https://www.researchgate.net/publication/223786075_The_Moran_Coefficient_for_Non-Normal_Data – FaCoffee Nov 23 '15 at 13:59
  • Also, someone pointed out a transformation could be applied to the original datasets to account for their non-normality. Being absolutely new to all this, is this grounds for further reasoning or is it just foolishness? What kind of transformation could be useful, if any? However, your write up is a couple steps in the right direction, but I failed to see how it could be cited. Hints? – FaCoffee Nov 23 '15 at 19:43
  • @FC84, nothing's been published (other than what's on my web page). It might be while (if ever) before I attempt publishing any of this. I would recommend running your data through a series of power transformations until you find a power that symmetrizes your data if skew is a problem. – MannyG Nov 24 '15 at 00:26
  • Skewness is not a consistent feature of my maps. It appears, at times, but if I were to adjust for it I would end up having some new problems where skewness is not present. Also, I was wondering: could the effect played by the shape of the distribution be somehow lessened by the range of values involved in the maps? In your example your range is 2.99 to 7.15, while in my case it is 0.0 to 1.0. I think this is in agreement with @Andy W's last answer. – FaCoffee Nov 24 '15 at 09:17

1 Answers1

2

(This is just too unwieldy at this point to turn into a comment)

This is in regards to local and global tests (not a specific, sample independent measure of auto-correlation). I can appreciate that the specific Moran's I measure is a biased estimate of the correlation (interpreting it in the same terms as Pearson correlation coefficient), I still don't see how the permutation hypothesis test is sensitive to the original distribution of the variable (either in terms of type 1 or type 2 errors).

Slightly adapting the code you provided in the comment (the spatial weights colqueen was missing);

library(spdep)
data(columbus)
attach(columbus)

colqueen <- nb2listw(col.gal.nb, style="W") #weights object was missing in original comment
MC1 <- moran.mc(PLUMB,colqueen,999)
MC2 <- moran.mc(log(PLUMB),colqueen,999)
par(mfrow = c(2,2))
hist(PLUMB, main = "Histogram PLUMB")
hist(log(PLUMB), main = "HISTOGRAM log(PLUMB)")
plot(MC1, main = "999 perm. PLUMB")
plot(MC2, main = "999 perm. log(PLUMB)")

When one conducts permutation tests (in this instance, I like to think of it as jumbling up space) the hypothesis test of global spatial auto-correlation should not be impacted by the distribution of the variable, as the simulated test distribution will in essence change with the distribution of the original variables. Likely one could come up with more interesting simulations to demonstrate this, but as you can see in this example, the observed test statistics is well outside of the generated distribution for both the original PLUMB and the logged PLUMB (which is much closer to a normal distribution). Although you can see the logged PLUMB test distribution under the null shifts closer to symmetry about 0.

enter image description here

I was going to suggest this as a alternative anyway, transforming the distribution to be approximately normal. I was also going to suggest looking up resources on spatial filtering (and similarly the Getis-Ord local and global statistics), although I'm not sure this will help with a scale free measure either (but perhaps may be fruitful for hypothesis tests). I will post back later with potentially more literature of interest.

Andy W
  • 4,234
  • 5
  • 45
  • 69
  • Thanks Andy for your detailed account. If I understand you correctly, your implying that in a permutation test the test statistic (Moran's I) will not change relative to the resulting MC distribution, but this does not agree with my observations. For example, if we use the HOVAL variable in the same columbus dataset, the resulting MC Moran's I test p-value goes from 0.029 (with the original skewed data) to 0.004 (with the log transformed data) indicating a widening gap between the MC distribution and test statistic--not insignificant if we had set the threshold at 1%. – MannyG May 01 '12 at 20:57
  • 1
    Yes you are interpreting my point correctly. It is certainly possible to find any particular run in which the results differ. The question becomes whether or not the error rates are the same under a variety of circumstances. – Andy W May 01 '12 at 21:22