0

I am doing a classification of z over a single raster of dimension x,y,z.

r <- raster(nrows = 200, ncols = 400, xmn=-1, xmx=1, ymn=-1, ymx=1)
r[] <-  sample(1:100, size = 80000, replace = TRUE)

I want to do K-Means with Spatial Constraints:

Any of the solutions presented here would suit me but my computer is too slow for that.

There must be a way to better leverage the raster format to make the process more efficient.

Any clue?

Kadir Şahbaz
  • 76,800
  • 56
  • 247
  • 389
J Lal
  • 11
  • 3

1 Answers1

0

The problem is that you need to have all of the data on hand to find a cluster solution, it is not something that you can piecemeal. You may be best served coercing your raster stack to a SpatialPixelsDataFrame, adding [x,y] coordinates and clustering the data.frame in the @data slot. This will create a naive spatial structure in the clustering solution. You could also start exploring spatial constraints but, it is computationally expensive.

Here is a simple example of creating some structured raster data, coercing to an sp class, clustering the data using clara (large data version of k-means) and then pulling back into raster. However, you do need to be prepared for this being very computationally expensive on real raster data. Think of a medium-sized raster with 2500 rows/columns and 10 parameters will yield an n=62,500,000.

library(raster)
library(spatialEco)
library(cluster)

r <- raster(nrows=250, ncols=250) r[] <- runif(ncell(r)) r <- focal(r, gaussian.kernel(sigma = 8, n = 11), mean) r1 <- raster(nrows=250, ncols=250) r1[] <- pnorm(runif(ncell(r1)), mean = 0.5, sd = 2) r1 <- focal(r1, gaussian.kernel(sigma = 8, n = 11), mean) r2 <- raster(nrows=250, ncols=250) r2[] <- qbinom(runif(ncell(r2)), 10, 1/3) r2 <- focal(r2, gaussian.kernel(sigma = 8, n = 11), mean)

r <- stack(r, r1, r2) dev.new(height=8,width=11) plot(r)

Here we coerce to an sp raster object and add coordinates.

r <- as(r, "SpatialPixelsDataFrame")
  r@data <- data.frame(coordinates(r), r@data)

Now we can center and cluster data, assigning the result directly to the sp object. We can coerce back to a RasterLayer object using raster::raster.

r@data <- data.frame(k=clara(scale(r@data), k=10)$clustering)
  r <- raster(r)
    dev.new(height=8,width=11)
      plot(r)

Please note that you should have some strategy in mind to find supported clusters. Here is an example of using silhouettes to find optimal cluster solutions and supported k.

x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)), cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)))

dev.new(height=8,width=11)
  par(mfrow=c(2,2))
  clust <- optimal.k(x, 20, plot=TRUE, cluster=TRUE)
    plot(silhouette(clust), col = c('red', 'green'))
      plot(clust, which.plots=1, main='K-Medoid fit')

With large data you could plausibly take a sampling approach following something like.

optimal.k(r@data[sample(1:nrow(r), 
  round(ncell(r)* 0.05,0)),], nk=10)
Jeffrey Evans
  • 31,750
  • 2
  • 47
  • 94
  • Thank you for your solution. I considered adding the coordinates to the clustering step but unfortunately this doesn't seem to constrain contiguity. I am happy, however, that you presented the cluster::clara as well as the spatialEco::optimal.k functions which will likely help me through the process. – J Lal Dec 07 '20 at 15:16
  • So, the rub here is that there is a difference between constrained and conditional to solve contiguous clusters. For multivariate spatially contiguous clusters you need an optimization approach or a way to assign large weights in the spatial adjacenty. I would make the argument that merarly weighting space does not necessarly provide an "optimal" solution to the multivariate problem. There is too much complexity occurring in the state space while finding an optimal k with weights. This is why it becomes an optimization problem best solved with something like simulated anneling or an MCMC. – Jeffrey Evans Dec 07 '20 at 16:43