0

This could be seen as a continuation of this question where the top accepted answer uses the spedp library in R in order to generate contiguous clusters of a raster image with multivariate data. In the code they provided (also below, with example dataset here, a raster un UTM Zone 35S containing the results of a supervised classification in 11 classes based on LANDSAT 30m imagery), as far as I understand it, a neighbor list is created with the dimensions of the raster, and then weights are calculated based on the euclidean distance and used to generate a Minimum Spanning Tree (MST) which then runs the SKATER algorithm in order to prune this tree based on feature distance. The user, unfortunately, did not provide a working example, nor the results of the code.

This is used effectively in tutorials for this package with polygons, but the problem with applying this to a raster seems to be that the MST for a raster, at least calculated with the methods available in spdep, will always provide a link where the next connected cell is the one below it until it hits an edge, so any possible pruning of the MST will result in your clusters being in a straight line rather than in a more natural shape.

An example clustering using the provided code

Finally to the question: Is there a way to circumvent this? Or is SKATER and its associated methods simply not suited for clustering raster data?

# 1. Load packages
packs <- list("tidyverse", "raster", "spdep", "parallel")
lapply(packs, require, character.only = T)

2. Load raster layers using raster()

layers <- brick("sample.img")

r1 <- layers[[1]] r2 <- layers[[2]] r3 <- layers[[3]]

And so on...

3. Define the number of regions (k + 1)

k <- 10

4. Merge explanatory variables in data frame

dat <- lapply(list(r1, r2, r3), values) %>% do.call(cbind, .) %>% as.data.frame(., stringsAsFactors = F) %>% magrittr::set_colnames(c("1984", "1993", "1997"))

5. Set up parallel framework for faster computation (optional)

For a non-parallel, single core execution skip steps 5 and 13

ncores <- detectCores() - 1 cl <- parallel::makeCluster(ncores, type = "PSOCK") set.coresOption(ncores) set.ClusterOption(cl)

6. Standardize variables

sdat <- scale(dat) %>% as.data.frame(., stringsAsFactors = F)

7. Create neighbor list object

raster_nb <- cell2nb(nrow(r1), ncol(r1), type = "queen") # you can alternatively set contiguity to "rook"

8. Subset cells

There are various reasons for which you might need to exclude some pixels to avoid errors in subsequent functions

One example is missing values in your raster layers (e.g. due to water bodies)

complete_pixels <- which(complete.cases(sdat)) raster_nb <- subset.nb(raster_nb, 1:length(raster_nb) %in% complete_pixels) sdat <- sdat[complete_pixels,]

9. Calculate dissimilarity between neighboring cells

lcosts <- nbcosts(raster_nb, sdat)

10. Calculate spatial weights based on dissimilarity between neighbors

raster_w <- nb2listw(raster_nb, lcosts, style = "B")

11. Obtain minimum spanning tree

raster_mst <- mstree(raster_w)

12. Run skater clustering algorithm

skater_clusters <- skater(raster_mst[,1:2], sdat, k)

13. Close parallel framework

stopCluster(cl)

14. Visualize results

Initialize an empty raster with the same dimensions

cluster_raster <- raster(layers) cluster_raster[] <- NA

Assign cluster IDs from SKATER results

cluster_raster[complete_pixels] <- skater_clusters$groups

Visualize

plot(cluster_raster, main="Cluster Assignments")

Phil
  • 3
  • 2
  • Can you make a simple reproducible example with sample or random data? Currently we don't have r1, r2, r3. From your plot it looks like the adjacency list is wrong. I can run all this with three random 10x10 rasters but also you've not shown how you made that stripy raster plot from the clustering output so that would help too. – Spacedman Mar 01 '24 at 12:04
  • Thanks for your comment @Spacedman, I've edited my question to include a google drive link and some extra code. I'm working with a multi layer raster where each layer represents the distribution of 11 numerical classes of a certain year. The data source is LANDSAT TM data with a resolution of 30x30 m, UTM Zone 35S which has undergone a supervised classification. You could probably reproduce a similar dataset by some rasters populated by random integers from 1-11.

    As for the plot, It would be amazing if I was simply generating that wrong.

    – Phil Mar 01 '24 at 13:24
  • Your "example dataset here" link wants to make an access request. The missing piece in the plot was how you were assigning the clusters back to the raster - I'll have another look now... – Spacedman Mar 01 '24 at 17:08
  • It produces sensible-looking clusters on a test 50x50x3 set of random rasters with 11 classes into 11 clusters for me. Takes a few minutes to compute even with all 7 cores though. I've even put a few NA pixels in to see if that was messing it up. No, I see clusters. – Spacedman Mar 01 '24 at 17:38
  • Thanks for continuing to look into this. I've allowed permission for the dataset, but it seems like you're not having any issues. Do you mind sharing your code and results? Maybe it has to do with the structure of the inputs. – Phil Mar 04 '24 at 07:55

1 Answers1

0

Your code, as posted above, runs unmodified and produces the following plot, here alongside rasters r1 r2 and r3 so we can tell if there's any relation to the input:

enter image description here

Looks reasonable to me - you can see some of the clusters relating to the input, and it looks nothing like your stripey image. Maybe its a problem with that particular input data.

The only change I had to make to your code was to force the number of cores to be an integer with ncores <- as.integer(detectCores() - 1)

Package versions are:

> packageVersion("spdep")
[1] ‘1.2.8’
> packageVersion("raster")
[1] ‘3.6.23’

And you don't need the tidyverse package bundle installed, code runs fine without it.

I notice that your stripy plot is not at the same location as the sample.img you shared, so I can't be sure this is the same input data that produced your stripy plot. Maybe its a problem with that particular input data.

Spacedman
  • 63,755
  • 5
  • 81
  • 115
  • Thanks for sharing your results. I was able to successfully reproduce them. It seems like the main difference was in the way that I had subset the input data I used in my example. I switched to subsetting my raster using QGIS and everything worked fine. – Phil Mar 04 '24 at 13:31
  • If you want to ask a new question about subsetting rasters correctly I'm sure I'll find it... – Spacedman Mar 04 '24 at 15:10