Are there alternatives to this construction of a minimum spanning tree of a raster for the SKATER algorithm?

Question

This could be seen as a continuation of this question where the top accepted answer uses the spedp library in R in order to generate contiguous clusters of a raster image with multivariate data. In the code they provided (also below, with example dataset here, a raster un UTM Zone 35S containing the results of a supervised classification in 11 classes based on LANDSAT 30m imagery), as far as I understand it, a neighbor list is created with the dimensions of the raster, and then weights are calculated based on the euclidean distance and used to generate a Minimum Spanning Tree (MST) which then runs the SKATER algorithm in order to prune this tree based on feature distance. The user, unfortunately, did not provide a working example, nor the results of the code.

This is used effectively in tutorials for this package with polygons, but the problem with applying this to a raster seems to be that the MST for a raster, at least calculated with the methods available in spdep, will always provide a link where the next connected cell is the one below it until it hits an edge, so any possible pruning of the MST will result in your clusters being in a straight line rather than in a more natural shape.

Finally to the question: Is there a way to circumvent this? Or is SKATER and its associated methods simply not suited for clustering raster data?

# 1. Load packages
packs <- list("tidyverse", "raster", "spdep", "parallel")
lapply(packs, require, character.only = T)
2. Load raster layers using raster()
layers <- brick("sample.img")
r1 <- layers[[1]]
r2 <- layers[[2]]
r3 <- layers[[3]]
And so on...
3. Define the number of regions (k + 1)
k <- 10
4. Merge explanatory variables in data frame
dat <- lapply(list(r1, r2, r3), values) %>%
   do.call(cbind, .) %>%
   as.data.frame(., stringsAsFactors = F) %>%
   magrittr::set_colnames(c("1984", "1993", "1997"))
5. Set up parallel framework for faster computation (optional)
For a non-parallel, single core execution skip steps 5 and 13
ncores <- detectCores() - 1
cl <- parallel::makeCluster(ncores, type = "PSOCK")
set.coresOption(ncores)
set.ClusterOption(cl)
6. Standardize variables
sdat <- scale(dat) %>%
   as.data.frame(., stringsAsFactors = F)
7. Create neighbor list object
raster_nb <- cell2nb(nrow(r1), ncol(r1), type = "queen")     # you can alternatively set contiguity to "rook"
8. Subset cells
There are various reasons for which you might need to exclude some pixels to avoid errors in subsequent functions
One example is missing values in your raster layers (e.g. due to water bodies)
complete_pixels <- which(complete.cases(sdat))
raster_nb <- subset.nb(raster_nb, 1:length(raster_nb) %in% complete_pixels)
sdat <- sdat[complete_pixels,]
9. Calculate dissimilarity between neighboring cells
lcosts <- nbcosts(raster_nb, sdat)
10. Calculate spatial weights based on dissimilarity between neighbors
raster_w <- nb2listw(raster_nb, lcosts, style = "B")
11. Obtain minimum spanning tree
raster_mst <- mstree(raster_w)
12. Run skater clustering algorithm
skater_clusters <- skater(raster_mst[,1:2], sdat, k)
13. Close parallel framework
stopCluster(cl)
14. Visualize results
Initialize an empty raster with the same dimensions
cluster_raster <- raster(layers)
cluster_raster[] <- NA
Assign cluster IDs from SKATER results
cluster_raster[complete_pixels] <- skater_clusters$groups
Visualize
plot(cluster_raster, main="Cluster Assignments")

Can you make a simple reproducible example with sample or random data? Currently we don't have r1, r2, r3. From your plot it looks like the adjacency list is wrong. I can run all this with three random 10x10 rasters but also you've not shown how you made that stripy raster plot from the clustering output so that would help too. — Spacedman, Mar 01 '24 at 12:04
Thanks for your comment @Spacedman, I've edited my question to include a google drive link and some extra code. I'm working with a multi layer raster where each layer represents the distribution of 11 numerical classes of a certain year. The data source is LANDSAT TM data with a resolution of 30x30 m, UTM Zone 35S which has undergone a supervised classification. You could probably reproduce a similar dataset by some rasters populated by random integers from 1-11.
As for the plot, It would be amazing if I was simply generating that wrong. — Phil, Mar 01 '24 at 13:24
Your "example dataset here" link wants to make an access request. The missing piece in the plot was how you were assigning the clusters back to the raster - I'll have another look now... — Spacedman, Mar 01 '24 at 17:08
It produces sensible-looking clusters on a test 50x50x3 set of random rasters with 11 classes into 11 clusters for me. Takes a few minutes to compute even with all 7 cores though. I've even put a few NA pixels in to see if that was messing it up. No, I see clusters. — Spacedman, Mar 01 '24 at 17:38
Thanks for continuing to look into this. I've allowed permission for the dataset, but it seems like you're not having any issues. Do you mind sharing your code and results? Maybe it has to do with the structure of the inputs. — Phil, Mar 04 '24 at 07:55

score 0 · Accepted Answer · answered Mar 04 '24 at 10:17

Your code, as posted above, runs unmodified and produces the following plot, here alongside rasters r1 r2 and r3 so we can tell if there's any relation to the input:

Looks reasonable to me - you can see some of the clusters relating to the input, and it looks nothing like your stripey image. Maybe its a problem with that particular input data.

The only change I had to make to your code was to force the number of cores to be an integer with ncores <- as.integer(detectCores() - 1)

Package versions are:

> packageVersion("spdep")
[1] ‘1.2.8’
> packageVersion("raster")
[1] ‘3.6.23’

And you don't need the tidyverse package bundle installed, code runs fine without it.

I notice that your stripy plot is not at the same location as the sample.img you shared, so I can't be sure this is the same input data that produced your stripy plot. Maybe its a problem with that particular input data.

Thanks for sharing your results. I was able to successfully reproduce them. It seems like the main difference was in the way that I had subset the input data I used in my example. I switched to subsetting my raster using QGIS and everything worked fine. — Phil, Mar 04 '24 at 13:31
If you want to ask a new question about subsetting rasters correctly I'm sure I'll find it... — Spacedman, Mar 04 '24 at 15:10

Are there alternatives to this construction of a minimum spanning tree of a raster for the SKATER algorithm?

2. Load raster layers using raster()

And so on...

3. Define the number of regions (k + 1)

4. Merge explanatory variables in data frame

5. Set up parallel framework for faster computation (optional)

For a non-parallel, single core execution skip steps 5 and 13

6. Standardize variables

7. Create neighbor list object

8. Subset cells

There are various reasons for which you might need to exclude some pixels to avoid errors in subsequent functions

One example is missing values in your raster layers (e.g. due to water bodies)

9. Calculate dissimilarity between neighboring cells

10. Calculate spatial weights based on dissimilarity between neighbors

11. Obtain minimum spanning tree

12. Run skater clustering algorithm

13. Close parallel framework

14. Visualize results

Initialize an empty raster with the same dimensions

Assign cluster IDs from SKATER results

Visualize

1 Answers1