Rf-Classification seems to work but gives an error 15 seconds later

Question

This is my code. It classifies an imagery-stack(xvars) and reads a shapefile with training-points.

If I run single rows they work. Even if I run them all, they work and start to predict but after 15s it stops working and gives an error.

As far as I understand, the classification doesn't need the raster-names because they are read automatically.

My training-shapefile (sw_trainshape.shp) inherits a table with columns

FID    Shape    Class      ObjectID       x              y
 1     Point     bush         1      481791,2429   5626286,6397
 2      ...       ...        ...            ...

My tif-files are named band1,band2,band3 and so on. I have 6 bands.

Yet I'm not experienced enough, so I could use some help as to why my code doesn't classifies.

ERROR:

"Loading required package: tcltk

Error in predict.randomForest(model, blockvals, ...) : variables in the training data missing in newdata"

Code:

setwd("D:/BA-Workspace/DOP_10/orthophotos_abcd/R/test_run_R/test_other")


library(sp)
library(rgdal)
library(raster)
library(randomForest)

# create list of rasters
rlist=list.files(getwd(), pattern="tif$", full.names=TRUE) 

# CREATE RASTER STACK
xvars <- stack(rlist)      

# READ Raster TRAINING DATA
sdata <- readOGR(dsn=getwd(), layer="sw_trainshape")

# ASSIGN RASTER VALUES TO TRAINING DATA
v <- as.data.frame(extract(xvars, sdata))
sdata@data = data.frame(sdata@data, v[match(rownames(sdata@data), rownames(v)),])

# RUN RF MODEL
rf.mdl <- randomForest(x=sdata@data[,3:ncol(sdata@data)],     y=as.factor(sdata@data[,"Class"]),
                   ntree=501, importance=TRUE)

# CHECK ERROR CONVERGENCE
#plot(rf.mdl)

# PLOT mean decrease in accuracy VARIABLE IMPORTANCE
#varImpPlot(rf.mdl, type=1)

# PREDICT MODEL
predict(xvars, rf.mdl, filename="RfClassPred.img", type="response", 
    index=1, na.rm=TRUE, progress="window", overwrite=TRUE)

added sdata@data

Console:

> setwd("D:/BA-Workspace/DOP_10/orthophotos_abcd/R/test_run_R/test_other")
> 
> 
> library(sp)
> library(rgdal)
> library(raster)
> library(randomForest)
> 
> 
> # CREATE LIST OF RASTERS
> rlist=list.files(getwd(), pattern="tif$", full.names=TRUE) 
> 
> # CREATE RASTER STACK
> xvars <- stack(rlist)      
> 
> # READ Raster TRAINING DATA
> sdata <- readOGR(dsn=getwd(), layer="sw_trainshape")
OGR data source with driver: ESRI Shapefile 
Source: "D:/BA-Workspace/DOP_10/orthophotos_abcd/R/test_run_R/test_other", layer:     "sw_trainshape"
with 256 features and 10 fields
Feature type: wkbPoint with 2 dimensions
> 
> # ASSIGN RASTER VALUES TO TRAINING DATA
> v <- as.data.frame(extract(xvars, sdata))
> sdata@data = data.frame(sdata@data, v[match(rownames(sdata@data), rownames(v)),])
> 
> # RUN RF MODEL
> rf.mdl <- randomForest(x=sdata@data[,3:ncol(sdata@data)],     y=as.factor(sdata@data[,"Class"]),
+                        ntree=501, importance=TRUE)
> 
> # CHECK ERROR CONVERGENCE
> #plot(rf.mdl)
> 
> # PLOT mean decrease in accuracy VARIABLE IMPORTANCE
> #varImpPlot(rf.mdl, type=1)
> #setOldClass(SpatialPointsDataFrame)
> # PREDICT MODEL
> predict(xvars, rf.mdl, filename="RfClassPred.img", type="response", 
+         index=1, na.rm=TRUE, progress="window", overwrite=TRUE)
Error in predict.randomForest(model, blockvals, ...) : 
  variables in the training data missing in newdata

enter image description here

Solution, thanks to TimSalabim:

setwd("D:/BA-Workspace/sw_west_aug/reduced_size/")


library(sp)
library(rgdal)
library(raster)
library(randomForest)


# CREATE LIST OF RASTERS
rlist=list.files(getwd(), pattern="tif$", full.names=TRUE) 

# CREATE RASTER STACK
xvars <- stack(rlist)  

# CREATE RASTER STACK
xvars <- stack(rlist)  
x <- coordinates(xvars)[, 1]
y <- coordinates(xvars)[, 2]

x_rst <- y_rst <- xvars[[1]]
x_rst[] <- x
y_rst[] <- y

xvars <- stack(x_rst, y_rst, xvars)
names(xvars) <- c("X", "Y", "focal_1", "focal_2", "focal_3")
# READ Raster TRAINING DATA
sdata <- readOGR(dsn=getwd(), layer="training_west")

# ASSIGN RASTER VALUES TO TRAINING DATA
v <- as.data.frame(extract(xvars, sdata))
sdata@data = data.frame(sdata@data, v[match(rownames(sdata@data), rownames(v)),])

sdata@data  <- sdata@data[-c(5,6)] 

# RUN RF MODEL
rf.mdl <- randomForest(x=sdata@data[,3:ncol(sdata@data)],   y=as.factor(sdata@data[,"class"]),
                   ntree=501, importance=TRUE)

# CHECK ERROR CONVERGENCE
#plot(rf.mdl)
#sdata@data 

# PLOT mean decrease in accuracy VARIABLE IMPORTANCE
#varImpPlot(rf.mdl, type=1)
#setOldClass(SpatialPointsDataFrame)
# PREDICT MODEL
predict(xvars, rf.mdl, filename="RfClassPred.img", type="response", 
    index=1, na.rm=TRUE, progress="window", overwrite=TRUE)

If I click on sdata it warns me about the 256 entries. I click anyway and it opens a txt with this: <S4 object of class structure("SpatialPointsDataFrame", package = "sp")>
nothing more — steveomb, Aug 24 '14 at 16:44
the console then says: "Error in .External2(C_edit, name, file, title, editor) : unexpected '<' occurred on line 1 use a command like x <- edit() to recover In addition: Warning message: In edit.default(name, file, title, editor = defaultEditor) : deparse of an S4 object will not be source()able" — steveomb, Aug 24 '14 at 16:49
You need to be able to see what your dataframe looks like. To do this, you should be able to type on the command line: sdata@data. You will be interested in what the headers look like. — Aaron, Aug 24 '14 at 16:57
It is clear, as @Tim pointed out, that your raster stack and sdata@data are not equivalent. — Aaron, Aug 25 '14 at 13:37

score 3 · Accepted Answer · edited Aug 25 '14 at 14:02

3

You need to make sure that names(sdata@data[,3:ncol(sdata@data)]) and names(xvars) are exactly the same. Check this using

identical(names(sdata@data[,3:ncol(sdata@data)]), names(xvars))

If TRUE, your predict should run fine.

The edit related warnings/errors are irrelevant, they relate to you trying to display a SpatialPolygonsDataFrame (and S4 class object) as a standard data.frame in RStudio.

EDIT: It seems you have differences between your stack layer names and your sdata@data data frame. Make sure these are the same. If you would like to include x and y coordinates as layers to your stack (if this makes sense obviously depends on your objective) you could do it like this:

x <- coordinates(xvars)[, 1]
y <- coordinates(xvars)[, 2]

x_rst <- y_rst <- xvars[[1]]
x_rst[] <- x
y_rst[] <- y

Then you would need to add those to your stack at the appropriate position:

xvars <- stack(x_rst, y_rst, xvars)

Note also, that you have additional variables in your sdata@data data frame ("band1.1" etc). I don't know where these come from, maybe you are merging something earlier? Again, for predict() to work properly, the layers of your stack and the columns from your training data need to be identical (the names of these).

edited Aug 25 '14 at 14:02

Joseph

75,746
7
171
282

answered Aug 25 '14 at 08:06

TimSalabim

91
3

It says FALSE although all names are found in both, sdata and xvars. Could it be a problem that in addition to the fitting names there are more entries? The names match but do they have to be ordered? I will add an image with the columns. – steveomb Aug 25 '14 at 08:31
1

It is not enough if the names are found in both, they need to be identical! – TimSalabim Aug 25 '14 at 11:48
since xvars is a rasterstack made of the 'stack'function how am I able to add x and y there? – steveomb Aug 25 '14 at 12:25
I'm afraid that didn't work. I tried to perform RF with help of this thread: http://gis.stackexchange.com/questions/39021/how-to-perform-random-forest-land-classification?rq=1
They also use a shape which has the same content. Everything seems to be the same. Where is the difference? Why do I need to make a new column into xvars for x and y and they don't? Isn't that part of the following code? v <- as.data.frame(extract(xvars, sdata)) sdata@data = data.frame(sdata@data, v[match(rownames(sdata@data), rownames(v)),])
– steveomb Aug 25 '14 at 16:52
1

First of all, xvars is a RasterStack not a data frame so you're not adding columns but layers! Second, as I said, you don't need to create these x and y layers, but as you train your model using x and y, you will need them in your prediction too. If you don't want to use them, you need to remove them from your training data set. If you want to use them you need to have them in both the training data and the data used for the prediction. All in all, it still boils down to not having identical data in train() and predict(). – TimSalabim Aug 27 '14 at 10:18
ok, I understood that part and tried to add the coordinates to xvars, but I changed your suggested code into that new added one, because xvars had still this doubled raster (band1.1). Running the code brings up a new problem: it seems that my RAM is not big enough. after 15 minutes of stacking (xvars <- stack(x, y, rlist) it says "R-session terminated". I checked my used Ram and found out that the stack-function goes beyond 8,9gb (of total 8gb). Even with filtered bands (1st try was 5x5 mean, 2nd try was 13x13 mean, 3rd try was 21x21 mean) it didn't work cause of RAM. – steveomb Aug 27 '14 at 11:01
So I have 2 questions left:1. are my changes useful? 2. What can I do to save Ram, without reducing my usefull pixel-depth? – steveomb Aug 27 '14 at 11:03
1

According to the screenshot above, it is not xvars that contains a layer called "band1.1", but sdata. Anyway, the code you updated won't work this way, you need to first extract the appropriate coordinates (in your code x and y are identical) by supplying the column for x ([, 1]) and y ([, 2]) as in my suggestion. Then make sure your names are fine and if then you still run out of memory, you might want to find a bigger computer... I can help with that! – TimSalabim Aug 27 '14 at 11:47
solution added. – steveomb Aug 27 '14 at 13:50

Rf-Classification seems to work but gives an error 15 seconds later

1 Answers1