7

I am looking for a robust way to fill in missing values in some rasters. They all have a single layer. Missing values consist of single pixels to medium sized patches. Rasters are around 1000 x 1000 pixels in size and the largest patches are like 20x20 pixels.

I'm tempted to use aregImpute in the Hmisc R package.

Has anyone used it for this purpose?

this approach looks seems very cool but I think it is only meant to produce aesthetically pleasing corrections.

Detailed explanation of this:

All the rasters (I have 36 in total) share the same extent, they overlap and are aligned. Each raster is a different variable, I gathered variable from various sources (remote sensing, topographic and climatological). The original rasters come in various resolutions. The smallest being 30m. From there up they get as high as 1km. I resampled everything using cubic convolution (all the variables are continuos) to 1km. I have another 1km raster where I have data of a variable of interest for some sampled points. So I trained a model using those points and the other rasters as covariates to be able to generate a full raster of that variable. Unfortunately most covariate rasters have some missing values in them, actually not much but I would want to eliminate the problem entirely.

I would like to use R for this.

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
JEquihua
  • 1,087
  • 15
  • 30
  • What is the reason for the missingness and why are you filling in the values? (Both of these things matter in the selection of an appropriate solution.) What exactly do you mean by "robust"? (It has a technical statistical sense but it's not apparent yet how that would apply here.) – whuber Jun 26 '13 at 05:00
  • I am using the layers as covariates for a predictive model. The model I am using does not handle missing values, so It simply does not calculate the pixels with a missing value in any of the rasters, leaving holes in my "predicted layer". Maybe the word robust was poorly used, I apologize. What I would be looking for is that the imputation conserves the underlying relation between my covariates and my objective variable. I'm not sure how to call this, the manifold assumption? – JEquihua Jun 26 '13 at 05:11
  • Depending on the variable, the missigness is caused by sensor faultiness or measurement error replaced by a missing value. – JEquihua Jun 26 '13 at 05:25
  • Do your rasters overlap or not? If they do not overlap, or if the typical amount of overlap is only two or three rasters at any one point, then it would be difficult to get much value from aregImpute. Otherwise, that is a promising approach that would be even more attractive if you included spatial correlation terms in the model. – whuber Jun 26 '13 at 12:44
  • They all overlap perfectly. What do you mean by including spatial correlation terms in the model? One thing that worries me is that originally the rasters are in a finer resolutions than my objective variable. My variable of interest is at 1km but I have rasters that are at 30m. To train and then predict I first resampled the rasters to 1km. Would this come into play here? – JEquihua Jun 26 '13 at 15:03
  • 1
    Data missing due to sensing issues are always spatially correlated. I suspect that any reasonable method that accounts for this correlation, no matter how simple otherwise, would perform better than even the most sophisticated methods that neglect that correlation. The resampling could be an issue, but it is unclear what you have done. A more detailed explanation in your question would be welcome. (A good general principle is to perform your statistical analyses with original data rather than resampled data if you possibly can, to avoid artifacts of the resampling.) – whuber Jun 26 '13 at 15:59
  • Done, tell me if it helps you helping me :) – JEquihua Jun 26 '13 at 16:18
  • How would I include spatial correlation in the process? could I perhaps just include the coordinates as variables? – JEquihua Jun 27 '13 at 16:14
  • Coordinates as variables would likely help less than using lagged values of the covariates. Start with a model in which the value of the response Y at cell i in terms of the other rasters X_1, X_2, etc. is Y(i) = beta_1X_1[i] + ... etc., then add in terms of the form gamma_{j,k}X_j[k+i] where k ranges over a set of displacements. In other words, Y[i] is assumed to depend not only on the values of the X at the same location, but also on the values of the X at nearby locations. – whuber Jun 27 '13 at 16:26
  • How about using region growth algorithm? Please refer to this thread: http://gis.stackexchange.com/questions/9935/looking-for-a-region-growing-algorithm – EvilInside Aug 27 '13 at 23:00

1 Answers1

5

I am the author of the R package gapfill, which is a flexible tool to predict missing values in spatio-temporal remote sensing data sets. https://CRAN.R-project.org/package=gapfill It could be helpful in your case.

For an overview of published methods to predict missing values in remote sensing data sets see Table 1 of the corresponding publication https://doi.org/10.1109/TGRS.2017.2785240.

Nairolf
  • 211
  • 2
  • 7