0

So I have a dataframe which has multiple columns and many rows. I want to be able to assign the number of NAs across all the columns row by row to a new variable (NACount). Something like this:

Col1 Col2 Col3 Col4 NACount
 A     A   B    NA     1
 B     B   NA   NA     2

I built a loop to do this but my data set is huge so the loop takes forever! Here is my code:

    for(i in 1:nrow(dat)){

      temp = which(!is.na(dat[i,]))

      dat$NACount[[i]] = length(temp)

       }

Please help me find an easier approach/way to do this!

Thanks so much!

Arun kumar mahesh
  • 2,289
  • 2
  • 14
  • 22
Keith
  • 103
  • 1
  • 9

1 Answers1

3

Use rowSums:

dat[["NACount"]] <- rowSums(is.na(dat))

This is much faster than, say, apply:

microbenchmark::microbenchmark(
  rowSums = rowSums(is.na(dat)), 
  apply = apply(dat, 1, function(x) sum(is.na(x)))
)

Output:

Unit: microseconds
    expr     min       lq     mean  median       uq      max neval cld
 rowSums  78.033  88.4245 112.5160 106.839 116.1365  439.751   100  a 
   apply 632.643 657.8040 768.2667 674.395 725.2615 6124.064   100   b
JdeMello
  • 1,708
  • 15
  • 23