2

I have a very big matrix and I want to create a list of matrices transforming each column vector of the big matrix in matrices with (the same) specified dimensions (of course compatible with the vector length).

I did not find anything faster than using apply() at the moment. As I have to do it hundred of times I would need to speed up this piece of code.

library(Rfast)
bigmat <- matrnorm(25200, 9000)
iwant  <- apply(bigmat,2,function(x) matrix(x,ncol=9), simplify=FALSE)

length(iwant)
lapply(iwant, dim)

Do you have any advice?

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
  • Maybe have also a look at [Assigning Rcpp objects into an Rcpp List yields duplicates of the last element](https://stackoverflow.com/questions/37502121). – GKi Apr 18 '23 at 19:49

3 Answers3

3

Maybe an array is more comfortable than a list of matrices. Then this should be >5 times faster:

bigarray <- array(bigmat, dim=c(2800,9,9000))

You can go through the needed matrices by indexing the third dimension:

bigarray[,,14]

And you can convert it to a list, if really necessary:

biglist <- asplit(bigarray, 3)
Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
  • I found this additional solution that seems slightly faster than previous attempts: library(Rfast) bigmat<-matrnorm(25200,9000) iwant<-apply(bigmat,2,function(x) matrix(x,ncol=9),simplify=F) length(iwant) lapply(iwant,dim)library(Morpho) prova<-Morpho::array2list(array(bigmat,dim=c(2800,9,9000))) – Paolo Piras Apr 18 '23 at 12:18
1

Unfortunately asplit is not faster.

L <- lapply(asplit(bigmat, 2), matrix, ncol=9)

Another option is to use as.data.frame.

lapply(unname(as.data.frame(bigmat)), matrix, ncol=9)

As both didn't show improvement an Rcpp version.

Rcpp::cppFunction('Rcpp::List m2l(Rcpp::NumericMatrix A) {
  Rcpp::List output(A.ncol());
  for(int i = 0; i < A.ncol(); ++i) {
    output[i] = Rcpp::NumericMatrix(A.nrow()/9, 9, A.column(i).begin());
  }
  return output;
}')

m2l(bigmat)

Rcpp::cppFunction('Rcpp::List m2lB(Rcpp::NumericMatrix A) {
  Rcpp::List output(A.ncol());
  for(int i = 0; i < A.ncol(); ++i) {
    Rcpp::NumericMatrix B = Rcpp::no_init(A.nrow()/9, 9);
    for(int j = 0; j < A.nrow(); ++j) {
      B[j] = A(j, i);
    }
    output[i] = B;
  }
  return output;
}')

m2lB(bigmat)

set.seed(0)
bigmat <- matrix(rnorm(252*90),252)

bench::mark(check=TRUE,
"apply" = apply(bigmat,2,function(x) matrix(x,ncol=9),simplify=F),
"asplit" = lapply(asplit(bigmat, 2), matrix, ncol=9),
"asDF" = lapply(unname(as.data.frame(bigmat)), matrix, ncol=9),
"Rcpp" = m2l(bigmat),
"RcppB" = m2lB(bigmat),
"array" = `attributes<-`(asplit(`dim<-`(bigmat, c(dim(bigmat)[[1]]/9L, 9L, dim(bigmat)[[2]])), 3), NULL),
"lapply" = lapply(
    seq.int(ncol(m <- matrix(bigmat, nrow = nrow(bigmat) / 9)) / 9) - 1,
    function(k) m[, 9 * k + (1:9)]
  )
)
#  expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#  <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#1 apply      268.4µs 287.5µs     3416.     638KB     45.8  1269    17      371ms
#2 asplit     320.8µs 364.8µs     2529.     817KB     43.5  1046    18      414ms
#3 asDF       260.8µs 278.7µs     3476.     459KB     32.8  1484    14      427ms
#4 Rcpp        31.2µs  43.1µs    22611.     185KB     81.4  9165    33      405ms
#5 RcppB       29.4µs  36.9µs    26443.     185KB     98.2  9963    37      377ms
#6 array      261.9µs 283.8µs     3446.     812KB     57.2  1266    21      367ms
#7 lapply       138µs   156µs     6342.     402KB     51.4  2714    22      428ms

The Rcpp version is about 6 times faster than the original apply variant.

GKi
  • 37,245
  • 2
  • 26
  • 48
1

we can use lapply like below

lapply(
  seq.int(ncol(m <- matrix(bigmat, nrow = nrow(bigmat) / 9)) / 9) - 1,
  function(k) m[, 9 * k + (1:9)]
)

and the benchmarking result shows (borrow @GKi's benchmark code)

bench::mark(
  check = TRUE,
  "apply" = apply(bigmat, 2, function(x) matrix(x, ncol = 9), simplify = F),
  "asplit" = lapply(asplit(bigmat, 2), matrix, ncol = 9),
  "asDF" = lapply(unname(as.data.frame(bigmat)), matrix, ncol = 9),
  "Rcpp" = m2l(bigmat),
  "lapply" = lapply(
    seq.int(ncol(m <- matrix(bigmat, nrow = nrow(bigmat) / 9)) / 9) - 1,
    function(k) m[, 9 * k + (1:9)]
  )
)

# A tibble: 5 × 13
  expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <
dbl> <int> <dbl>   <bch:tm>
1 apply      310.2µs 446.8µs     2107.     639KB     36.8   744    13      353ms
2 asplit     350.1µs 445.5µs     2040.     817KB     42.7   717    15      352ms
3 asDF       279.7µs 391.8µs     2378.     459KB     29.8   879    11      370ms
4 Rcpp        38.8µs  59.3µs    15432.     185KB     72.3  4482    21      290ms
5 lapply       139µs 164.9µs     5505.     374KB     58.3  1700    18      309ms
# ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81