What's happening here is largely the result of how Rcpp objects work.
In particular, CharacterVector acts as a pointer to a memory location.
By defining this memory location outside the for loop, the result is
a "global" pointer. That is, when an update to B occurs in the loop
this subsequently updates all variants of B that have been conveniently stored in the Rcpp::List. Hence, the repeated lines of "c" throughout
the list.
With this being said, it is a very, very, very bad idea to use .push_back() on any Rcpp data types as you will end up copying to and fro the ever expanding object. Copying will occur as Rcpp data types hide the underlying SEXP that controls the R object, which must be recreated. As a result, you should try one of the following approaches:
- Rearrange where the
Rcpp::CharacterVector is created to be inside the first for loop and preallocate Rcpp::List space.
- Switch to using only C++ standard library objects and convert at the end to the appropriate type.
std::list with std::vector<T> type T (i.e. std::string)
Rcpp::wrap(x) to return the correct object or modify the function return type from Rcpp::List to std::list<std::vector<T> >.
- Preallocate
Rcpp::List space and use std::vector<T> type T (i.e. std::string).
- Preallocate
Rcpp::List space and make a clone() of the Rcpp object before storing it in the list.
Option 1
Here we rearrange the function by moving the declaration for B into the
first loop, preallocate the list space, and access the output list normally.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
Rcpp::List char_expand_list_rearrange(Rcpp::CharacterMatrix A) {
Rcpp::List output(A.nrow());
for(int i = 0; i < A.nrow(); i++) {
Rcpp::CharacterVector B(A.ncol());
for(int j = 0; j < A.ncol(); j++) {
B[j] = A(i, j);
}
output[i] = B;
}
return output;
}
Option 2
Here we removed Rcpp::CharacterVector in favor of std::vector<std::string> and substituted Rcpp::List for std::list<std::vector<std::string> >. At the end, we convert the standard object to an Rcpp::List via Rcpp::wrap().
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
Rcpp::List char_expand_std_to_list(Rcpp::CharacterMatrix A) {
std::vector<std::string> B(A.ncol());
std::list<std::vector<std::string> > o;
for(int i = 0 ;i < A.nrow(); i++) {
for(int j = 0; j < A.ncol(); j++) {
B[j] = A(i, j);
}
o.push_back(B);
}
return Rcpp::wrap(o);
}
Giving:
mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
char_expand_std_to_list(mat)
# [[1]]
# [1] "a" "a" "a"
#
# [[2]]
# [1] "b" "b" "b"
#
# [[3]]
# [1] "c" "c" "c"
Option 3
Alternatively, you could aim to keep the Rcpp::List, but just declare the size
it is expecting ahead of time and still use a std::vector<T> element.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
Rcpp::List char_expand_list_vec(Rcpp::CharacterMatrix A) {
std::vector<std::string> B(A.ncol());
Rcpp::List o(A.nrow());
for(int i = 0; i < A.nrow(); i++) {
for(int j = 0; j < A.ncol(); j++) {
B[j] = A(i, j);
}
o[i] = B;
}
return o;
}
Option 4
Lastly, with space predefined for a list, there is an explicit clone of the
data at each iteration.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
Rcpp::List char_expand_list_clone(Rcpp::CharacterMatrix A) {
Rcpp::CharacterVector B(A.ncol());
Rcpp::List output(A.nrow());
for(int i = 0; i < A.nrow(); i++) {
for(int j = 0; j < A.ncol(); j++) {
B[j] = A(i, j);
}
output[i] = clone(B);
}
return output;
}
Benchmark
The benchmark results show that Option 1 with a rearrangement and preallocation
of space performs the best. The runner-up second is Option 4, which
involves cloning each vector before saving it into the Rcpp::List.
library("microbenchmark")
library("ggplot2")
mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
micro_mat_to_list =
microbenchmark(char_expand_list_rearrange(mat),
char_expand_std_to_list(mat),
char_expand_list_vec(mat),
char_expand_list_clone(mat))
micro_mat_to_list
# Unit: microseconds
# expr min lq mean median uq max neval
# char_expand_list_rearrange(mat) 1.501 1.9255 3.22054 2.1965 4.8445 6.797 100
# char_expand_std_to_list(mat) 2.869 3.2035 4.90108 3.7740 6.4415 27.627 100
# char_expand_list_vec(mat) 1.948 2.2335 3.83939 2.7130 5.2585 24.814 100
# char_expand_list_clone(mat) 1.562 1.9225 3.60184 2.2370 4.8435 33.965 100
