似乎purrr
(0.2.2)软件包的当前版本是最快的解决方案:
by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out
让我们比较最有趣的解决方案:
data("Batting", package = "Lahman")
x <- Batting[1:10000, 1:10]
library(benchr)
library(purrr)
benchmark(
split = split(x, seq_len(.row_names_info(x, 2L))),
mapply = .mapply(function(...) structure(list(...), class = "data.frame", row.names = 1L), x, NULL),
purrr = by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out
)
结果:
Benchmark summary:
Time units : milliseconds
expr n.eval min lw.qu median mean up.qu max total relative
split 100 983.0 1060.0 1130.0 1130.0 1180.0 1450 113000 34.3
mapply 100 826.0 894.0 963.0 972.0 1030.0 1320 97200 29.3
purrr 100 24.1 28.6 32.9 44.9 40.5 183 4490 1.0
我们也可以通过以下方法获得相同的结果Rcpp
:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List df2list(const DataFrame& x) {
std::size_t nrows = x.rows();
std::size_t ncols = x.cols();
CharacterVector nms = x.names();
List res(no_init(nrows));
for (std::size_t i = 0; i < nrows; ++i) {
List tmp(no_init(ncols));
for (std::size_t j = 0; j < ncols; ++j) {
switch(TYPEOF(x[j])) {
case INTSXP: {
if (Rf_isFactor(x[j])) {
IntegerVector t = as<IntegerVector>(x[j]);
RObject t2 = wrap(t[i]);
t2.attr("class") = "factor";
t2.attr("levels") = t.attr("levels");
tmp[j] = t2;
} else {
tmp[j] = as<IntegerVector>(x[j])[i];
}
break;
}
case LGLSXP: {
tmp[j] = as<LogicalVector>(x[j])[i];
break;
}
case CPLXSXP: {
tmp[j] = as<ComplexVector>(x[j])[i];
break;
}
case REALSXP: {
tmp[j] = as<NumericVector>(x[j])[i];
break;
}
case STRSXP: {
tmp[j] = as<std::string>(as<CharacterVector>(x[j])[i]);
break;
}
default: stop("Unsupported type '%s'.", type2name(x));
}
}
tmp.attr("class") = "data.frame";
tmp.attr("row.names") = 1;
tmp.attr("names") = nms;
res[i] = tmp;
}
res.attr("names") = x.attr("row.names");
return res;
}
现在与purrr
:
benchmark(
purrr = by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out,
rcpp = df2list(x)
)
结果:
Benchmark summary:
Time units : milliseconds
expr n.eval min lw.qu median mean up.qu max total relative
purrr 100 25.2 29.8 37.5 43.4 44.2 159.0 4340 1.1
rcpp 100 19.0 27.9 34.3 35.8 37.2 93.8 3580 1.0
split
每个元素后都使用类型data.frame with 1 rows and N columns
而不是list of length N