What is the fastest way to perform multiple logical comparisons in R?
在R中执行多个逻辑比较的最快方法是什么?
Consider for example the vector x
考虑例如向量x
set.seed(14)
x = sample(LETTERS[1:4], size=10, replace=TRUE)
I want to test if each entry of x
is either a "A" or a "B" (and not anything else). The following works
我想测试x的每个条目是“A”还是“B”(而不是其他任何东西)。以下作品
x == "A" | x == "B"
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
The above code loops three times through the length of the whole vector. Is there a way in R to loop only once and test for each item whether it satisfies one or another condition?
上面的代码在整个向量的长度上循环三次。 R中是否有一种方法只循环一次并测试每个项目是否满足一个或另一个条件?
1 个解决方案
#1
12
If your objective is just to make a single pass, that is pretty straightforward to write in Rcpp, even if you don't have much experience with C++:
如果您的目标只是进行一次传递,即使您没有太多使用C ++的经验,在Rcpp中编写也非常简单:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::LogicalVector single_pass(Rcpp::CharacterVector x, Rcpp::String a, Rcpp::String b) {
R_xlen_t i = 0, n = x.size();
Rcpp::LogicalVector result(n);
for ( ; i < n; i++) {
result[i] = (x[i] == a || x[i] == b);
}
return result;
}
For such a small object as the one used in your example, the slight overhead of .Call
(presumably) masks the speed of the Rcpp version,
对于像你的例子中使用的那样一个小对象,.Call(可能)的轻微开销掩盖了Rcpp版本的速度,
r_fun <- function(X) X == "A" | X == "B"
##
cpp_fun <- function(X) single_pass(X, "A", "B")
##
all.equal(r_fun(x), cpp_fun(x))
#[1] TRUE
microbenchmark::microbenchmark(
r_fun(x), cpp_fun(x), times = 1000L)
#Unit: microseconds
#expr min lq mean median uq max neval
#r_fun(x) 1.499 1.584 1.974156 1.6795 1.8535 37.903 1000
#cpp_fun(x) 1.860 2.334 3.042671 2.7450 3.1140 51.870 1000
But for larger vectors (I'm assuming this is your real intention), it is considerably faster:
但对于较大的向量(我假设这是你的真实意图),它会快得多:
x2 <- sample(LETTERS, 10E5, replace = TRUE)
##
all.equal(r_fun(x2), cpp_fun(x2))
# [1] TRUE
microbenchmark::microbenchmark(
r_fun(x2), cpp_fun(x2), times = 200L)
#Unit: milliseconds
#expr min lq mean median uq max neval
#r_fun(x2) 78.044518 79.344465 83.741901 80.999538 86.368627 149.5106 200
#cpp_fun(x2) 7.104929 7.201296 7.797983 7.605039 8.184628 10.7250 200
Here's a quick attempt at generalizing the above, if you have any use for it.
如果您有任何用处,可以快速尝试概括上述内容。
#1
12
If your objective is just to make a single pass, that is pretty straightforward to write in Rcpp, even if you don't have much experience with C++:
如果您的目标只是进行一次传递,即使您没有太多使用C ++的经验,在Rcpp中编写也非常简单:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::LogicalVector single_pass(Rcpp::CharacterVector x, Rcpp::String a, Rcpp::String b) {
R_xlen_t i = 0, n = x.size();
Rcpp::LogicalVector result(n);
for ( ; i < n; i++) {
result[i] = (x[i] == a || x[i] == b);
}
return result;
}
For such a small object as the one used in your example, the slight overhead of .Call
(presumably) masks the speed of the Rcpp version,
对于像你的例子中使用的那样一个小对象,.Call(可能)的轻微开销掩盖了Rcpp版本的速度,
r_fun <- function(X) X == "A" | X == "B"
##
cpp_fun <- function(X) single_pass(X, "A", "B")
##
all.equal(r_fun(x), cpp_fun(x))
#[1] TRUE
microbenchmark::microbenchmark(
r_fun(x), cpp_fun(x), times = 1000L)
#Unit: microseconds
#expr min lq mean median uq max neval
#r_fun(x) 1.499 1.584 1.974156 1.6795 1.8535 37.903 1000
#cpp_fun(x) 1.860 2.334 3.042671 2.7450 3.1140 51.870 1000
But for larger vectors (I'm assuming this is your real intention), it is considerably faster:
但对于较大的向量(我假设这是你的真实意图),它会快得多:
x2 <- sample(LETTERS, 10E5, replace = TRUE)
##
all.equal(r_fun(x2), cpp_fun(x2))
# [1] TRUE
microbenchmark::microbenchmark(
r_fun(x2), cpp_fun(x2), times = 200L)
#Unit: milliseconds
#expr min lq mean median uq max neval
#r_fun(x2) 78.044518 79.344465 83.741901 80.999538 86.368627 149.5106 200
#cpp_fun(x2) 7.104929 7.201296 7.797983 7.605039 8.184628 10.7250 200
Here's a quick attempt at generalizing the above, if you have any use for it.
如果您有任何用处,可以快速尝试概括上述内容。