在R中循环有序集的函数方法

时间:2023-01-02 22:01:02

I'm trying to optimize an algorithm in R that runs over an ordered set of values and determines whether there are values 'in the future' ( further down the set ) that have a lower value than the given value.

我正在尝试在R中优化一个算法,它运行在一个有序的值集合上,并确定是否有值“在未来”(在集合的后面)比给定值更低。

For example:

例如:

+-------+--------------------------------+
| Value | RestOfSeriesContainsLowerValue |
+-------+--------------------------------+
| 5     | true                           |
| 4     | true                           |
| 2     | true                           |
| 1     | false                          |
| 3     | true                           |
| 4     | true                           |
| 4     | true                           |
| 3     | true                           |
| 3     | true                           |
| 2     | false                          |
| 2     | false                          |
| 2     | false                          |
| 7     | false                          |
| 8     | false                          |
| 9     | false                          |
| ...   | ...                            |
+-------+--------------------------------+

The local minima are values 1 and 2. Therefore RestOfSeriesContainsLowerValue for the first items in this set valuates to true - since there's a value (1) further down the set that has a lower value.

局部最小值为1和2。因此,这个集合中的第一个条目的restofseriesslowervalue值为true—因为在这个集合的后面有一个值(1),它的值更低。

After the 1 value - the 3 and 4 values valuate to true, since the new local minimum ( value 2 ) is coming up later down the set.

在1值之后——值为3和4的值为true,因为新的局部最小值(值2)将在后面的集合中出现。

We're currently using a for loop that runs over the - in pseudo code:

我们目前使用的for循环运行在- in伪代码上:

for (i in set) {
   if(value(i) <=  min(set[,i:end])) 
     RestOfSeriesContainsLowerValue(i) = true
   else
    RestOfSeriesContainsLowerValue(i) = false
}

However this is not efficient enough. I'm looking for a set based / functional way to write this in R but cannot get my head around it. Can I use lapply to do this?

然而,这还不够有效。我正在寻找一种以R为基础的/函数式的方法来写这个,但却无法得到它。我能用lapply做这个吗?

1 个解决方案

#1


2  

Your pseudo code in functional R code using lapply

使用lapply编写函数R代码中的伪代码

f <-function(value) unlist(lapply(seq_along(value), function(i)if(value[i] <=  min(value[i:length(value)]))FALSE else TRUE))

Vectorized code for achieving the same is

实现相同的矢量化代码是

f1 <- function(value)value > rev(cummin(rev(value)))

Depending on the sample size, the vectorized code can be arbitrarily faster. For n=100 it is about 10 times faster, 100 times faster for 1000, around 1000 times faster for 10000

根据示例大小,矢量化代码可以任意地更快。n=100时,速度快10倍,1000时快100倍,10000时快1000倍

value <- sample(1:100, 1000, replace = TRUE)
microbenchmark::microbenchmark(f(value), f1(value), unit="relative")
#Unit: relative
#     expr      min       lq     mean   median       uq      max neval
# f(value) 172.3758 174.2449 124.1607 107.5502 104.8017 96.85548   100
#f1(value)   1.0000   1.0000   1.0000   1.0000   1.0000  1.00000   100

#1


2  

Your pseudo code in functional R code using lapply

使用lapply编写函数R代码中的伪代码

f <-function(value) unlist(lapply(seq_along(value), function(i)if(value[i] <=  min(value[i:length(value)]))FALSE else TRUE))

Vectorized code for achieving the same is

实现相同的矢量化代码是

f1 <- function(value)value > rev(cummin(rev(value)))

Depending on the sample size, the vectorized code can be arbitrarily faster. For n=100 it is about 10 times faster, 100 times faster for 1000, around 1000 times faster for 10000

根据示例大小,矢量化代码可以任意地更快。n=100时,速度快10倍,1000时快100倍,10000时快1000倍

value <- sample(1:100, 1000, replace = TRUE)
microbenchmark::microbenchmark(f(value), f1(value), unit="relative")
#Unit: relative
#     expr      min       lq     mean   median       uq      max neval
# f(value) 172.3758 174.2449 124.1607 107.5502 104.8017 96.85548   100
#f1(value)   1.0000   1.0000   1.0000   1.0000   1.0000  1.00000   100