Given a (dummy) vector
给出(虚拟)向量
index=log(seq(10,20,by=0.5))
I want to compute the running mean with centered window and with tapered windows at each end, i.e. that the first entry is left untouched, the second is the average of a window size of 3, and so on until the specified window size is reached.
我想计算带有居中窗口的运行平均值,每端有锥形窗口,即第一个条目保持不变,第二个是窗口大小为3的平均值,依此类推,直到达到指定的窗口大小。
The answers given here: Calculating moving average, seem to all produce a shorter vector cutting off the start and end where the window is too large, for example:
这里给出的答案:计算移动平均线,似乎都会产生一个较短的向量,切断窗口太大的起点和终点,例如:
ma <- function(x,n=5){filter(x,rep(1/n,n), sides=2)}
ma(index)
Time Series:
Start = 1
End = 21
Frequency = 1
[1] NA NA 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315
[9] 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383
[17] 2.889599 2.917039 2.943746 NA NA
same goes for
同样的
rollmean(index,5)
from the zoo package
来自动物园的包裹
Is there a quick way of implementing tapered windows without resorting to coding up loops?
有没有一种快速的方法来实现锥形窗口而不需要编写循环编码?
3 个解决方案
#1
6
The width
argument of zoo::rollapply
can be a numeric vector.
zoo :: rollapply的width参数可以是数字向量。
Hence, in your example, you can use:
因此,在您的示例中,您可以使用:
rollapply(index, c(1, 3, 5, rep(5, 15), 5, 3, 1), mean)
# [1] 2.302585 2.350619 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383
# [17] 2.889599 2.917039 2.943746 2.970195 2.995732
And if n
is an odd integer, a general solution is:
如果n是一个奇整数,一般的解决方案是:
w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))
rollapply(index, w, mean)
Edit: If you care about performance, you can use a custom Rcpp function:
编辑:如果您关心性能,可以使用自定义Rcpp功能:
library(Rcpp)
cppFunction("NumericVector fasttapermean(NumericVector x, const int window = 5) {
const int n = x.size();
NumericVector y(n);
double s = x[0];
int w = 1;
for (int i = 0; i < n; i++) {
y[i] = s/w;
if (i < window/2) {
s += x[i + (w+1)/2] + x[i + (w+3)/2];
w += 2;
} else if (i > n - window/2 - 2) {
s -= x[i - (w-1)/2] + x[i - (w-3)/2];
w -= 2;
} else {
s += x[i + (w+1)/2] - x[i - (w-1)/2];
}
}
return y;
}")
New benchmark:
n <- 5
index <- log(seq(10, 200, by = .5))
w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))
bench::mark(
fasttapermean(index),
tapermean(index),
zoo::rollapply(index, w, mean)
)
# # A tibble: 3 x 14
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory time gc
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list> <list> <list>
# 1 fasttapermean(index) 4.7us 5.94us 5.56us 67.6us 168264. 5.52KB 0 10000 59.4ms <dbl [381]> <Rprofmem [2 x 3]> <bch:tm> <tibble [10,000 x 3]>
# 2 tapermean(index) 53.9us 79.68us 91.08us 405.8us 12550. 37.99KB 3 5951 474.2ms <dbl [381]> <Rprofmem [16 x 3]> <bch:tm> <tibble [5,954 x 3]>
# 3 zoo::rollapply(index, w, mean) 12.8ms 15.42ms 14.31ms 29.2ms 64.9 100.58KB 8 23 354.7ms <dbl [381]> <Rprofmem [44 x 3]> <bch:tm> <tibble [31 x 3]>
However if you care about (extreme) precision you should use the rollapply
method because the built-in mean
algorithm of R is more accurate than the naive sum-and-divide approach.
但是,如果您关心(极端)精度,则应使用rollapply方法,因为R的内置平均算法比天真的求和除法更准确。
Also note that the rollapply
method is the only one that allows you to use na.rm = TRUE
if needed.
另请注意,rollapply方法是唯一允许您在需要时使用na.rm = TRUE的方法。
#2
8
As rollapply
can be quite slow, it is often worth writing a simple bespoke function...
由于rollapply可能非常慢,因此通常需要编写一个简单的定制函数...
tapermean <- function(x, width=5){
taper <- pmin(width,
2*(seq_along(x))-1,
2*rev(seq_along(x))-1) #set taper pattern
lower <- seq_along(x)-(taper-1)/2 #lower index for each mean
upper <- lower+taper #upper index for each mean
x <- c(0, cumsum(x)) #sum x once
return((x[upper]-x[lower])/taper)} #return means
This is over 200x faster than the rollapply
solution...
这比rollapply解决方案快200倍以上......
library(microbenchmark)
index <- log(seq(10,200,by=0.5)) #longer version for testing
w <- c(seq(1,5,2),rep(5,length(index)-5-1),seq(5,1,-2)) #as in Scarabees answer
microbenchmark(tapermean(index),
rollapply(index,w,mean))
Unit: microseconds
expr min lq mean median uq max neval
tapermean(index) 185.562 193.9405 246.4123 210.6965 284.548 590.197 100
rollapply(index,w,mean) 48213.027 49681.0715 52053.7352 50583.4320 51756.378 97187.538 100
I rest my case!
我休息一下!
#3
4
Similarly to zoo::rollapply()
, you can also use gtools::running()
and change the width
argument. However, interestingly enough, @Andrew Gustar's function is still faster.
与zoo :: rollapply()类似,您也可以使用gtools :: running()并更改width参数。然而,有趣的是,@ Andrew Gustar的功能仍然更快。
require(tidyverse)
require(gtools)
require(zoo)
require(rbenchmark)
index <- rep(log(seq(10,20,by=0.5)),100)
benchmark("rollapply" = {
rollapply(index, c(1, 3, 5, rep(5, 15), 5, 3, 1), mean)
},
"tapermean" = {
tapermean(index)
},
"running" = {
running(index, fun=mean, width=c(1, 3, 5, rep(5, 15), 5, 3, 1),
simplify=TRUE)
},
replications = 1000,
columns = c("test", "replications", "elapsed","user.self",
"sys.self"))
test replications elapsed user.self sys.self
1 rollapply 1000 17.67 17.57 0.01
3 running 1000 32.24 32.23 0.00
2 tapermean 1000 0.14 0.14 0.00
#1
6
The width
argument of zoo::rollapply
can be a numeric vector.
zoo :: rollapply的width参数可以是数字向量。
Hence, in your example, you can use:
因此,在您的示例中,您可以使用:
rollapply(index, c(1, 3, 5, rep(5, 15), 5, 3, 1), mean)
# [1] 2.302585 2.350619 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383
# [17] 2.889599 2.917039 2.943746 2.970195 2.995732
And if n
is an odd integer, a general solution is:
如果n是一个奇整数,一般的解决方案是:
w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))
rollapply(index, w, mean)
Edit: If you care about performance, you can use a custom Rcpp function:
编辑:如果您关心性能,可以使用自定义Rcpp功能:
library(Rcpp)
cppFunction("NumericVector fasttapermean(NumericVector x, const int window = 5) {
const int n = x.size();
NumericVector y(n);
double s = x[0];
int w = 1;
for (int i = 0; i < n; i++) {
y[i] = s/w;
if (i < window/2) {
s += x[i + (w+1)/2] + x[i + (w+3)/2];
w += 2;
} else if (i > n - window/2 - 2) {
s -= x[i - (w-1)/2] + x[i - (w-3)/2];
w -= 2;
} else {
s += x[i + (w+1)/2] - x[i - (w-1)/2];
}
}
return y;
}")
New benchmark:
n <- 5
index <- log(seq(10, 200, by = .5))
w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))
bench::mark(
fasttapermean(index),
tapermean(index),
zoo::rollapply(index, w, mean)
)
# # A tibble: 3 x 14
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory time gc
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list> <list> <list>
# 1 fasttapermean(index) 4.7us 5.94us 5.56us 67.6us 168264. 5.52KB 0 10000 59.4ms <dbl [381]> <Rprofmem [2 x 3]> <bch:tm> <tibble [10,000 x 3]>
# 2 tapermean(index) 53.9us 79.68us 91.08us 405.8us 12550. 37.99KB 3 5951 474.2ms <dbl [381]> <Rprofmem [16 x 3]> <bch:tm> <tibble [5,954 x 3]>
# 3 zoo::rollapply(index, w, mean) 12.8ms 15.42ms 14.31ms 29.2ms 64.9 100.58KB 8 23 354.7ms <dbl [381]> <Rprofmem [44 x 3]> <bch:tm> <tibble [31 x 3]>
However if you care about (extreme) precision you should use the rollapply
method because the built-in mean
algorithm of R is more accurate than the naive sum-and-divide approach.
但是,如果您关心(极端)精度,则应使用rollapply方法,因为R的内置平均算法比天真的求和除法更准确。
Also note that the rollapply
method is the only one that allows you to use na.rm = TRUE
if needed.
另请注意,rollapply方法是唯一允许您在需要时使用na.rm = TRUE的方法。
#2
8
As rollapply
can be quite slow, it is often worth writing a simple bespoke function...
由于rollapply可能非常慢,因此通常需要编写一个简单的定制函数...
tapermean <- function(x, width=5){
taper <- pmin(width,
2*(seq_along(x))-1,
2*rev(seq_along(x))-1) #set taper pattern
lower <- seq_along(x)-(taper-1)/2 #lower index for each mean
upper <- lower+taper #upper index for each mean
x <- c(0, cumsum(x)) #sum x once
return((x[upper]-x[lower])/taper)} #return means
This is over 200x faster than the rollapply
solution...
这比rollapply解决方案快200倍以上......
library(microbenchmark)
index <- log(seq(10,200,by=0.5)) #longer version for testing
w <- c(seq(1,5,2),rep(5,length(index)-5-1),seq(5,1,-2)) #as in Scarabees answer
microbenchmark(tapermean(index),
rollapply(index,w,mean))
Unit: microseconds
expr min lq mean median uq max neval
tapermean(index) 185.562 193.9405 246.4123 210.6965 284.548 590.197 100
rollapply(index,w,mean) 48213.027 49681.0715 52053.7352 50583.4320 51756.378 97187.538 100
I rest my case!
我休息一下!
#3
4
Similarly to zoo::rollapply()
, you can also use gtools::running()
and change the width
argument. However, interestingly enough, @Andrew Gustar's function is still faster.
与zoo :: rollapply()类似,您也可以使用gtools :: running()并更改width参数。然而,有趣的是,@ Andrew Gustar的功能仍然更快。
require(tidyverse)
require(gtools)
require(zoo)
require(rbenchmark)
index <- rep(log(seq(10,20,by=0.5)),100)
benchmark("rollapply" = {
rollapply(index, c(1, 3, 5, rep(5, 15), 5, 3, 1), mean)
},
"tapermean" = {
tapermean(index)
},
"running" = {
running(index, fun=mean, width=c(1, 3, 5, rep(5, 15), 5, 3, 1),
simplify=TRUE)
},
replications = 1000,
columns = c("test", "replications", "elapsed","user.self",
"sys.self"))
test replications elapsed user.self sys.self
1 rollapply 1000 17.67 17.57 0.01
3 running 1000 32.24 32.23 0.00
2 tapermean 1000 0.14 0.14 0.00