This question can be considered related to this one, that helped me to improve the R performances in computing the mean on a big array. Unfortunately, in this case I'm trying to apply something more complex (like a quantile calculation).
这个问题可以被认为与这个问题有关,这有助于我提高计算大阵列均值的R性能。不幸的是,在这种情况下,我试图应用更复杂的东西(如分位数计算)。
I have a 4-D array with more than 40 millions of elements and I want to calculate the 66th percentile on a specific dimension. Here there is the MATLAB code:
我有一个包含超过4千万个元素的4-D数组,我想计算特定维度的第66个百分点。这里有MATLAB代码:
> n = randn(100, 50, 100, 20);
> tic; q = quantile(n, 0.66, 4); toc
Elapsed time is 0.440824 seconds.
Let's do something similar in R.
让我们在R中做类似的事情。
> n = array(rnorm(100*50*100*20), dim = c(100,50,100,20))
> start = Sys.time(); q = apply(n, 1:3, quantile, .66); print(Sys.time() - start)
Time difference of 1.600693 mins
I was aware of the better performances of MATLAB wrt R but in this case I don't know what to do. Probably I just need to wait 2 minutes instead of one second... I hope someone can suggest me any way to improve running times, anyway, thank you in advance...
我知道MATLAB wrt R的性能更好,但在这种情况下我不知道该怎么做。可能我只需要等待2分钟而不是一秒......我希望有人可以建议我改善运行时间,无论如何,提前谢谢你......
UPDATE I've applied some of the suggestions into the comments and I've reduced the running time:
更新我已将一些建议应用到评论中,并缩短了运行时间:
> start = Sys.time(); q = apply(n, 1:3, quantile, .66, names = FALSE); print(Sys.time() - start)
Time difference of 33.42773 secs
We're still far from the MATLAB performances but at least I've learnt something.
我们距离MATLAB的表现还很远,但至少我学到了一些东西。
UPDATE I put here some advancements related to `quantile' function discussed here. The running time of same code I've shown above has passed from 33 to 5 seconds...
更新我在这里讨论了与“分位数”功能相关的一些进步。我上面显示的相同代码的运行时间已经从33秒增加到5秒......
1 个解决方案
#1
5
The RcppOctave package calls the GNU Octave API functions; if you don't already know about GNU Octave, it is very similar to Matlab and aims for complete compatiility.
RcppOctave包调用GNU Octave API函数;如果您还不了解GNU Octave,它与Matlab非常相似,旨在实现完全兼容。
This is nearly as fast as Matlab direct...
这几乎和Matlab直接一样快......
library(RcppOctave)
set.seed(1)
n = array(rnorm(100*50*100*20), dim = c(100,50,100,20))
system.time( s <- octave_style_quantile(n, .66, 4) )
## user system elapsed
## 0.526 0.048 0.574
# *R* `quantile` argument `type=5` is the method that matches matlab.
system.time( r <- apply(n, 1:3, quantile, .66, names = FALSE, type=5) )
## user system elapsed
## 23.308 0.029 23.327
dim(r)
## [1] 100 50 100
identical(r,s)
## [1] TRUE
A fairly straight forward translation of Octave's quantile.m to R.
Octave的quantile.m到R的相当直接的翻译。
octave_style_quantile <- function (x, p=NULL, dim=NULL) {
if ( is.null(p) ) p <- c(0.00, 0.25, 0.50, 0.75, 1.00)
if ( is.null(dim) ) {
## Find the first non-singleton dimension.
dim <- which(dim(x) > 1)[1];
}
stopifnot( is.numeric(x)||is.logical(x),
is.numeric(p),
dim <= length(dim(x)) )
## Set the permutation vector.
perm <- seq_along(dim(x))
perm[1] <- dim
perm[dim] <- 1
## Permute dim to the 1st index.
x <- aperm(x, perm);
## Save the size of the permuted x N-d array.
sx = dim(x);
## Reshape to a 2-d array.
dim(x) <- c( sx[1], prod(sx[-1]) );
## Calculate the quantiles.
q = .CallOctave("quantile",x,p)
## Return the shape to the original N-d array.
dim(q) <- c( length(p), sx[-1] )
## Permute the 1st index back to dim.
q = aperm(q, perm);
if( any(dim(q)==1) ) dim(q) <- dim(q)[-which(dim(q)==1)]
q
}
#1
5
The RcppOctave package calls the GNU Octave API functions; if you don't already know about GNU Octave, it is very similar to Matlab and aims for complete compatiility.
RcppOctave包调用GNU Octave API函数;如果您还不了解GNU Octave,它与Matlab非常相似,旨在实现完全兼容。
This is nearly as fast as Matlab direct...
这几乎和Matlab直接一样快......
library(RcppOctave)
set.seed(1)
n = array(rnorm(100*50*100*20), dim = c(100,50,100,20))
system.time( s <- octave_style_quantile(n, .66, 4) )
## user system elapsed
## 0.526 0.048 0.574
# *R* `quantile` argument `type=5` is the method that matches matlab.
system.time( r <- apply(n, 1:3, quantile, .66, names = FALSE, type=5) )
## user system elapsed
## 23.308 0.029 23.327
dim(r)
## [1] 100 50 100
identical(r,s)
## [1] TRUE
A fairly straight forward translation of Octave's quantile.m to R.
Octave的quantile.m到R的相当直接的翻译。
octave_style_quantile <- function (x, p=NULL, dim=NULL) {
if ( is.null(p) ) p <- c(0.00, 0.25, 0.50, 0.75, 1.00)
if ( is.null(dim) ) {
## Find the first non-singleton dimension.
dim <- which(dim(x) > 1)[1];
}
stopifnot( is.numeric(x)||is.logical(x),
is.numeric(p),
dim <= length(dim(x)) )
## Set the permutation vector.
perm <- seq_along(dim(x))
perm[1] <- dim
perm[dim] <- 1
## Permute dim to the 1st index.
x <- aperm(x, perm);
## Save the size of the permuted x N-d array.
sx = dim(x);
## Reshape to a 2-d array.
dim(x) <- c( sx[1], prod(sx[-1]) );
## Calculate the quantiles.
q = .CallOctave("quantile",x,p)
## Return the shape to the original N-d array.
dim(q) <- c( length(p), sx[-1] )
## Permute the 1st index back to dim.
q = aperm(q, perm);
if( any(dim(q)==1) ) dim(q) <- dim(q)[-which(dim(q)==1)]
q
}