从矩阵的每一列得到最小值的最快方法?

时间:2022-05-23 22:51:50

What is the fastest way to extract the min from each column in a matrix?

从矩阵的每一列中提取最小值的最快方法是什么?


EDIT:

Moved all the benchmarks to the answer below.

将所有基准移动到下面的答案。

Using a Tall, Short or Wide Matrix:

  ##  TEST DATA
  set.seed(1)
  matrix.inputs <- list(
        "Square Matrix"     = matrix(sample(seq(1e6), 4^2*1e4, T), ncol=400),   #  400 x  400
        "Tall Matrix"       = matrix(sample(seq(1e6), 4^2*1e4, T), nrow=4000),  # 4000 x   40
        "Wide-short Matrix" = matrix(sample(seq(1e6), 4^2*1e4, T), ncol=4000),  #   40 x 4000
        "Wide-tall Matrix"  = matrix(sample(seq(1e6), 4^2*1e5, T), ncol=4000),   #  400 x 4000
        "Tiny Sq Matrix"    = matrix(sample(seq(1e6), 4^2*1e2, T), ncol=40)     #   40 x   40
  )

6 个解决方案

#1


8  

Here is one that is faster on square and wide matrices. It uses pmin on the rows of the matrix. (If you know a faster way of splitting the matrix into its rows, please feel free to edit)

这是一个在方阵和宽阵上速度更快的。它在矩阵的行上使用pmin。(如果您知道更快地将矩阵分解为行,请随意编辑)

do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))

Using the same benchmark as @RicardoSaporta:

使用与@RicardoSaporta相同的基准:

$`Square Matrix`
          test elapsed relative
3 pmin.on.rows   1.370    1.000
1          apl   1.455    1.062
2         cmin   2.075    1.515

$`Wide Matrix`
      test elapsed relative
3 pmin.on.rows   0.926    1.000
2         cmin   2.302    2.486
1          apl   5.058    5.462

$`Tall Matrix`
          test elapsed relative
1          apl   1.175    1.000
2         cmin   2.126    1.809
3 pmin.on.rows   5.813    4.947

#2


10  

The sos package is great for answering these sorts of questions.

sos套餐非常适合回答这类问题。

library("sos")
findFn("colMins")
library("matrixStats")
?colMins

http://finzi.psych.upenn.edu/R/library/matrixStats/html/rowRanges.html

http://finzi.psych.upenn.edu/R/library/matrixStats/html/rowRanges.html

Oddly enough, for the one example I tried colMins was slower. Perhaps someone can point out what's funny about my example?

奇怪的是,我尝试过的一个例子是colMins的速度。也许有人能指出我的例子中有趣的地方?

set.seed(101); z <- matrix(runif(1e6),nrow=1000)
library(rbenchmark)
benchmark(colMins(z),apply(z,2,min))
##               test replications elapsed relative user.self sys.self
## 2 apply(z, 2, min)          100  14.290     1.00     7.216    7.057
## 1       colMins(z)          100  25.585     1.79    15.509    9.852

#3


5  

Update 2014-12-17:

更新2014-12-17:

colMins() et al. were made significantly faster in a recent version of matrixStats. Here's an updated benchmark summary using matrixStats 0.12.2 showing that it ("cmin") is ~5-20 times faster than the second fastest approach:

colMins()等在最新版本的matrixStats中得到了显著提高。这里是一个更新的基准测试摘要,使用矩阵的0.12.2显示它(“cmin”)比第二快的方法快5-20倍:

$`Square Matrix`
     test elapsed relative
2    cmin   0.216    1.000
1     apl   4.200   19.444
5 pmn.int   4.604   21.315
4     pmn   5.136   23.778
3    lapl  12.546   58.083

$`Tall Matrix`
     test elapsed relative
2    cmin   0.262    1.000
1     apl   3.006   11.473
5 pmn.int  18.605   71.011
3    lapl  22.798   87.015
4     pmn  27.583  105.279

$`Wide-short Matrix`
     test elapsed relative
2    cmin   0.346    1.000
5 pmn.int   3.766   10.884
4     pmn   3.955   11.431
3    lapl  13.393   38.708
1     apl  19.187   55.454

$`Wide-tall Matrix`
     test elapsed relative
2    cmin   5.591    1.000
5 pmn.int  39.466    7.059
4     pmn  40.265    7.202
1     apl  67.151   12.011
3    lapl 158.035   28.266

$`Tiny Sq Matrix`
     test elapsed relative
2    cmin   0.011    1.000
5 pmn.int   0.135   12.273
4     pmn   0.178   16.182
1     apl   0.202   18.364
3    lapl   0.269   24.455

Previous comment 2013-10-09:
FYI, since matrixStats v0.8.7 (2013-07-28), colMins() is roughly twice as fast as before. The reason is that the function previously utilized colRanges(), which explains the "surprisingly slow" results reported here. Same speed is seen for colMaxs(), rowMins() and rowMaxs().

前文评论2013-10-09:仅供参考,由于matrixStats v0.8.7 (2013-07-28), colMins()的速度大约是之前的两倍。原因是函数之前使用了colRanges(),这解释了这里报告的“异常缓慢”的结果。colMaxs()、rowMins()和rowMaxs()的速度相同。

#4


3  

lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)

which( ! apply(my.mat, 2, min, na.rm=T) ==
        sapply( split(my.mat, rep(1:dim(my.mat)[1], each=dim(my.mat)[2])), min) )
# named integer(0)

#5


2  

Below is a collection of the answers thus far. This will be updated as more answers are contributed.

下面是迄今为止的答案。随着更多的答案被提供,这将被更新。

BENCHMARKS

  library(rbenchmark)
  library(matrixStats)  # for colMins


  list.of.tests <- list (
        ## Method 1: apply()  [original]
        apl =expression(apply(mat, 2, min, na.rm=T)),

        ## Method 2:  matrixStats::colMins [contributed by @Ben Bolker ]
        cmin = expression(colMins(mat)),

        ## Method 3: lapply() + split()  [contributed by @DWin ]
        lapl = expression(lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)),

        ## Method 4: pmin() / pmin.int()  [contributed by @flodel ]
        pmn = expression(do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))),
        pmn.int = expression(do.call(pmin.int, lapply(1:nrow(mat), function(i)mat[i,]))) #,

        ## Method 5: ????
        #  e5 = expression(  ???  ),
        )  


  (times <- 
        lapply(matrix.inputs, function(mat)
            do.call(benchmark, args=c(list.of.tests, replications=500, order="relative"))[, c("test", "elapsed", "relative")]
  ))



  ############################# 
  #$         RESULTS         $#
  #$_________________________$#
  #############################

  # $`Square Matrix`
  #      test elapsed relative
  # 5 pmn.int   2.842    1.000
  # 4     pmn   3.622    1.274
  # 1     apl   3.670    1.291
  # 2    cmin   5.826    2.050
  # 3    lapl  41.817   14.714  

  # $`Tall Matrix`
  #      test elapsed relative
  # 1     apl   2.622    1.000
  # 2    cmin   5.561    2.121
  # 5 pmn.int  11.264    4.296
  # 4     pmn  18.142    6.919
  # 3    lapl  48.637   18.550  

  # $`Wide-short Matrix`
  #      test elapsed relative
  # 5 pmn.int   2.909    1.000
  # 4     pmn   3.018    1.037
  # 2    cmin   6.361    2.187
  # 1     apl  15.765    5.419
  # 3    lapl  41.479   14.259  

  # $`Wide-tall Matrix`
  #      test elapsed relative
  # 5 pmn.int  20.917    1.000
  # 4     pmn  26.188    1.252
  # 1     apl  38.635    1.847
  # 2    cmin  64.557    3.086
  # 3    lapl 434.761   20.785  

  # $`Tiny Sq Matrix`
  #      test elapsed relative
  # 5 pmn.int   0.112    1.000
  # 2    cmin   0.149    1.330
  # 4     pmn   0.174    1.554
  # 1     apl   0.180    1.607
  # 3    lapl   0.509    4.545

#6


1  

mat[(1:ncol(mat)-1)*nrow(mat)+max.col(t(-mat))] seems pretty fast, and it's base R.

mat[(1:ncol(mat)-1)*nrow(mat)+max.col(t(-mat))]似乎很快,而且是以R为底。

#1


8  

Here is one that is faster on square and wide matrices. It uses pmin on the rows of the matrix. (If you know a faster way of splitting the matrix into its rows, please feel free to edit)

这是一个在方阵和宽阵上速度更快的。它在矩阵的行上使用pmin。(如果您知道更快地将矩阵分解为行,请随意编辑)

do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))

Using the same benchmark as @RicardoSaporta:

使用与@RicardoSaporta相同的基准:

$`Square Matrix`
          test elapsed relative
3 pmin.on.rows   1.370    1.000
1          apl   1.455    1.062
2         cmin   2.075    1.515

$`Wide Matrix`
      test elapsed relative
3 pmin.on.rows   0.926    1.000
2         cmin   2.302    2.486
1          apl   5.058    5.462

$`Tall Matrix`
          test elapsed relative
1          apl   1.175    1.000
2         cmin   2.126    1.809
3 pmin.on.rows   5.813    4.947

#2


10  

The sos package is great for answering these sorts of questions.

sos套餐非常适合回答这类问题。

library("sos")
findFn("colMins")
library("matrixStats")
?colMins

http://finzi.psych.upenn.edu/R/library/matrixStats/html/rowRanges.html

http://finzi.psych.upenn.edu/R/library/matrixStats/html/rowRanges.html

Oddly enough, for the one example I tried colMins was slower. Perhaps someone can point out what's funny about my example?

奇怪的是,我尝试过的一个例子是colMins的速度。也许有人能指出我的例子中有趣的地方?

set.seed(101); z <- matrix(runif(1e6),nrow=1000)
library(rbenchmark)
benchmark(colMins(z),apply(z,2,min))
##               test replications elapsed relative user.self sys.self
## 2 apply(z, 2, min)          100  14.290     1.00     7.216    7.057
## 1       colMins(z)          100  25.585     1.79    15.509    9.852

#3


5  

Update 2014-12-17:

更新2014-12-17:

colMins() et al. were made significantly faster in a recent version of matrixStats. Here's an updated benchmark summary using matrixStats 0.12.2 showing that it ("cmin") is ~5-20 times faster than the second fastest approach:

colMins()等在最新版本的matrixStats中得到了显著提高。这里是一个更新的基准测试摘要,使用矩阵的0.12.2显示它(“cmin”)比第二快的方法快5-20倍:

$`Square Matrix`
     test elapsed relative
2    cmin   0.216    1.000
1     apl   4.200   19.444
5 pmn.int   4.604   21.315
4     pmn   5.136   23.778
3    lapl  12.546   58.083

$`Tall Matrix`
     test elapsed relative
2    cmin   0.262    1.000
1     apl   3.006   11.473
5 pmn.int  18.605   71.011
3    lapl  22.798   87.015
4     pmn  27.583  105.279

$`Wide-short Matrix`
     test elapsed relative
2    cmin   0.346    1.000
5 pmn.int   3.766   10.884
4     pmn   3.955   11.431
3    lapl  13.393   38.708
1     apl  19.187   55.454

$`Wide-tall Matrix`
     test elapsed relative
2    cmin   5.591    1.000
5 pmn.int  39.466    7.059
4     pmn  40.265    7.202
1     apl  67.151   12.011
3    lapl 158.035   28.266

$`Tiny Sq Matrix`
     test elapsed relative
2    cmin   0.011    1.000
5 pmn.int   0.135   12.273
4     pmn   0.178   16.182
1     apl   0.202   18.364
3    lapl   0.269   24.455

Previous comment 2013-10-09:
FYI, since matrixStats v0.8.7 (2013-07-28), colMins() is roughly twice as fast as before. The reason is that the function previously utilized colRanges(), which explains the "surprisingly slow" results reported here. Same speed is seen for colMaxs(), rowMins() and rowMaxs().

前文评论2013-10-09:仅供参考,由于matrixStats v0.8.7 (2013-07-28), colMins()的速度大约是之前的两倍。原因是函数之前使用了colRanges(),这解释了这里报告的“异常缓慢”的结果。colMaxs()、rowMins()和rowMaxs()的速度相同。

#4


3  

lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)

which( ! apply(my.mat, 2, min, na.rm=T) ==
        sapply( split(my.mat, rep(1:dim(my.mat)[1], each=dim(my.mat)[2])), min) )
# named integer(0)

#5


2  

Below is a collection of the answers thus far. This will be updated as more answers are contributed.

下面是迄今为止的答案。随着更多的答案被提供,这将被更新。

BENCHMARKS

  library(rbenchmark)
  library(matrixStats)  # for colMins


  list.of.tests <- list (
        ## Method 1: apply()  [original]
        apl =expression(apply(mat, 2, min, na.rm=T)),

        ## Method 2:  matrixStats::colMins [contributed by @Ben Bolker ]
        cmin = expression(colMins(mat)),

        ## Method 3: lapply() + split()  [contributed by @DWin ]
        lapl = expression(lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)),

        ## Method 4: pmin() / pmin.int()  [contributed by @flodel ]
        pmn = expression(do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))),
        pmn.int = expression(do.call(pmin.int, lapply(1:nrow(mat), function(i)mat[i,]))) #,

        ## Method 5: ????
        #  e5 = expression(  ???  ),
        )  


  (times <- 
        lapply(matrix.inputs, function(mat)
            do.call(benchmark, args=c(list.of.tests, replications=500, order="relative"))[, c("test", "elapsed", "relative")]
  ))



  ############################# 
  #$         RESULTS         $#
  #$_________________________$#
  #############################

  # $`Square Matrix`
  #      test elapsed relative
  # 5 pmn.int   2.842    1.000
  # 4     pmn   3.622    1.274
  # 1     apl   3.670    1.291
  # 2    cmin   5.826    2.050
  # 3    lapl  41.817   14.714  

  # $`Tall Matrix`
  #      test elapsed relative
  # 1     apl   2.622    1.000
  # 2    cmin   5.561    2.121
  # 5 pmn.int  11.264    4.296
  # 4     pmn  18.142    6.919
  # 3    lapl  48.637   18.550  

  # $`Wide-short Matrix`
  #      test elapsed relative
  # 5 pmn.int   2.909    1.000
  # 4     pmn   3.018    1.037
  # 2    cmin   6.361    2.187
  # 1     apl  15.765    5.419
  # 3    lapl  41.479   14.259  

  # $`Wide-tall Matrix`
  #      test elapsed relative
  # 5 pmn.int  20.917    1.000
  # 4     pmn  26.188    1.252
  # 1     apl  38.635    1.847
  # 2    cmin  64.557    3.086
  # 3    lapl 434.761   20.785  

  # $`Tiny Sq Matrix`
  #      test elapsed relative
  # 5 pmn.int   0.112    1.000
  # 2    cmin   0.149    1.330
  # 4     pmn   0.174    1.554
  # 1     apl   0.180    1.607
  # 3    lapl   0.509    4.545

#6


1  

mat[(1:ncol(mat)-1)*nrow(mat)+max.col(t(-mat))] seems pretty fast, and it's base R.

mat[(1:ncol(mat)-1)*nrow(mat)+max.col(t(-mat))]似乎很快,而且是以R为底。