如何在R中定义矢量化函数

时间:2022-05-12 21:21:48

As the title, I'd like to know how to define a vectorized function in R.

作为标题,我想知道如何在R中定义矢量化函数。

  • Is it just by using a loop in the function?
  • 它只是通过在函数中使用一个循环吗?
  • Is this method efficient?
  • 这方法有效吗?
  • And what's the best practice ?
  • 最好的做法是什么?

2 个解决方案

#1


26  

A loop at the R level is not vectorized. An R loop will be calling the same R code for each element of a vector, which will be inefficient. Vectorized functions usually refer to those that take a vector and operate on the entire vector in an efficient way. Ultimately this will involve some for of loop, but as that loop is being performed in a low-level language such as C it can be highly efficient and tailored to the particular task.

R级的循环不是矢量化的。一个R循环将为向量的每个元素调用相同的R代码,这将是低效的。矢量化的函数通常是指那些取矢量并以有效的方式对整个矢量进行操作的函数。最终,这将涉及到一些for循环,但是由于该循环是在低级语言(如C)中执行的,所以它可以非常高效,并且适合于特定的任务。

Consider this silly function to add pairwise the elements of two vectors

考虑这个愚蠢的函数,将两个向量的元素成对地相加

sillyplus <- function(x, y) {
    out <- numeric(length = length(x))
    for(i in seq_along(x)) {
        out[i] <- x[i] + y[i]
    }
    out
}

It gives the right result

它给出了正确的结果

R> sillyplus(1:10, 1:10)
 [1]  2  4  6  8 10 12 14 16 18 20

and is vectorised in the sense that it can operate on entire vectors at once, but it is not vectorised in the sense I describe above because it is exceptionally inefficient. + is vectorised at the C level in R so we really only need 1:10 + 1:10, not an explicit loop in R.

矢量化的意义是它可以同时作用于整个向量,但它不像我上面描述的那样矢量化因为它效率非常低。在R的C级上,我们只需要1:10 + 1:10,而不是R中的显式循环。

The usual way to write a vectorised function is to use existing R functions that are already vectorised. If you want to start from scratch and the thing you want to do with the function doesn't exist as a vectorised function in R (odd, but possible) then you will need to get your hands dirty and write the guts of the function in C and prepare a little wrapper in R to call the C function you wrote with the vector of data you want it to work on. There are ways with functions like Vectorize() to fake vectorisation for R functions that are not vectorised.

通常编写矢量化函数的方法是使用已经矢量化的R函数。如果你想从头开始,你想要做的事与函数不存在vectorised函数R(很奇怪,但是可能的),那么您将需要弄脏你的手,用C编写的核心函数和准备一个小包装R与向量调用C函数你写你想要的数据。有一些方法可以使用像Vectorize()这样的函数来伪造没有向量化的R函数的向量化。

C is not the only option here, FORTRAN is a possibility as is C++ and, thanks to Dirk Eddelbuettel & Romain Francois, the latter is much easier to do now with the rcpp package.

C不是这里唯一的选择,FORTRAN和c++都有可能,而且多亏了Dirk Eddelbuettel和Romain Francois,后者在rcpp包中更容易实现。

#2


7  

A vectorized function will return a vector of the same length as one of its arguments. Generally one can get such a function by using combinations of built-in functions like "+", cos or exp that are vectorized as well.

矢量化的函数将返回与其参数之一相同长度的向量。通常,可以通过使用“+”、cos或exp等内置函数的组合来获得这样的函数。

vecexpcos <- function(x) exp(cos(x))
vecexpcos( (1:10)*pi )
>    vecexpcos( (1:10)*pi )
# [1] 0.3678794 2.7182818 0.3678794 2.7182818 0.3678794 2.7182818 0.3678794 2.7182818 0.3678794 2.7182818

If you need to use a non-vectorized function like sum, you may need to invoke mapply or Vectorize in order to get the desired behavior.

如果需要使用像sum这样的非向量化函数,则可能需要调用mapply或Vectorize以获得所需的行为。

#1


26  

A loop at the R level is not vectorized. An R loop will be calling the same R code for each element of a vector, which will be inefficient. Vectorized functions usually refer to those that take a vector and operate on the entire vector in an efficient way. Ultimately this will involve some for of loop, but as that loop is being performed in a low-level language such as C it can be highly efficient and tailored to the particular task.

R级的循环不是矢量化的。一个R循环将为向量的每个元素调用相同的R代码,这将是低效的。矢量化的函数通常是指那些取矢量并以有效的方式对整个矢量进行操作的函数。最终,这将涉及到一些for循环,但是由于该循环是在低级语言(如C)中执行的,所以它可以非常高效,并且适合于特定的任务。

Consider this silly function to add pairwise the elements of two vectors

考虑这个愚蠢的函数,将两个向量的元素成对地相加

sillyplus <- function(x, y) {
    out <- numeric(length = length(x))
    for(i in seq_along(x)) {
        out[i] <- x[i] + y[i]
    }
    out
}

It gives the right result

它给出了正确的结果

R> sillyplus(1:10, 1:10)
 [1]  2  4  6  8 10 12 14 16 18 20

and is vectorised in the sense that it can operate on entire vectors at once, but it is not vectorised in the sense I describe above because it is exceptionally inefficient. + is vectorised at the C level in R so we really only need 1:10 + 1:10, not an explicit loop in R.

矢量化的意义是它可以同时作用于整个向量,但它不像我上面描述的那样矢量化因为它效率非常低。在R的C级上,我们只需要1:10 + 1:10,而不是R中的显式循环。

The usual way to write a vectorised function is to use existing R functions that are already vectorised. If you want to start from scratch and the thing you want to do with the function doesn't exist as a vectorised function in R (odd, but possible) then you will need to get your hands dirty and write the guts of the function in C and prepare a little wrapper in R to call the C function you wrote with the vector of data you want it to work on. There are ways with functions like Vectorize() to fake vectorisation for R functions that are not vectorised.

通常编写矢量化函数的方法是使用已经矢量化的R函数。如果你想从头开始,你想要做的事与函数不存在vectorised函数R(很奇怪,但是可能的),那么您将需要弄脏你的手,用C编写的核心函数和准备一个小包装R与向量调用C函数你写你想要的数据。有一些方法可以使用像Vectorize()这样的函数来伪造没有向量化的R函数的向量化。

C is not the only option here, FORTRAN is a possibility as is C++ and, thanks to Dirk Eddelbuettel & Romain Francois, the latter is much easier to do now with the rcpp package.

C不是这里唯一的选择,FORTRAN和c++都有可能,而且多亏了Dirk Eddelbuettel和Romain Francois,后者在rcpp包中更容易实现。

#2


7  

A vectorized function will return a vector of the same length as one of its arguments. Generally one can get such a function by using combinations of built-in functions like "+", cos or exp that are vectorized as well.

矢量化的函数将返回与其参数之一相同长度的向量。通常,可以通过使用“+”、cos或exp等内置函数的组合来获得这样的函数。

vecexpcos <- function(x) exp(cos(x))
vecexpcos( (1:10)*pi )
>    vecexpcos( (1:10)*pi )
# [1] 0.3678794 2.7182818 0.3678794 2.7182818 0.3678794 2.7182818 0.3678794 2.7182818 0.3678794 2.7182818

If you need to use a non-vectorized function like sum, you may need to invoke mapply or Vectorize in order to get the desired behavior.

如果需要使用像sum这样的非向量化函数,则可能需要调用mapply或Vectorize以获得所需的行为。