In reference to this question, I was trying to figure out the simplest way to apply a list of functions to a list of values. Basically, a nested lapply
. For example, here we apply sd
and mean
to built in data set trees
:
关于这个问题,我试图找出将函数列表应用到值列表的最简单方法。基本上,一个嵌套的拉普兰人。例如,我们在数据集树中应用sd和mean:
funs <- list(sd=sd, mean=mean)
sapply(funs, function(x) sapply(trees, x))
to get:
得到:
sd mean
Girth 3.138139 13.24839
Height 6.371813 76.00000
Volume 16.437846 30.17097
But I was hoping to avoid the inner function
and have something like:
但我希望避免内在的功能,有如下的东西:
sapply(funs, sapply, X=trees)
which doesn't work because X
matches the first sapply
instead of the second. We can do it with functional::Curry
:
因为X匹配第一个sapply而不是第二个。我们可以用功能::咖喱:
sapply(funs, Curry(sapply, X=trees))
but I was hoping maybe there was a clever way to do this with positional and name matching that I'm missing.
但是我希望有一种聪明的方法来实现我所缺少的位置和名称匹配。
4 个解决方案
#1
18
Since mapply
use ellipsis ...
to pass vectors (atomics or lists) and not a named argument (X) as in sapply, lapply, etc ...
you don't need to name the parameter X = trees
if you use mapply instead of sapply :
由于mapply使用了省略号…传递向量(原子或列表)而不是在sapply、lapply等中指定的参数(X)。如果您使用mapply而不是sapply,您不需要命名参数X =树。
funs <- list(sd = sd, mean = mean)
x <- sapply(funs, function(x) sapply(trees, x))
y <- sapply(funs, mapply, trees)
> y
sd mean
Girth 3.138139 13.24839
Height 6.371813 76.00000
Volume 16.437846 30.17097
> identical(x, y)
[1] TRUE
You were one letter close to get what you were looking for ! :)
你离你要找的东西只有一步之遥!:)
Note that I used a list for funs
because I can't create a dataframe of functions, I got an error.
注意,我使用了一个函数列表,因为我不能创建函数的dataframe,所以出现了一个错误。
> R.version.string
[1] "R version 3.1.3 (2015-03-09)"
#2
13
You're basically going to need an anonymous function of some sort because there would be no other way to distinguish named parameters to the two different sapply
calls. You've already shown an explicit anonymous function and the Curry
method. You could also use magrittr
您基本上需要某种匿名函数,因为没有其他方法可以将命名参数与两个不同的sapply调用区分开来。您已经展示了显式匿名函数和Curry方法。你也可以用magrittr
library(magrittr)
sapply(funs, . %>% sapply(trees, .))
# or .. funs %>% sapply(. %>% sapply(trees, .))
but the point is you need something there to do the splitting. The "problem" is that sapply
dispatches to lapply
which is an internal function that seems determined to place the changing values as the beginning of the function call. You need something to reorder parameters and due to to the identical sets of parameter names it's not possible to tease that apart without a helper function to take care of the disambiguation.
但关键是你需要一些东西来做分裂。“问题”是sapply分派到lapply, lapply是一个内部函数,它似乎决定将更改的值作为函数调用的开始。您需要一些东西来重新排序参数,由于相同的参数名称集,如果没有帮助函数来处理消除歧义,就不可能将其分开。
The mapply
function does allow you to pass a list to "MoreArgs" which allows a way to get around the named parameter conflict. This is intended to split between the items you should vectorize over and those that are fixed. Thus you can do
mapply函数允许您将列表传递给“MoreArgs”,这允许您绕过命名参数冲突。这是为了在应该向量化的项目和那些已修复的项目之间进行分割。因此你可以做
mapply(sapply, funs, MoreArgs=list(X=trees))
# sd mean
# Girth 3.138139 13.24839
# Height 6.371813 76.00000
# Volume 16.437846 30.17097
#3
5
Another approach using purrr
would be:
另一种使用purrr的方法是:
require(purrr)
funs <- list(sd=sd, mean=mean)
trees %>% map_df(~invoke_map(funs, ,.), .id="id")
Important: Note the empty second argument of invoke_map
to match by position. See ?purrr::invoke_map
examples.
重要提示:注意invoke_map的空第二个参数将按位置匹配。看到了什么? purrr::invoke_map例子。
which gives you:
它给你:
Source: local data frame [3 x 3]
id sd mean
<chr> <dbl> <dbl>
1 Girth 3.138139 13.24839
2 Height 6.371813 76.00000
3 Volume 16.437846 30.17097
Instead of rownames this approach gives you a column id
containing the original columns.
这种方法不提供行名,而是提供包含原始列的列id。
#4
0
Though not as edifying nor as elegant as the solution presented by @Floo0, here is yet another take using tidyr and dplyr:
虽然不像@ flood 0提供的解决方案那样具有启发性,也不那么优雅,下面是使用tidyr和dplyr的另一个观点:
library(dplyr)
library(tidyr)
fns <- funs(sd = sd, mean = mean)
trees %>%
gather(property, value, everything()) %>%
group_by(property) %>%
summarise_all(fns)
# A tibble: 3 x 3
# property sd mean
# <chr> <dbl> <dbl>
# 1 Girth 3.138139 13.24839
# 2 Height 6.371813 76.00000
# 3 Volume 16.437846 30.17097
This sequence of operations does a decent job of signaling intent, at the cost of extra verbosity.
这个操作序列以额外的冗长为代价,在发送意图信号方面做得很好。
#1
18
Since mapply
use ellipsis ...
to pass vectors (atomics or lists) and not a named argument (X) as in sapply, lapply, etc ...
you don't need to name the parameter X = trees
if you use mapply instead of sapply :
由于mapply使用了省略号…传递向量(原子或列表)而不是在sapply、lapply等中指定的参数(X)。如果您使用mapply而不是sapply,您不需要命名参数X =树。
funs <- list(sd = sd, mean = mean)
x <- sapply(funs, function(x) sapply(trees, x))
y <- sapply(funs, mapply, trees)
> y
sd mean
Girth 3.138139 13.24839
Height 6.371813 76.00000
Volume 16.437846 30.17097
> identical(x, y)
[1] TRUE
You were one letter close to get what you were looking for ! :)
你离你要找的东西只有一步之遥!:)
Note that I used a list for funs
because I can't create a dataframe of functions, I got an error.
注意,我使用了一个函数列表,因为我不能创建函数的dataframe,所以出现了一个错误。
> R.version.string
[1] "R version 3.1.3 (2015-03-09)"
#2
13
You're basically going to need an anonymous function of some sort because there would be no other way to distinguish named parameters to the two different sapply
calls. You've already shown an explicit anonymous function and the Curry
method. You could also use magrittr
您基本上需要某种匿名函数,因为没有其他方法可以将命名参数与两个不同的sapply调用区分开来。您已经展示了显式匿名函数和Curry方法。你也可以用magrittr
library(magrittr)
sapply(funs, . %>% sapply(trees, .))
# or .. funs %>% sapply(. %>% sapply(trees, .))
but the point is you need something there to do the splitting. The "problem" is that sapply
dispatches to lapply
which is an internal function that seems determined to place the changing values as the beginning of the function call. You need something to reorder parameters and due to to the identical sets of parameter names it's not possible to tease that apart without a helper function to take care of the disambiguation.
但关键是你需要一些东西来做分裂。“问题”是sapply分派到lapply, lapply是一个内部函数,它似乎决定将更改的值作为函数调用的开始。您需要一些东西来重新排序参数,由于相同的参数名称集,如果没有帮助函数来处理消除歧义,就不可能将其分开。
The mapply
function does allow you to pass a list to "MoreArgs" which allows a way to get around the named parameter conflict. This is intended to split between the items you should vectorize over and those that are fixed. Thus you can do
mapply函数允许您将列表传递给“MoreArgs”,这允许您绕过命名参数冲突。这是为了在应该向量化的项目和那些已修复的项目之间进行分割。因此你可以做
mapply(sapply, funs, MoreArgs=list(X=trees))
# sd mean
# Girth 3.138139 13.24839
# Height 6.371813 76.00000
# Volume 16.437846 30.17097
#3
5
Another approach using purrr
would be:
另一种使用purrr的方法是:
require(purrr)
funs <- list(sd=sd, mean=mean)
trees %>% map_df(~invoke_map(funs, ,.), .id="id")
Important: Note the empty second argument of invoke_map
to match by position. See ?purrr::invoke_map
examples.
重要提示:注意invoke_map的空第二个参数将按位置匹配。看到了什么? purrr::invoke_map例子。
which gives you:
它给你:
Source: local data frame [3 x 3]
id sd mean
<chr> <dbl> <dbl>
1 Girth 3.138139 13.24839
2 Height 6.371813 76.00000
3 Volume 16.437846 30.17097
Instead of rownames this approach gives you a column id
containing the original columns.
这种方法不提供行名,而是提供包含原始列的列id。
#4
0
Though not as edifying nor as elegant as the solution presented by @Floo0, here is yet another take using tidyr and dplyr:
虽然不像@ flood 0提供的解决方案那样具有启发性,也不那么优雅,下面是使用tidyr和dplyr的另一个观点:
library(dplyr)
library(tidyr)
fns <- funs(sd = sd, mean = mean)
trees %>%
gather(property, value, everything()) %>%
group_by(property) %>%
summarise_all(fns)
# A tibble: 3 x 3
# property sd mean
# <chr> <dbl> <dbl>
# 1 Girth 3.138139 13.24839
# 2 Height 6.371813 76.00000
# 3 Volume 16.437846 30.17097
This sequence of operations does a decent job of signaling intent, at the cost of extra verbosity.
这个操作序列以额外的冗长为代价,在发送意图信号方面做得很好。