When I need to apply multiple functions to multiple columns sequentially and aggregate by multiple columns and want the results to be bound into a data frame I usually use aggregate()
in the following manner:
当我需要将多个函数按顺序应用到多个列,并按多个列聚合,并希望将结果绑定到一个数据框架时,我通常以以下方式使用aggregate():
# bogus functions
foo1 <- function(x){mean(x)*var(x)}
foo2 <- function(x){mean(x)/var(x)}
# for illustration purposes only
npk$block <- as.numeric(npk$block)
subdf <- aggregate(npk[,c("yield", "block")],
by = list(N = npk$N, P = npk$P),
FUN = function(x){c(col1 = foo1(x), col2 = foo2(x))})
Having the results in a nicely ordered data frame is achieved by using:
将结果放在有序的数据框中,可以使用:
df <- do.call(data.frame, subdf)
Can I avoid the call to do.call()
by somehow using aggregate()
smarter in this scenario or shorten the whole process by using another base R
solution from the start?
我是否可以通过在此场景中使用聚集()更智能的方法来避免对do.call()的调用,或者从一开始就使用另一个基本R解决方案来缩短整个过程?
1 个解决方案
#1
2
As @akrun suggested, dplyr
's summarise_each
is well-suited to the task.
正如@akrun所指出的,dplyr的summary se_each非常适合这个任务。
library(dplyr)
npk %>%
group_by(N, P) %>%
summarise_each(funs(foo1, foo2), yield, block)
# Source: local data frame [4 x 6]
# Groups: N
#
# N P yield_foo2 block_foo2 yield_foo1 block_foo1
# 1 0 0 2.432390 1 1099.583 12.25
# 2 0 1 1.245831 1 2205.361 12.25
# 3 1 0 1.399998 1 2504.727 12.25
# 4 1 1 2.172399 1 1451.309 12.25
#1
2
As @akrun suggested, dplyr
's summarise_each
is well-suited to the task.
正如@akrun所指出的,dplyr的summary se_each非常适合这个任务。
library(dplyr)
npk %>%
group_by(N, P) %>%
summarise_each(funs(foo1, foo2), yield, block)
# Source: local data frame [4 x 6]
# Groups: N
#
# N P yield_foo2 block_foo2 yield_foo1 block_foo1
# 1 0 0 2.432390 1 1099.583 12.25
# 2 0 1 1.245831 1 2205.361 12.25
# 3 1 0 1.399998 1 2504.727 12.25
# 4 1 1 2.172399 1 1451.309 12.25