dplyr:如何在函数内使用group_by ?

时间:2021-03-29 22:31:03

I want to use use the dplyr::group_by function inside another function, but I do not know how to pass the arguments to this function.

我想使用dplyr::group_by函数在另一个函数中,但是我不知道如何将参数传递给这个函数。

Can someone provide a working example?

有人能提供一个工作的例子吗?

library(dplyr)
data(iris)
iris %.% group_by(Species) %.% summarise(n = n()) # 
## Source: local data frame [3 x 2]
##      Species  n
## 1  virginica 50
## 2 versicolor 50
## 3     setosa 50

mytable0 <- function(x, ...) x %.% group_by(...) %.% summarise(n = n())
mytable0(iris, "Species") # OK
## Source: local data frame [3 x 2]
##      Species  n
## 1  virginica 50
## 2 versicolor 50
## 3     setosa 50

mytable1 <- function(x, key) x %.% group_by(as.name(key)) %.% summarise(n = n())
mytable1(iris, "Species") # Wrong!
# Error: unsupported type for column 'as.name(key)' (SYMSXP)

mytable2 <- function(x, key) x %.% group_by(key) %.% summarise(n = n())
mytable2(iris, "Species") # Wrong!
# Error: index out of bounds

3 个解决方案

#1


51  

For programming, group_by_ is the counterpart to group_by:

对于编程,group_by_是group_by:

library(dplyr)

mytable <- function(x, ...) x %>% group_by_(...) %>% summarise(n = n())
mytable(iris, "Species")
# or iris %>% mytable("Species")

which gives:

这使:

     Species  n
1     setosa 50
2 versicolor 50
3  virginica 50

Update At the time this was written dplyr used %.% which is what was originally used above but now %>% is favored so have changed above to that to keep this relevant.

更新时,这是书面的dplyr使用%。%这是最初使用的,但现在%>%被看好,所以已经改变了,以保持这个相关性。

Update 2 regroup is now deprecated, use group_by_ instead.

更新2重组现在已被弃用,使用group_by_代替。

Update 3 group_by_(list(...)) now becomes group_by_(...) in new version of dplyr as per Roberto's comment.

更新3 group_by_(列表(…))现在变成group_by_(…)在新版本的dplyr中,按照Roberto的评论。

Update 4 Added minor variation suggested in comments.

更新4添加了在评论中建议的微小变化。

Update 5: With rlang/tidyeval it is now possible to do this:

更新5:使用rlang/tidyeval,现在可以这样做:

library(rlang)
mytable <- function(x, ...) {
  group_ <- syms(...)
  x %>% 
    group_by(!!!group_) %>% 
    summarise(n = n())
}
mytable(iris, "Species")

or passing Species unevaluated, i.e. no quotes around it:

或者传递未经评估的物种,也就是没有引号:

library(rlang)
mytable <- function(x, ...) {
  group_ <- quos(...)
  x %>% 
    group_by(!!!group_) %>% 
    summarise(n = n())
}
mytable(iris, Species)

#2


4  

UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.

更新:在dplyr 0.7.0中,您可以使用tidy eval来完成这个任务。

See http://dplyr.tidyverse.org/articles/programming.html for more details.

请参阅http://dplyr.tidyverse.org/articles/programming.html了解更多细节。

library(tidyverse)
data("iris")

my_table <- function(df, group_var) {
  group_var <- enquo(group_var)      # Create quosure
  df %>% 
    group_by(!!group_var) %>%        # Use !! to unquote the quosure
    summarise(n = n())
}

my_table(iris, Species)

> my_table(iris, Species)
# A tibble: 3 x 2
     Species     n
      <fctr> <int>
1     setosa    50
2 versicolor    50
3  virginica    50

#3


2  

Ugly as they come, but she works:

虽然他们很丑,但她工作:

mytable3 <- function(x, key) {
  my.call <- bquote(summarise(group_by(.(substitute(x)), NULL), n = n()))
  my.call[[2]][[3]] <- as.name(key)
  eval(my.call, parent.frame())
} 
mytable3(iris, "Species")
# Source: local data frame [3 x 2]
#
#      Species  n
# 1  virginica 50
# 2 versicolor 50
# 3     setosa 50

There are almost certainly cases that will cause this to break, but you get the idea. I don't think you can get around messing with the call. One other thing that did work but was even uglier is:

几乎可以肯定的是,这种情况会导致这种情况的发生,但你会明白的。我认为你不可能在电话里胡闹。还有一件事确实起了作用,但更丑陋的是:

mytable4 <- function(x, key) summarise(group_by(x, x[[key]]), n = n())

#1


51  

For programming, group_by_ is the counterpart to group_by:

对于编程,group_by_是group_by:

library(dplyr)

mytable <- function(x, ...) x %>% group_by_(...) %>% summarise(n = n())
mytable(iris, "Species")
# or iris %>% mytable("Species")

which gives:

这使:

     Species  n
1     setosa 50
2 versicolor 50
3  virginica 50

Update At the time this was written dplyr used %.% which is what was originally used above but now %>% is favored so have changed above to that to keep this relevant.

更新时,这是书面的dplyr使用%。%这是最初使用的,但现在%>%被看好,所以已经改变了,以保持这个相关性。

Update 2 regroup is now deprecated, use group_by_ instead.

更新2重组现在已被弃用,使用group_by_代替。

Update 3 group_by_(list(...)) now becomes group_by_(...) in new version of dplyr as per Roberto's comment.

更新3 group_by_(列表(…))现在变成group_by_(…)在新版本的dplyr中,按照Roberto的评论。

Update 4 Added minor variation suggested in comments.

更新4添加了在评论中建议的微小变化。

Update 5: With rlang/tidyeval it is now possible to do this:

更新5:使用rlang/tidyeval,现在可以这样做:

library(rlang)
mytable <- function(x, ...) {
  group_ <- syms(...)
  x %>% 
    group_by(!!!group_) %>% 
    summarise(n = n())
}
mytable(iris, "Species")

or passing Species unevaluated, i.e. no quotes around it:

或者传递未经评估的物种,也就是没有引号:

library(rlang)
mytable <- function(x, ...) {
  group_ <- quos(...)
  x %>% 
    group_by(!!!group_) %>% 
    summarise(n = n())
}
mytable(iris, Species)

#2


4  

UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.

更新:在dplyr 0.7.0中,您可以使用tidy eval来完成这个任务。

See http://dplyr.tidyverse.org/articles/programming.html for more details.

请参阅http://dplyr.tidyverse.org/articles/programming.html了解更多细节。

library(tidyverse)
data("iris")

my_table <- function(df, group_var) {
  group_var <- enquo(group_var)      # Create quosure
  df %>% 
    group_by(!!group_var) %>%        # Use !! to unquote the quosure
    summarise(n = n())
}

my_table(iris, Species)

> my_table(iris, Species)
# A tibble: 3 x 2
     Species     n
      <fctr> <int>
1     setosa    50
2 versicolor    50
3  virginica    50

#3


2  

Ugly as they come, but she works:

虽然他们很丑,但她工作:

mytable3 <- function(x, key) {
  my.call <- bquote(summarise(group_by(.(substitute(x)), NULL), n = n()))
  my.call[[2]][[3]] <- as.name(key)
  eval(my.call, parent.frame())
} 
mytable3(iris, "Species")
# Source: local data frame [3 x 2]
#
#      Species  n
# 1  virginica 50
# 2 versicolor 50
# 3     setosa 50

There are almost certainly cases that will cause this to break, but you get the idea. I don't think you can get around messing with the call. One other thing that did work but was even uglier is:

几乎可以肯定的是,这种情况会导致这种情况的发生,但你会明白的。我认为你不可能在电话里胡闹。还有一件事确实起了作用,但更丑陋的是:

mytable4 <- function(x, key) summarise(group_by(x, x[[key]]), n = n())