如何在dplyr 0.7中参数化函数调用?

时间:2022-12-15 14:29:37

The release of dplyr 0.7 includes a major overhaul of programming with dplyr. I read this document carefully, and I am trying to understand how it will impact my use of dplyr.

dplyr 0.7的发布包括对dplyr进行编程的重大改进。我仔细阅读了本文档,并试图了解它将如何影响我对dplyr的使用。

Here is a common idiom I use when building reporting and aggregation functions with dplyr:

这是我在使用dplyr构建报告和聚合函数时使用的常用习惯用法:

my_report <- function(data, grouping_vars) {
  data %>%
    group_by_(.dots=grouping_vars) %>%
    summarize(x_mean=mean(x), x_median=median(x), ...)
}

Here, grouping_vars is a vector of strings.

这里,grouping_vars是字符串的向量。

I like this idiom because I can pass in string vectors from other places, say a file or a Shiny app's reactive UI, but it's also not too bad for interactive work either.

我喜欢这个成语,因为我可以从其他地方传递字符串向量,例如文件或Shiny应用程序的反应性UI,但对于交互式工作也不是太糟糕。

However, in the new programming with dplyr vignette, I see no examples of how something like this can be done with the new dplyr. I only see examples of how passing strings is no longer the correct approach, and I have to use quosures instead.

但是,在使用dplyr vignette的新编程中,我没有看到使用新的dplyr可以完成这样的事情的示例。我只看到传递字符串不再是正确方法的示例,我必须使用quosures。

I'm happy to adopt quosures, but how exactly do I get from strings to the quosures expected by dplyr here? It doesn't seem feasible to expect the entire R ecosystem to provide quosures to dplyr - lots of times we're going to get strings and they'll have to be converted.

我很高兴采用quosures,但是我如何才能从字符串到dplyr预期的定义?期望整个R生态系统为dplyr提供数据似乎是不可行的 - 很多时候我们会得到字符串并且它们必须被转换。

Here is an example showing what you're now supposed to do, and how my old idiom doesn't work:

这是一个示例,显示您现在应该做什么,以及我的旧习语如何不起作用:

library(dplyr)
grouping_vars <- quo(am)
mtcars %>%
  group_by(!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#> # A tibble: 2 × 2
#>      am mean_cyl
#>   <dbl>    <dbl>
#> 1     0 6.947368
#> 2     1 5.076923

grouping_vars <- "am"
mtcars %>%
  group_by(!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#> # A tibble: 1 × 2
#>   `"am"` mean_cyl
#>    <chr>    <dbl>
#> 1     am   6.1875

3 个解决方案

#1


11  

dplyr will have a specialized group_by function group_by_at to deal with multiple grouping variables. It would be much easier to use the new member of the _at family:

dplyr将有一个专门的group_by函数group_by_at来处理多个分组变量。使用_at系列的新成员会容易得多:

# using the pre-release 0.6.0

cols <- c("am","gear")

mtcars %>%
    group_by_at(.vars = cols) %>%
    summarise(mean_cyl=mean(cyl))

# Source: local data frame [4 x 3]
# Groups: am [?]
# 
# am  gear mean_cyl
# <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

The .vars argument accepts both character/numeric vector or column names generated by vars:

.vars参数接受由vars生成的字符/数字向量或列名:

.vars

.vars

A list of columns generated by vars(), or a character vector of column names, or a numeric vector of column positions.

由vars()生成的列列表,或列名称的字符向量,或列位置的数字向量。

#2


9  

Here's the quick and dirty reference I wrote for myself.

这是我为自己写的快速而肮脏的参考资料。

# install.packages("rlang")
library(tidyverse)

dat <- data.frame(cat = sample(LETTERS[1:2], 50, replace = TRUE),
                  cat2 = sample(LETTERS[3:4], 50, replace = TRUE),
                  value = rnorm(50))

Representing column names with strings

Convert strings to symbol objects using rlang::sym and rlang::syms.

使用rlang :: sym和rlang :: syms将字符串转换为符号对象。

summ_var <- "value"
group_vars <- c("cat", "cat2")

summ_sym <- rlang::sym(summ_var)  # capture a single symbol
group_syms <- rlang::syms(group_vars)  # creates list of symbols

dat %>%
  group_by(!!!group_syms) %>%  # splice list of symbols into a function call
  summarize(summ = sum(!!summ_sym)) # slice single symbol into call

If you use !! or !!! outside of dplyr functions you will get an error.

如果你用!!要么 !!!在dplyr函数之外,您将收到错误。

The usage of rlang::sym and rlang::syms is identical inside functions.

rlang :: sym和rlang :: syms的用法在函数内部是相同的。

summarize_by <- function(df, summ_var, group_vars) {

  summ_sym <- rlang::sym(summ_var)
  group_syms <- rlang::syms(group_vars)

  df %>%
    group_by(!!!group_syms) %>%
    summarize(summ = sum(!!summ_sym))
}

We can then call summarize_by with string arguments.

然后我们可以使用字符串参数调用summarize_by。

summarize_by(dat, "value", c("cat", "cat2"))

Using non-standard evaluation for column/variable names

summ_quo <- quo(value)  # capture a single variable for NSE
group_quos <- quos(cat, cat2)  # capture list of variables for NSE

dat %>%
  group_by(!!!group_quos) %>%  # use !!! with both quos and rlang::syms
  summarize(summ = sum(!!summ_quo))  # use !! both quo and rlang::sym

Inside functions use enquo rather than quo. quos is okay though!?

summarize_by <- function(df, summ_var, ...) {

  summ_quo <- enquo(summ_var)  # can only capture a single value!
  group_quos <- quos(...)  # captures multiple values, also inside functions!?

  df %>%
    group_by(!!!group_quos) %>%
    summarize(summ = sum(!!summ_quo))
}

And then our function call is

然后我们的函数调用是

summarize_by(dat, value, cat, cat2)

#3


6  

If you want to group by possibly more than one column, you can use quos

如果要按可能多个列进行分组,则可以使用quos

grouping_vars <- quos(am, gear)
mtcars %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#      am  gear mean_cyl
#   <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

Right now, it doesn't seem like there's a great way to turn strings into quos. Here's one way that does work though

现在,似乎没有一种很好的方法可以将字符串变成混乱。这是一种有效的方法

cols <- c("am","gear")
grouping_vars <- rlang::parse_quosures(paste(cols, collapse=";"))
mtcars %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#      am  gear mean_cyl
#   <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

#1


11  

dplyr will have a specialized group_by function group_by_at to deal with multiple grouping variables. It would be much easier to use the new member of the _at family:

dplyr将有一个专门的group_by函数group_by_at来处理多个分组变量。使用_at系列的新成员会容易得多:

# using the pre-release 0.6.0

cols <- c("am","gear")

mtcars %>%
    group_by_at(.vars = cols) %>%
    summarise(mean_cyl=mean(cyl))

# Source: local data frame [4 x 3]
# Groups: am [?]
# 
# am  gear mean_cyl
# <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

The .vars argument accepts both character/numeric vector or column names generated by vars:

.vars参数接受由vars生成的字符/数字向量或列名:

.vars

.vars

A list of columns generated by vars(), or a character vector of column names, or a numeric vector of column positions.

由vars()生成的列列表,或列名称的字符向量,或列位置的数字向量。

#2


9  

Here's the quick and dirty reference I wrote for myself.

这是我为自己写的快速而肮脏的参考资料。

# install.packages("rlang")
library(tidyverse)

dat <- data.frame(cat = sample(LETTERS[1:2], 50, replace = TRUE),
                  cat2 = sample(LETTERS[3:4], 50, replace = TRUE),
                  value = rnorm(50))

Representing column names with strings

Convert strings to symbol objects using rlang::sym and rlang::syms.

使用rlang :: sym和rlang :: syms将字符串转换为符号对象。

summ_var <- "value"
group_vars <- c("cat", "cat2")

summ_sym <- rlang::sym(summ_var)  # capture a single symbol
group_syms <- rlang::syms(group_vars)  # creates list of symbols

dat %>%
  group_by(!!!group_syms) %>%  # splice list of symbols into a function call
  summarize(summ = sum(!!summ_sym)) # slice single symbol into call

If you use !! or !!! outside of dplyr functions you will get an error.

如果你用!!要么 !!!在dplyr函数之外,您将收到错误。

The usage of rlang::sym and rlang::syms is identical inside functions.

rlang :: sym和rlang :: syms的用法在函数内部是相同的。

summarize_by <- function(df, summ_var, group_vars) {

  summ_sym <- rlang::sym(summ_var)
  group_syms <- rlang::syms(group_vars)

  df %>%
    group_by(!!!group_syms) %>%
    summarize(summ = sum(!!summ_sym))
}

We can then call summarize_by with string arguments.

然后我们可以使用字符串参数调用summarize_by。

summarize_by(dat, "value", c("cat", "cat2"))

Using non-standard evaluation for column/variable names

summ_quo <- quo(value)  # capture a single variable for NSE
group_quos <- quos(cat, cat2)  # capture list of variables for NSE

dat %>%
  group_by(!!!group_quos) %>%  # use !!! with both quos and rlang::syms
  summarize(summ = sum(!!summ_quo))  # use !! both quo and rlang::sym

Inside functions use enquo rather than quo. quos is okay though!?

summarize_by <- function(df, summ_var, ...) {

  summ_quo <- enquo(summ_var)  # can only capture a single value!
  group_quos <- quos(...)  # captures multiple values, also inside functions!?

  df %>%
    group_by(!!!group_quos) %>%
    summarize(summ = sum(!!summ_quo))
}

And then our function call is

然后我们的函数调用是

summarize_by(dat, value, cat, cat2)

#3


6  

If you want to group by possibly more than one column, you can use quos

如果要按可能多个列进行分组,则可以使用quos

grouping_vars <- quos(am, gear)
mtcars %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#      am  gear mean_cyl
#   <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

Right now, it doesn't seem like there's a great way to turn strings into quos. Here's one way that does work though

现在,似乎没有一种很好的方法可以将字符串变成混乱。这是一种有效的方法

cols <- c("am","gear")
grouping_vars <- rlang::parse_quosures(paste(cols, collapse=";"))
mtcars %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#      am  gear mean_cyl
#   <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000