I am trying to write a function in tidyverse/dplyr
that I want to eventually use with lapply
(or map
). (I had been working on it to answer this question, but came upon an interesting result/dead-end. Please don't mark this as a duplicate - this question is an extension/departure from the answers that you see there.)
我试图在tidyverse / dplyr中编写一个函数,我想最终使用lapply(或map)。 (我一直在努力回答这个问题,但发现了一个有趣的结果/死胡同。请不要将其标记为重复 - 这个问题是你在那里看到的答案的延伸/背离。)
Is there
1) a way to get a list of quoted variables to work inside a dplyr function
(and not use the deprecated SE_
functions) or is there
2) some way to feed a list of unquoted strings through an lapply
or map
是否有1)获取引用变量列表以在dplyr函数内工作(并且不使用已弃用的SE_函数)的方法,或者是2)某种方式通过lapply或map提供未加引号的字符串列表
I have used the Programming in Dplyr
vignette to construct what I believe is a function most in line with the current standard for working with the NSE.
我使用Dplyr编程中的编程来构建我认为最符合当前使用NSE标准的函数。
The sample data:
sample_data <-
read.table(text = "REVENUEID AMOUNT YEAR REPORT_CODE PAYMENT_METHOD INBOUND_CHANNEL AMOUNT_CAT
1 rev-24985629 30 FY18 S Check Mail 25,50
2 rev-22812413 1 FY16 Q Other Canvassing 0.01,10
3 rev-23508794 100 FY17 Q Credit_card Web 100,250
4 rev-23506121 300 FY17 S Credit_card Mail 250,500
5 rev-23550444 100 FY17 S Credit_card Web 100,250
6 rev-21508672 25 FY14 J Check Mail 25,50
7 rev-24981769 500 FY18 S Credit_card Web 500,1e+03
8 rev-23503684 50 FY17 R Check Mail 50,75
9 rev-24982087 25 FY18 R Check Mail 25,50
10 rev-24979834 50 FY18 R Credit_card Web 50,75
", header = TRUE, stringsAsFactors = FALSE)
A report generating function
report <- function(report_cat){
report_cat <- enquo(report_cat)
sample_data %>%
group_by(!!report_cat, YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!report_cat) %>%
mutate(REPORT_CATEGORY := as.character(quote(!!report_cat))[2])
}
Which works fine for generating a single report:
这适用于生成单个报告:
> report(REPORT_CODE) # A tibble: 7 x 5 # Groups: REPORT_VALUE [4] REPORT_VALUE YEAR num total REPORT_CATEGORY <chr> <chr> <int> <int> <chr> 1 J FY14 1 25 REPORT_CODE 2 Q FY16 1 1 REPORT_CODE 3 Q FY17 1 100 REPORT_CODE 4 R FY17 1 50 REPORT_CODE 5 R FY18 2 75 REPORT_CODE 6 S FY17 2 400 REPORT_CODE 7 S FY18 2 530 REPORT_CODE
It is when I try and set up a list of all 4 of the reports to generate, that everything breaks down. (Though admittedly the code required in that last line of the function - to return a string with which to then fill the column - should be clue enough that I have wandered off in the wrong direction.)
当我尝试设置要生成的所有4个报告的列表时,一切都会崩溃。 (尽管可以肯定的是函数最后一行所需的代码 - 返回一个字符串,然后用它来填充列 - 应该足够线索以至于我已经走错了方向。)
#the other reports
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
# Applying and Mapping attempts
lapply(cat.list, report)
map_df(cat.list, report)
Which results in:
结果如下:
> lapply(cat.list, report) Error in (function (x, strict = TRUE) : the argument has already been evaluated > map_df(cat.list, report) Error in (function (x, strict = TRUE) : the argument has already been evaluated
I have also tried to convert the list of strings to names before handing it over to apply
and map
:
我还尝试将字符串列表转换为名称,然后将其移交给apply和map:
library(rlang)
cat.names <- lapply(cat.list, sym)
lapply(cat.names, report)
map_df(cat.names, report)
> lapply(cat.names, report) Error in (function (x, strict = TRUE) : the argument has already been evaluated > map_df(cat.names, report) Error in (function (x, strict = TRUE) : the argument has already been evaluated
In any case, the reason I am asking this question is that I think that I have written the function to the currently documented standards, but ultimately I can then see no way to utilize a member of the apply
or even of the purrr::map
family with such a function. Short of rewriting the function to use names
like useR has done here https://*.com/a/47316151/5088194 is there a way to get this function to work with apply
or map
?
在任何情况下,我问这个问题的原因是我认为我已经将函数写入当前记录的标准,但最终我可以看到无法利用apply的成员甚至是purrr :: map有这种功能的家庭。没有重写函数使用像useR这样的名称在这里完成https://*.com/a/47316151/5088194有没有办法让这个功能与apply或map一起使用?
I am hoping to see this as a result:
我希望看到这样的结果:
# A tibble: 27 x 5 # Groups: REPORT_VALUE [16] REPORT_VALUE YEAR num total REPORT_CATEGORY <chr> <chr> <int> <int> <chr> 1 J FY14 1 25 REPORT_CODE 2 Q FY16 1 1 REPORT_CODE 3 Q FY17 1 100 REPORT_CODE 4 R FY17 1 50 REPORT_CODE 5 R FY18 2 75 REPORT_CODE 6 S FY17 2 400 REPORT_CODE 7 S FY18 2 530 REPORT_CODE 8 Check FY14 1 25 PAYMENT_METHOD 9 Check FY17 1 50 PAYMENT_METHOD 10 Check FY18 2 55 PAYMENT_METHOD # ... with 17 more rows
3 个解决方案
#1
3
as.name
will convert a string to a name and that can be passed to report
:
as.name会将字符串转换为名称,并且可以将其传递给report:
lapply(cat.list, function(x) do.call("report", list(as.name(x))))
character argument An alternative is to rewrite report
so that it accepts a character string argument:
character argument另一种方法是重写报告,以便它接受一个字符串参数:
report_ch <- function(colname) {
report_cat <- rlang::sym(colname) # as.name(colname) would also work here
sample_data %>%
group_by(!!report_cat, YEAR) %>%
summarize(num = n(), total = sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!report_cat) %>%
mutate(REPORT_CATEGORY = colname)
}
lapply(cat.list, report_ch)
wrapr An alternate approach is to rewrite report
using the wrapr package which is an alternative to rlang/tidyeval:
wrapr另一种方法是使用wrapr包重写报告,它是rlang / tidyeval的替代方法:
library(dplyr)
library(wrapr)
report_wrapr <- function(colname)
let(c(COLNAME = colname),
sample_data %>%
group_by(COLNAME, YEAR) %>%
summarize(num = n(), total = sum(AMOUNT)) %>%
rename(REPORT_VALUE = COLNAME) %>%
mutate(REPORT_CATEGORY = colname)
)
lapply(cat.list, report_wrapr)
Of course, this whole problem would go away if you used a different framework, e.g.
当然,如果您使用不同的框架,例如,整个问题就会消失。
plyr
plyr
library(plyr)
report_plyr <- function(colname)
ddply(sample_data, c(REPORT_VALUE = colname, "YEAR"), function(x)
data.frame(num = nrow(x), total = sum(x$AMOUNT), REPORT_CATEOGRY = colname))
lapply(cat.list, report_plyr)
sqldf
sqldf
library(sqldf)
report_sql <- function(colname, envir = parent.frame(), ...)
fn$sqldf("select [$colname] REPORT_VALUE,
YEAR,
count(*) num,
sum(AMOUNT) total,
'$colname' REPORT_CATEGORY
from sample_data
group by [$colname], YEAR", envir = envir, ...)
lapply(cat.list, report_sql)
base - by
基地 -
report_base_by <- function(colname)
do.call("rbind",
by(sample_data, sample_data[c(colname, "YEAR")], function(x)
data.frame(REPORT_VALUE = x[1, colname],
YEAR = x$YEAR[1],
num = nrow(x),
total = sum(x$AMOUNT),
REPORT_CATEGORY = colname)
)
)
lapply(cat.list, report_base_by)
data.table The data.table package provides another alternative but that has already been covered by another answer.
data.table data.table包提供了另一种选择,但已经被另一个答案所涵盖。
Update: Added additional alternatives.
更新:添加了其他选择。
#2
2
I'm not really a dplyr afficionado, but for what its worth here is how you could achieve this using library(data.table)
instead:
我不是一个真正意义上的问题,但是它的价值在于你如何使用库(data.table)来实现这一点:
setDT(sample_data)
gen_report <- function(report_cat){
sample_data[ , .(num = .N, total = sum(AMOUNT), REPORT_CATEGORY = report_cat),
by = .(REPORT_VALUE = get(report_cat), YEAR)]
}
gen_report('REPORT_CODE')
lapply(cat.list, gen_report)
#3
2
Let me first point out that in your initial report
function, you can use quo_name
to convert the quosure into a string, which you can then use in mutate
like the following:
让我首先指出,在您的初始报告函数中,您可以使用quo_name将quosure转换为字符串,然后您可以在mutate中使用它,如下所示:
library(dplyr)
library(rlang)
report <- function(report_cat){
report_cat <- enquo(report_cat)
sample_data %>%
group_by(!!report_cat, YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!report_cat) %>%
mutate(REPORT_CATEGORY = quo_name(report_cat))
}
report(REPORT_CODE)
Now, to address your question of "how to feed a list of unquoted strings through lapply
or map
to make it work inside dplyr
functions", I propose two ways of doing it.
现在,为了解决“如何通过lapply或map提供未加引号的字符串列表以使其在dplyr函数内工作”的问题,我提出了两种方法。
1. Use rlang::sym
to parse your strings and unquote it when feeding into lapply
or map
library(purrr)
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
map_df(cat.list, ~report(!!sym(.)))
or with syms
you can parse all elements of a vector at once:
或者使用syms,您可以一次解析向量的所有元素:
map_df(syms(cat.list), ~report(!!.))
Result:
结果:
# A tibble: 27 x 5
# Groups: REPORT_VALUE [16]
REPORT_VALUE YEAR num total REPORT_CATEGORY
<chr> <chr> <int> <int> <chr>
1 J FY14 1 25 REPORT_CODE
2 Q FY16 1 1 REPORT_CODE
3 Q FY17 1 100 REPORT_CODE
4 R FY17 1 50 REPORT_CODE
5 R FY18 2 75 REPORT_CODE
6 S FY17 2 400 REPORT_CODE
7 S FY18 2 530 REPORT_CODE
8 Check FY14 1 25 PAYMENT_METHOD
9 Check FY17 1 50 PAYMENT_METHOD
10 Check FY18 2 55 PAYMENT_METHOD
# ... with 17 more rows
2. Rewrite your report
function by placing lapply
or map
inside so that report
can do NSE
report <- function(...){
report_cat <- quos(...)
map_df(report_cat, function(x) sample_data %>%
group_by(!!x, YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!x) %>%
mutate(REPORT_CATEGORY = quo_name(x)))
}
By placing map_df
inside report
, you can take advantage of quos
, which converts ...
to list of quosures. They are then fed into map_df
and unquoted one by one using !!
.
通过在报告中放置map_df,您可以利用quos,它将...转换为quosures列表。然后使用!!将它们一个接一个地送入map_df并且不加引号。
report(REPORT_CODE, PAYMENT_METHOD, INBOUND_CHANNEL, AMOUNT_CAT)
Another advantage of writing it like this is that you can also supply a vector of string symbols and splice them using !!!
like the following:
像这样编写它的另一个好处是你还可以提供一个字符串符号向量并使用它们拼接它们!如下:
report(!!!syms(cat.list))
Result:
结果:
# A tibble: 27 x 5
# Groups: REPORT_VALUE [16]
REPORT_VALUE YEAR num total REPORT_CATEGORY
<chr> <chr> <int> <int> <chr>
1 J FY14 1 25 REPORT_CODE
2 Q FY16 1 1 REPORT_CODE
3 Q FY17 1 100 REPORT_CODE
4 R FY17 1 50 REPORT_CODE
5 R FY18 2 75 REPORT_CODE
6 S FY17 2 400 REPORT_CODE
7 S FY18 2 530 REPORT_CODE
8 Check FY14 1 25 PAYMENT_METHOD
9 Check FY17 1 50 PAYMENT_METHOD
10 Check FY18 2 55 PAYMENT_METHOD
# ... with 17 more rows
#1
3
as.name
will convert a string to a name and that can be passed to report
:
as.name会将字符串转换为名称,并且可以将其传递给report:
lapply(cat.list, function(x) do.call("report", list(as.name(x))))
character argument An alternative is to rewrite report
so that it accepts a character string argument:
character argument另一种方法是重写报告,以便它接受一个字符串参数:
report_ch <- function(colname) {
report_cat <- rlang::sym(colname) # as.name(colname) would also work here
sample_data %>%
group_by(!!report_cat, YEAR) %>%
summarize(num = n(), total = sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!report_cat) %>%
mutate(REPORT_CATEGORY = colname)
}
lapply(cat.list, report_ch)
wrapr An alternate approach is to rewrite report
using the wrapr package which is an alternative to rlang/tidyeval:
wrapr另一种方法是使用wrapr包重写报告,它是rlang / tidyeval的替代方法:
library(dplyr)
library(wrapr)
report_wrapr <- function(colname)
let(c(COLNAME = colname),
sample_data %>%
group_by(COLNAME, YEAR) %>%
summarize(num = n(), total = sum(AMOUNT)) %>%
rename(REPORT_VALUE = COLNAME) %>%
mutate(REPORT_CATEGORY = colname)
)
lapply(cat.list, report_wrapr)
Of course, this whole problem would go away if you used a different framework, e.g.
当然,如果您使用不同的框架,例如,整个问题就会消失。
plyr
plyr
library(plyr)
report_plyr <- function(colname)
ddply(sample_data, c(REPORT_VALUE = colname, "YEAR"), function(x)
data.frame(num = nrow(x), total = sum(x$AMOUNT), REPORT_CATEOGRY = colname))
lapply(cat.list, report_plyr)
sqldf
sqldf
library(sqldf)
report_sql <- function(colname, envir = parent.frame(), ...)
fn$sqldf("select [$colname] REPORT_VALUE,
YEAR,
count(*) num,
sum(AMOUNT) total,
'$colname' REPORT_CATEGORY
from sample_data
group by [$colname], YEAR", envir = envir, ...)
lapply(cat.list, report_sql)
base - by
基地 -
report_base_by <- function(colname)
do.call("rbind",
by(sample_data, sample_data[c(colname, "YEAR")], function(x)
data.frame(REPORT_VALUE = x[1, colname],
YEAR = x$YEAR[1],
num = nrow(x),
total = sum(x$AMOUNT),
REPORT_CATEGORY = colname)
)
)
lapply(cat.list, report_base_by)
data.table The data.table package provides another alternative but that has already been covered by another answer.
data.table data.table包提供了另一种选择,但已经被另一个答案所涵盖。
Update: Added additional alternatives.
更新:添加了其他选择。
#2
2
I'm not really a dplyr afficionado, but for what its worth here is how you could achieve this using library(data.table)
instead:
我不是一个真正意义上的问题,但是它的价值在于你如何使用库(data.table)来实现这一点:
setDT(sample_data)
gen_report <- function(report_cat){
sample_data[ , .(num = .N, total = sum(AMOUNT), REPORT_CATEGORY = report_cat),
by = .(REPORT_VALUE = get(report_cat), YEAR)]
}
gen_report('REPORT_CODE')
lapply(cat.list, gen_report)
#3
2
Let me first point out that in your initial report
function, you can use quo_name
to convert the quosure into a string, which you can then use in mutate
like the following:
让我首先指出,在您的初始报告函数中,您可以使用quo_name将quosure转换为字符串,然后您可以在mutate中使用它,如下所示:
library(dplyr)
library(rlang)
report <- function(report_cat){
report_cat <- enquo(report_cat)
sample_data %>%
group_by(!!report_cat, YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!report_cat) %>%
mutate(REPORT_CATEGORY = quo_name(report_cat))
}
report(REPORT_CODE)
Now, to address your question of "how to feed a list of unquoted strings through lapply
or map
to make it work inside dplyr
functions", I propose two ways of doing it.
现在,为了解决“如何通过lapply或map提供未加引号的字符串列表以使其在dplyr函数内工作”的问题,我提出了两种方法。
1. Use rlang::sym
to parse your strings and unquote it when feeding into lapply
or map
library(purrr)
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
map_df(cat.list, ~report(!!sym(.)))
or with syms
you can parse all elements of a vector at once:
或者使用syms,您可以一次解析向量的所有元素:
map_df(syms(cat.list), ~report(!!.))
Result:
结果:
# A tibble: 27 x 5
# Groups: REPORT_VALUE [16]
REPORT_VALUE YEAR num total REPORT_CATEGORY
<chr> <chr> <int> <int> <chr>
1 J FY14 1 25 REPORT_CODE
2 Q FY16 1 1 REPORT_CODE
3 Q FY17 1 100 REPORT_CODE
4 R FY17 1 50 REPORT_CODE
5 R FY18 2 75 REPORT_CODE
6 S FY17 2 400 REPORT_CODE
7 S FY18 2 530 REPORT_CODE
8 Check FY14 1 25 PAYMENT_METHOD
9 Check FY17 1 50 PAYMENT_METHOD
10 Check FY18 2 55 PAYMENT_METHOD
# ... with 17 more rows
2. Rewrite your report
function by placing lapply
or map
inside so that report
can do NSE
report <- function(...){
report_cat <- quos(...)
map_df(report_cat, function(x) sample_data %>%
group_by(!!x, YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!x) %>%
mutate(REPORT_CATEGORY = quo_name(x)))
}
By placing map_df
inside report
, you can take advantage of quos
, which converts ...
to list of quosures. They are then fed into map_df
and unquoted one by one using !!
.
通过在报告中放置map_df,您可以利用quos,它将...转换为quosures列表。然后使用!!将它们一个接一个地送入map_df并且不加引号。
report(REPORT_CODE, PAYMENT_METHOD, INBOUND_CHANNEL, AMOUNT_CAT)
Another advantage of writing it like this is that you can also supply a vector of string symbols and splice them using !!!
like the following:
像这样编写它的另一个好处是你还可以提供一个字符串符号向量并使用它们拼接它们!如下:
report(!!!syms(cat.list))
Result:
结果:
# A tibble: 27 x 5
# Groups: REPORT_VALUE [16]
REPORT_VALUE YEAR num total REPORT_CATEGORY
<chr> <chr> <int> <int> <chr>
1 J FY14 1 25 REPORT_CODE
2 Q FY16 1 1 REPORT_CODE
3 Q FY17 1 100 REPORT_CODE
4 R FY17 1 50 REPORT_CODE
5 R FY18 2 75 REPORT_CODE
6 S FY17 2 400 REPORT_CODE
7 S FY18 2 530 REPORT_CODE
8 Check FY14 1 25 PAYMENT_METHOD
9 Check FY17 1 50 PAYMENT_METHOD
10 Check FY18 2 55 PAYMENT_METHOD
# ... with 17 more rows