dplyr中的基于字符串的过滤 - NSE

时间:2021-08-13 22:20:41

I'd like to use dplyr's new NSE notations (version >= 0.6) for a dynamic filter on my data. Let's say I have the following dummy dataset:

我想使用dplyr的新NSE表示法(版本> = 0.6)来对我的数据进行动态过滤。假设我有以下虚拟数据集:

df = data_frame(x = 1:10, y = 10:1, z = 10 * runif(10))

If now I want to filter column tofilter = "x" for values greater than 5 I know I can do:

如果现在我想要对大于5的值过滤列tofilter =“x”,我知道我可以这样做:

df %>% 
  filter((!!rlang::sym(tofilter)) >= 5)

Question 1

What if I want to dynamically change the operator of the filtering too (let's say I have a Shiny App in which the user can dynamically selectInput if to filter the data for values greater than 5, equal to 5 or lower than 5?

如果我想动态更改过滤器的操作员怎么办(假设我有一个Shiny App,用户可以动态选择输入,如果过滤数据大于5,等于5或小于5?

What I'd like to do is something on the line of:

我想做的是:

op = ">="
val = 5
filt_expr = paste("x", op, val)
df %>% 
  filter(filt_expr)

Obviously, this does not work and I have played a bit with the rlang quosore/symbols, etc but didn't quite find the right way to "quote" my inputs.

显然,这不起作用,我玩了一些rlang quosore /符号等,但没有找到正确的方法来“引用”我的输入。

Question 2

Bonus question is, what if I want to apply multiple filters? Do I need to loop or I can create a list of filtering expressions and apply them all in one go?

奖金问题是,如果我想应用多个过滤器怎么办?我是否需要循环或者我可以创建一个过滤表达式列表并一次性应用它们?

An example of this is a Shiny App where the user can type multiple conditions he/she wants to apply to the data so that we have a dynamically changing list of the format:

一个例子是Shiny App,用户可以输入他/她想要应用于数据的多个条件,以便我们有一个动态变化的格式列表:

filt_expr_list = list("x >= 5", "y <= 10", "z >= 2")

and we want to dynamically apply them all, so that the output is equivalent to:

我们想要动态地应用它们,以便输出相当于:

df %>%
  filter(x >= 5, y <= 10, z >= 2)

I guess this is in a certain sense a subset of question 1 since when I know how to correctly quote the arguments I think I could do something like:

我想这在某种意义上是问题1的一个子集,因为当我知道如何正确引用参数时我认为我可以做类似的事情:

filt_expr = paste0(unlist(filt_expr_list), collapse = ", ")
df %>%
  filter(filt_expr)

but would be nice to see if there is any nicer cleaner way

但很高兴看到是否有更好的清洁方式

1 个解决方案

#1


4  

What if I want to dynamically change the operator of the filtering too

如果我想动态更改过滤运算符怎么办?

You can do it with tidy eval by unquoting a symbol representing the operator (note that I use expr() to illustrate the result of the unquoting):

您可以通过取消表示运算符的符号来指定整齐的eval(注意我使用expr()来说明取消引用的结果):

lhs <- "foo"

# Storing the symbol `<` in `op`
op <- quote(`<`)

expr(`!!`(op)(!!sym(lhs), 5))
#> foo < 5

However it is cleaner to do it outside tidy eval with regular R code. Unquoting is only necessary when the symbol you unquote represents a column from the data frame, i.e. something that's not in the context. Here you can just store the operator in a variable and then call that variable in your filtering expression:

然而,使用常规R代码在整洁的eval之外做它更干净。只有当您取消引用的符号表示数据框中的列时,才需要取消引用,即不在上下文中的内容。在这里,您可以将运算符存储在变量中,然后在过滤表达式中调用该变量:

# Storing the function `<` in `op`
op <- `<`

expr(op(!!sym(lhs), 5))
#> op(foo, 5)

what if I want to apply multiple filters?

如果我想应用多个过滤器怎么办?

You save the expressions in a list and then you splice them in a call with !!!:

您将表达式保存在列表中,然后使用!!!将它们拼接在一个调用中:

filters <- list(
  quote(x >= 5),
  quote(y <= 10),
  quote(z >= 2)
)

expr(df %>% filter(!!!filters))
#> df %>% filter(x >= 5, y <= 10, z >= 2)`

Note: I said above that it is not necessary to unquote variable from the context, but it is still often a good idea to do so if you're writing a function that has the data frame as input. Since the data frame is variable, you don't know in advance what columns it contains. The columns will always have precedence over the objects you have defined in the environment. In the case here, this is not an issue because we are talking about a function and R will keep looking for a function if it finds a similarly named object in the data frame.

注意:我在上面说过,没有必要从上下文中取消引用变量,但如果你正在编写一个以数据框作为输入的函数,那么这通常是一个好主意。由于数据框是可变的,因此您事先不知道它包含哪些列。列始终优先于您在环境中定义的对象。在这种情况下,这不是问题,因为我们正在谈论一个函数,如果R在数据框中找到一个类似命名的对象,它将继续寻找函数。

#1


4  

What if I want to dynamically change the operator of the filtering too

如果我想动态更改过滤运算符怎么办?

You can do it with tidy eval by unquoting a symbol representing the operator (note that I use expr() to illustrate the result of the unquoting):

您可以通过取消表示运算符的符号来指定整齐的eval(注意我使用expr()来说明取消引用的结果):

lhs <- "foo"

# Storing the symbol `<` in `op`
op <- quote(`<`)

expr(`!!`(op)(!!sym(lhs), 5))
#> foo < 5

However it is cleaner to do it outside tidy eval with regular R code. Unquoting is only necessary when the symbol you unquote represents a column from the data frame, i.e. something that's not in the context. Here you can just store the operator in a variable and then call that variable in your filtering expression:

然而,使用常规R代码在整洁的eval之外做它更干净。只有当您取消引用的符号表示数据框中的列时,才需要取消引用,即不在上下文中的内容。在这里,您可以将运算符存储在变量中,然后在过滤表达式中调用该变量:

# Storing the function `<` in `op`
op <- `<`

expr(op(!!sym(lhs), 5))
#> op(foo, 5)

what if I want to apply multiple filters?

如果我想应用多个过滤器怎么办?

You save the expressions in a list and then you splice them in a call with !!!:

您将表达式保存在列表中,然后使用!!!将它们拼接在一个调用中:

filters <- list(
  quote(x >= 5),
  quote(y <= 10),
  quote(z >= 2)
)

expr(df %>% filter(!!!filters))
#> df %>% filter(x >= 5, y <= 10, z >= 2)`

Note: I said above that it is not necessary to unquote variable from the context, but it is still often a good idea to do so if you're writing a function that has the data frame as input. Since the data frame is variable, you don't know in advance what columns it contains. The columns will always have precedence over the objects you have defined in the environment. In the case here, this is not an issue because we are talking about a function and R will keep looking for a function if it finds a similarly named object in the data frame.

注意:我在上面说过,没有必要从上下文中取消引用变量,但如果你正在编写一个以数据框作为输入的函数,那么这通常是一个好主意。由于数据框是可变的,因此您事先不知道它包含哪些列。列始终优先于您在环境中定义的对象。在这种情况下,这不是问题,因为我们正在谈论一个函数,如果R在数据框中找到一个类似命名的对象,它将继续寻找函数。