R：在函数内使用dplyr。 eval中的异常（expr，envir，enclos）：未知列

I have created a function in R based on the kind help of @Jim M.

我根据@Jim M.的帮助在R中创建了一个函数。

When i run the function i get the error: Error: unknown column 'rawdata' When looking at the debugger i get the message: Rcpp::exception in eval(expr, envir, enclos): unknown column 'rawdata'

当我运行该函数时,我得到错误:错误:未知列'rawdata'当查看调试器时,我收到消息:Eval中的Rcpp :: exception(expr,envir,enclos):未知列'rawdata'

However when i look at the environment window i can see 2 variables which I have passed to the function and they contain information rawdata with 7 level factors and refdata with 28 levels

然而,当我查看环境窗口时,我可以看到我传递给函数的2个变量,它们包含具有7个级别因子的信息rawdata和具有28个级别的refdata

function (refdata, rawdata)
{
  wordlist <- expand.grid(rawdata = rawdata, refdata = refdata,     stringsAsFactors = FALSE)
  wordlist %>% group_by(rawdata) %>% mutate(match_score =     jarowinkler(rawdata, refdata)) %>%
summarise(match = match_score[which.max(match_score)], matched_to = ref[which.max(match_score)])
}

1 个解决方案

#1

This is the problem with functions using NSE (non-standard evaluation). Functions using NSE are very useful in interactive programming but cause many problems in development i.e. when you try to use those inside other functions. Due to expressions not being evaluated directly, R is not able to find the objects in the environments it looks in. I can suggest you read here and preferably the scoping issues chapter for more info.

这是使用NSE(非标准评估)的功能的问题。使用NSE的函数在交互式编程中非常有用,但在开发过程中会产生许多问题,即当您尝试在其他函数中使用它们时。由于表达式没有被直接评估,R无法在它所查找的环境中找到对象。我建议您阅读此处,最好是范围问题章节以获取更多信息。

First of all you need to know that ALL the standard dplyr functions use NSE. Let's see an approximate example to your problem:

首先,您需要知道所有标准dplyr函数都使用NSE。让我们看一下您问题的近似示例:

Data:

df <- data.frame(col1 = rep(c('a','b'), each=5), col2 = runif(10))


> df
   col1       col2
1     a 0.03366446
2     a 0.46698763
3     a 0.34114682
4     a 0.92125387
5     a 0.94511394
6     b 0.67241460
7     b 0.38168131
8     b 0.91107090
9     b 0.15342089
10    b 0.60751868

Let's see how NSE will make our simple problem crush:

让我们看看NSE将如何解决我们的简单问题:

First of all the simple interactive case works:

首先,简单的交互式案例有效:

df %>% group_by(col1) %>% summarise(count = n())

Source: local data frame [2 x 2]

  col1 count
1    a     5
2    b     5

Let's see what happens if I put it in a function:

让我们看看如果我把它放在一个函数中会发生什么:

lets_group <- function(column) {
  df %>% group_by(column) %>% summarise(count = n())
}

>lets_group(col1)
Error: index out of bounds

Not the same error as yours but it is caused by NSE. Exactly the same line of code worked outside the function.

与您的错误不同,但它是由NSE引起的。完全相同的代码行在函数外部工作。

Fortunately, there is a solution to your problem and that is standard evaluation. Hadley also made versions of all the functions in dplyr that use standard evaluation. They are just the normal functions plus the _ underscore at the end.

幸运的是,您的问题有一个解决方案,那就是标准评估。 Hadley还使用标准评估在dplyr中创建了所有函数的版本。它们只是正常的功能加上最后的_下划线。

Now look at how this will work:

现在看看这将如何工作:

#notice the formula operator (~) at the function at summarise_
lets_group2 <- function(column) {
  df %>% group_by_(column) %>% summarise_(count = ~n())
}

This yields the following result:

这产生以下结果:

#also notice the quotes around col1
> lets_group2('col1')
Source: local data frame [2 x 2]

  col1 count
1    a     5
2    b     5

I cannot test your problem but using SE instead of NSE will give you the results you want. For more info you can also read here

我无法测试您的问题,但使用SE代替NSE将为您提供所需的结果。有关详细信息,您还可以在此处阅读

#1