I want to use dplyr's mutate()
to create multiple new columns in a data frame. The column names and their contents should be dynamically generated.
我想使用dplyr的mutate()在数据帧中创建多个新列。应该动态生成列名及其内容。
Example data from iris:
从虹膜示例数据:
require(dplyr)
data(iris)
iris <- tbl_df(iris)
I've created a function to mutate my new columns from the Petal.Width
variable:
我创建了一个函数来改变花瓣上的新列。宽度变量:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df <- mutate(df, varname = Petal.Width * n) ## problem arises here
df
}
Now I create a loop to build my columns:
现在我创建一个循环来构建我的列:
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
However, since mutate thinks varname is a literal variable name, the loop only creates one new variable (called varname) instead of four (called petal.2 - petal.5).
然而,由于mutate认为varname是一个文字变量名,循环只创建一个新变量(称为varname),而不是四个(称为petal)。2 - petal.5)。
How can I get mutate()
to use my dynamic name as variable name?
如何让mutate()将动态名称用作变量名?
7 个解决方案
#1
90
Since you are dramatically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:
由于您正在戏剧性地将变量名构建为字符值,因此使用标准的data.frame进行赋值更有意义,它允许为列名创建字符值。例如:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}
The mutate
function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.
mutate函数使通过命名参数命名新列变得非常容易。但这假定您在输入命令时知道名称。如果您想动态地指定列名,那么您还需要构建命名参数。
The latest version of dplyr (0.7) does this using by using :=
to dynamically assign parameter names. You can write your function as:
最新版本的dplyr(0.7)使用:=动态分配参数名。你可以把你的函数写成:
# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}
For more information, see the documentation available form vignette("programming", "dplyr")
.
有关更多信息,请参阅可用的文档。
Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")
).
稍早版本的dplyr(>=0.3 <0.7)鼓励使用“标准评估”替代许多函数。有关更多信息,请参见非标准的评估小品(小品)。
So here, the answer is to use mutate_()
rather than mutate()
and do:
因此,这里的答案是使用mutate_()而不是mutate(),并这样做:
# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}
Older versions of dplyr
老版本的dplyr
Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote
and setName
:
注意,这在问题最初提出时存在的旧版本的dplyr中也是可能的。它需要谨慎使用引号和setName:
# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}
#2
28
In the new release of dplyr
(0.6.0
awaiting in April 2017), we can also do an assignment (:=
) and pass variables as column names by unquoting (!!
) to not evaluate it
在dplyr的新版本(0.6.0等待2017年4月)中,我们还可以做一个赋值(:=),通过un引号(
library(dplyr)
multipetalN <- function(df, n){
varname <- paste0("petal.", n)
df %>%
mutate(!!varname := Petal.Width * n)
}
data(iris)
iris1 <- tbl_df(iris)
iris2 <- tbl_df(iris)
for(i in 2:5) {
iris2 <- multipetalN(df=iris2, n=i)
}
Checking the output based on @MrFlick's multipetal
applied on 'iris1'
基于@MrFlick应用于“iris1”的多瓣图检查输出
identical(iris1, iris2)
#[1] TRUE
#3
9
Here's another version, and it's arguably a bit simpler.
这是另一个版本,它可能更简单一点。
multipetal <- function(df, n) {
varname <- paste("petal", n, sep=".")
df<-mutate_(df, .dots=setNames(paste0("Petal.Width*",n), varname))
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.2 petal.3 petal.4 petal.5
1 5.1 3.5 1.4 0.2 setosa 0.4 0.6 0.8 1
2 4.9 3.0 1.4 0.2 setosa 0.4 0.6 0.8 1
3 4.7 3.2 1.3 0.2 setosa 0.4 0.6 0.8 1
4 4.6 3.1 1.5 0.2 setosa 0.4 0.6 0.8 1
5 5.0 3.6 1.4 0.2 setosa 0.4 0.6 0.8 1
6 5.4 3.9 1.7 0.4 setosa 0.8 1.2 1.6 2
#4
4
I am also adding an answer that augments this a little bit because I came to this entry when searching for an answer, and this had almost what I needed, but I needed a bit more, which I got via @MrFlik 's answer and the R lazyeval vignettes.
我还添加了一个答案来补充这一点,因为我在搜索答案时遇到了这个条目,它几乎满足了我的需要,但我还需要更多一点,这是我通过@MrFlik的答案和R lazyeval的插图得到的。
I wanted to make a function that could take a dataframe and a vector of column names (as strings) that I want to be converted from a string to a Date object. I couldn't figure out how to make as.Date()
take an argument that is a string and convert it to a column, so I did it as shown below.
我想要创建一个函数,该函数可以接受一个dataframe和一个列名称向量(作为字符串),我希望将它们从一个字符串转换为一个日期对象。我不知道如何使as. date()获取一个参数,它是一个字符串,并将它转换为一个列,所以我做了如下所示。
Below is how I did this via SE mutate (mutate_()
) and the .dots
argument. Criticisms that make this better are welcome.
下面是我如何通过SE mutate (mutate_())和.dots参数来实现这一点的。让这一切变得更好的批评是受欢迎的。
library(dplyr)
dat <- data.frame(a="leave alone",
dt="2015-08-03 00:00:00",
dt2="2015-01-20 00:00:00")
# This function takes a dataframe and list of column names
# that have strings that need to be
# converted to dates in the data frame
convertSelectDates <- function(df, dtnames=character(0)) {
for (col in dtnames) {
varval <- sprintf("as.Date(%s)", col)
df <- df %>% mutate_(.dots= setNames(list(varval), col))
}
return(df)
}
dat <- convertSelectDates(dat, c("dt", "dt2"))
dat %>% str
#5
2
The pattern UQ(rlang::sym("some string here")))
is really useful for working with strings and dplyr verbs.
模式UQ(rlang::sym(“这里有一些字符串”))对于处理字符串和dplyr谓词非常有用。
Here's an example with mutate:
这里有一个突变的例子:
# add two values together
mutate_values <- function(new_name, name1, name2){
mtcars %>%
mutate(UQ(rlang::sym(new_name)) := UQ(rlang::sym(name1)) + UQ(rlang::sym(name2)))
}
mutate_values('test', 'mpg', 'cyl')
It works with other dplyr functions as well.
它还可以与其他dplyr函数一起工作。
## select a column
select_name <- function(name){
mtcars %>%
select(!!name)
}
select_name('mpg')
## filter a column by a value
filter_values <- function(name, value){
mtcars %>%
filter(UQ(rlang::sym(name)) != value)
}
filter_values('gear', 4)
## transform a variable and then sort by it
arrange_values <- function(name, transform){
mtcars %>%
arrange(UQ(rlang::sym(name)) %>% UQ(rlang::sym(transform)))
}
arrange_values('mpg', 'sin')
#6
1
While I enjoy using dplyr for interactive use, I find it extraordinarily tricky to do this using dplyr because you have to go through hoops to use lazyeval::interp(), setNames, etc. workarounds.
虽然我喜欢使用dplyr进行交互式使用,但我发现使用dplyr进行这种操作非常棘手,因为使用lazyeval:::interp()、setNames等解决方案需要经历很多困难。
Here is a simpler version using base R, in which it seems more intuitive, to me at least, to put the loop inside the function, and which extends @MrFlicks's solution.
这里有一个使用base R的更简单的版本,至少对我来说更直观,将循环放在函数中,并扩展了@MrFlicks的解决方案。
multipetal <- function(df, n) {
for (i in 1:n){
varname <- paste("petal", i , sep=".")
df[[varname]] <- with(df, Petal.Width * i)
}
df
}
multipetal(iris, 3)
#7
0
You may enjoy package friendlyeval
which presents a simplified tidy eval API and documentation for newer/casual dplyr
users.
您可能会喜欢friendlyeval的包,它为更新的/临时的dplyr用户提供了简化的整洁的eval API和文档。
You are creating strings that you wish mutate
to treat as column names. So using friendlyeval
you could write:
您正在创建希望修改为列名的字符串。使用friendlyeval,你可以这样写:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df <- mutate(df, !!treat_string_as_col(varname) := Petal.Width * n)
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
Which under the hood calls rlang
functions that check varname
is legal as column name.
在引擎盖下调用检查varname的rlang函数作为列名是合法的。
friendlyeval
code can be converted to equivalent plain tidy eval code at any time with an RStudio addin.
friendlyeval的代码可以在任何时候用RStudio addin转换为等效的plain tidy eval代码。
#1
90
Since you are dramatically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:
由于您正在戏剧性地将变量名构建为字符值,因此使用标准的data.frame进行赋值更有意义,它允许为列名创建字符值。例如:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}
The mutate
function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.
mutate函数使通过命名参数命名新列变得非常容易。但这假定您在输入命令时知道名称。如果您想动态地指定列名,那么您还需要构建命名参数。
The latest version of dplyr (0.7) does this using by using :=
to dynamically assign parameter names. You can write your function as:
最新版本的dplyr(0.7)使用:=动态分配参数名。你可以把你的函数写成:
# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}
For more information, see the documentation available form vignette("programming", "dplyr")
.
有关更多信息,请参阅可用的文档。
Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")
).
稍早版本的dplyr(>=0.3 <0.7)鼓励使用“标准评估”替代许多函数。有关更多信息,请参见非标准的评估小品(小品)。
So here, the answer is to use mutate_()
rather than mutate()
and do:
因此,这里的答案是使用mutate_()而不是mutate(),并这样做:
# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}
Older versions of dplyr
老版本的dplyr
Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote
and setName
:
注意,这在问题最初提出时存在的旧版本的dplyr中也是可能的。它需要谨慎使用引号和setName:
# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}
#2
28
In the new release of dplyr
(0.6.0
awaiting in April 2017), we can also do an assignment (:=
) and pass variables as column names by unquoting (!!
) to not evaluate it
在dplyr的新版本(0.6.0等待2017年4月)中,我们还可以做一个赋值(:=),通过un引号(
library(dplyr)
multipetalN <- function(df, n){
varname <- paste0("petal.", n)
df %>%
mutate(!!varname := Petal.Width * n)
}
data(iris)
iris1 <- tbl_df(iris)
iris2 <- tbl_df(iris)
for(i in 2:5) {
iris2 <- multipetalN(df=iris2, n=i)
}
Checking the output based on @MrFlick's multipetal
applied on 'iris1'
基于@MrFlick应用于“iris1”的多瓣图检查输出
identical(iris1, iris2)
#[1] TRUE
#3
9
Here's another version, and it's arguably a bit simpler.
这是另一个版本,它可能更简单一点。
multipetal <- function(df, n) {
varname <- paste("petal", n, sep=".")
df<-mutate_(df, .dots=setNames(paste0("Petal.Width*",n), varname))
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.2 petal.3 petal.4 petal.5
1 5.1 3.5 1.4 0.2 setosa 0.4 0.6 0.8 1
2 4.9 3.0 1.4 0.2 setosa 0.4 0.6 0.8 1
3 4.7 3.2 1.3 0.2 setosa 0.4 0.6 0.8 1
4 4.6 3.1 1.5 0.2 setosa 0.4 0.6 0.8 1
5 5.0 3.6 1.4 0.2 setosa 0.4 0.6 0.8 1
6 5.4 3.9 1.7 0.4 setosa 0.8 1.2 1.6 2
#4
4
I am also adding an answer that augments this a little bit because I came to this entry when searching for an answer, and this had almost what I needed, but I needed a bit more, which I got via @MrFlik 's answer and the R lazyeval vignettes.
我还添加了一个答案来补充这一点,因为我在搜索答案时遇到了这个条目,它几乎满足了我的需要,但我还需要更多一点,这是我通过@MrFlik的答案和R lazyeval的插图得到的。
I wanted to make a function that could take a dataframe and a vector of column names (as strings) that I want to be converted from a string to a Date object. I couldn't figure out how to make as.Date()
take an argument that is a string and convert it to a column, so I did it as shown below.
我想要创建一个函数,该函数可以接受一个dataframe和一个列名称向量(作为字符串),我希望将它们从一个字符串转换为一个日期对象。我不知道如何使as. date()获取一个参数,它是一个字符串,并将它转换为一个列,所以我做了如下所示。
Below is how I did this via SE mutate (mutate_()
) and the .dots
argument. Criticisms that make this better are welcome.
下面是我如何通过SE mutate (mutate_())和.dots参数来实现这一点的。让这一切变得更好的批评是受欢迎的。
library(dplyr)
dat <- data.frame(a="leave alone",
dt="2015-08-03 00:00:00",
dt2="2015-01-20 00:00:00")
# This function takes a dataframe and list of column names
# that have strings that need to be
# converted to dates in the data frame
convertSelectDates <- function(df, dtnames=character(0)) {
for (col in dtnames) {
varval <- sprintf("as.Date(%s)", col)
df <- df %>% mutate_(.dots= setNames(list(varval), col))
}
return(df)
}
dat <- convertSelectDates(dat, c("dt", "dt2"))
dat %>% str
#5
2
The pattern UQ(rlang::sym("some string here")))
is really useful for working with strings and dplyr verbs.
模式UQ(rlang::sym(“这里有一些字符串”))对于处理字符串和dplyr谓词非常有用。
Here's an example with mutate:
这里有一个突变的例子:
# add two values together
mutate_values <- function(new_name, name1, name2){
mtcars %>%
mutate(UQ(rlang::sym(new_name)) := UQ(rlang::sym(name1)) + UQ(rlang::sym(name2)))
}
mutate_values('test', 'mpg', 'cyl')
It works with other dplyr functions as well.
它还可以与其他dplyr函数一起工作。
## select a column
select_name <- function(name){
mtcars %>%
select(!!name)
}
select_name('mpg')
## filter a column by a value
filter_values <- function(name, value){
mtcars %>%
filter(UQ(rlang::sym(name)) != value)
}
filter_values('gear', 4)
## transform a variable and then sort by it
arrange_values <- function(name, transform){
mtcars %>%
arrange(UQ(rlang::sym(name)) %>% UQ(rlang::sym(transform)))
}
arrange_values('mpg', 'sin')
#6
1
While I enjoy using dplyr for interactive use, I find it extraordinarily tricky to do this using dplyr because you have to go through hoops to use lazyeval::interp(), setNames, etc. workarounds.
虽然我喜欢使用dplyr进行交互式使用,但我发现使用dplyr进行这种操作非常棘手,因为使用lazyeval:::interp()、setNames等解决方案需要经历很多困难。
Here is a simpler version using base R, in which it seems more intuitive, to me at least, to put the loop inside the function, and which extends @MrFlicks's solution.
这里有一个使用base R的更简单的版本,至少对我来说更直观,将循环放在函数中,并扩展了@MrFlicks的解决方案。
multipetal <- function(df, n) {
for (i in 1:n){
varname <- paste("petal", i , sep=".")
df[[varname]] <- with(df, Petal.Width * i)
}
df
}
multipetal(iris, 3)
#7
0
You may enjoy package friendlyeval
which presents a simplified tidy eval API and documentation for newer/casual dplyr
users.
您可能会喜欢friendlyeval的包,它为更新的/临时的dplyr用户提供了简化的整洁的eval API和文档。
You are creating strings that you wish mutate
to treat as column names. So using friendlyeval
you could write:
您正在创建希望修改为列名的字符串。使用friendlyeval,你可以这样写:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df <- mutate(df, !!treat_string_as_col(varname) := Petal.Width * n)
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
Which under the hood calls rlang
functions that check varname
is legal as column name.
在引擎盖下调用检查varname的rlang函数作为列名是合法的。
friendlyeval
code can be converted to equivalent plain tidy eval code at any time with an RStudio addin.
friendlyeval的代码可以在任何时候用RStudio addin转换为等效的plain tidy eval代码。