How can I simplify or perform the following operations using dplyr
:
如何使用dplyr简化或执行以下操作:
-
Run a function on all
data.frame
names, likemutate_each(funs())
for values, e.g.在所有data.frame名称上运行一个函数,例如mutate_each(funs())的值,例如:
names(iris) <- make.names(names(iris))
-
Delete columns that do NOT exist (i.e. delete nothing), e.g.
删除不存在的列(即不删除任何内容),例如
iris %>% select(-matches("Width")) # ok iris %>% select(-matches("X")) # returns empty data.frame, why?
-
Add a new column by name (string), e.g.
按名称(字符串)添加新列,例如
iris %>% mutate_("newcol" = 0) # ok x <- "newcol" iris %>% mutate_(x = 0) # adds a column with name "x" instead of "newcol"
-
Rename a data.frame colname that does not exist
重命名不存在的data.frame colname
names(iris)[names(iris)=="X"] <- "Y" iris %>% rename(sl=Sepal.Length) # ok iris %>% rename(Y=X) # error, instead of no change
2 个解决方案
#1
10
- I would use setNames for this:
- 我会使用setNames:
iris %>% setNames(make.names(names(.)))
- Include everything() as an argument for select:
- 包括everything()作为select的参数:
iris %>% select(-matches("Width"), everything())
iris %>% select(-matches("X"), everything())
- To my understanding there's no other shortcut than explicitly naming the string like you already do:
- 根据我的理解,除了显式命名字符串之外别无其他快捷方式:
iris %>% mutate_("newcol" = 0)
#2
1
1 through 3 are answered above. I came here because I had the same problem as number 4. Here is my solution:
上面回答了1到3。我来到这里是因为我遇到了与4号相同的问题。这是我的解决方案:
df <- iris
Set a name key with the columns to be renamed and the new values:
使用要重命名的列和新值设置名称键:
name_key <- c(
sl = "Sepal.Length",
sw = "Sepal.Width",
Y = "X"
)
Set values not in data frame to NA. This works for my purpose better. You could probably just remove it from name_key
.
将不在数据框中的值设置为NA。这更符合我的目的。你可能只是从name_key中删除它。
for (var in names(name_key)) {
if (!(name_key[[var]] %in% names(df))) {
name_key[var] <- NA
}
}
Get a vector of column names in the data frame.
获取数据框中的列名称向量。
cols <- names(name_key[!is.na(name_key)])
Rename columns
重命名列
for (nm in names(name_key)) {
names(df)[names(df) == name_key[[nm]]] <- nm
}
Select columns
选择列
df2 <- df %>%
select(cols)
I'm almost positive this can be done more elegantly, but this is what I have so far. Hope this helps, if you haven't solved it already!
我几乎肯定这可以做得更优雅,但这是我到目前为止所做的。希望这有帮助,如果你还没有解决它!
#1
10
- I would use setNames for this:
- 我会使用setNames:
iris %>% setNames(make.names(names(.)))
- Include everything() as an argument for select:
- 包括everything()作为select的参数:
iris %>% select(-matches("Width"), everything())
iris %>% select(-matches("X"), everything())
- To my understanding there's no other shortcut than explicitly naming the string like you already do:
- 根据我的理解,除了显式命名字符串之外别无其他快捷方式:
iris %>% mutate_("newcol" = 0)
#2
1
1 through 3 are answered above. I came here because I had the same problem as number 4. Here is my solution:
上面回答了1到3。我来到这里是因为我遇到了与4号相同的问题。这是我的解决方案:
df <- iris
Set a name key with the columns to be renamed and the new values:
使用要重命名的列和新值设置名称键:
name_key <- c(
sl = "Sepal.Length",
sw = "Sepal.Width",
Y = "X"
)
Set values not in data frame to NA. This works for my purpose better. You could probably just remove it from name_key
.
将不在数据框中的值设置为NA。这更符合我的目的。你可能只是从name_key中删除它。
for (var in names(name_key)) {
if (!(name_key[[var]] %in% names(df))) {
name_key[var] <- NA
}
}
Get a vector of column names in the data frame.
获取数据框中的列名称向量。
cols <- names(name_key[!is.na(name_key)])
Rename columns
重命名列
for (nm in names(name_key)) {
names(df)[names(df) == name_key[[nm]]] <- nm
}
Select columns
选择列
df2 <- df %>%
select(cols)
I'm almost positive this can be done more elegantly, but this is what I have so far. Hope this helps, if you haven't solved it already!
我几乎肯定这可以做得更优雅,但这是我到目前为止所做的。希望这有帮助,如果你还没有解决它!