I am trying to do the following and was wondering if there is an easier way to use dplyr to achieve this (I'm sure there is):
我正在尝试做以下的事情,我想知道是否有一种更简单的方法可以使用dplyr来实现这一点(我肯定有):
I want to compare the columns of a dataframe to a vector of names, and if the df does not contain a column corresponding to one of the names in the name vector, add that column to the df and populate its values with NAs.
我想将dataframe的列与名称向量进行比较,如果df不包含与名称向量中的一个名称对应的列,那么将该列添加到df中,并用NAs填充它的值。
E.g., in the MWE below:
例如,在下面的MWE中:
df <- data.frame(cbind(c(1:6),c(11:16),c(10:15)))
colnames(df) <- c("A","B","C")
names <- c("A","B","C","D","E")
how do I use dplyr to create the two columns D and E (which are in names, but not in df) and populate it with NAs?
如何使用dplyr创建两个列D和E(它们在名称中,但不在df中)并使用NAs填充它们?
1 个解决方案
#1
3
No need in dplyr
, it's just a basic operation in base R. (Btw, try avoiding overriding built in functions such as names
in the future. The reason names
still works is because R looks in the base package NAMESPACE file instead in the global environment, but this is still a bad practice.)
在dplyr中不需要,它只是base r中的一个基本操作。名称仍然有效的原因是R在全局环境中查看基包名称空间文件,但这仍然是一个糟糕的做法。
df[setdiff(names, names(df))] <- NA
df
# A B C D E
# 1 1 11 10 NA NA
# 2 2 12 11 NA NA
# 3 3 13 12 NA NA
# 4 4 14 13 NA NA
# 5 5 15 14 NA NA
# 6 6 16 15 NA NA
#1
3
No need in dplyr
, it's just a basic operation in base R. (Btw, try avoiding overriding built in functions such as names
in the future. The reason names
still works is because R looks in the base package NAMESPACE file instead in the global environment, but this is still a bad practice.)
在dplyr中不需要,它只是base r中的一个基本操作。名称仍然有效的原因是R在全局环境中查看基包名称空间文件,但这仍然是一个糟糕的做法。
df[setdiff(names, names(df))] <- NA
df
# A B C D E
# 1 1 11 10 NA NA
# 2 2 12 11 NA NA
# 3 3 13 12 NA NA
# 4 4 14 13 NA NA
# 5 5 15 14 NA NA
# 6 6 16 15 NA NA