I have 2 data frames, for which I want to harmonize the column names. Some variables are different in respect to upper/lower case, some variables have already the same name, and some variables are unique. I want to keep the names of my first data frame, i.e. the variable names of the second data frame should be converted to the upper/lower cases of the first data frame. For that reason, the typical toupper
or tolower
functions do not work.
我有2个数据框,我想要协调列名。有些变量在大写/小写方面有所不同,有些变量已经有了相同的名称,有些变量是唯一的。我想保留我的第一个数据帧的名称,即第二个数据帧的变量名称应该转换为第一个数据帧的大/小的情况。出于这个原因,典型的toupper或tolower功能不起作用。
Consider the following reproducible example:
考虑以下可重现的示例:
# Data frame A
df_a <- data.frame(Col1 = rnorm(5),
cOL2 = rnorm(5),
col3 = rnorm(5),
COL4 = rnorm(5),
unique_a = rnorm(5))
# Data frame B
df_b <- data.frame(COL1 = rnorm(5), # Should be converted to Col1
COL2 = rnorm(5), # Should be converted to cOL2
col3 = rnorm(5), # Should be kept as it is
COL4 = rnorm(5), # Should be kept as it is
unique_b = rnorm(5)) # Should be kept as it is
# Vectors of column names
vec_a <- colnames(df_a)
vec_b <- colnames(df_b)
# If there is a match, vec_b should be converted to vec_a
# The final result shoul look as follows:
# vec_b
# [1] "Col1" "cOL2" "col3" "COL4" "unique_b"
Question: How could I convert the matching column names of data frame B to the column names of data frame A?
问题:如何将数据框B的匹配列名转换为数据框A的列名?
2 个解决方案
#1
1
You could use plyr::mapvalues
:
你可以使用plyr :: mapvalues:
plyr::mapvalues(x = tolower(names(df_b)),
from = tolower(names(df_a)),
to = names(df_a),
warn_missing = FALSE)
#2
1
One option is to use match
on the names that are converted to a single case and then do the assignment
一种选择是对转换为单个案例的名称使用匹配,然后执行分配
i1 <- match(toupper(vec_a), toupper(vec_b), nomatch = 0)
i2 <- match(toupper(vec_b), toupper(vec_a), nomatch = 0)
names(df_b)[i2] <- names(df_a)[i1]
names(df_b)
#[1] "Col1" "cOL2" "col3" "COL4" "unique_b"
#1
1
You could use plyr::mapvalues
:
你可以使用plyr :: mapvalues:
plyr::mapvalues(x = tolower(names(df_b)),
from = tolower(names(df_a)),
to = names(df_a),
warn_missing = FALSE)
#2
1
One option is to use match
on the names that are converted to a single case and then do the assignment
一种选择是对转换为单个案例的名称使用匹配,然后执行分配
i1 <- match(toupper(vec_a), toupper(vec_b), nomatch = 0)
i2 <- match(toupper(vec_b), toupper(vec_a), nomatch = 0)
names(df_b)[i2] <- names(df_a)[i1]
names(df_b)
#[1] "Col1" "cOL2" "col3" "COL4" "unique_b"