I need help with a merge(vlookup) problem that I can not solve. I have 2 data frames I would like to merge, in addition they also have different column names. My real datasets have many columns and that why its a hard for me to come up with a solution. I have tried the merge function but I can not figure out how to do it on multiple columns with different names. I would like to explicitly specify the column names using something like:
我需要帮助解决我无法解决的合并(vlookup)问题。我有2个数据框我想合并,此外它们也有不同的列名。我的真实数据集有很多列,这就是为什么我很难找到解决方案。我已经尝试了合并功能,但我无法弄清楚如何在具有不同名称的多个列上执行此操作。我想使用类似的东西显式指定列名:
output <- merge(df1, df.vlookup, by.x=????, by.y=???, ) #just where I am today
Here is a very simplified example
这是一个非常简单的例子
id<-c(2,4,6,8,10,12,14,16,18,20,22,24,26,28)
bike <- c(1,3,2,1,1,1,2,3,2,3,1,1,1,1)
size <- c(1,2,1,2,1,2,1,2,1,2,1,2,1,2)
color <-c (10,11,13,15,12,12,12,11,11,14,12,11,10,10)
price <- c(1,2,2,2,1,3,1,1,2,1,2,1,2,1)
df1 <- data.frame(id,bike,size,color,price)
id bike size color price
1 2 1 1 10 1
2 4 3 2 11 2
3 6 2 1 13 2
4 8 1 2 15 2
5 10 1 1 12 1
6 12 1 2 12 3
7 14 2 1 12 1
8 16 3 2 11 1
9 18 2 1 11 2
10 20 3 2 14 1
11 22 1 1 12 2
12 24 1 2 11 1
13 26 1 1 10 2
14 28 1 2 10 1
b1<-c(1,2,3)
b2<-c("Alan", "CCM", "Basso")
s1 <- c(1,2)
s2 <- c("L","S")
c1<-c(10,11,12,13,14,15)
c2 <-c("black","blue","green","red","pink")
p1<- c(1,2,3)
p2<- c(1000,2000,3000)
#trick for making a dataframe with unequal vector length
na.pad <- function(x,len){
x[1:len]
}
makePaddedDataFrame <- function(l,...){
maxlen <- max(sapply(l,length))
data.frame(lapply(l,na.pad,len=maxlen),...)
}
df.vlookup <- makePaddedDataFrame(list(b1=b1,b2=b2,s1=s1,s2=s2,c1=c1,c2=c2,p1=p1,p2=p2))
> df.vlookup
b1 b2 s1 s2 c1 c2 p1 p2
1 1 Alan 1 L 10 black 1 1000
2 2 CCM 2 S 11 blue 2 2000
3 3 Basso NA <NA> 12 green 3 3000
4 NA <NA> NA <NA> 13 red NA NA
5 NA <NA> NA <NA> 14 pink NA NA
6 NA <NA> NA <NA> 15 <NA> NA NA
Here is a dataframe that I would like to end up with:
这是一个我想最终得到的数据框:
> df.final
id bike b2 size s2 color c2 price
1 2 1 Alan 1 L 10 black 1
2 4 3 Basso 2 S 11 blue 2
3 6 2 CCM 1 L 13 red 2
4 8 1 Alan 2 S 15 #N/A 2
5 10 1 Alan 1 L 12 green 1
6 12 1 Alan 2 S 12 green 3
7 14 2 CCM 1 L 12 green 1
8 16 3 Basso 2 S 11 blue 1
9 18 2 CCM 1 L 11 blue 2
10 20 3 Basso 2 S 14 pink 1
11 22 1 Alan 1 L 12 green 2
12 24 1 Alan 2 S 11 blue 1
13 26 1 Alan 1 L 10 black 2
14 28 1 Alan 2 S 10 black 1
Really appreciate some help on this...
真的很感激这方面的一些帮助......
1 个解决方案
#1
I don't think a single data frame for lookup values is the right approach. What about using named vectors?
我不认为查找值的单个数据框是正确的方法。那么使用命名向量呢?
For example:
bike_names <- c("Alan" = 1, "CCM" = 2, "Basso" = 3)
df1$b2 <- names(bike_names[ df1$bike ])
If using data frames, put each lookup table in a separate data frame.
如果使用数据帧,请将每个查找表放在单独的数据框中。
lookup <- list(
bike = data.frame( bike = c(1, 2, 3), bike_name = c("Alan", "CCM", "Basso")),
size = data.frame(size = c(1, 2), size_name = c("L", "S")),
color = data.frame(color = c(10, 11, 12, 13, 14, 15), color_name = c("black", "blue", "green", "red", "pink", NA)),
price = data.frame(price = c(1, 2, 3), price_name = c(1000, 2000, 3000))
)
And use it with merge:
并使用它与合并:
Reduce(merge, c(data = list(df1), lookup))
Or use dplyr and joins:
或者使用dplyr和join:
library(dplyr)
df1 %>%
left_join(lookup$bike, by = c("bike")) %>%
left_join(lookup$size, by = c("size")) %>%
left_join(lookup$color, by = c("color")) %>%
left_join(lookup$price, by = c("price"))
Update
But if you really want to start from the df.vlookup
data frame, you can convert it to a list of data frames like this:
但是,如果您真的想从df.vlookup数据框开始,可以将其转换为数据框列表,如下所示:
lookup <- lapply(seq(1, to = ncol(df.vlookup), by = 2), function(i) {
setNames(df.vlookup[,c(i,i+1)], c(names(df1)[i/2+2], names(df.vlookup)[i+1]))
})
And use it in a multiple merge:
并在多次合并中使用它:
Reduce(merge, c(data = list(df1), lookup))
NOTE: When creating lookup list there are some assumptions about column order in df1
and in df.vlookup
注意:创建查找列表时,df1和df.vlookup中的列顺序有一些假设
#1
I don't think a single data frame for lookup values is the right approach. What about using named vectors?
我不认为查找值的单个数据框是正确的方法。那么使用命名向量呢?
For example:
bike_names <- c("Alan" = 1, "CCM" = 2, "Basso" = 3)
df1$b2 <- names(bike_names[ df1$bike ])
If using data frames, put each lookup table in a separate data frame.
如果使用数据帧,请将每个查找表放在单独的数据框中。
lookup <- list(
bike = data.frame( bike = c(1, 2, 3), bike_name = c("Alan", "CCM", "Basso")),
size = data.frame(size = c(1, 2), size_name = c("L", "S")),
color = data.frame(color = c(10, 11, 12, 13, 14, 15), color_name = c("black", "blue", "green", "red", "pink", NA)),
price = data.frame(price = c(1, 2, 3), price_name = c(1000, 2000, 3000))
)
And use it with merge:
并使用它与合并:
Reduce(merge, c(data = list(df1), lookup))
Or use dplyr and joins:
或者使用dplyr和join:
library(dplyr)
df1 %>%
left_join(lookup$bike, by = c("bike")) %>%
left_join(lookup$size, by = c("size")) %>%
left_join(lookup$color, by = c("color")) %>%
left_join(lookup$price, by = c("price"))
Update
But if you really want to start from the df.vlookup
data frame, you can convert it to a list of data frames like this:
但是,如果您真的想从df.vlookup数据框开始,可以将其转换为数据框列表,如下所示:
lookup <- lapply(seq(1, to = ncol(df.vlookup), by = 2), function(i) {
setNames(df.vlookup[,c(i,i+1)], c(names(df1)[i/2+2], names(df.vlookup)[i+1]))
})
And use it in a multiple merge:
并在多次合并中使用它:
Reduce(merge, c(data = list(df1), lookup))
NOTE: When creating lookup list there are some assumptions about column order in df1
and in df.vlookup
注意:创建查找列表时,df1和df.vlookup中的列顺序有一些假设