I gather data from 4 df's and would like to merge them by rownames. I am looking for an efficient way to do this. This is a simplified version of the data I have.
我收集了来自4个df的数据,并希望通过rowname合并它们。我正在寻找一种有效的方法。这是我所拥有的数据的简化版本。
df1 <- data.frame(N= sample(seq(9, 27, 0.5), 40, replace= T),
P= sample(seq(0.3, 4, 0.1), 40, replace= T),
C= sample(seq(400, 500, 1), 40, replace= T))
df2 <- data.frame(origin= sample(c("A", "B", "C", "D", "E"), 40,
replace= T),
foo1= sample(c(T, F), 40, replace= T),
X= sample(seq(145600, 148300, 100), 40, replace= T),
Y= sample(seq(349800, 398600, 100), 40, replace= T))
df3 <- matrix(sample(seq(0, 1, 0.01), 40), 40, 100)
df4 <- matrix(sample(seq(0, 1, 0.01), 40), 40, 100)
rownames(df1) <- paste("P", sprintf("%02d", c(1:40)), sep= "")
rownames(df2) <- rownames(df1)
rownames(df3) <- rownames(df1)
rownames(df4) <- rownames(df1)
This is what I would normally do:
这是我通常的做法:
# merge df1 and df2
dat <- merge(df1, df2, by= "row.names", all.x= F, all.y= F) #merge
rownames(dat) <- dat$Row.names #reset rownames
dat$Row.names <- NULL #remove added rownames col
# merge dat and df3
dat <- merge(dat, df3, by= "row.names", all.x= F, all.y= F) #merge
rownames(dat) <- dat$Row.names #reset rownames
dat$Row.names <- NULL #remove added rownames col
# merge dat and df4
dat <- merge(dat, df4, by= "row.names", all.x= F, all.y= F) #merge
rownames(dat) <- dat$Row.names #reset rownames
dat$Row.names <- NULL #remove added rownames col
As you can see, this requires a lot of code. My question is if the same result can be achieved with more simple means. I've tried (without success): UPDATE: this works now!
正如您所看到的,这需要大量的代码。我的问题是,是否可以用更简单的方法实现同样的结果。我试过了(没有成功):更新:现在成功了!
MyMerge <- function(x, y){
df <- merge(x, y, by= "row.names", all.x= F, all.y= F)
rownames(df) <- df$Row.names
df$Row.names <- NULL
return(df)
}
dat <- Reduce(MyMerge, list(df1, df2, df3, df4))
Thanks in advance for any suggestions
谢谢你的建议。
4 个解决方案
#1
34
join_all
from plyr
will probably do what you want. But they all must be data frames and the rownames are added as a column
来自plyr的join_all可能会做你想做的事。但它们都必须是数据帧,并且将行名添加为列。
require(plyr)
df3 <- data.frame(df3)
df4 <- data.frame(df4)
df1$rn <- rownames(df1)
df2$rn <- rownames(df2)
df3$rn <- rownames(df3)
df4$rn <- rownames(df4)
df <- join_all(list(df1,df2,df3,df4), by = 'rn', type = 'full')
type
argument should help even if the rownames vary and do not match If you do not want the rownames:
类型参数应该帮助,即使行名称不同,如果您不想要行名称,则不匹配:
df$rn <- NULL
#2
9
Editing your function, I have came up with the function which allows you to merge more data frames by a specific column key (name of the column). The resulted data frame includes all the variable of the merged data frames (if you wanna keep just the common variables (excluding NA, use: all.x= FALSE, all.y= FALSE
)
编辑您的函数,我已经提出了一个函数,它允许您通过一个特定的列键(列的名称)合并更多的数据帧。所产生的数据帧包括合并数据帧的所有变量(如果您想保留普通变量(不包括NA,使用:all)。x = FALSE。y = FALSE)
MyMerge <- function(x, y){
df <- merge(x, y, by= "name of the common column", all.x= TRUE, all.y= TRUE)
return(df)
}
new.df <- Reduce(MyMerge, list(df1, df2, df3, df4))
#3
4
Three lines of code will give you the exact same result:
三行代码将给出完全相同的结果:
dat2 <- cbind(df1, df2, df3, df4)
colnames(dat2)[-(1:7)] <- paste(paste('V', rep(1:100, 2),sep = ''),
rep(c('x', 'y'), each = 100), sep = c('.'))
all.equal(dat,dat2)
Ah I see, now I understand why you are getting into so much pain. Using the old for
loop surely does the trick. Maybe there are even more clever solutions
啊,我明白了,现在我明白你为什么这么痛苦了。使用旧的for循环肯定会成功。也许还有更聪明的解决方案。
rn <- rownames(df1)
l <- list(df1, df2, df3, df4)
dat <- l[[1]]
for(i in 2:length(l)) {
dat <- merge(dat, l[[i]], by= "row.names", all.x= F, all.y= F) [,-1]
rownames(dat) <- rn
}
#4
4
I have been looking for the same function. After trying a couple of the options here and others elsewhere. The easiest for me was:
我一直在寻找相同的函数。在尝试了几个选项之后。对我来说最容易的是:
cbind.data.frame( df1,df2,df3,df4....)
cbind.data.frame(df1 df2、df3 df4 ....)
#1
34
join_all
from plyr
will probably do what you want. But they all must be data frames and the rownames are added as a column
来自plyr的join_all可能会做你想做的事。但它们都必须是数据帧,并且将行名添加为列。
require(plyr)
df3 <- data.frame(df3)
df4 <- data.frame(df4)
df1$rn <- rownames(df1)
df2$rn <- rownames(df2)
df3$rn <- rownames(df3)
df4$rn <- rownames(df4)
df <- join_all(list(df1,df2,df3,df4), by = 'rn', type = 'full')
type
argument should help even if the rownames vary and do not match If you do not want the rownames:
类型参数应该帮助,即使行名称不同,如果您不想要行名称,则不匹配:
df$rn <- NULL
#2
9
Editing your function, I have came up with the function which allows you to merge more data frames by a specific column key (name of the column). The resulted data frame includes all the variable of the merged data frames (if you wanna keep just the common variables (excluding NA, use: all.x= FALSE, all.y= FALSE
)
编辑您的函数,我已经提出了一个函数,它允许您通过一个特定的列键(列的名称)合并更多的数据帧。所产生的数据帧包括合并数据帧的所有变量(如果您想保留普通变量(不包括NA,使用:all)。x = FALSE。y = FALSE)
MyMerge <- function(x, y){
df <- merge(x, y, by= "name of the common column", all.x= TRUE, all.y= TRUE)
return(df)
}
new.df <- Reduce(MyMerge, list(df1, df2, df3, df4))
#3
4
Three lines of code will give you the exact same result:
三行代码将给出完全相同的结果:
dat2 <- cbind(df1, df2, df3, df4)
colnames(dat2)[-(1:7)] <- paste(paste('V', rep(1:100, 2),sep = ''),
rep(c('x', 'y'), each = 100), sep = c('.'))
all.equal(dat,dat2)
Ah I see, now I understand why you are getting into so much pain. Using the old for
loop surely does the trick. Maybe there are even more clever solutions
啊,我明白了,现在我明白你为什么这么痛苦了。使用旧的for循环肯定会成功。也许还有更聪明的解决方案。
rn <- rownames(df1)
l <- list(df1, df2, df3, df4)
dat <- l[[1]]
for(i in 2:length(l)) {
dat <- merge(dat, l[[i]], by= "row.names", all.x= F, all.y= F) [,-1]
rownames(dat) <- rn
}
#4
4
I have been looking for the same function. After trying a couple of the options here and others elsewhere. The easiest for me was:
我一直在寻找相同的函数。在尝试了几个选项之后。对我来说最容易的是:
cbind.data.frame( df1,df2,df3,df4....)
cbind.data.frame(df1 df2、df3 df4 ....)