I'm interested in merging two dataframes in R. I'd like to merge both by a date and a second ID variable. For example, creating two dataframes (df1 and df2) that have some but not complete overlap:
我有兴趣在R中合并两个数据帧。我想用日期和第二个ID变量合并它们。例如,创建两个具有一些但不完全重叠的数据帧(df1和df2):
df1 <- data.frame(ID=c(rep(1,5),rep(2,3),rep(3,7)),
Date=c(seq( as.Date("2011-07-01"), by=1, len=5),
seq( as.Date("2011-07-01"), by=1, len=3),
seq( as.Date("2011-07-01"), by=1, len=7)))
df2 <- data.frame(ID=c(rep(1,3),rep(2,2),rep(3,7)),
Date=c(seq( as.Date("2011-07-01"), by=1, len=3),
seq( as.Date("2011-07-01"), by=1, len=2),
seq( as.Date("2011-07-01"), by=1, len=7)),
var=c(rep(12,3),rep(5,2),rep(9,7)))
Yields:
产量:
> df1
ID Date
1 1 2011-07-01
2 1 2011-07-02
3 1 2011-07-03
4 1 2011-07-04
5 1 2011-07-05
> df2
ID Date var
1 1 2011-07-01 12
2 1 2011-07-02 12
3 1 2011-07-03 12
4 2 2011-07-01 5 , etc..
I'm wanting the equivalent of join_all(type="full") or merge(all=TRUE), so that NAs appear when overlap is not possible. I'm wanting the matching on cases with equivalent IDs and Dates.
我想要相当于join_all(type =“full”)或merge(all = TRUE),以便在无法重叠时出现NA。我想要在具有等效ID和日期的案例上进行匹配。
> df3
ID Date var
1 1 2011-07-01 12
2 1 2011-07-02 12
3 1 2011-07-03 12
4 1 2011-07-04 NA
5 1 2011-07-05 NA, etc.
Thanks!
谢谢!
2 个解决方案
#1
1
Seems like you want something like this,
好像你想要这样的东西,
> merge(df1, df2, by=c("ID", "Date"), all = TRUE)
ID Date var
1 1 2011-07-01 12
2 1 2011-07-02 12
3 1 2011-07-03 12
4 1 2011-07-04 NA
5 1 2011-07-05 NA
6 2 2011-07-01 5
7 2 2011-07-02 5
8 2 2011-07-03 NA
9 3 2011-07-01 9
10 3 2011-07-02 9
11 3 2011-07-03 9
12 3 2011-07-04 9
13 3 2011-07-05 9
14 3 2011-07-06 9
15 3 2011-07-07 9
#2
0
merge
is a useful base function. But also suggest to take a look at dplyr
package. Quite sure you will bump into it soon.
合并是一个有用的基础功能。但也建议看看dplyr包。相当肯定你很快会遇到它。
library(dplyr)
left_join(df1,df2)
# Joining by: c("ID", "Date")
# ID Date var
# 1 1 15156 12
# 2 1 15157 12
# 3 1 15158 12
# 4 1 15159 NA
# 5 1 15160 NA
# 6 2 15156 5
# 7 2 15157 5
# 8 2 15158 NA
# 9 3 15156 9
# 10 3 15157 9
# 11 3 15158 9
# 12 3 15159 9
# 13 3 15160 9
# 14 3 15161 9
# 15 3 15162 9
#1
1
Seems like you want something like this,
好像你想要这样的东西,
> merge(df1, df2, by=c("ID", "Date"), all = TRUE)
ID Date var
1 1 2011-07-01 12
2 1 2011-07-02 12
3 1 2011-07-03 12
4 1 2011-07-04 NA
5 1 2011-07-05 NA
6 2 2011-07-01 5
7 2 2011-07-02 5
8 2 2011-07-03 NA
9 3 2011-07-01 9
10 3 2011-07-02 9
11 3 2011-07-03 9
12 3 2011-07-04 9
13 3 2011-07-05 9
14 3 2011-07-06 9
15 3 2011-07-07 9
#2
0
merge
is a useful base function. But also suggest to take a look at dplyr
package. Quite sure you will bump into it soon.
合并是一个有用的基础功能。但也建议看看dplyr包。相当肯定你很快会遇到它。
library(dplyr)
left_join(df1,df2)
# Joining by: c("ID", "Date")
# ID Date var
# 1 1 15156 12
# 2 1 15157 12
# 3 1 15158 12
# 4 1 15159 NA
# 5 1 15160 NA
# 6 2 15156 5
# 7 2 15157 5
# 8 2 15158 NA
# 9 3 15156 9
# 10 3 15157 9
# 11 3 15158 9
# 12 3 15159 9
# 13 3 15160 9
# 14 3 15161 9
# 15 3 15162 9