I want to do the following join:
我想做以下加入:
A B
0 a z
1 b y
2 c x
A C D
0 a 1 xy
1 b 1 xc
2 a 2 xv
3 c 2 xb
to
A B C D
0 a z 1 xy
1 b y 1 xc
2 c x 1 NaN
3 a z 2 xv
4 b y 2 NaN
5 c x 2 xb
So for every value in 'C' I want to join the whole first Dataframe to the second one without losing any rows of the first Frame. Is that possible?
因此,对于'C'中的每个值,我想将整个第一个Dataframe连接到第二个Dataframe,而不会丢失第一个Frame的任何行。那可能吗?
2 个解决方案
#1
2
join
and reindex
d = d2.set_index(['A', 'C'])
d = d.reindex(pd.MultiIndex.from_product(d.index.levels, names=d.index.names))
d.join(d1.set_index('A')).reset_index().sort_index(1)
A B C D
0 a z 1 xy
1 a z 2 xv
2 b y 1 xc
3 b y 2 NaN
4 c x 1 NaN
5 c x 2 xb
Rearrange some things to match OP exactly
重新排列一些东西以完全匹配OP
d = d2.set_index(['C', 'A'])
d = d.reindex(pd.MultiIndex.from_product(d.index.levels, names=d.index.names))
d.join(d1.set_index('A')).sort_index().reset_index().sort_index(1)
A B C D
0 a z 1 xy
1 b y 1 xc
2 c x 1 NaN
3 a z 2 xv
4 b y 2 NaN
5 c x 2 xb
Clever use of pd.concat
pd.concat(d.merge(d1.assign(C=i), 'outer') for i, d in d2.groupby('C'))
A B C D
0 a z 1 xy
1 b y 1 xc
4 c x 1 NaN
2 a z 2 xv
5 b y 2 NaN
3 c x 2 xb
#2
1
This will involve groupby
+ merge
, I cannot guarantee the speed
这将涉及groupby + merge,我无法保证速度
df2.groupby('C').apply(lambda x : x.merge(df1,on='A',how='outer').assign(C=lambda d: d['C'].ffill())).reset_index(drop=True)
Out[954]:
A C D B
0 a 1.0 xy z
1 b 1.0 xc y
2 c 1.0 NaN x
3 a 2.0 xv z
4 c 2.0 xb x
5 b 2.0 NaN y
#1
2
join
and reindex
d = d2.set_index(['A', 'C'])
d = d.reindex(pd.MultiIndex.from_product(d.index.levels, names=d.index.names))
d.join(d1.set_index('A')).reset_index().sort_index(1)
A B C D
0 a z 1 xy
1 a z 2 xv
2 b y 1 xc
3 b y 2 NaN
4 c x 1 NaN
5 c x 2 xb
Rearrange some things to match OP exactly
重新排列一些东西以完全匹配OP
d = d2.set_index(['C', 'A'])
d = d.reindex(pd.MultiIndex.from_product(d.index.levels, names=d.index.names))
d.join(d1.set_index('A')).sort_index().reset_index().sort_index(1)
A B C D
0 a z 1 xy
1 b y 1 xc
2 c x 1 NaN
3 a z 2 xv
4 b y 2 NaN
5 c x 2 xb
Clever use of pd.concat
pd.concat(d.merge(d1.assign(C=i), 'outer') for i, d in d2.groupby('C'))
A B C D
0 a z 1 xy
1 b y 1 xc
4 c x 1 NaN
2 a z 2 xv
5 b y 2 NaN
3 c x 2 xb
#2
1
This will involve groupby
+ merge
, I cannot guarantee the speed
这将涉及groupby + merge,我无法保证速度
df2.groupby('C').apply(lambda x : x.merge(df1,on='A',how='outer').assign(C=lambda d: d['C'].ffill())).reset_index(drop=True)
Out[954]:
A C D B
0 a 1.0 xy z
1 b 1.0 xc y
2 c 1.0 NaN x
3 a 2.0 xv z
4 c 2.0 xb x
5 b 2.0 NaN y