如何加入Pandas Dataframes并多次保留左列?

时间:2021-07-14 22:58:08

I want to do the following join:

我想做以下加入:

    A   B
0   a   z
1   b   y
2   c   x

    A   C   D
0   a   1   xy
1   b   1   xc
2   a   2   xv
3   c   2   xb

to

    A   B   C   D
0   a   z   1   xy
1   b   y   1   xc
2   c   x   1   NaN
3   a   z   2   xv
4   b   y   2   NaN
5   c   x   2   xb

So for every value in 'C' I want to join the whole first Dataframe to the second one without losing any rows of the first Frame. Is that possible?

因此,对于'C'中的每个值,我想将整个第一个Dataframe连接到第二个Dataframe,而不会丢失第一个Frame的任何行。那可能吗?

2 个解决方案

#1


2  

join and reindex

d = d2.set_index(['A', 'C'])
d = d.reindex(pd.MultiIndex.from_product(d.index.levels, names=d.index.names))
d.join(d1.set_index('A')).reset_index().sort_index(1)

   A  B  C    D
0  a  z  1   xy
1  a  z  2   xv
2  b  y  1   xc
3  b  y  2  NaN
4  c  x  1  NaN
5  c  x  2   xb

Rearrange some things to match OP exactly

重新排列一些东西以完全匹配OP

d = d2.set_index(['C', 'A'])
d = d.reindex(pd.MultiIndex.from_product(d.index.levels, names=d.index.names))
d.join(d1.set_index('A')).sort_index().reset_index().sort_index(1)

   A  B  C    D
0  a  z  1   xy
1  b  y  1   xc
2  c  x  1  NaN
3  a  z  2   xv
4  b  y  2  NaN
5  c  x  2   xb

Clever use of pd.concat

pd.concat(d.merge(d1.assign(C=i), 'outer') for i, d in d2.groupby('C'))

   A  B  C    D
0  a  z  1   xy
1  b  y  1   xc
4  c  x  1  NaN
2  a  z  2   xv
5  b  y  2  NaN
3  c  x  2   xb

#2


1  

This will involve groupby + merge , I cannot guarantee the speed

这将涉及groupby + merge,我无法保证速度

df2.groupby('C').apply(lambda x : x.merge(df1,on='A',how='outer').assign(C=lambda d: d['C'].ffill())).reset_index(drop=True)                       
Out[954]: 
   A    C    D  B
0  a  1.0   xy  z
1  b  1.0   xc  y
2  c  1.0  NaN  x
3  a  2.0   xv  z
4  c  2.0   xb  x
5  b  2.0  NaN  y

#1


2  

join and reindex

d = d2.set_index(['A', 'C'])
d = d.reindex(pd.MultiIndex.from_product(d.index.levels, names=d.index.names))
d.join(d1.set_index('A')).reset_index().sort_index(1)

   A  B  C    D
0  a  z  1   xy
1  a  z  2   xv
2  b  y  1   xc
3  b  y  2  NaN
4  c  x  1  NaN
5  c  x  2   xb

Rearrange some things to match OP exactly

重新排列一些东西以完全匹配OP

d = d2.set_index(['C', 'A'])
d = d.reindex(pd.MultiIndex.from_product(d.index.levels, names=d.index.names))
d.join(d1.set_index('A')).sort_index().reset_index().sort_index(1)

   A  B  C    D
0  a  z  1   xy
1  b  y  1   xc
2  c  x  1  NaN
3  a  z  2   xv
4  b  y  2  NaN
5  c  x  2   xb

Clever use of pd.concat

pd.concat(d.merge(d1.assign(C=i), 'outer') for i, d in d2.groupby('C'))

   A  B  C    D
0  a  z  1   xy
1  b  y  1   xc
4  c  x  1  NaN
2  a  z  2   xv
5  b  y  2  NaN
3  c  x  2   xb

#2


1  

This will involve groupby + merge , I cannot guarantee the speed

这将涉及groupby + merge,我无法保证速度

df2.groupby('C').apply(lambda x : x.merge(df1,on='A',how='outer').assign(C=lambda d: d['C'].ffill())).reset_index(drop=True)                       
Out[954]: 
   A    C    D  B
0  a  1.0   xy  z
1  b  1.0   xc  y
2  c  1.0  NaN  x
3  a  2.0   xv  z
4  c  2.0   xb  x
5  b  2.0  NaN  y