I want to combine two dataframes where individual indices exist in a sorted manner, but show up a different number of times in the dataframes that I want to combine.
我想组合两个数据帧,其中各个索引以排序的方式存在,但在我想要组合的数据帧中显示不同的次数。
frame1 = pd.DataFrame([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], index=['A','B','B','C','C','C','D','E','E','F'])
frame2 = pd.DataFrame([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], index=['A', 'A', 'B', 'C', 'C', 'D', 'D', 'E', 'F', 'F'])
frame1.columns =['Hi']
frame2.columns =['Bye']
frame1
Out[160]:
Hi
A 1
B 2
B 3
C 4
C 5
C 6
D 7
E 8
E 9
F 10
frame2
Out[161]:
Bye
A 1
A 2
B 3
C 4
C 5
D 6
D 7
E 8
F 9
F 10
Desired output:
Bye Hi
A 1.0 1.0
A 2.0 NaN
B 3.0 2.0
B NaN 3.0
C 4.0 4.0
C 5.0 5.0
C NaN 6.0
D 6.0 7.0
D 7.0 NaN
E 8.0 8.0
E NaN 9.0
F 9.0 10.0
F 10.0 NaN
Can't seem to find any right combinations of concat or join to do this. Is there any way?
似乎无法找到任何正确的concat或join组合来执行此操作。有什么办法吗?
1 个解决方案
#1
3
Ok, let us build a new key here by using comcount
好吧,让我们使用comcount在这里构建一个新密钥
s1=frame1.set_index(frame1.groupby(level=0).cumcount(),append=True)
s2=frame2.set_index(frame2.groupby(level=0).cumcount(),append=True)
pd.concat([s2,s1],1).reset_index(level=1,drop=True)
Out[364]:
Bye Hi
A 1.0 1.0
A 2.0 NaN
B 3.0 2.0
B NaN 3.0
C 4.0 4.0
C 5.0 5.0
C NaN 6.0
D 6.0 7.0
D 7.0 NaN
E 8.0 8.0
E NaN 9.0
F 9.0 10.0
F 10.0 NaN
From piR (great solution with self-define func)
来自piR(具有自定义功能的强大解决方案)
def add_cumcount_level(df):
return df.set_index(df.groupby(level=0).cumcount(), append=True)
pd.concat(map(add_cumcount_level, [frame1, frame2]), axis=1)
#1
3
Ok, let us build a new key here by using comcount
好吧,让我们使用comcount在这里构建一个新密钥
s1=frame1.set_index(frame1.groupby(level=0).cumcount(),append=True)
s2=frame2.set_index(frame2.groupby(level=0).cumcount(),append=True)
pd.concat([s2,s1],1).reset_index(level=1,drop=True)
Out[364]:
Bye Hi
A 1.0 1.0
A 2.0 NaN
B 3.0 2.0
B NaN 3.0
C 4.0 4.0
C 5.0 5.0
C NaN 6.0
D 6.0 7.0
D 7.0 NaN
E 8.0 8.0
E NaN 9.0
F 9.0 10.0
F 10.0 NaN
From piR (great solution with self-define func)
来自piR(具有自定义功能的强大解决方案)
def add_cumcount_level(df):
return df.set_index(df.groupby(level=0).cumcount(), append=True)
pd.concat(map(add_cumcount_level, [frame1, frame2]), axis=1)