合并两个不同的数据帧与Pandas

I am new to pandas, I need to complete the following task, is there an effective way to do it? There are 2 different dataframes, dfa and dfb:

我是熊猫新手,我需要完成以下任务,有没有一种有效的方法呢?有两种不同的数据帧,dfa和dfb:

I used this to merge them together:

我用这个将它们合并在一起:

df = pd.merge(dfa, dfb, left_on = ['a_retry','a_cca', 'a_rssif', 'a_lqif'], right_on = ['b_retry','b_cca', 'b_rssif', 'b_lqif'])

I got the df output:

我得到了df输出:

However it is not my expectation. The merged dataframe contains all columns, it is OK, but the rows shall not exceed the smaller one (aka. dfa), that means the row 3 must be dropped, the expected one is: How can I do that? Thanks.

然而,这不是我的期望。合并的数据框包含所有列,没关系,但行不应超过较小的行(也称为dfa),这意味着必须删除第3行,预期的是:我该怎么做?谢谢。

1 个解决方案

#1

It is expected, because duplicates per all 4 columns.

这是预料之中的,因为每4列都有重复数据。

So need remove duplicates rows by drop_duplicates:

因此需要通过drop_duplicates删除重复行:

dfa = dfa.drop_duplicates(subset=['a_retry','a_cca', 'a_rssif', 'a_lqif'])
dfb = dfb.drop_duplicates(subset=['b_retry','b_cca', 'b_rssif', 'b_lqif'])

But if need match duplicates rows, is it possible with new column by cumcount, which is used for merge:

但是如果需要匹配重复行,是否可以使用cumcount的新列,用于合并:

dfa['new'] = dfa.groupby(['a_retry','a_cca', 'a_rssif', 'a_lqif']).cumcount()
dfb['new'] = dfb.groupby(['b_retry','b_cca', 'b_rssif', 'b_lqif']).cumcount()

df = (pd.merge(dfa, 
               dfb, 
               left_on = ['a_retry','a_cca', 'a_rssif', 'a_lqif', 'new'], 
               right_on = ['b_retry','b_cca', 'b_rssif','b_lqif', 'new']).drop('new', axis=1))

#1