I have two dataframes with the following column names:
我有两个具有以下列名称的数据帧:
frame_1:
event_id, date, time, county_ID
frame_2:
countyid, state
I would like to get a dataframe with the following columns by joining (left) on county_ID = countyid
:
我想通过在county_ID = countyid上加入(左)来获得包含以下列的数据框:
joined_dataframe
event_id, date, time, county, state
I cannot figure out how to do it if the columns on which I want to join are not the index. What's the easiest way? Thanks!
如果我想要加入的列不是索引,我无法弄清楚如何做到这一点。什么是最简单的方法?谢谢!
2 个解决方案
#1
72
you can use the left_on and right_on options as follows:
您可以使用left_on和right_on选项,如下所示:
pd.merge(frame_1, frame_2, left_on = 'county_ID', right_on = 'countyid')
I was not sure from the question if you only wanted to merge if the key was in the left hand dataframe. If that is the case then the following will do that (the above will in effect do a many to many merge)
如果密钥位于左侧数据框中,我不确定是否只想合并。如果是这种情况,那么以下将会这样做(以上将实际上做多对多合并)
pd.merge(frame_1, frame_2, how = 'left', left_on = 'county_ID', right_on = 'countyid')
#2
2
you need to make county_ID
as index for the right frame:
你需要将county_ID作为右框架的索引:
frame_2.join ( frame_1.set_index( [ 'county_ID' ], verify_integrity=True ),
on=[ 'countyid' ], how='left' )
for your information, in pandas left join breaks when the right frame has non unique values on the joining column. see this bug.
为了您的信息,在pandas中,当右框架在连接列上具有非唯一值时,连接会中断。看到这个bug。
so you need to verify integrity before joining by , verify_integrity=True
所以你需要在加入之前验证完整性,verify_integrity = True
#1
72
you can use the left_on and right_on options as follows:
您可以使用left_on和right_on选项,如下所示:
pd.merge(frame_1, frame_2, left_on = 'county_ID', right_on = 'countyid')
I was not sure from the question if you only wanted to merge if the key was in the left hand dataframe. If that is the case then the following will do that (the above will in effect do a many to many merge)
如果密钥位于左侧数据框中,我不确定是否只想合并。如果是这种情况,那么以下将会这样做(以上将实际上做多对多合并)
pd.merge(frame_1, frame_2, how = 'left', left_on = 'county_ID', right_on = 'countyid')
#2
2
you need to make county_ID
as index for the right frame:
你需要将county_ID作为右框架的索引:
frame_2.join ( frame_1.set_index( [ 'county_ID' ], verify_integrity=True ),
on=[ 'countyid' ], how='left' )
for your information, in pandas left join breaks when the right frame has non unique values on the joining column. see this bug.
为了您的信息,在pandas中,当右框架在连接列上具有非唯一值时,连接会中断。看到这个bug。
so you need to verify integrity before joining by , verify_integrity=True
所以你需要在加入之前验证完整性,verify_integrity = True