在字符串列表中查找字符串,并在熊猫中创建一个新的列。

时间:2021-04-18 22:56:43

I am new to Python and trying to solve the performance issue here. I have 2 data frames

我是Python的新手,并试图解决这里的性能问题。我有两个数据帧。

Dataframe 1

Dataframe 1

col1        col2
holiday     party
party       party
bagel       snack
fruit       snack

Data Frame 2:

数据帧2:

col1                            col2
bagel wednesday                 snack               
coffee for party                snack
holiday party                   party

Data Frame 1 has 2 columns. I need to lookup DataFrame1.col1, in DataFrame2.col1 and create a new column in DataFrame2.col2 with DataFrame1.col2 value Currently, I am achieving this using a loop and it is taking a very long time. I am looking for an efficient way to do this. Also, if I get multiple matches I should always go with the first match found from DataFrame1. For example, "coffee for party" has 2 matches from DF1, snack and party, in which case "snack" should be picked from DF1.col2.

数据帧1有两列。我需要查找DataFrame1。col1 DataFrame2。在DataFrame2中创建一个新的列。col2 DataFrame1。col2值目前,我正在使用一个循环来实现它,这需要很长的时间。我正在寻找一种有效的方法。而且,如果我有多个匹配,我应该总是使用DataFrame1中找到的第一个匹配项。例如,“派对咖啡”有两款来自DF1,零食和派对,其中“零食”应该从DF1.col2中挑选出来。

Thanks RL

由于RL

1 个解决方案

#1


0  

I think you have to loop over the days of the week (but not all the rows of df2 (well, df.col.str.contains will do the inner loop for you in an optimized manner)).

我认为您必须在一周内循环(但不是所有df2的行)。包含将以优化的方式为您执行内部循环)。

for item in df1.col2.unique():
    for idx, row in df1[df1.col2==item].iterrows():
        df2.loc[df2.col1.str.contains(row.col1), 'col3'] = item

#1


0  

I think you have to loop over the days of the week (but not all the rows of df2 (well, df.col.str.contains will do the inner loop for you in an optimized manner)).

我认为您必须在一周内循环(但不是所有df2的行)。包含将以优化的方式为您执行内部循环)。

for item in df1.col2.unique():
    for idx, row in df1[df1.col2==item].iterrows():
        df2.loc[df2.col1.str.contains(row.col1), 'col3'] = item