How can I find matches from my first series that are found in col1 from the dataframe with 3 columns? I need to be able to use regex as well due to my series containing * as a placeholder for anything that is in that field.
如何从包含3列的数据框中找到col1中找到的第一个系列的匹配项?我需要能够使用正则表达式,因为我的系列包含*作为该字段中任何内容的占位符。
I have a pandas series that consists of data like below:
我有一个大熊猫系列,包括如下数据:
col1
joe\creed\found\match
matt\creed\*\not
adam\creed\notfound\match
I have another dataframe with data like below:
我有另一个数据框,其数据如下:
col1 col2 col3
joe2\creed2\found\match 2 23
matt2\creed2\found2\not 2 23
adam\creed\notfound\match 2 23
matt\creed\found\not 2 23
I have attempted to do the following code with no success.
我试图执行以下代码但没有成功。
for item in series:
print(df[df.col1.str.contains(item, regex=True)]
and
for item in series:
print(df[df.col1.isin([str(item)])
My expected output is as follows:
我的预期产量如下:
col1 col2 col3
adam\creed\notfound\match 2 23
matt\creed\found\not 2 23
1 个解决方案
#1
2
You can do it this way:
你可以这样做:
Data:
In [163]: s
Out[163]:
0 joe\creed\found\match
1 matt\creed\*\not
2 adam\creed\notfound\match
Name: col1, dtype: object
In [164]: df
Out[164]:
col1 col2 col3
0 joe2\creed2\found\match 2 23
1 matt2\creed2\found2\not 2 23
2 adam\creed\notfound\match 2 23
3 matt\creed\found\not 2 23
Solution:
import re
# replacing '*' --> '[^\\]*' (in the escaped string: '\\\*' --> '[^\\\\]*')
pat = s.apply(re.escape).str.replace(r'\\\*', r'[^\\\\]*').str.cat(sep='|')
# use the following line instead, if `s` is a DataFrame (not a Series):
#pat = s.col1.apply(re.escape).str.replace(r'\\\*', r'[^\\\\]*').str.cat(sep='|')
In [161]: df[df.col1.str.contains(pat)]
Out[161]:
col1 col2 col3
2 adam\creed\notfound\match 2 23
3 matt\creed\found\not 2 23
In [162]: pat
Out[162]: 'joe\\\\creed\\\\found\\\\match|matt\\\\creed\\\\[^\\\\]*\\\\not|adam\\\\creed\\\\notfound\\\\match'
The main difficulty is to correctly escape all special characters (like \
) in the "search pattern" series.
主要的难点是正确地逃避“搜索模式”系列中的所有特殊字符(如\)。
#1
2
You can do it this way:
你可以这样做:
Data:
In [163]: s
Out[163]:
0 joe\creed\found\match
1 matt\creed\*\not
2 adam\creed\notfound\match
Name: col1, dtype: object
In [164]: df
Out[164]:
col1 col2 col3
0 joe2\creed2\found\match 2 23
1 matt2\creed2\found2\not 2 23
2 adam\creed\notfound\match 2 23
3 matt\creed\found\not 2 23
Solution:
import re
# replacing '*' --> '[^\\]*' (in the escaped string: '\\\*' --> '[^\\\\]*')
pat = s.apply(re.escape).str.replace(r'\\\*', r'[^\\\\]*').str.cat(sep='|')
# use the following line instead, if `s` is a DataFrame (not a Series):
#pat = s.col1.apply(re.escape).str.replace(r'\\\*', r'[^\\\\]*').str.cat(sep='|')
In [161]: df[df.col1.str.contains(pat)]
Out[161]:
col1 col2 col3
2 adam\creed\notfound\match 2 23
3 matt\creed\found\not 2 23
In [162]: pat
Out[162]: 'joe\\\\creed\\\\found\\\\match|matt\\\\creed\\\\[^\\\\]*\\\\not|adam\\\\creed\\\\notfound\\\\match'
The main difficulty is to correctly escape all special characters (like \
) in the "search pattern" series.
主要的难点是正确地逃避“搜索模式”系列中的所有特殊字符(如\)。