Thanks for taking the time to read this.
感谢您抽时间阅读。
I'm using Python pandas to merge two datasets on a column named 'title'. Some of the data, in one dataset has additional characters in the title cells surrounded by parentheses which causes the merge to fail on these cells. I'm trying to remove the parentheses and the values they contain using the following however, the merge still misses the updated data.
我正在使用Python pandas在名为“title”的列上合并两个数据集。一个数据集中的一些数据在括号中包含的标题单元格中具有附加字符,这导致合并在这些单元格上失败。我试图使用以下内容删除括号及其包含的值,但合并仍然错过了更新的数据。
Data sample, code and regex are below.
数据样本,代码和正则表达式如下。
I'm assuming that the regex is incorrect - any thoughts?
我假设正则表达式是错误的 - 任何想法?
import pandas as pd
data1 = pd.DataFrame({'id': ['a12bcde0'], 'title': ['company_a']})
data2 = pd.DataFrame({'serial_number': ['01a2b345','10ab2030'],'title':['company_a','company_a (123)']})
data2['title'].replace(regex=True,inplace=True,to_replace=r"\(.*\)",value=r'')
pd.merge(data1, data2, on='title')
1 个解决方案
#1
2
You're forgetting the whitespace before the opening parentheses in your pattern: to_replace=r"\s\(.*\)"
您在模式中的左括号之前忘记了空格:to_replace = r“\ s \(。* \)”
#1
2
You're forgetting the whitespace before the opening parentheses in your pattern: to_replace=r"\s\(.*\)"
您在模式中的左括号之前忘记了空格:to_replace = r“\ s \(。* \)”