Learning regular expressions and stumbled into a bit of a wall. I have the following dataframe:
学习正则表达式,偶然发现了一点墙。我有以下数据帧:
item_data=pandas.DataFrame({'item':['001','002','003'],
'description':['Fishing,Hooks,12-inch','Fishing,Lines','Fish Eggs']})
For each description, I want to be extract everything prior to the second comma ",". If there is no comma, then the original description is retained
对于每个描述,我想在第二个逗号“,”之前提取所有内容。如果没有逗号,则保留原始描述
Results should look like this:
结果应如下所示:
item_data=pandas.DataFrame({'item':['001','002','003'],
'description':['Fishing,Hooks,12-inch','Fishing,Lines','Fish Eggs'],
'new_description':['Fishing,Hooks','Fishing,Lines', 'Fish Eggs']})
Any pointers would be much appreciated.
任何指针都将非常感激。
Thanks.
2 个解决方案
#1
1
Using a regexp...
使用正则表达式...
re.sub("^([^,]*,[^,]*),.*$", "\\1", x)
meaning is
-
^
start of string -
(
start capture -
[^,]
anything but a comma -
*
zero or more times -
,
a comma -
[^,]
anything but a comma -
*
zero or more times -
)
end of capture -
,
another comma -
.*
anything -
$
end of string
^字符串的开始
(开始捕捉
[^,]除了逗号之外的任何东西
*零次或多次
一个逗号
[^,]除了逗号之外的任何东西
*零次或多次
)捕获结束
,另一个逗号
$ end of string
Replacing with the content of group 1 (\1
) drops whatever is present after the second comma
替换为组1(\ 1)的内容会删除第二个逗号后出现的内容
#2
1
new_description = [",".join(i.split(",")[:2]) for i in item_data['description']]
#1
1
Using a regexp...
使用正则表达式...
re.sub("^([^,]*,[^,]*),.*$", "\\1", x)
meaning is
-
^
start of string -
(
start capture -
[^,]
anything but a comma -
*
zero or more times -
,
a comma -
[^,]
anything but a comma -
*
zero or more times -
)
end of capture -
,
another comma -
.*
anything -
$
end of string
^字符串的开始
(开始捕捉
[^,]除了逗号之外的任何东西
*零次或多次
一个逗号
[^,]除了逗号之外的任何东西
*零次或多次
)捕获结束
,另一个逗号
$ end of string
Replacing with the content of group 1 (\1
) drops whatever is present after the second comma
替换为组1(\ 1)的内容会删除第二个逗号后出现的内容
#2
1
new_description = [",".join(i.split(",")[:2]) for i in item_data['description']]