Say I have a string like this, where items are separated by commas but there may also be commas within items that have parenthesized content:
假设我有一个这样的字符串,其中的项目用逗号分隔,但在带有括号内容的项目中也可能有逗号:
(EDIT: Sorry, forgot to mention that some items may not have parenthesized content)
(编辑:对不起,忘了提一些项目可能没有括号内容)
"Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
How can I split the string by only those commas that are NOT within parentheses? i.e:
如何只用那些不在括号内的逗号分割字符串?即:
["Water", "Titanium Dioxide (CI 77897)", "Black 2 (CI 77266)", "Iron Oxides (CI 77491, 77492, 77499)", "Ultramarines (CI 77007)"]
I think I'd have to use a regex, perhaps something like this:
我想我必须使用正则表达式,也许是这样的:
([(]?)(.*?)([)]?)(,|$)
but I'm still trying to make it work.
但我仍然想让它发挥作用。
4 个解决方案
#1
14
Use a negative lookahead to match all the commas which are not inside the parenthesis. Splitting the input string according to the matched commas will give you the desired output.
使用否定前瞻来匹配不在括号内的所有逗号。根据匹配的逗号分割输入字符串将为您提供所需的输出。
,\s*(?![^()]*\))
DEMO
>>> import re
>>> s = "Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>>> re.split(r',\s*(?![^()]*\))', s)
['Water', 'Titanium Dioxide (CI 77897)', 'Black 2 (CI 77266)', 'Iron Oxides (CI 77491, 77492, 77499)', 'Ultramarines (CI 77007)']
#2
2
You can just do it using str.replace
and str.split
. You may use any character to replace ),
.
您可以使用str.replace和str.split来完成它。您可以使用任何字符替换),.
a = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
a = a.replace('),', ')//').split('//')
print a
output:-
输出: -
['Titanium Dioxide (CI 77897)', ' Black 2 (CI 77266)', ' Iron Oxides (CI 77491, 77492, 77499)', ' Ultramarines (CI 77007)']
#3
0
Try the regex
试试正则表达式
[^()]*\([^()]*\),?
code:
码:
>>x="Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>> re.findall("[^()]*\([^()]*\),?",x)
['Titanium Dioxide (CI 77897),', ' Black 2 (CI 77266),', ' Iron Oxides (CI 77491, 77492, 77499),', ' Ultramarines (CI 77007)']
see how the regex works http://regex101.com/r/pS9oV3/1
看看正则表达式如何工作http://regex101.com/r/pS9oV3/1
#4
0
Using regex
, this can be done easily with the findall
function.
使用正则表达式,可以使用findall函数轻松完成。
import re
s = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
re.findall(r"\w.*?\(.*?\)", s) # returns what you want
Use http://www.regexr.com/ if you want to understand regex better, and here is the link to the python documentation : https://docs.python.org/2/library/re.html
如果你想更好地理解正则表达式,请使用http://www.regexr.com/,这里是python文档的链接:https://docs.python.org/2/library/re.html
EDIT : I modified the regex string to accept content without parenthesis : \w[^,(]*(?:\(.*?\))?
编辑:我修改了正则表达式字符串接受没有括号的内容:\ w [^,(] *(?:\(。*?\))?
#1
14
Use a negative lookahead to match all the commas which are not inside the parenthesis. Splitting the input string according to the matched commas will give you the desired output.
使用否定前瞻来匹配不在括号内的所有逗号。根据匹配的逗号分割输入字符串将为您提供所需的输出。
,\s*(?![^()]*\))
DEMO
>>> import re
>>> s = "Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>>> re.split(r',\s*(?![^()]*\))', s)
['Water', 'Titanium Dioxide (CI 77897)', 'Black 2 (CI 77266)', 'Iron Oxides (CI 77491, 77492, 77499)', 'Ultramarines (CI 77007)']
#2
2
You can just do it using str.replace
and str.split
. You may use any character to replace ),
.
您可以使用str.replace和str.split来完成它。您可以使用任何字符替换),.
a = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
a = a.replace('),', ')//').split('//')
print a
output:-
输出: -
['Titanium Dioxide (CI 77897)', ' Black 2 (CI 77266)', ' Iron Oxides (CI 77491, 77492, 77499)', ' Ultramarines (CI 77007)']
#3
0
Try the regex
试试正则表达式
[^()]*\([^()]*\),?
code:
码:
>>x="Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>> re.findall("[^()]*\([^()]*\),?",x)
['Titanium Dioxide (CI 77897),', ' Black 2 (CI 77266),', ' Iron Oxides (CI 77491, 77492, 77499),', ' Ultramarines (CI 77007)']
see how the regex works http://regex101.com/r/pS9oV3/1
看看正则表达式如何工作http://regex101.com/r/pS9oV3/1
#4
0
Using regex
, this can be done easily with the findall
function.
使用正则表达式,可以使用findall函数轻松完成。
import re
s = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
re.findall(r"\w.*?\(.*?\)", s) # returns what you want
Use http://www.regexr.com/ if you want to understand regex better, and here is the link to the python documentation : https://docs.python.org/2/library/re.html
如果你想更好地理解正则表达式,请使用http://www.regexr.com/,这里是python文档的链接:https://docs.python.org/2/library/re.html
EDIT : I modified the regex string to accept content without parenthesis : \w[^,(]*(?:\(.*?\))?
编辑:我修改了正则表达式字符串接受没有括号的内容:\ w [^,(] *(?:\(。*?\))?