如何用不在括号内的逗号分隔?

时间:2021-08-21 21:42:40

Say I have a string like this, where items are separated by commas but there may also be commas within items that have parenthesized content:

假设我有一个这样的字符串,其中的项目用逗号分隔,但在带有括号内容的项目中也可能有逗号:

(EDIT: Sorry, forgot to mention that some items may not have parenthesized content)

(编辑:对不起,忘了提一些项目可能没有括号内容)

"Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"

How can I split the string by only those commas that are NOT within parentheses? i.e:

如何只用那些不在括号内的逗号分割字符串?即:

["Water", "Titanium Dioxide (CI 77897)", "Black 2 (CI 77266)", "Iron Oxides (CI 77491, 77492, 77499)", "Ultramarines (CI 77007)"]

I think I'd have to use a regex, perhaps something like this:

我想我必须使用正则表达式,也许是这样的:

([(]?)(.*?)([)]?)(,|$)

but I'm still trying to make it work.

但我仍然想让它发挥作用。

4 个解决方案

#1


14  

Use a negative lookahead to match all the commas which are not inside the parenthesis. Splitting the input string according to the matched commas will give you the desired output.

使用否定前瞻来匹配不在括号内的所有逗号。根据匹配的逗号分割输入字符串将为您提供所需的输出。

,\s*(?![^()]*\))

DEMO

DEMO

>>> import re
>>> s = "Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>>> re.split(r',\s*(?![^()]*\))', s)
['Water', 'Titanium Dioxide (CI 77897)', 'Black 2 (CI 77266)', 'Iron Oxides (CI 77491, 77492, 77499)', 'Ultramarines (CI 77007)']

#2


2  

You can just do it using str.replace and str.split. You may use any character to replace ),.

您可以使用str.replace和str.split来完成它。您可以使用任何字符替换),.

a = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
a = a.replace('),', ')//').split('//')
print a

output:-

输出: -

['Titanium Dioxide (CI 77897)', ' Black 2 (CI 77266)', ' Iron Oxides (CI 77491, 77492, 77499)', ' Ultramarines (CI 77007)']

#3


0  

Try the regex

试试正则表达式

[^()]*\([^()]*\),?

code:

码:

>>x="Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>> re.findall("[^()]*\([^()]*\),?",x)
['Titanium Dioxide (CI 77897),', ' Black 2 (CI 77266),', ' Iron Oxides (CI 77491, 77492, 77499),', ' Ultramarines (CI 77007)']

see how the regex works http://regex101.com/r/pS9oV3/1

看看正则表达式如何工作http://regex101.com/r/pS9oV3/1

#4


0  

Using regex, this can be done easily with the findall function.

使用正则表达式,可以使用findall函数轻松完成。

import re
s = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
re.findall(r"\w.*?\(.*?\)", s) # returns what you want

Use http://www.regexr.com/ if you want to understand regex better, and here is the link to the python documentation : https://docs.python.org/2/library/re.html

如果你想更好地理解正则表达式,请使用http://www.regexr.com/,这里是python文档的链接:https://docs.python.org/2/library/re.html

EDIT : I modified the regex string to accept content without parenthesis : \w[^,(]*(?:\(.*?\))?

编辑:我修改了正则表达式字符串接受没有括号的内容:\ w [^,(] *(?:\(。*?\))?

#1


14  

Use a negative lookahead to match all the commas which are not inside the parenthesis. Splitting the input string according to the matched commas will give you the desired output.

使用否定前瞻来匹配不在括号内的所有逗号。根据匹配的逗号分割输入字符串将为您提供所需的输出。

,\s*(?![^()]*\))

DEMO

DEMO

>>> import re
>>> s = "Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>>> re.split(r',\s*(?![^()]*\))', s)
['Water', 'Titanium Dioxide (CI 77897)', 'Black 2 (CI 77266)', 'Iron Oxides (CI 77491, 77492, 77499)', 'Ultramarines (CI 77007)']

#2


2  

You can just do it using str.replace and str.split. You may use any character to replace ),.

您可以使用str.replace和str.split来完成它。您可以使用任何字符替换),.

a = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
a = a.replace('),', ')//').split('//')
print a

output:-

输出: -

['Titanium Dioxide (CI 77897)', ' Black 2 (CI 77266)', ' Iron Oxides (CI 77491, 77492, 77499)', ' Ultramarines (CI 77007)']

#3


0  

Try the regex

试试正则表达式

[^()]*\([^()]*\),?

code:

码:

>>x="Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>> re.findall("[^()]*\([^()]*\),?",x)
['Titanium Dioxide (CI 77897),', ' Black 2 (CI 77266),', ' Iron Oxides (CI 77491, 77492, 77499),', ' Ultramarines (CI 77007)']

see how the regex works http://regex101.com/r/pS9oV3/1

看看正则表达式如何工作http://regex101.com/r/pS9oV3/1

#4


0  

Using regex, this can be done easily with the findall function.

使用正则表达式,可以使用findall函数轻松完成。

import re
s = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
re.findall(r"\w.*?\(.*?\)", s) # returns what you want

Use http://www.regexr.com/ if you want to understand regex better, and here is the link to the python documentation : https://docs.python.org/2/library/re.html

如果你想更好地理解正则表达式,请使用http://www.regexr.com/,这里是python文档的链接:https://docs.python.org/2/library/re.html

EDIT : I modified the regex string to accept content without parenthesis : \w[^,(]*(?:\(.*?\))?

编辑:我修改了正则表达式字符串接受没有括号的内容:\ w [^,(] *(?:\(。*?\))?