如何用不在括号内的逗号分隔？

Say I have a string like this, where items are separated by commas but there may also be commas within items that have parenthesized content:

假设我有一个这样的字符串，其中的项目用逗号分隔，但在带有括号内容的项目中也可能有逗号：

(EDIT: Sorry, forgot to mention that some items may not have parenthesized content)

（编辑：对不起，忘了提一些项目可能没有括号内容）

"Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"

How can I split the string by only those commas that are NOT within parentheses? i.e:

如何只用那些不在括号内的逗号分割字符串？即：

["Water", "Titanium Dioxide (CI 77897)", "Black 2 (CI 77266)", "Iron Oxides (CI 77491, 77492, 77499)", "Ultramarines (CI 77007)"]

I think I'd have to use a regex, perhaps something like this:

我想我必须使用正则表达式，也许是这样的：

([(]?)(.*?)([)]?)(,|$)

but I'm still trying to make it work.

但我仍然想让它发挥作用。

4 个解决方案

#1

Use a negative lookahead to match all the commas which are not inside the parenthesis. Splitting the input string according to the matched commas will give you the desired output.

使用否定前瞻来匹配不在括号内的所有逗号。根据匹配的逗号分割输入字符串将为您提供所需的输出。

,\s*(?![^()]*\))

DEMO

>>> import re
>>> s = "Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>>> re.split(r',\s*(?![^()]*\))', s)
['Water', 'Titanium Dioxide (CI 77897)', 'Black 2 (CI 77266)', 'Iron Oxides (CI 77491, 77492, 77499)', 'Ultramarines (CI 77007)']

#2

You can just do it using str.replace and str.split. You may use any character to replace ),.

您可以使用str.replace和str.split来完成它。您可以使用任何字符替换）,.

a = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
a = a.replace('),', ')//').split('//')
print a

output:-

输出： -

['Titanium Dioxide (CI 77897)', ' Black 2 (CI 77266)', ' Iron Oxides (CI 77491, 77492, 77499)', ' Ultramarines (CI 77007)']

#3

Try the regex

试试正则表达式

[^()]*\([^()]*\),?

code:

码：

>>x="Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>> re.findall("[^()]*\([^()]*\),?",x)
['Titanium Dioxide (CI 77897),', ' Black 2 (CI 77266),', ' Iron Oxides (CI 77491, 77492, 77499),', ' Ultramarines (CI 77007)']

see how the regex works http://regex101.com/r/pS9oV3/1

看看正则表达式如何工作http://regex101.com/r/pS9oV3/1

#4

Using regex, this can be done easily with the findall function.

使用正则表达式，可以使用findall函数轻松完成。

import re
s = "Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
re.findall(r"\w.*?\(.*?\)", s) # returns what you want

Use http://www.regexr.com/ if you want to understand regex better, and here is the link to the python documentation : https://docs.python.org/2/library/re.html

如果你想更好地理解正则表达式，请使用http://www.regexr.com/，这里是python文档的链接：https：//docs.python.org/2/library/re.html

EDIT : I modified the regex string to accept content without parenthesis : \w[^,(]*(?:\(.*?\))?

编辑：我修改了正则表达式字符串接受没有括号的内容：\ w [^，（] *（？：\（。*？\））？

#1