I have a string with 3 lines:
我有一个包含3行的字符串:
a VARCHAR(20),
b FLOAT, c FLOAT,
d NUMBER(38,0), e NUMBER(38,0)
Need to split a string into array based on comma delimiter but to ignore commas in parentheses.
需要根据逗号分隔符将字符串拆分为数组,但忽略括号中的逗号。
Final output is array with 5 elements:
最终输出是包含5个元素的数组:
s_arr = ['a VARCHAR(20)', 'b FLOAT', 'c FLOAT', 'd NUMBER(38,0)', 'e NUMBER(38,0)']
So far I have s_arr = s.split(",")
到目前为止我有s_arr = s.split(“,”)
How to achieve that?
怎么实现呢?
4 个解决方案
#1
2
You may use ,(?![^\(]*[\)])
with a list comprehension:
您可以将(?![^ \(* * [\]])与列表推导一起使用:
s = '''
a VARCHAR(20),
b FLOAT, c FLOAT,
d NUMBER(38,0), e NUMBER(38,0)
'''
[i.strip() for i in re.split(r',(?![^\(]*[\)])', s)]
# ['a VARCHAR(20)', 'b FLOAT', 'c FLOAT', 'd NUMBER(38,0)', 'e NUMBER(38,0)']
#2
1
Use a Regular Expression to split based on multiple delimiters
stringToSplit = '''a VARCHAR(20),
b FLOAT, c FLOAT,
d NUMBER(38,0), e NUMBER(38,0)'''
import re
re.split(', |,\n', stringToSplit)
This works because your string doesn't have any spaces or newlines after commas in the parentheses (1,2)
.
这是有效的,因为您的字符串在括号中的逗号后面没有任何空格或换行符(1,2)。
#3
0
If you know more about the data, you can easily avoid all the weird parsing by doing this:
如果您对数据有更多了解,可以通过以下方式轻松避免所有奇怪的解析:
a.replace(", ", "@").replace(",\n", "@").split("@")
Which replaces the delimiters with a different character and splits them on that. This assumes you have a space after the comma for delimiters. Not the most elegant, but will handle most cases if you're in a bind.
用不同的字符替换分隔符并将其拆分。假设您在分隔符的逗号后面有空格。不是最优雅的,但如果你处在一个绑定中,它将处理大多数情况。
#4
0
Using list comprehensions and string methods:
使用列表推导和字符串方法:
Given
s = """\
a VARCHAR(20),
b FLOAT, c FLOAT,
d NUMBER(38,0), e NUMBER(38,0)
"""
Code
[z.strip() for y in [x.split(", ") for x in s.split(",\n")] for z in y]
# ['a VARCHAR(20)', 'b FLOAT', 'c FLOAT', 'd NUMBER(38,0)', 'e NUMBER(38,0)']
Alternatively
[z.strip(",") for y in [x.split(", ") for x in s.splitlines()] for z in y]
# ['a VARCHAR(20)', 'b FLOAT', 'c FLOAT', 'd NUMBER(38,0)', 'e NUMBER(38,0)']
#1
2
You may use ,(?![^\(]*[\)])
with a list comprehension:
您可以将(?![^ \(* * [\]])与列表推导一起使用:
s = '''
a VARCHAR(20),
b FLOAT, c FLOAT,
d NUMBER(38,0), e NUMBER(38,0)
'''
[i.strip() for i in re.split(r',(?![^\(]*[\)])', s)]
# ['a VARCHAR(20)', 'b FLOAT', 'c FLOAT', 'd NUMBER(38,0)', 'e NUMBER(38,0)']
#2
1
Use a Regular Expression to split based on multiple delimiters
stringToSplit = '''a VARCHAR(20),
b FLOAT, c FLOAT,
d NUMBER(38,0), e NUMBER(38,0)'''
import re
re.split(', |,\n', stringToSplit)
This works because your string doesn't have any spaces or newlines after commas in the parentheses (1,2)
.
这是有效的,因为您的字符串在括号中的逗号后面没有任何空格或换行符(1,2)。
#3
0
If you know more about the data, you can easily avoid all the weird parsing by doing this:
如果您对数据有更多了解,可以通过以下方式轻松避免所有奇怪的解析:
a.replace(", ", "@").replace(",\n", "@").split("@")
Which replaces the delimiters with a different character and splits them on that. This assumes you have a space after the comma for delimiters. Not the most elegant, but will handle most cases if you're in a bind.
用不同的字符替换分隔符并将其拆分。假设您在分隔符的逗号后面有空格。不是最优雅的,但如果你处在一个绑定中,它将处理大多数情况。
#4
0
Using list comprehensions and string methods:
使用列表推导和字符串方法:
Given
s = """\
a VARCHAR(20),
b FLOAT, c FLOAT,
d NUMBER(38,0), e NUMBER(38,0)
"""
Code
[z.strip() for y in [x.split(", ") for x in s.split(",\n")] for z in y]
# ['a VARCHAR(20)', 'b FLOAT', 'c FLOAT', 'd NUMBER(38,0)', 'e NUMBER(38,0)']
Alternatively
[z.strip(",") for y in [x.split(", ") for x in s.splitlines()] for z in y]
# ['a VARCHAR(20)', 'b FLOAT', 'c FLOAT', 'd NUMBER(38,0)', 'e NUMBER(38,0)']