I am looking to be able to split a string into a list around anything that is not a numeral or a dot. Currently the split method only provides a way of doing a positive match for split, is a regex the best route to take in this situation?
我希望能够将一个字符串拆分成一个不是数字或点的任何东西的列表。目前split方法只提供了一种做出积极匹配的方法,是一种正则表达式在这种情况下采取的最佳途径吗?
For example, given the string "10.23, 10.13.21; 10.1 10.5 and 10.23.32"
This should return the list ['10.23', '10.13.21', '10.1', '10.5', '10.23.32']
例如,给定字符串“10.23,10.13.21; 10.1 10.5和10.23.32”这应该返回列表['10 .23','10 .13.21','10 .1','10 .5。','10 .23.32']
As such I believe the best regex to use in this situation would be... [\d\.]+
因此我相信在这种情况下使用的最好的正则表达式是...... [\ d \。] +
Is this the best way to handle such a case?
这是处理这种情况的最佳方法吗?
3 个解决方案
#1
9
In case you are thinking of re.findall
: you can use re.split
with an inverted version of your regex:
如果您正在考虑re.findall:您可以将re.split与正则表达式的反转版本一起使用:
In [1]: import re
In [2]: s = "10.23, 10.13.21; 10.1 10.5 and 10.23.32"
In [3]: re.split(r'[^\d\.]+', s)
Out[3]: ['10.23', '10.13.21', '10.1', '10.5', '10.23.32']
#2
2
If you want a solution other than regex, you could use str.translate
and translate everything other than '.0123456789'
into whitespace and make a call to split()
如果你想要一个除了正则表达式之外的解决方案,你可以使用str.translate并将除“.0123456789”以外的所有内容翻译成空格并调用split()
In [69]: mystr
Out[69]: '10.23, 10.13.21; 10.1 10.5 and 10.23.32'
In [70]: mystr.translate(' '*46 + '. ' + '0123456789' + ' '*198).split()
Out[70]: ['10.23', '10.13.21', '10.1', '10.5', '10.23.32']
Hope this helps
希望这可以帮助
#3
2
An arguably better readable form of what @inspectorG4dget proposed:
可以说是@ inspectorG4dget提出的更好的可读形式:
>>> import string
>>> s = '10.23, 10.13.21; 10.1 10.5 and 10.23.32'
>>> ''.join(c if c in set(string.digits + '.') else ' ' for c in s).split()
['10.23', '10.13.21', '10.1', '10.5', '10.23.32']
This way you can avoid regular expressions, which is often a good idea when you quite easily can.
通过这种方式,您可以避免使用正则表达式,这通常是您很容易就能做到的好主意。
#1
9
In case you are thinking of re.findall
: you can use re.split
with an inverted version of your regex:
如果您正在考虑re.findall:您可以将re.split与正则表达式的反转版本一起使用:
In [1]: import re
In [2]: s = "10.23, 10.13.21; 10.1 10.5 and 10.23.32"
In [3]: re.split(r'[^\d\.]+', s)
Out[3]: ['10.23', '10.13.21', '10.1', '10.5', '10.23.32']
#2
2
If you want a solution other than regex, you could use str.translate
and translate everything other than '.0123456789'
into whitespace and make a call to split()
如果你想要一个除了正则表达式之外的解决方案,你可以使用str.translate并将除“.0123456789”以外的所有内容翻译成空格并调用split()
In [69]: mystr
Out[69]: '10.23, 10.13.21; 10.1 10.5 and 10.23.32'
In [70]: mystr.translate(' '*46 + '. ' + '0123456789' + ' '*198).split()
Out[70]: ['10.23', '10.13.21', '10.1', '10.5', '10.23.32']
Hope this helps
希望这可以帮助
#3
2
An arguably better readable form of what @inspectorG4dget proposed:
可以说是@ inspectorG4dget提出的更好的可读形式:
>>> import string
>>> s = '10.23, 10.13.21; 10.1 10.5 and 10.23.32'
>>> ''.join(c if c in set(string.digits + '.') else ' ' for c in s).split()
['10.23', '10.13.21', '10.1', '10.5', '10.23.32']
This way you can avoid regular expressions, which is often a good idea when you quite easily can.
通过这种方式,您可以避免使用正则表达式,这通常是您很容易就能做到的好主意。