I want to split a string like:
我想分割一个字符串,如:
'aaabbccccabbb'
into
成
['aaa', 'bb', 'cccc', 'a', 'bbb']
What's an elegant way to do this in Python? If it makes it easier, it can be assumed that the string will only contain a's, b's and c's.
在Python中执行此操作的优雅方法是什么?如果它更容易,可以假设字符串只包含a,b和c。
4 个解决方案
#1
26
That is the use case for itertools.groupby
:)
这是itertools.groupby :)的用例
>>> from itertools import groupby
>>> s = 'aaabbccccabbb'
>>> [''.join(y) for _,y in groupby(s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']
#2
3
You can create an iterator - without trying to be smart just to keep it short and unreadable:
你可以创建一个迭代器 - 不要试图变得聪明只是为了保持简短和不可读:
def yield_same(string):
it_str = iter(string)
result = it_str.next()
for next_chr in it_str:
if next_chr != result[0]:
yield result
result = ""
result += next_chr
yield result
..
>>> list(yield_same("aaaaaabcbcdcdccccccdddddd"))
['aaaaaa', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'cccccc', 'dddddd']
>>>
edit ok, so there is itertools.groupby, which probably does something like this.
编辑确定,所以有itertools.groupby,它可能会做这样的事情。
#3
2
Here's the best way I could find using regex:
这是我使用正则表达式找到的最好方法:
print [a for a,b in re.findall(r"((\w)\2*)", s)]
#4
1
>>> import re
>>> s = 'aaabbccccabbb'
>>> [m.group() for m in re.finditer(r'(\w)(\1*)',s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']
#1
26
That is the use case for itertools.groupby
:)
这是itertools.groupby :)的用例
>>> from itertools import groupby
>>> s = 'aaabbccccabbb'
>>> [''.join(y) for _,y in groupby(s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']
#2
3
You can create an iterator - without trying to be smart just to keep it short and unreadable:
你可以创建一个迭代器 - 不要试图变得聪明只是为了保持简短和不可读:
def yield_same(string):
it_str = iter(string)
result = it_str.next()
for next_chr in it_str:
if next_chr != result[0]:
yield result
result = ""
result += next_chr
yield result
..
>>> list(yield_same("aaaaaabcbcdcdccccccdddddd"))
['aaaaaa', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'cccccc', 'dddddd']
>>>
edit ok, so there is itertools.groupby, which probably does something like this.
编辑确定,所以有itertools.groupby,它可能会做这样的事情。
#3
2
Here's the best way I could find using regex:
这是我使用正则表达式找到的最好方法:
print [a for a,b in re.findall(r"((\w)\2*)", s)]
#4
1
>>> import re
>>> s = 'aaabbccccabbb'
>>> [m.group() for m in re.finditer(r'(\w)(\1*)',s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']