如何用python拆分这个字符串?

时间:2021-05-21 21:46:29

I have strings that look like this example: "AAABBBCDEEEEBBBAA"

我有一些看起来像这个例子的字符串:“AAABBBCDEEEEBBBAA”

Any character is possible in the string.

字符串中可能有任何字符。

I want to split it to a list like: ['AAA','BBB','C','D','EEEE','BBB','AA']

我想把它分成如下列表:['AAA','BBB','C','D','EEEE','BBB','AA']

so every continuous stretch of the same characters goes to separate element of the split list.

所以相同字符的每个连续拉伸都会转到拆分列表的单独元素。

I know that I can iterate over characters in the string, check every i and i-1 pair if they contain the same character, etc. but is there a more simple solution out there?

我知道我可以迭代字符串中的字符,检查每个i和i-1对,如果它们包含相同的字符等,但是那里有更简单的解决方案吗?

4 个解决方案

#1


9  

>>> from itertools import groupby
>>> [''.join(g) for k, g in groupby('AAAABBBCCD')]
['AAAA', 'BBB', 'CC', 'D']

And by normal string manipulation

并通过正常的字符串操作

>>> a=[];S="";p=""
>>> s
'AAABBBCDEEEEBBBAA'
>>> for c in s:
...     if c != p: a.append(S);S=""
...     S=S+c
...     p=c
...
>>> a.append(S)
>>> a
['', 'AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']
>>> filter(None,a)
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']

#2


15  

We could use Regex:

我们可以使用正则表达式:

>>> import re
>>> r = re.compile(r'(.)\1*')
>>> [m.group() for m in r.finditer('AAABBBCDEEEEBBBAA')]
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']

Alternatively, we could use itertools.groupby.

或者,我们可以使用itertools.groupby。

>>> import itertools
>>> [''.join(g) for k, g in itertools.groupby('AAABBBCDEEEEBBBAA')]
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']

timeit shows Regex is faster (for this particular string) (Python 2.6, Python 3.1). But Regex is after all specialized for string, and groupby is a generic function, so this is not so unexpected.

timeit显示Regex更快(对于这个特定的字符串)(Python 2.6,Python 3.1)。但是Regex毕竟是专门用于字符串的,而groupby是一个通用函数,所以这并不是那么出乎意料。

#3


3  

import itertools
s = "AAABBBCDEEEEBBBAA"
["".join(chars) for _, chars in itertools.groupby(s)]

#4


0  

Just another way of soloving your problem :

解决问题的另一种方法:

#!/usr/bin/python

string = 'AAABBBCDEEEEBBBAA'
memory = str()
List = list()
for index, element in enumerate(string):
    if index > 0:
        if string[index] == string[index - 1]:
            memory += string[index]
        else:
            List.append(memory)
            memory = element
    else:
        memory += element

print List

#1


9  

>>> from itertools import groupby
>>> [''.join(g) for k, g in groupby('AAAABBBCCD')]
['AAAA', 'BBB', 'CC', 'D']

And by normal string manipulation

并通过正常的字符串操作

>>> a=[];S="";p=""
>>> s
'AAABBBCDEEEEBBBAA'
>>> for c in s:
...     if c != p: a.append(S);S=""
...     S=S+c
...     p=c
...
>>> a.append(S)
>>> a
['', 'AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']
>>> filter(None,a)
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']

#2


15  

We could use Regex:

我们可以使用正则表达式:

>>> import re
>>> r = re.compile(r'(.)\1*')
>>> [m.group() for m in r.finditer('AAABBBCDEEEEBBBAA')]
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']

Alternatively, we could use itertools.groupby.

或者,我们可以使用itertools.groupby。

>>> import itertools
>>> [''.join(g) for k, g in itertools.groupby('AAABBBCDEEEEBBBAA')]
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']

timeit shows Regex is faster (for this particular string) (Python 2.6, Python 3.1). But Regex is after all specialized for string, and groupby is a generic function, so this is not so unexpected.

timeit显示Regex更快(对于这个特定的字符串)(Python 2.6,Python 3.1)。但是Regex毕竟是专门用于字符串的,而groupby是一个通用函数,所以这并不是那么出乎意料。

#3


3  

import itertools
s = "AAABBBCDEEEEBBBAA"
["".join(chars) for _, chars in itertools.groupby(s)]

#4


0  

Just another way of soloving your problem :

解决问题的另一种方法:

#!/usr/bin/python

string = 'AAABBBCDEEEEBBBAA'
memory = str()
List = list()
for index, element in enumerate(string):
    if index > 0:
        if string[index] == string[index - 1]:
            memory += string[index]
        else:
            List.append(memory)
            memory = element
    else:
        memory += element

print List