在许多不同的字符上拆分字符串

时间:2021-05-06 21:38:38

I'd like to split a string using one or more separator characters.

我想使用一个或多个分隔符分割字符串。

E.g. "a b.c", split on " " and "." would give the list ["a", "b", "c"].

例如。 “a b.c”,分为“”和“。”会给出列表[“a”,“b”,“c”]。

At the moment, I can't see anything in the standard library to do this, and my own attempts are a bit clumsy. E.g.

目前,我在标准库中看不到任何内容,我自己的尝试有点笨拙。例如。

def my_split(string, split_chars):
    if isinstance(string_L, basestring):
        string_L = [string_L]
    try:
        split_char = split_chars[0]
    except IndexError:
        return string_L

    res = []
    for s in string_L:
        res.extend(s.split(split_char))
    return my_split(res, split_chars[1:])

print my_split("a b.c", [' ', '.'])

Horrible! Any better suggestions?

可怕!还有更好的建议?

4 个解决方案

#1


37  

>>> import re
>>> re.split('[ .]', 'a b.c')
['a', 'b', 'c']

#2


2  

This one replaces all of the separators with the first separator in the list, and then "splits" using that character.

这个用列表中的第一个分隔符替换所有分隔符,然后使用该字符“拆分”。

def split(string, divs):
    for d in divs[1:]:
        string = string.replace(d, divs[0])
    return string.split(divs[0])

output:

>>> split("a b.c", " .")
['a', 'b', 'c']

>>> split("a b.c", ".")
['a b', 'c']

I do like that 're' solution though.

我确实喜欢那种“解决方案”。

#3


2  

Solution without re:

无需解决方案:

from itertools import groupby
sep = ' .,'
s = 'a b.c,d'
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]

An explanation is here https://*.com/a/19211729/2468006

这里有一个解释https://*.com/a/19211729/2468006

#4


1  

Not very fast but does the job:

不是很快,但做的工作:

def my_split(text, seps):
  for sep in seps:
    text = text.replace(sep, seps[0])
  return text.split(seps[0])

#1


37  

>>> import re
>>> re.split('[ .]', 'a b.c')
['a', 'b', 'c']

#2


2  

This one replaces all of the separators with the first separator in the list, and then "splits" using that character.

这个用列表中的第一个分隔符替换所有分隔符,然后使用该字符“拆分”。

def split(string, divs):
    for d in divs[1:]:
        string = string.replace(d, divs[0])
    return string.split(divs[0])

output:

>>> split("a b.c", " .")
['a', 'b', 'c']

>>> split("a b.c", ".")
['a b', 'c']

I do like that 're' solution though.

我确实喜欢那种“解决方案”。

#3


2  

Solution without re:

无需解决方案:

from itertools import groupby
sep = ' .,'
s = 'a b.c,d'
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]

An explanation is here https://*.com/a/19211729/2468006

这里有一个解释https://*.com/a/19211729/2468006

#4


1  

Not very fast but does the job:

不是很快,但做的工作:

def my_split(text, seps):
  for sep in seps:
    text = text.replace(sep, seps[0])
  return text.split(seps[0])