用Python正则表达式将字符串按行中断或周期分隔。

时间:2021-10-17 20:27:49

I have a string:

我有一个字符串:

"""Hello. It's good to meet you.
My name is Bob."""

I'm trying to find the best way to split this into a list divided by periods and linebreaks:

我正试图找到把这个分解成一个列表的最好方法除以周期和线性分解:

["Hello", "It's good to meet you", "My name is Bob"]

I'm pretty sure I should use regular expressions, but, having no experience with them, I'm struggling to figure out how to do this.

我很确定我应该使用正则表达式,但是由于我没有使用正则表达式的经验,我正在努力弄明白如何使用正则表达式。

5 个解决方案

#1


17  

You don't need regex.

你不需要正则表达式。

>>> txt = """Hello. It's good to meet you.
... My name is Bob."""
>>> txt.split('.')
['Hello', " It's good to meet you", '\nMy name is Bob', '']
>>> [x for x in map(str.strip, txt.split('.')) if x]
['Hello', "It's good to meet you", 'My name is Bob']

#2


2  

For your example, it would suffice to split on dots, optionally followed by whitespace (and to ignore empty results):

对于您的示例,只需在点上进行分割,可以选择在空格后面进行分割(并忽略空结果):

>>> s = """Hello. It's good to meet you.
... My name is Bob."""
>>> import re
>>> re.split(r"\.\s*", s)
['Hello', "It's good to meet you", 'My name is Bob', '']

In real life, you'd have to handle Mr. Orange, Dr. Greene and George W. Bush, though...

但在现实生活中,你得对付奥兰治先生、格林博士和小布什。

#3


1  

>>> s = """Hello. It's good to meet you.
... My name is Bob."""
>>> import re
>>> p = re.compile(r'[^\s\.][^\.\n]+')
>>> p.findall(s)
['Hello', "It's good to meet you", 'My name is Bob']
>>> s = "Hello. #It's good to meet you # .'"
>>> p.findall(s)
['Hello', "#It's good to meet you # "]

#4


0  

You can use this split

你可以用这个分叉。

re.split(r"(?<!^)\s*[.\n]+\s*(?!$)", s)

#5


0  

Mine:

我:

re.findall('(?=\S)[^.\n]+(?<=\S)',su)

#1


17  

You don't need regex.

你不需要正则表达式。

>>> txt = """Hello. It's good to meet you.
... My name is Bob."""
>>> txt.split('.')
['Hello', " It's good to meet you", '\nMy name is Bob', '']
>>> [x for x in map(str.strip, txt.split('.')) if x]
['Hello', "It's good to meet you", 'My name is Bob']

#2


2  

For your example, it would suffice to split on dots, optionally followed by whitespace (and to ignore empty results):

对于您的示例,只需在点上进行分割,可以选择在空格后面进行分割(并忽略空结果):

>>> s = """Hello. It's good to meet you.
... My name is Bob."""
>>> import re
>>> re.split(r"\.\s*", s)
['Hello', "It's good to meet you", 'My name is Bob', '']

In real life, you'd have to handle Mr. Orange, Dr. Greene and George W. Bush, though...

但在现实生活中,你得对付奥兰治先生、格林博士和小布什。

#3


1  

>>> s = """Hello. It's good to meet you.
... My name is Bob."""
>>> import re
>>> p = re.compile(r'[^\s\.][^\.\n]+')
>>> p.findall(s)
['Hello', "It's good to meet you", 'My name is Bob']
>>> s = "Hello. #It's good to meet you # .'"
>>> p.findall(s)
['Hello', "#It's good to meet you # "]

#4


0  

You can use this split

你可以用这个分叉。

re.split(r"(?<!^)\s*[.\n]+\s*(?!$)", s)

#5


0  

Mine:

我:

re.findall('(?=\S)[^.\n]+(?<=\S)',su)