Python - 在单词之后拆分句子,但结果中最多包含n个字符

时间:2022-01-19 21:36:46

I want to display some text on a scrolling display with a width of 16 characters. To improve readability I want to flip through the text, but not by simply splitting every 16. character, I rather want to split on every ending of a word or punctuation, before the 16 character limit exceeds..

我想在滚动显示器上显示一些宽度为16个字符的文本。为了提高可读性,我想翻阅文本,但不是简单地分割每16个字符,我宁愿在16个字符限制超过之前拆分单词或标点符号的每个结尾。

Example:

text = 'Hello, this is an example of text shown in the scrolling display. Bla, bla, bla!'

this text shall be converted in a list of strings with 16 characters maximum

该文本应转换为最多16个字符的字符串列表

result = ['Hello, this is ', 'an example of ', 'text shown in ', 'the scrolling ', 'display. Bla, ', 'bla, bla!']

I started with the regex re.split('(\W+)', text) to get a list of every element (word, punctuation), but I fail combining them.

我开始使用regex re.split('(\ W +)',text)来获取每个元素(单词,标点符号)的列表,但是我将它们组合起来失败了。

Can you help me, or at least give me some hints?

你能帮助我,或者至少给我一些提示吗?

Thank you!

3 个解决方案

#1


14  

I'd look at the textwrap module:

我看一下textwrap模块:

>>> text = 'Hello, this is an example of text shown in the scrolling display. Bla, bla, bla!'
>>> from textwrap import wrap
>>> wrap(text, 16)
['Hello, this is', 'an example of', 'text shown in', 'the scrolling', 'display. Bla,', 'bla, bla!']

There are lots of options you can play with in the TextWrapper, for example:

您可以在TextWrapper中使用许多选项,例如:

>>> from textwrap import TextWrapper
>>> w = TextWrapper(16, break_long_words=True)
>>> w.wrap("this_is_a_really_long_word")
['this_is_a_really', '_long_word']
>>> w = TextWrapper(16, break_long_words=False)
>>> w.wrap("this_is_a_really_long_word")
['this_is_a_really_long_word']

#2


3  

As DSM suggested, look at textwrap. If you prefer to stick with regular expressions, the following will get you part of the way there:

正如DSM建议的那样,看看textwrap。如果您更喜欢坚持使用正则表达式,以下内容将使您了解其中的一部分:

In [10]: re.findall(r'.{,16}\b', text)
Out[10]: 
['Hello, this is ',
 'an example of ',
 'text shown in ',
 'the scrolling ',
 'display. Bla, ',
 'bla, bla',
 '']

(Note the missing exclamation mark and the empty string at the end though.)

(注意最后丢失的感叹号和空字符串。)

#3


2  

Using regex:

>>> text = 'Hello, this is an example of text shown in the scrolling display. Bla, bla, bla!'
>>> pprint(re.findall(r'.{1,16}(?:\s+|$)', text))
['Hello, this is ',
 'an example of ',
 'text shown in ',
 'the scrolling ',
 'display. Bla, ',
 'bla, bla!']

#1


14  

I'd look at the textwrap module:

我看一下textwrap模块:

>>> text = 'Hello, this is an example of text shown in the scrolling display. Bla, bla, bla!'
>>> from textwrap import wrap
>>> wrap(text, 16)
['Hello, this is', 'an example of', 'text shown in', 'the scrolling', 'display. Bla,', 'bla, bla!']

There are lots of options you can play with in the TextWrapper, for example:

您可以在TextWrapper中使用许多选项,例如:

>>> from textwrap import TextWrapper
>>> w = TextWrapper(16, break_long_words=True)
>>> w.wrap("this_is_a_really_long_word")
['this_is_a_really', '_long_word']
>>> w = TextWrapper(16, break_long_words=False)
>>> w.wrap("this_is_a_really_long_word")
['this_is_a_really_long_word']

#2


3  

As DSM suggested, look at textwrap. If you prefer to stick with regular expressions, the following will get you part of the way there:

正如DSM建议的那样,看看textwrap。如果您更喜欢坚持使用正则表达式,以下内容将使您了解其中的一部分:

In [10]: re.findall(r'.{,16}\b', text)
Out[10]: 
['Hello, this is ',
 'an example of ',
 'text shown in ',
 'the scrolling ',
 'display. Bla, ',
 'bla, bla',
 '']

(Note the missing exclamation mark and the empty string at the end though.)

(注意最后丢失的感叹号和空字符串。)

#3


2  

Using regex:

>>> text = 'Hello, this is an example of text shown in the scrolling display. Bla, bla, bla!'
>>> pprint(re.findall(r'.{1,16}(?:\s+|$)', text))
['Hello, this is ',
 'an example of ',
 'text shown in ',
 'the scrolling ',
 'display. Bla, ',
 'bla, bla!']