用python regex替换字符串中的所有单字字符

I am trying to remove all the single characters in a string

我试图删除字符串中的所有单个字符

input: "This is a big car and it has a spacious seats"

输入:“这是一辆大车，有一个宽敞的座位”

my output should be:

我的输出应该是:

output: "This is big car and it has spacious seats"

输出:“这是大车，有宽敞的座位”

Here I am using the expression

这里我用的是表达式

import re
re.compile('\b(?<=)[a-z](?=)\b')

This matches with first single character in the string ...

这与字符串中的第一个字符匹配…

Any help would be appreciated ...thanks in Advance

如有任何帮助，我们将不胜感激。谢谢提前

4 个解决方案

#1

Edit: I have just seen that this was suggested in the comments first by Wiktor Stribiżew. Credit to him - I had not seen when this was posted.

编辑:我刚刚看到这是建议在评论中首先有意Stribiżew。这是他的功劳——我还没看到这是什么时候发布的。

You can also use re.sub() to automatically remove single characters (assuming you only want to remove alphabetical characters). The following will replace any occurrences of a single alphabetical character:

还可以使用re.sub()自动删除单个字符(假设您只想删除字母字符)。以下将取代任何出现的单一字母字符:

import re
input =  "This is a big car and it has a spacious seats"

output =  re.sub(r"\b[a-zA-Z]\b", "", input)

>>>
output = "This is  big car and it has  spacious seats"

You can learn more about inputting regex expression when replacing strings here: How to input a regex in string.replace?

在替换字符串时，您可以了解有关输入regex表达式的更多信息:如何在string.replace中输入regex ?

#2

Here's one way to do it by splitting the string and filtering out single length letters using len and str.isalpha:

这里有一种方法，通过拆分字符串并使用len和str.isalpha过滤出单个长度的字母:

>>> s = "1 . This is a big car and it has a spacious seats"
>>> ' '.join(i for i in s.split() if not (i.isalpha() and len(i)==1))
'1 . This is big car and it has spacious seats'

#3

EDIT:

编辑:

You can use:

您可以使用:

import re
input_string = "This is a big car and it has a spacious seats"
str_without_single_chars = re.sub(r'(?:^| )\w(?:$| )', ' ', input_string).strip()

or (which as was brought to my attention, doesn't meet the specifications):

或者(我注意到不符合规格):

input_string = "This is a big car and it has a spacious seats"
' '.join(w for w in input_string.split() if len(w)>3)

#4

The fastest way to remove words, characters, strings or anything between two known tags or two known characters in a string is by using a direct and Native C approach using RE along with a Common as shown below.

删除两个已知标记或字符串中的两个已知字符之间的单词、字符、字符串或任何内容的最快方法是使用直接的、本机的C方法，使用RE和常见的C方法，如下所示。

var = re.sub('<script>', '<!--', var)
var = re.sub('</script>', '-->', var)
#And finally
var = re.sub('<!--.*?-->', '', var)

It removes everything and works faster, better and cleaner than Beautiful Soup. Batch files are where the "" got there beginnings and were only borrowed for use with batch and html from native C". When using all Pythonic methods with regular expressions you have to realize that Python has not altered or changed much from all regular expressions used by Machine Language so why iterate many times when a single loop can find it all as one chunk in one iteration? Do the same individually with Characters also.

它去除了所有的东西，而且比漂亮的汤更快、更好、更干净。批处理文件是“”开始出现的地方，并且只用于从本机C中使用批处理和html。当使用所有带有正则表达式的Python方法时，您必须认识到，Python并没有对机器语言使用的所有正则表达式做太大的修改或改变，因此，当一个循环可以在一次迭代中发现所有的正则表达式时，为什么要进行多次迭代呢?对字符也做同样的处理。

var = re.sub('\[', '<!--', var)
var = re.sub('\]', '-->', var)
And finally
var = re.sub('<!--.*?-->', '' var)# wipes it all out from between along with.

And you do not need Beautiful Soup. You can also scalp data using them if you understand how this works.

你不需要美味的汤。如果您理解这是如何工作的，您也可以使用它们来获取数据。

#1