I need to remove all non-letter characters from the beginning and from the end of a word, but keep them if they appear between two letters.
我需要从单词的开头和结尾删除所有非字母字符,但如果它们出现在两个字母之间,请保留它们。
For example:
'123foo456' --> 'foo'
'2foo1c#BAR' --> 'foo1c#BAR'
I tried using re.sub()
, but I couldn't write the regex.
我尝试使用re.sub(),但我无法编写正则表达式。
6 个解决方案
#1
5
like this?
re.sub('^[^a-zA-z]*|[^a-zA-Z]*$','',s)
s
is the input string.
s是输入字符串。
#2
6
You could use str.strip for this:
您可以使用str.strip:
In [1]: import string
In [4]: '123foo456'.strip(string.digits)
Out[4]: 'foo'
In [5]: '2foo1c#BAR'.strip(string.digits)
Out[5]: 'foo1c#BAR'
As Matt points out in the comments (thanks, Matt), this removes digits only. To remove any non-letter character,
正如Matt在评论中指出的那样(感谢Matt),这只会删除数字。要删除任何非字母字符,
Define what you mean by a non-letter:
用非字母定义你的意思:
In [22]: allchars = string.maketrans('', '')
In [23]: nonletter = allchars.translate(allchars, string.letters)
and then strip:
然后剥离:
In [18]: '2foo1c#BAR'.strip(nonletter)
Out[18]: 'foo1c#BAR'
#3
2
With your two examples, I was able to create a regex using Python's non-greedy syntax as described here. I broke up the input into three parts: non-letters, exclusively letters, then non-letters until the end. Here's a test run:
通过两个示例,我可以使用Python的非贪婪语法创建一个正则表达式,如此处所述。我将输入分为三部分:非字母,专用字母,然后非字母直到结束。这是一个测试运行:
1:[123] 2:[foo] 3:[456]
1:[2] 2:[foo1c#BAR] 3:[]
Here's the regular expression:
这是正则表达式:
^([^A-Za-z]*)(.*?)([^A-Za-z]*)$
And mo.group(2)
what you want, where mo
is the MatchObject.
并且mo.group(2)你想要什么,其中mo是MatchObject。
#4
2
To be unicode compatible:
要兼容unicode:
^\PL+|\PL+$
\PL
stands for for not a letter
\ PL代表不是一封信
#5
0
Try this:
re.sub(r'^[^a-zA-Z]*(.*?)[^a-zA-Z]*$', '\1', string);
The round brackets capture everything between non-letter strings at the beginning and end of the string. The ?
makes sure that the .
does not capture any non-letter strings at the end, too. The replacement then simply prints the captured group.
圆括号捕获字符串开头和结尾处的非字母字符串之间的所有内容。的?确保。也不会在最后捕获任何非字母字符串。然后替换只是打印捕获的组。
#6
0
result = re.sub('(.*?)([a-z].*[a-z])(.*)', '\\2', '23WERT#3T67', flags=re.IGNORECASE)
result = re.sub('(。*?)([a-z]。* [a-z])(。*)','\\ 2','23WERT#3T67',flags = re.IGNORECASE)
#1
5
like this?
re.sub('^[^a-zA-z]*|[^a-zA-Z]*$','',s)
s
is the input string.
s是输入字符串。
#2
6
You could use str.strip for this:
您可以使用str.strip:
In [1]: import string
In [4]: '123foo456'.strip(string.digits)
Out[4]: 'foo'
In [5]: '2foo1c#BAR'.strip(string.digits)
Out[5]: 'foo1c#BAR'
As Matt points out in the comments (thanks, Matt), this removes digits only. To remove any non-letter character,
正如Matt在评论中指出的那样(感谢Matt),这只会删除数字。要删除任何非字母字符,
Define what you mean by a non-letter:
用非字母定义你的意思:
In [22]: allchars = string.maketrans('', '')
In [23]: nonletter = allchars.translate(allchars, string.letters)
and then strip:
然后剥离:
In [18]: '2foo1c#BAR'.strip(nonletter)
Out[18]: 'foo1c#BAR'
#3
2
With your two examples, I was able to create a regex using Python's non-greedy syntax as described here. I broke up the input into three parts: non-letters, exclusively letters, then non-letters until the end. Here's a test run:
通过两个示例,我可以使用Python的非贪婪语法创建一个正则表达式,如此处所述。我将输入分为三部分:非字母,专用字母,然后非字母直到结束。这是一个测试运行:
1:[123] 2:[foo] 3:[456]
1:[2] 2:[foo1c#BAR] 3:[]
Here's the regular expression:
这是正则表达式:
^([^A-Za-z]*)(.*?)([^A-Za-z]*)$
And mo.group(2)
what you want, where mo
is the MatchObject.
并且mo.group(2)你想要什么,其中mo是MatchObject。
#4
2
To be unicode compatible:
要兼容unicode:
^\PL+|\PL+$
\PL
stands for for not a letter
\ PL代表不是一封信
#5
0
Try this:
re.sub(r'^[^a-zA-Z]*(.*?)[^a-zA-Z]*$', '\1', string);
The round brackets capture everything between non-letter strings at the beginning and end of the string. The ?
makes sure that the .
does not capture any non-letter strings at the end, too. The replacement then simply prints the captured group.
圆括号捕获字符串开头和结尾处的非字母字符串之间的所有内容。的?确保。也不会在最后捕获任何非字母字符串。然后替换只是打印捕获的组。
#6
0
result = re.sub('(.*?)([a-z].*[a-z])(.*)', '\\2', '23WERT#3T67', flags=re.IGNORECASE)
result = re.sub('(。*?)([a-z]。* [a-z])(。*)','\\ 2','23WERT#3T67',flags = re.IGNORECASE)