如何使用正则表达式匹配名称?

时间:2021-06-27 21:20:40

I am a newbie in Python. I want to write a regular expression for some name checking. My input string can contain a-z, A-Z, 0-9, and ' _ ', but it should start with either a-z or A-Z (not 0-9 and ' _ '). I want to write a regular expression for this. I tried, but nothing was matching perfectly.

我是Python的新手。我想为一些名称检查编写正则表达式。我的输入字符串可以包含a-z,A-Z,0-9和'_',但它应该以a-z或A-Z(不是0-9和'_')开头。我想为此写一个正则表达式。我试过了,但没有什么能完美搭配。

Once the input string follows the regular expression rules, I can proceed further, otherwise discard that string.

一旦输入字符串遵循正则表达式规则,我就可以继续进行,否则丢弃该字符串。

3 个解决方案

#1


6  

Here's an answer to your question:

以下是您的问题的答案:

Interpreting that you want _ (not -), this should do the job:

解释你想要_(不是 - ),这应该做的工作:

>>> tests = ["a", "A", "a1", "a_1", "1a", "_a", "a\n", "", "z_"]
>>> for test in tests:
...    print repr(test), bool(re.match(r"[A-Za-z]\w*\Z", test))
...
'a' True
'A' True
'a1' True
'a_1' True
'1a' False
'_a' False
'a\n' False
'' False
'z_' True
>>>

Stoutly resist the temptation to use $; here's why:

坚决抵制使用$的诱惑;这是为什么:

Hello, hello, using $ is WRONG, use \Z instead

你好,你好,使用$是错误的,使用\ Z代替

>>> re.match(r"[a-zA-Z][\w-]*$","A")
<_sre.SRE_Match object at 0x00BAFE90>
>>> re.match(r"[a-zA-Z][\w-]*$","A\n")
<_sre.SRE_Match object at 0x00BAFF70> # WRONG; SHOULDN'T MATCH
>>>

>>> re.match(r"[a-zA-Z][\w-]*\Z","A")
<_sre.SRE_Match object at 0x00BAFE90>
>>> re.match(r"[a-zA-Z][\w-]*\Z","A\n")
>>> # CORRECT: NO MATCH

The Fine Manual says:

精细手册说:

'$'
Matches the end of the string or just before the newline at the end of the string [my emphasis], and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.

'$'匹配字符串的结尾或在字符串末尾的换行符之前[我的强调],并且在MULTILINE模式下也匹配换行符。 foo匹配'foo'和'foobar',而正则表达式foo $只匹配'foo'。更有趣的是,在'foo1 \ nfoo2 \ n'中搜索foo。$'正常匹配'foo2',但在MULTILINE模式下匹配'foo1';在'foo \ n'中搜索单个$将找到两个(空)匹配:一个在换行符之前,一个在换行符的末尾。

and

\Z
Matches only at the end of the string.

\ Z仅匹配字符串的末尾。

=== And now for something completely different ===

===现在有一些完全不同的东西===

>>> import string
>>> letters = set(string.ascii_letters)
>>> ok_chars = letters | set(string.digits + "_")
>>>
>>> def is_valid_name(strg):
...     return strg and strg[0] in letters and all(c in ok_chars for c in strg)
...
>>> for test in tests:
...     print repr(test), repr(is_valid_name(test))
...
'a' True
'A' True
'a1' True
'a_1' True
'1a' False
'_a' False
'a\n' False
'' ''
'z_' True
>>>

#2


4  

>>> import re

>>> re.match("[a-zA-Z][\w-]*$","A")
<_sre.SRE_Match object at 0x00932E20>

>>> re.match("[a-zA-Z][\w-]*$","A_B")
<_sre.SRE_Match object at 0x008CA950>

>>> re.match("[a-zA-Z][\w-]*$","0A")
>>> 
>>> re.match("[a-zA-Z][\w-]*$","!A_B")
>>>

Note: OP mentioned string cannot start from ( 0-9 and "_")., apparently _ can be in the text. Thats why I am using \w

注意:OP提到的字符串不能从(0-9和“_”)开始。显然_可以在文本中。这就是为什么我使用\ w

Note2: If you don't want match string ends with \n, you could use \Z instead of $ as John Machin mentioned.

注意2:如果你不希望匹配字符串以\ n结尾,你可以使用\ Z而不是像John Machin所提到的$。

#3


-1  

here's a non re way

这是一个非常好的方式

import string
flag=0
mystring="abcadsf123"
if not mystring[0] in string.digits+"_":
    for c in mystring:
       if not c in string.letters+string.digits+"-":
           flag=1
    if flag: print "%s not ok" % mystring
    else: print "%s ok" % mystring
else: print "%s starts with digits or _" % mystring

#1


6  

Here's an answer to your question:

以下是您的问题的答案:

Interpreting that you want _ (not -), this should do the job:

解释你想要_(不是 - ),这应该做的工作:

>>> tests = ["a", "A", "a1", "a_1", "1a", "_a", "a\n", "", "z_"]
>>> for test in tests:
...    print repr(test), bool(re.match(r"[A-Za-z]\w*\Z", test))
...
'a' True
'A' True
'a1' True
'a_1' True
'1a' False
'_a' False
'a\n' False
'' False
'z_' True
>>>

Stoutly resist the temptation to use $; here's why:

坚决抵制使用$的诱惑;这是为什么:

Hello, hello, using $ is WRONG, use \Z instead

你好,你好,使用$是错误的,使用\ Z代替

>>> re.match(r"[a-zA-Z][\w-]*$","A")
<_sre.SRE_Match object at 0x00BAFE90>
>>> re.match(r"[a-zA-Z][\w-]*$","A\n")
<_sre.SRE_Match object at 0x00BAFF70> # WRONG; SHOULDN'T MATCH
>>>

>>> re.match(r"[a-zA-Z][\w-]*\Z","A")
<_sre.SRE_Match object at 0x00BAFE90>
>>> re.match(r"[a-zA-Z][\w-]*\Z","A\n")
>>> # CORRECT: NO MATCH

The Fine Manual says:

精细手册说:

'$'
Matches the end of the string or just before the newline at the end of the string [my emphasis], and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.

'$'匹配字符串的结尾或在字符串末尾的换行符之前[我的强调],并且在MULTILINE模式下也匹配换行符。 foo匹配'foo'和'foobar',而正则表达式foo $只匹配'foo'。更有趣的是,在'foo1 \ nfoo2 \ n'中搜索foo。$'正常匹配'foo2',但在MULTILINE模式下匹配'foo1';在'foo \ n'中搜索单个$将找到两个(空)匹配:一个在换行符之前,一个在换行符的末尾。

and

\Z
Matches only at the end of the string.

\ Z仅匹配字符串的末尾。

=== And now for something completely different ===

===现在有一些完全不同的东西===

>>> import string
>>> letters = set(string.ascii_letters)
>>> ok_chars = letters | set(string.digits + "_")
>>>
>>> def is_valid_name(strg):
...     return strg and strg[0] in letters and all(c in ok_chars for c in strg)
...
>>> for test in tests:
...     print repr(test), repr(is_valid_name(test))
...
'a' True
'A' True
'a1' True
'a_1' True
'1a' False
'_a' False
'a\n' False
'' ''
'z_' True
>>>

#2


4  

>>> import re

>>> re.match("[a-zA-Z][\w-]*$","A")
<_sre.SRE_Match object at 0x00932E20>

>>> re.match("[a-zA-Z][\w-]*$","A_B")
<_sre.SRE_Match object at 0x008CA950>

>>> re.match("[a-zA-Z][\w-]*$","0A")
>>> 
>>> re.match("[a-zA-Z][\w-]*$","!A_B")
>>>

Note: OP mentioned string cannot start from ( 0-9 and "_")., apparently _ can be in the text. Thats why I am using \w

注意:OP提到的字符串不能从(0-9和“_”)开始。显然_可以在文本中。这就是为什么我使用\ w

Note2: If you don't want match string ends with \n, you could use \Z instead of $ as John Machin mentioned.

注意2:如果你不希望匹配字符串以\ n结尾,你可以使用\ Z而不是像John Machin所提到的$。

#3


-1  

here's a non re way

这是一个非常好的方式

import string
flag=0
mystring="abcadsf123"
if not mystring[0] in string.digits+"_":
    for c in mystring:
       if not c in string.letters+string.digits+"-":
           flag=1
    if flag: print "%s not ok" % mystring
    else: print "%s ok" % mystring
else: print "%s starts with digits or _" % mystring