Is there any easy way to test whether a regex matches an entire string in Python? I thought that putting $
at the end would do this, but it turns out that $
doesn't work in the case of trailing newlines.
有没有简单的方法来测试正则表达式是否匹配Python中的整个字符串?我认为将$放在最后会这样做,但事实证明$在尾随换行符的情况下不起作用。
For example, the following returns a match, even though that's not what I want.
例如,以下返回匹配,即使这不是我想要的。
re.match(r'\w+$', 'foo\n')
4 个解决方案
#1
4
You can use \Z
:
你可以使用\ Z:
\Z
Matches only at the end of the string.
仅匹配字符串末尾的匹配项。
In [5]: re.match(r'\w+\Z', 'foo\n')
In [6]: re.match(r'\w+\Z', 'foo')
Out[6]: <_sre.SRE_Match object; span=(0, 3), match='foo'>
#2
2
You can use a negative lookahead assertion to require that the $
is not followed by a trailing newline:
您可以使用否定前瞻断言来要求$后跟一个尾随换行符:
>>> re.match(r'\w+$(?!\n)', 'foo\n')
>>> re.match(r'\w+$(?!\n)', 'foo')
<_sre.SRE_Match object; span=(0, 3), match='foo'>
re.MULTILINE
is not relevant here; OP has it turned off and the regex is still matching. The problem is that $
always matches right before the trailing newline:
re.MULTILINE与此无关; OP关闭它,正则表达式仍然匹配。问题是$总是在尾随换行符之前匹配:
When [
re.MULTILINE
is] specified, the pattern character'^'
matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character'$'
matches at the end of the string and at the end of each line (immediately preceding each newline). By default,'^'
matches only at the beginning of the string, and'$'
only at the end of the string and immediately before the newline (if any) at the end of the string.当指定[re.MULTILINE]时,模式字符'^'在字符串的开头和每行的开头(紧跟在每个换行符之后)匹配;模式字符'$'在字符串的末尾和每行的末尾(紧接在每个换行符之前)匹配。默认情况下,'^'仅匹配字符串的开头,'$'仅匹配字符串的结尾,紧接在字符串末尾的换行符(如果有)之前。
I have experimentally verified that this works correctly with re.X
enabled.
我已通过实验验证,这可以在启用re.X时正常工作。
#3
2
To test whether you matched the entire string, just check if the matched string is as long as the entire string:
要测试是否匹配整个字符串,只需检查匹配的字符串是否与整个字符串一样长:
m = re.match(r".*", mystring)
start, stop = m.span()
if stop-start == len(mystring):
print("The entire string matched")
Note: This is independent of the question (which you didn't ask) of how to match a trailing newline.
注意:这与如何匹配尾随换行符的问题(您没有问过)无关。
#4
0
Based on @alexis answer: A method to check for a fullMatch could look like this:
基于@alexis回答:检查fullMatch的方法可能如下所示:
def fullMatch(matchObject, fullString):
if matchObject is None:
return False
start, stop = matchObject.span()
if stop-start == len(fullString):
return True
else:
return False
Where the fullString
is the String on which you apply the regex and the matchObject
is the result of matchObject = re.match(yourRegex, fullString)
其中fullString是应用正则表达式的String,matchObject是matchObject = re.match(yourRegex,fullString)的结果
#1
4
You can use \Z
:
你可以使用\ Z:
\Z
Matches only at the end of the string.
仅匹配字符串末尾的匹配项。
In [5]: re.match(r'\w+\Z', 'foo\n')
In [6]: re.match(r'\w+\Z', 'foo')
Out[6]: <_sre.SRE_Match object; span=(0, 3), match='foo'>
#2
2
You can use a negative lookahead assertion to require that the $
is not followed by a trailing newline:
您可以使用否定前瞻断言来要求$后跟一个尾随换行符:
>>> re.match(r'\w+$(?!\n)', 'foo\n')
>>> re.match(r'\w+$(?!\n)', 'foo')
<_sre.SRE_Match object; span=(0, 3), match='foo'>
re.MULTILINE
is not relevant here; OP has it turned off and the regex is still matching. The problem is that $
always matches right before the trailing newline:
re.MULTILINE与此无关; OP关闭它,正则表达式仍然匹配。问题是$总是在尾随换行符之前匹配:
When [
re.MULTILINE
is] specified, the pattern character'^'
matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character'$'
matches at the end of the string and at the end of each line (immediately preceding each newline). By default,'^'
matches only at the beginning of the string, and'$'
only at the end of the string and immediately before the newline (if any) at the end of the string.当指定[re.MULTILINE]时,模式字符'^'在字符串的开头和每行的开头(紧跟在每个换行符之后)匹配;模式字符'$'在字符串的末尾和每行的末尾(紧接在每个换行符之前)匹配。默认情况下,'^'仅匹配字符串的开头,'$'仅匹配字符串的结尾,紧接在字符串末尾的换行符(如果有)之前。
I have experimentally verified that this works correctly with re.X
enabled.
我已通过实验验证,这可以在启用re.X时正常工作。
#3
2
To test whether you matched the entire string, just check if the matched string is as long as the entire string:
要测试是否匹配整个字符串,只需检查匹配的字符串是否与整个字符串一样长:
m = re.match(r".*", mystring)
start, stop = m.span()
if stop-start == len(mystring):
print("The entire string matched")
Note: This is independent of the question (which you didn't ask) of how to match a trailing newline.
注意:这与如何匹配尾随换行符的问题(您没有问过)无关。
#4
0
Based on @alexis answer: A method to check for a fullMatch could look like this:
基于@alexis回答:检查fullMatch的方法可能如下所示:
def fullMatch(matchObject, fullString):
if matchObject is None:
return False
start, stop = matchObject.span()
if stop-start == len(fullString):
return True
else:
return False
Where the fullString
is the String on which you apply the regex and the matchObject
is the result of matchObject = re.match(yourRegex, fullString)
其中fullString是应用正则表达式的String,matchObject是matchObject = re.match(yourRegex,fullString)的结果