Python正则表达式只匹配整个字符串

时间:2021-08-14 19:25:42

Is there any easy way to test whether a regex matches an entire string in Python? I thought that putting $ at the end would do this, but it turns out that $ doesn't work in the case of trailing newlines.

有没有简单的方法来测试正则表达式是否匹配Python中的整个字符串?我认为将$放在最后会这样做,但事实证明$在尾随换行符的情况下不起作用。

For example, the following returns a match, even though that's not what I want.

例如,以下返回匹配,即使这不是我想要的。

re.match(r'\w+$', 'foo\n')

4 个解决方案

#1


4  

You can use \Z:

你可以使用\ Z:

\Z

Matches only at the end of the string.

仅匹配字符串末尾的匹配项。

In [5]: re.match(r'\w+\Z', 'foo\n')

In [6]: re.match(r'\w+\Z', 'foo')
Out[6]: <_sre.SRE_Match object; span=(0, 3), match='foo'>

#2


2  

You can use a negative lookahead assertion to require that the $ is not followed by a trailing newline:

您可以使用否定前瞻断言来要求$后跟一个尾随换行符:

>>> re.match(r'\w+$(?!\n)', 'foo\n')
>>> re.match(r'\w+$(?!\n)', 'foo')
<_sre.SRE_Match object; span=(0, 3), match='foo'>

re.MULTILINE is not relevant here; OP has it turned off and the regex is still matching. The problem is that $ always matches right before the trailing newline:

re.MULTILINE与此无关; OP关闭它,正则表达式仍然匹配。问题是$总是在尾随换行符之前匹配:

When [re.MULTILINE is] specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

当指定[re.MULTILINE]时,模式字符'^'在字符串的开头和每行的开头(紧跟在每个换行符之后)匹配;模式字符'$'在字符串的末尾和每行的末尾(紧接在每个换行符之前)匹配。默认情况下,'^'仅匹配字符串的开头,'$'仅匹配字符串的结尾,紧接在字符串末尾的换行符(如果有)之前。

I have experimentally verified that this works correctly with re.X enabled.

我已通过实验验证,这可以在启用re.X时正常工作。

#3


2  

To test whether you matched the entire string, just check if the matched string is as long as the entire string:

要测试是否匹配整个字符串,只需检查匹配的字符串是否与整个字符串一样长:

m = re.match(r".*", mystring)
start, stop = m.span()
if stop-start == len(mystring):
    print("The entire string matched")

Note: This is independent of the question (which you didn't ask) of how to match a trailing newline.

注意:这与如何匹配尾随换行符的问题(您没有问过)无关。

#4


0  

Based on @alexis answer: A method to check for a fullMatch could look like this:

基于@alexis回答:检查fullMatch的方法可能如下所示:

def fullMatch(matchObject, fullString):
    if matchObject is None:
        return False
    start, stop = matchObject.span()
    if stop-start == len(fullString):
        return True
    else:
        return False

Where the fullString is the String on which you apply the regex and the matchObject is the result of matchObject = re.match(yourRegex, fullString)

其中fullString是应用正则表达式的String,matchObject是matchObject = re.match(yourRegex,fullString)的结果

#1


4  

You can use \Z:

你可以使用\ Z:

\Z

Matches only at the end of the string.

仅匹配字符串末尾的匹配项。

In [5]: re.match(r'\w+\Z', 'foo\n')

In [6]: re.match(r'\w+\Z', 'foo')
Out[6]: <_sre.SRE_Match object; span=(0, 3), match='foo'>

#2


2  

You can use a negative lookahead assertion to require that the $ is not followed by a trailing newline:

您可以使用否定前瞻断言来要求$后跟一个尾随换行符:

>>> re.match(r'\w+$(?!\n)', 'foo\n')
>>> re.match(r'\w+$(?!\n)', 'foo')
<_sre.SRE_Match object; span=(0, 3), match='foo'>

re.MULTILINE is not relevant here; OP has it turned off and the regex is still matching. The problem is that $ always matches right before the trailing newline:

re.MULTILINE与此无关; OP关闭它,正则表达式仍然匹配。问题是$总是在尾随换行符之前匹配:

When [re.MULTILINE is] specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

当指定[re.MULTILINE]时,模式字符'^'在字符串的开头和每行的开头(紧跟在每个换行符之后)匹配;模式字符'$'在字符串的末尾和每行的末尾(紧接在每个换行符之前)匹配。默认情况下,'^'仅匹配字符串的开头,'$'仅匹配字符串的结尾,紧接在字符串末尾的换行符(如果有)之前。

I have experimentally verified that this works correctly with re.X enabled.

我已通过实验验证,这可以在启用re.X时正常工作。

#3


2  

To test whether you matched the entire string, just check if the matched string is as long as the entire string:

要测试是否匹配整个字符串,只需检查匹配的字符串是否与整个字符串一样长:

m = re.match(r".*", mystring)
start, stop = m.span()
if stop-start == len(mystring):
    print("The entire string matched")

Note: This is independent of the question (which you didn't ask) of how to match a trailing newline.

注意:这与如何匹配尾随换行符的问题(您没有问过)无关。

#4


0  

Based on @alexis answer: A method to check for a fullMatch could look like this:

基于@alexis回答:检查fullMatch的方法可能如下所示:

def fullMatch(matchObject, fullString):
    if matchObject is None:
        return False
    start, stop = matchObject.span()
    if stop-start == len(fullString):
        return True
    else:
        return False

Where the fullString is the String on which you apply the regex and the matchObject is the result of matchObject = re.match(yourRegex, fullString)

其中fullString是应用正则表达式的String,matchObject是matchObject = re.match(yourRegex,fullString)的结果