if I have the following string 'some numbers 66666666666666666667867866 and serial 151283917503423 and 8888888' and I want to find 15 digit numbers (so only 151283917503423) how do I make it so that it doesn't match the bigger number and also deal with the possibility that the string can just be '151283917503423' therefore I cannot identify it by it possibly containing spaces on both sides?
如果我有以下字符串'某些数字66666666666666666667867866和序列号151283917503423和8888888'并且我想找到15位数字(所以只有151283917503423)我该怎么做才能使它与更大的数字不匹配并处理可能性字符串可以只是'151283917503423'因此我无法识别它可能在两边都包含空格?
serial = re.compile('[0-9]{15}')
serial.findall('some numbers 66666666666666666667867866 and serial 151283917503423 and 8888888')
this returns both 66666666666666666667867866 and 151283917503423 but I only want the latter
这将返回66666666666666666667867866和151283917503423,但我只想要后者
4 个解决方案
#1
5
Use word boundaries:
使用单词边界:
serial = re.compile(r'\b[0-9]{15}\b')
\b Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. For example, r'\bfoo\b' matches 'foo', 'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.
\ b匹配空字符串,但仅匹配单词的开头或结尾。单词被定义为字母数字或下划线字符的序列,因此单词的结尾由空格或非字母数字的非下划线字符表示。请注意,正式地,\ b被定义为\ w和\ W字符之间的边界(反之亦然),或者在\ w和字符串的开头/结尾之间,因此被认为是字母数字的精确字符集取决于关于UNICODE和LOCALE标志的值。例如,r'\ bfoo \ b'匹配'foo','foo。','(foo)','bar foo baz'但不匹配'foobar'或'foo3'。在字符范围内,\ b表示退格符,以便与Python的字符串文字兼容。
#2
4
You need to use word boundaries to ensure you don't match unwanted text on either side of your match:
您需要使用单词边界来确保您不匹配匹配任何一方的不需要的文本:
>>> serial = re.compile(r'\b\d{15}\b')
>>> serial.findall('some numbers 66666666666666666667867866 and serial 151283917503423 and 8888888')
['151283917503423']
#3
2
Include word boundaries. Let s
be your string. You can use
包括单词边界。我们是你的字符串。您可以使用
>>> re.findall(r'\b\d{15}\b' ,s)
['151283917503423']
where \b asserts a word boundary (^\w|\w$|\W\w|\w\W)
其中\ b断言一个单词边界(^ \ w | \ w $ | \ W \ w | \ w \ W)
#4
1
Since word boundaries \b
contain 2 assertions each, I would use a single assertion
instead.
由于字边界\ b每个包含2个断言,我将使用单个断言。
(?<![0-9])[0-9]{15}(?![0-9])
(?<![0-9])[0-9] {15}(?![0-9])
should be quicker?
应该更快?
#1
5
Use word boundaries:
使用单词边界:
serial = re.compile(r'\b[0-9]{15}\b')
\b Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. For example, r'\bfoo\b' matches 'foo', 'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.
\ b匹配空字符串,但仅匹配单词的开头或结尾。单词被定义为字母数字或下划线字符的序列,因此单词的结尾由空格或非字母数字的非下划线字符表示。请注意,正式地,\ b被定义为\ w和\ W字符之间的边界(反之亦然),或者在\ w和字符串的开头/结尾之间,因此被认为是字母数字的精确字符集取决于关于UNICODE和LOCALE标志的值。例如,r'\ bfoo \ b'匹配'foo','foo。','(foo)','bar foo baz'但不匹配'foobar'或'foo3'。在字符范围内,\ b表示退格符,以便与Python的字符串文字兼容。
#2
4
You need to use word boundaries to ensure you don't match unwanted text on either side of your match:
您需要使用单词边界来确保您不匹配匹配任何一方的不需要的文本:
>>> serial = re.compile(r'\b\d{15}\b')
>>> serial.findall('some numbers 66666666666666666667867866 and serial 151283917503423 and 8888888')
['151283917503423']
#3
2
Include word boundaries. Let s
be your string. You can use
包括单词边界。我们是你的字符串。您可以使用
>>> re.findall(r'\b\d{15}\b' ,s)
['151283917503423']
where \b asserts a word boundary (^\w|\w$|\W\w|\w\W)
其中\ b断言一个单词边界(^ \ w | \ w $ | \ W \ w | \ w \ W)
#4
1
Since word boundaries \b
contain 2 assertions each, I would use a single assertion
instead.
由于字边界\ b每个包含2个断言,我将使用单个断言。
(?<![0-9])[0-9]{15}(?![0-9])
(?<![0-9])[0-9] {15}(?![0-9])
should be quicker?
应该更快?