当使用正则表达式时,\b是如何工作的?

时间:2021-06-22 15:44:53

If I have a sentence and I wish to display a word or all words after a particular word has been matched ahead of it, for example I would like to display the word fox after brown The quick brown fox jumps over the lazy dog, I know I can look positive look behinds e.g. (?<=brown\s)(\w+) however I don't quite understand the use of \b in the instance (?<=\bbrown\s)(\w+). I am using http://gskinner.com/RegExr/ as my tester.

如果我有一个句子,我想展示一个或所有单词匹配一个特定的词后它之前,例如我想显示单词福克斯后棕色那只敏捷的棕色狐狸跳过了懒惰的狗,我知道我可以看正面看后面是如布朗(? < = \ s)(\ w +)但是我不太明白使用\ b的实例(? < = \ bbrown \ s)(\ w +)。我使用http://gskinner.com/RegExr/作为测试器。

7 个解决方案

#1


15  

\b is a zero with assertion. That means it does not match a character, it matches a position with one thing on the left side and another thing on the right side.

\b是带断言的零。这意味着它不匹配一个字符,它匹配一个位置,一个在左边,另一个在右边。

The word boundary \b matches on a change from a \w (a word character) to a \W a non word character, or from \W to \w

单词boundary \b匹配从一个\w(一个单词字符)到一个\w(一个非单词字符),或者从一个\w到一个\w(一个单词字符)

Which characters are included in \w depends on your language. At least there are all ASCII letters, all ASCII numbers and the underscore. If your regex engine supports unicode, it could be that there are all letters and numbers in \w that have the unicode property letter or number.

\w中包含哪些字符取决于您的语言。至少有所有ASCII码,所有ASCII码和下划线。如果您的regex引擎支持unicode,那么可能存在所有以\w为单位的字母和数字都具有unicode属性的字母或数字。

\W are all characters, that are NOT in \w.

所有的字符都是不存在的。

\bbrown\s

will match here

在这里比赛

The quick brown fox
         ^^

but not here

但不是在这里

The quick bbbbrown fox

because between b and brown is no word boundary, i.e. no change from a non word character to a word character, both characters are included in \w.

因为b和brown之间没有单词边界,也就是说从非单词字符到单词字符没有变化,所以这两个字符都包含在\w中。

If your regex comes to a \b it goes on to the next char, thats the b from brown. Now the \b know's whats on the right side, a word char ==> the b. But now it needs to look back, to let the \b become TRUE, there needs to be a non word character before the b. If there is a space (thats not in \w) then the \b before the b is true. BUT if there is another b then its false and then \bbrown does not match "bbrown"

如果你的正则表达式变成了一个\b,那么它就变成了下一个char,这就是来自brown的b。现在\ b知道什么在右边,一个单词字符= = > b。但现在它需要回顾,让\ b成为真实的,需要有一个非单词字符前b。如果有空间(这不是\ w)前的\ b b是正确的。但是如果有另一个b那么它是假的,然后\bbrown不匹配“bbrown”

The regex brown would match both strings "quick brown" and "bbrown", where the regex \bbrown matches only "quick brown" AND NOT "bbrown"

regex brown将匹配两个字符串“quick brown”和“bbrown”,其中regex \bbrown只匹配“quick brown”而不匹配“bbrown”

For more details see here on www.regular-expressions.info

更多细节请参见www.regular-expressions.info

#2


2  

The \b token is kind of special. It doesn't actually match a character. What it does is it matches any position that lies at the boundary of a word (where "word" in this case is anything that matches \w). So the pattern (?<=brown\s)(\w+) would match "bbbbrown fox", but (?<=\bbrown\s)(\w+) wouldn't, since the position between "bb" and "brown" is in the middle of a word, not at its boundary.

\b令牌有点特别。它实际上不匹配字符。它所做的就是匹配一个单词边界上的任何位置(这里的“word”是任何匹配\w的位置)。因此模式(?<=brown\s)(\w+)会匹配“bbbbrown fox”,但是(?<=\bbrown\s)(\w+)不会匹配,因为“bb”和“brown”之间的位置在一个词的中间,而不是在它的边界。

#3


1  

\b guarantees that brown is on a word boundary effectively excluding patterns like

\b保证brown在一个词的边界上有效地排除了像这样的模式

blackandbrown

blackandbrown

#4


1  

You don't need a look behind, you can simply use:

你不需要往后面看,你可以用:

(\bbrown\s)(\w+)

#5


1  

\b is a "word boundary" and is the position between the start or end of a word and then "non-word" characters.

\b是一个“单词边界”,是一个单词的开始或结束,然后是“非单词”字符的位置。

Its main use is to simplify the selection of a whole word to \bbrown\s will match:

它的主要用途是简化整个单词的选择,使其与\bbrown\s将匹配:

^brown brown 99brown _brown

^ 99年布朗布朗布朗_brown

Its more or less equivalent to "\W*" except when "capturing" strings as "\b" matches the start of the word rather than the non-word character preceding or following the word.

它或多或少相当于“\W*”,除非“捕获”字符串为“\b”匹配单词的开头,而不是单词前面或后面的非单词字符。

#6


1  

\b is a zero width match of a word boundary.

\b是一个零宽度的单词边界匹配。

(Either start of end of a word, where "word" is defined as \w+)

(单词结尾的任何一个,其中“单词”定义为\w+)

Note: "zero width" means if the \b is within a regex that matches, it does not add any characters to the text captured by that match. ie the regex \bfoo\b when matched will capture just "foo" - although the \b contributed to the way that foo was matched (ie as a whole word), it didn't contribute any characters.

注意:“零宽度”意味着如果\b在匹配的regex中,它不会向该匹配捕获的文本添加任何字符。在匹配时,regex \bfoo\b将只捕获“foo”——尽管\b对foo的匹配方式(即整个单词)作出了贡献,但它并没有贡献任何字符。

#7


0  

A word boundary is a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. It's equivalent to this:

单词边界是一个位置,它的前面有一个单词字符,后面没有一个,或者后面有一个单词字符,前面没有一个字符。它等于:

(?<=\w)(?!\w)|(?=\w)(?<!\w)

...or it's supposed to be. See this question for everything you ever wanted to know about word boundaries. ;)

…或者它应该是。看看这个问题,你想知道的关于单词边界的一切。,)

#1


15  

\b is a zero with assertion. That means it does not match a character, it matches a position with one thing on the left side and another thing on the right side.

\b是带断言的零。这意味着它不匹配一个字符,它匹配一个位置,一个在左边,另一个在右边。

The word boundary \b matches on a change from a \w (a word character) to a \W a non word character, or from \W to \w

单词boundary \b匹配从一个\w(一个单词字符)到一个\w(一个非单词字符),或者从一个\w到一个\w(一个单词字符)

Which characters are included in \w depends on your language. At least there are all ASCII letters, all ASCII numbers and the underscore. If your regex engine supports unicode, it could be that there are all letters and numbers in \w that have the unicode property letter or number.

\w中包含哪些字符取决于您的语言。至少有所有ASCII码,所有ASCII码和下划线。如果您的regex引擎支持unicode,那么可能存在所有以\w为单位的字母和数字都具有unicode属性的字母或数字。

\W are all characters, that are NOT in \w.

所有的字符都是不存在的。

\bbrown\s

will match here

在这里比赛

The quick brown fox
         ^^

but not here

但不是在这里

The quick bbbbrown fox

because between b and brown is no word boundary, i.e. no change from a non word character to a word character, both characters are included in \w.

因为b和brown之间没有单词边界,也就是说从非单词字符到单词字符没有变化,所以这两个字符都包含在\w中。

If your regex comes to a \b it goes on to the next char, thats the b from brown. Now the \b know's whats on the right side, a word char ==> the b. But now it needs to look back, to let the \b become TRUE, there needs to be a non word character before the b. If there is a space (thats not in \w) then the \b before the b is true. BUT if there is another b then its false and then \bbrown does not match "bbrown"

如果你的正则表达式变成了一个\b,那么它就变成了下一个char,这就是来自brown的b。现在\ b知道什么在右边,一个单词字符= = > b。但现在它需要回顾,让\ b成为真实的,需要有一个非单词字符前b。如果有空间(这不是\ w)前的\ b b是正确的。但是如果有另一个b那么它是假的,然后\bbrown不匹配“bbrown”

The regex brown would match both strings "quick brown" and "bbrown", where the regex \bbrown matches only "quick brown" AND NOT "bbrown"

regex brown将匹配两个字符串“quick brown”和“bbrown”,其中regex \bbrown只匹配“quick brown”而不匹配“bbrown”

For more details see here on www.regular-expressions.info

更多细节请参见www.regular-expressions.info

#2


2  

The \b token is kind of special. It doesn't actually match a character. What it does is it matches any position that lies at the boundary of a word (where "word" in this case is anything that matches \w). So the pattern (?<=brown\s)(\w+) would match "bbbbrown fox", but (?<=\bbrown\s)(\w+) wouldn't, since the position between "bb" and "brown" is in the middle of a word, not at its boundary.

\b令牌有点特别。它实际上不匹配字符。它所做的就是匹配一个单词边界上的任何位置(这里的“word”是任何匹配\w的位置)。因此模式(?<=brown\s)(\w+)会匹配“bbbbrown fox”,但是(?<=\bbrown\s)(\w+)不会匹配,因为“bb”和“brown”之间的位置在一个词的中间,而不是在它的边界。

#3


1  

\b guarantees that brown is on a word boundary effectively excluding patterns like

\b保证brown在一个词的边界上有效地排除了像这样的模式

blackandbrown

blackandbrown

#4


1  

You don't need a look behind, you can simply use:

你不需要往后面看,你可以用:

(\bbrown\s)(\w+)

#5


1  

\b is a "word boundary" and is the position between the start or end of a word and then "non-word" characters.

\b是一个“单词边界”,是一个单词的开始或结束,然后是“非单词”字符的位置。

Its main use is to simplify the selection of a whole word to \bbrown\s will match:

它的主要用途是简化整个单词的选择,使其与\bbrown\s将匹配:

^brown brown 99brown _brown

^ 99年布朗布朗布朗_brown

Its more or less equivalent to "\W*" except when "capturing" strings as "\b" matches the start of the word rather than the non-word character preceding or following the word.

它或多或少相当于“\W*”,除非“捕获”字符串为“\b”匹配单词的开头,而不是单词前面或后面的非单词字符。

#6


1  

\b is a zero width match of a word boundary.

\b是一个零宽度的单词边界匹配。

(Either start of end of a word, where "word" is defined as \w+)

(单词结尾的任何一个,其中“单词”定义为\w+)

Note: "zero width" means if the \b is within a regex that matches, it does not add any characters to the text captured by that match. ie the regex \bfoo\b when matched will capture just "foo" - although the \b contributed to the way that foo was matched (ie as a whole word), it didn't contribute any characters.

注意:“零宽度”意味着如果\b在匹配的regex中,它不会向该匹配捕获的文本添加任何字符。在匹配时,regex \bfoo\b将只捕获“foo”——尽管\b对foo的匹配方式(即整个单词)作出了贡献,但它并没有贡献任何字符。

#7


0  

A word boundary is a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. It's equivalent to this:

单词边界是一个位置,它的前面有一个单词字符,后面没有一个,或者后面有一个单词字符,前面没有一个字符。它等于:

(?<=\w)(?!\w)|(?=\w)(?<!\w)

...or it's supposed to be. See this question for everything you ever wanted to know about word boundaries. ;)

…或者它应该是。看看这个问题,你想知道的关于单词边界的一切。,)