\b和\s在正则表达式中的区别

时间:2022-05-27 20:13:30

I was learning regular expression in iOS, saw this tutorial:http://www.raywenderlich.com/30288/nsregularexpression-tutorial-and-cheat-sheet

我在iOS中学习正则表达式,看过这个教程:http://www.raywenderlich.com/30288/nsregularexpression-tutorial-and-cheat-sheet

It reads like this for \b:

它的读法是这样的:

\b matches word boundary characters such as spaces and punctuation. to\b will match the "to" in "to the moon" and "to!", but it will not match "tomorrow". \b is handy for "whole word" type matching.

\b匹配单词边界字符,如空格和标点符号。to\b将匹配"to" in "to the moon" and "to!",但与“明天”不匹配。\b是方便的“全字”类型匹配。

and \s:

和\ s:

\s matches whitespace characters such as spaces, tabs, and newlines. hello\s will match "hello " in "Well, hello there!".

\s匹配空格字符,如空格、制表符和换行符。hello\s将匹配“Well, hello there”中的“hello”。

I have two questions on this:

我有两个问题:

1) what is the difference between \s and \b? when to use which?

1) \s和\b的区别是什么?什么时候使用哪个?

2) \b is handy for "whole word" type matching -> Don't understand the meaning..

2) \b是方便的“全字”字匹配->不懂的意思…

Need some guidance on these two.

这两方面需要一些指导。

4 个解决方案

#1


18  

\b Boundary characters

\b matches the boundary itself but not the boundary character (like a comma or period). It has no length in itself but can be used to find for example e in the end of a word.

\b匹配边界本身,但不匹配边界字符(如逗号或句号)。它本身没有长度,但是可以用来找到,例如在一个单词的结尾e。

For example in the sentence: "Hello there, this is one test. Testing"

例如在句子中:“你好,这是一个测试。测试”

The regex e\b will match an e if it's at the end of the word (followed by a word boundary). Notice in the image below that the e in "test" and "Testing" didn't match since the "e" is not followed by a boundary.

如果regex e\b在单词的末尾(后面跟着单词边界),它将匹配一个e。请注意以下图片中的e在“测试”和“测试”中没有匹配,因为“e”没有遵循边界。

\b和\s在正则表达式中的区别

\s Whitespace

\s on the other hand matches the actual white space characters (like spaces and tabs). In the same sentence it will match all the spaces between the words.

另一方面,\s则匹配实际的空格字符(比如空格和制表符)。在同一个句子中,它将匹配单词之间的所有空格。

\b和\s在正则表达式中的区别


Edit

Since \b doesn't make much sense alone I showed to how to it as e\b (above). The OP asked (in a comment) about what e\s would match compared to e\b to better explain the difference between \b and \s.

因为只看b并没有多大意义,所以我向大家展示了e\b(上图)。OP询问(在评论中)e\s与e\b的匹配程度,以更好地解释\b和\s的区别。

In the same string there is only one match for e\s while there was two matches for e\b since the comma is not a whitespace. Note that the e\s match (image 3) includes the white space where as the e\b match doesn't (image 1).

在同一个字符串中,e\s只有一个匹配项,而e\b有两个匹配项,因为逗号不是空格。注意,e\s match(图3)包含了e\b match没有的空白(图1)。

\b和\s在正则表达式中的区别

#2


2  

\b is zero-width. That is, it doesn't actually match any character. Meanwhile, \s does match a character. This is an important distinction for capturing and more complicated regular expressions.

\ b是任意的。也就是说,它实际上不匹配任何字符。与此同时,\s确实匹配一个字符。这是捕获和更复杂正则表达式的重要区别。

For example, say you're trying to match numbers that begin with multiple zeros, like 007 or 000101101. You might try:

例如,假设您试图匹配以多个0开头的数字,如007或000101101。你可以试一试:

0+\d*

But see, that would also match 1007 and 101000101101! So then, you might try:

但是你看,那也会匹配1007和101000101101!那么,你可以试试:

\s0+\d*

But see how that wouldn't match a 007 at the beginning of the string (because there's no space character)? Using \b allows you to get the "whole word (or number)":

但是,看看在字符串的开头,这和007是不一样的(因为没有空格字符)?使用\b可以让你得到“整字(或数字)”:

\b0+\d*

#3


2  

  • \b is matching a word boundary. That is a zero width assertion, means it is not matching a character, it is matching a position, where a certain condition is true.

    \b正在匹配一个单词边界。这是一个零宽度断言,意思是它不匹配一个字符,它匹配一个位置,在那里一个特定的条件为真。

    \b is related to \w. \w is defining "word characters", means letters, digits and underscores. So \b is now matching on a change from a word character to a non-word character, or the other way round. Means it matches the start and end of a word, but not the character before or after the word.

    \b与\w有关。\w定义“单词字符”,表示字母、数字和下划线。所以\b现在正在匹配从单词字符到非单词字符的变化,或者反过来。意思是它匹配一个单词的开头和结尾,而不是单词前后的字符。

  • \s is a predefined character class that is matching any whitespace character.

    \s是一个预定义的字符类,它匹配任何空格字符。

See and try out what \bFoo\b matches here on Regexr

See and try out what \sFoo\s matches here on Regexr

#4


0  

\b matches any character that is not a letter or number without including itself in the match.

\b匹配任何非字母或数字的字符,但不包括在匹配中。

\s matches only white space.

只匹配空格。

For example: \b would match any of these: "!?,.@#$%^&*()_+ ".

例如:\ b会匹配任何这些:“! ?,.@ # $ % ^ & *()_ +”。

$text = "Hello, Yo! moo .";
$regex = "~o\b~";

^---Will match all three o's.

^——将匹配所有三个o。

$text = "Hello, Yo! moo .";
$regex = "~o\s~";

^---Will only match the 'o' in 'moo'.

^————只在“moo”与“o”。

#1


18  

\b Boundary characters

\b matches the boundary itself but not the boundary character (like a comma or period). It has no length in itself but can be used to find for example e in the end of a word.

\b匹配边界本身,但不匹配边界字符(如逗号或句号)。它本身没有长度,但是可以用来找到,例如在一个单词的结尾e。

For example in the sentence: "Hello there, this is one test. Testing"

例如在句子中:“你好,这是一个测试。测试”

The regex e\b will match an e if it's at the end of the word (followed by a word boundary). Notice in the image below that the e in "test" and "Testing" didn't match since the "e" is not followed by a boundary.

如果regex e\b在单词的末尾(后面跟着单词边界),它将匹配一个e。请注意以下图片中的e在“测试”和“测试”中没有匹配,因为“e”没有遵循边界。

\b和\s在正则表达式中的区别

\s Whitespace

\s on the other hand matches the actual white space characters (like spaces and tabs). In the same sentence it will match all the spaces between the words.

另一方面,\s则匹配实际的空格字符(比如空格和制表符)。在同一个句子中,它将匹配单词之间的所有空格。

\b和\s在正则表达式中的区别


Edit

Since \b doesn't make much sense alone I showed to how to it as e\b (above). The OP asked (in a comment) about what e\s would match compared to e\b to better explain the difference between \b and \s.

因为只看b并没有多大意义,所以我向大家展示了e\b(上图)。OP询问(在评论中)e\s与e\b的匹配程度,以更好地解释\b和\s的区别。

In the same string there is only one match for e\s while there was two matches for e\b since the comma is not a whitespace. Note that the e\s match (image 3) includes the white space where as the e\b match doesn't (image 1).

在同一个字符串中,e\s只有一个匹配项,而e\b有两个匹配项,因为逗号不是空格。注意,e\s match(图3)包含了e\b match没有的空白(图1)。

\b和\s在正则表达式中的区别

#2


2  

\b is zero-width. That is, it doesn't actually match any character. Meanwhile, \s does match a character. This is an important distinction for capturing and more complicated regular expressions.

\ b是任意的。也就是说,它实际上不匹配任何字符。与此同时,\s确实匹配一个字符。这是捕获和更复杂正则表达式的重要区别。

For example, say you're trying to match numbers that begin with multiple zeros, like 007 or 000101101. You might try:

例如,假设您试图匹配以多个0开头的数字,如007或000101101。你可以试一试:

0+\d*

But see, that would also match 1007 and 101000101101! So then, you might try:

但是你看,那也会匹配1007和101000101101!那么,你可以试试:

\s0+\d*

But see how that wouldn't match a 007 at the beginning of the string (because there's no space character)? Using \b allows you to get the "whole word (or number)":

但是,看看在字符串的开头,这和007是不一样的(因为没有空格字符)?使用\b可以让你得到“整字(或数字)”:

\b0+\d*

#3


2  

  • \b is matching a word boundary. That is a zero width assertion, means it is not matching a character, it is matching a position, where a certain condition is true.

    \b正在匹配一个单词边界。这是一个零宽度断言,意思是它不匹配一个字符,它匹配一个位置,在那里一个特定的条件为真。

    \b is related to \w. \w is defining "word characters", means letters, digits and underscores. So \b is now matching on a change from a word character to a non-word character, or the other way round. Means it matches the start and end of a word, but not the character before or after the word.

    \b与\w有关。\w定义“单词字符”,表示字母、数字和下划线。所以\b现在正在匹配从单词字符到非单词字符的变化,或者反过来。意思是它匹配一个单词的开头和结尾,而不是单词前后的字符。

  • \s is a predefined character class that is matching any whitespace character.

    \s是一个预定义的字符类,它匹配任何空格字符。

See and try out what \bFoo\b matches here on Regexr

See and try out what \sFoo\s matches here on Regexr

#4


0  

\b matches any character that is not a letter or number without including itself in the match.

\b匹配任何非字母或数字的字符,但不包括在匹配中。

\s matches only white space.

只匹配空格。

For example: \b would match any of these: "!?,.@#$%^&*()_+ ".

例如:\ b会匹配任何这些:“! ?,.@ # $ % ^ & *()_ +”。

$text = "Hello, Yo! moo .";
$regex = "~o\b~";

^---Will match all three o's.

^——将匹配所有三个o。

$text = "Hello, Yo! moo .";
$regex = "~o\s~";

^---Will only match the 'o' in 'moo'.

^————只在“moo”与“o”。