正则表达式中方括号和圆括号的区别是什么?

Here is a regular expression I created to use in JavaScript:

下面是我在JavaScript中创建的一个正则表达式:

var reg_num = /^(7|8|9)\d{9}$/

Here is another one suggested by my team member.

这是我的团队成员提出的另一个建议。

var reg_num = /^[7|8|9][\d]{9}$/

The rule is to validate a phone number:

规则是验证一个电话号码:

It should be of only ten numbers.
它应该只有10个数字。
The first number is supposed to be any of 7, 8 or 9.
第一个数字应该是7 8或9。

3 个解决方案

#1

These regexes are equivalent (for matching purposes):

这些正则表达式是等价的(为了匹配目的):

/^(7|8|9)\d{9}$/
/ ^(7 | 8 | 9)\ d { 9 } /美元
/^[789]\d{9}$/
/ ^[789]\ d { 9 } /美元
/^[7-9]\d{9}$/
文献[7 - 9]/ ^ \ d { 9 } /美元

The explanation:

解释:

(a|b|c) is a regex "OR" and means "a or b or c", although the presence of brackets, necessary for the OR, also captures the digit. To be strictly equivalent, you would code (?:7|8|9) to make it a non capturing group.

(|b|c)是一个regex“或”，意思是“a、b或c”，尽管括号的存在(对于OR来说是必需的)也捕获了数字。严格地说，你会编码(?:7|8|9)使它成为一个非捕获组。
[abc] is a "character class" that means "any character from a,b or c" (a character class may use ranges, e.g. [a-d] = [abcd])

[abc]是一个“字符类”，意思是“来自a、b或c的任何字符”(字符类可以使用范围，例如[a-d] = [abcd])

The reason these regexes are similar is that a character class is a shorthand for an "or" (but only for single characters). In an alternation, you can also do something like (abc|def) which does not translate to a character class.

这些正则表达式之所以相似，是因为字符类是“或”的简写(但只针对单个字符)。在交替中，您还可以执行类似(abc|def)的操作，它不会转化为字符类。

#2

Your team's advice is almost right, except for the mistake he made. Once you find out why, you will never forget it. Take a look at this mistake.

你的团队的建议几乎是正确的，除了他犯的错误。一旦你发现了原因，你将永远不会忘记它。看看这个错误。

/^(7|8|9)\d{9}$/

What this does:

这样做:

^ and $ denotes anchored matches, which asserts that the subpattern in between these anchors are the entire match. The string will only match if the subpattern matches the entirety of it, not just a section.
^和$表示固定匹配,声称这些锚之间的子模式在整个比赛。只有当子模式匹配整个字符串(而不仅仅是部分)时，字符串才会匹配。
() denotes a capturing group.
()表示捕获组。
7|8|9 denotes matching either of 7, 8, or 9. It does this with alternations, which is what the pipe operator | does — alternating between alternations. This backtracks between alternations: If the first alternation is not matched, the engine has to return before the pointer location moved during the match of the alternation, to continue matching the next alternation; Whereas the character class can advance sequentially. See this match on a regex engine with optimizations disabled:
|8|9表示7、8或9的匹配。它是通过交替进行的，这是管道运营商|所做的——在交替之间进行。改变之间的反向:如果第一个改变没有被匹配，引擎必须在改变发生时指针位置移动之前返回，以继续匹配下一个改变;而字符类可以按顺序前进。请查看禁用优化的regex引擎上的匹配:

Pattern: (r|f)at
Match string: carat

正则表达式中方括号和圆括号的区别是什么?

Pattern: [rf]at
Match string: carat

正则表达式中方括号和圆括号的区别是什么?

\d{9} matches nine digits. \d is a shorthanded metacharacter, which matches any digits.
\ d { 9 }匹配9位数。\d是一个短元字符，可以匹配任何数字。

/^[7|8|9][\d]{9}$/

Look at what it does:

看看它的作用:

^ and $ denotes anchored matches as well.
^和$表示固定匹配。
[7|8|9] is a character class. Any characters from the list 7, |, 8, |, or 9 can be matched, thus the | was added in incorrectly. This matches without backtracking.
[7|8|9]是一个字符类。可以匹配列表7、|、8、|或9中的任何字符，因此不正确地添加了|。这个比赛没有回溯。
[\d] is a character class that inhabits the metacharacter \d. The combination of the use of a character class and a single metacharacter is a bad idea, by the way, since the layer of abstraction can slow down the match, but this is only an implementation detail and only applies to a few of regex implementations. JavaScript is not one, but it does make the subpattern slightly longer.
[\d]是一个字符类，它使用元字符。顺便说一句，使用字符类和单个元字符是一个坏主意，因为抽象层会减慢匹配，但这只是实现细节，只适用于少数regex实现。JavaScript不是其中之一，但它确实使子模式稍微长一些。
{9} indicates the previous single construct is repeated nine times in total.
{9}表示前面的单个构造总共重复了9次。

The optimal regex is /^[789]\d{9}$/, because /^(7|8|9)\d{9}$/ captures unnecessarily which imposes a performance decrease on most regex implementations (javascript happens to be one, considering the question uses keyword var in code, this probably is JavaScript). The use of php which runs on PCRE for preg matching will optimize away the lack of backtracking, however we're not in PHP either, so using classes [] instead of alternations | gives performance bonus as the match does not backtrack, and therefore both matches and fails faster than using your previous regular expression.

最优正则表达式/ ^[789]\ d { 9 } $ /因为/ ^(7 | 8 | 9)\ d { 9 } $ /捕获不必要对大多数正则表达式实现性能下降(javascript是一个考虑问题使用关键字var代码,这可能是javascript)。使用php运行在PCRE怀孕的匹配将优化掉回溯的缺乏,然而我们不是在php,所以使用类别[],而不是交替|给绩效奖金的比赛并不放弃,因此匹配和失败的速度比使用您之前的正则表达式。

#3

The first 2 examples act very differently if you are REPLACING them by something. If you match on this:

前两个例子的作用非常不同，如果你用某物替换它们。如果你在这方面匹配:

str = str.replace(/^(7|8|9)/ig,'');

you would replace 7 or 8 or 9 by the empty string.

你可以用空字符串替换7或8或9。

If you match on this

如果你匹配这个

str = str.replace(/^[7|8|9]/ig,'');

you will replace 7 or 8 or 9 OR THE VERTICAL BAR!!!! by the empty string.

你将取代7或8或9或垂直酒吧!!!!空字符串。

I just found this out the hard way.

我刚刚才发现这一点。

#1

These regexes are equivalent (for matching purposes):

这些正则表达式是等价的(为了匹配目的):

/^(7|8|9)\d{9}$/
/ ^(7 | 8 | 9)\ d { 9 } /美元
/^[789]\d{9}$/
/ ^[789]\ d { 9 } /美元
/^[7-9]\d{9}$/
文献[7 - 9]/ ^ \ d { 9 } /美元

The explanation:

解释:

(a|b|c) is a regex "OR" and means "a or b or c", although the presence of brackets, necessary for the OR, also captures the digit. To be strictly equivalent, you would code (?:7|8|9) to make it a non capturing group.

(|b|c)是一个regex“或”，意思是“a、b或c”，尽管括号的存在(对于OR来说是必需的)也捕获了数字。严格地说，你会编码(?:7|8|9)使它成为一个非捕获组。
[abc] is a "character class" that means "any character from a,b or c" (a character class may use ranges, e.g. [a-d] = [abcd])

[abc]是一个“字符类”，意思是“来自a、b或c的任何字符”(字符类可以使用范围，例如[a-d] = [abcd])

这些正则表达式之所以相似，是因为字符类是“或”的简写(但只针对单个字符)。在交替中，您还可以执行类似(abc|def)的操作，它不会转化为字符类。

#2

Your team's advice is almost right, except for the mistake he made. Once you find out why, you will never forget it. Take a look at this mistake.

你的团队的建议几乎是正确的，除了他犯的错误。一旦你发现了原因，你将永远不会忘记它。看看这个错误。

/^(7|8|9)\d{9}$/

What this does:

这样做:

^ and $ denotes anchored matches, which asserts that the subpattern in between these anchors are the entire match. The string will only match if the subpattern matches the entirety of it, not just a section.
^和$表示固定匹配,声称这些锚之间的子模式在整个比赛。只有当子模式匹配整个字符串(而不仅仅是部分)时，字符串才会匹配。
() denotes a capturing group.
()表示捕获组。
7|8|9 denotes matching either of 7, 8, or 9. It does this with alternations, which is what the pipe operator | does — alternating between alternations. This backtracks between alternations: If the first alternation is not matched, the engine has to return before the pointer location moved during the match of the alternation, to continue matching the next alternation; Whereas the character class can advance sequentially. See this match on a regex engine with optimizations disabled:
|8|9表示7、8或9的匹配。它是通过交替进行的，这是管道运营商|所做的——在交替之间进行。改变之间的反向:如果第一个改变没有被匹配，引擎必须在改变发生时指针位置移动之前返回，以继续匹配下一个改变;而字符类可以按顺序前进。请查看禁用优化的regex引擎上的匹配:

Pattern: (r|f)at
Match string: carat

正则表达式中方括号和圆括号的区别是什么?

Pattern: [rf]at
Match string: carat

正则表达式中方括号和圆括号的区别是什么?

\d{9} matches nine digits. \d is a shorthanded metacharacter, which matches any digits.
\ d { 9 }匹配9位数。\d是一个短元字符，可以匹配任何数字。

/^[7|8|9][\d]{9}$/

Look at what it does:

看看它的作用:

^ and $ denotes anchored matches as well.
^和$表示固定匹配。
[7|8|9] is a character class. Any characters from the list 7, |, 8, |, or 9 can be matched, thus the | was added in incorrectly. This matches without backtracking.
[7|8|9]是一个字符类。可以匹配列表7、|、8、|或9中的任何字符，因此不正确地添加了|。这个比赛没有回溯。
[\d] is a character class that inhabits the metacharacter \d. The combination of the use of a character class and a single metacharacter is a bad idea, by the way, since the layer of abstraction can slow down the match, but this is only an implementation detail and only applies to a few of regex implementations. JavaScript is not one, but it does make the subpattern slightly longer.
[\d]是一个字符类，它使用元字符。顺便说一句，使用字符类和单个元字符是一个坏主意，因为抽象层会减慢匹配，但这只是实现细节，只适用于少数regex实现。JavaScript不是其中之一，但它确实使子模式稍微长一些。
{9} indicates the previous single construct is repeated nine times in total.
{9}表示前面的单个构造总共重复了9次。

#3

The first 2 examples act very differently if you are REPLACING them by something. If you match on this:

前两个例子的作用非常不同，如果你用某物替换它们。如果你在这方面匹配:

str = str.replace(/^(7|8|9)/ig,'');

you would replace 7 or 8 or 9 by the empty string.

你可以用空字符串替换7或8或9。

If you match on this

如果你匹配这个

str = str.replace(/^[7|8|9]/ig,'');

you will replace 7 or 8 or 9 OR THE VERTICAL BAR!!!! by the empty string.

你将取代7或8或9或垂直酒吧!!!!空字符串。

I just found this out the hard way.

我刚刚才发现这一点。

秒客网

正则表达式中方括号和圆括号的区别是什么?

3 个解决方案

#1

#2

#3

#1

#2

#3

相关文章