这个正则表达式部分添加了什么?

时间:2021-04-06 01:58:41

I came across this regular expression in the jQuery source code:

我在jQuery源代码中遇到了这个正则表达式:

...
rmozilla = /(mozilla)(?:.*? rv:([\w.]+))?/,
...

I was wondering why it was rather complicated. I'm especially interested in the reason behind the second part:

我想知道为什么它相当复杂。我对第二部分的原因特别感兴趣:

(?:.*? rv:([\w.]+))?

I did some research but I could not figure out what this part of the regular expression adds.

我做了一些研究,但是我不知道正则表达式的这一部分增加了什么。

(?:)      to match but not capture
.*?       any amount of any character
 rv:      something literal
([\w.]+)  one or more word characters or a dot
?         appear 0 or 1 time

Particularly, that last ? doesn't make much sense to me. The whole second part matches if there is or is not a substring as defined by that second part. With some trial and error the regular expression does not seem to differ from just:

特别是最后一个吗?对我来说没什么意义。如果第二部分定义了子字符串,则整个第二部分匹配。经过反复试验,正则表达式似乎与:

/(mozilla)/

Could someone shed some light on what the second part of the regular expression is supposed to do? What does it constrain; what string fails that passes /(mozilla)/ or the other way round?

有人能解释一下正则表达式的第二部分应该做什么吗?它限制;什么字符串不能通过/(mozilla)/或者反过来?

5 个解决方案

#1


4  

The two regexes would match the same strings, but would store different information in their capturing groups.

这两个regex将匹配相同的字符串,但将在它们的捕获组中存储不同的信息。

for the string: mozilla asdf rv:sadf

用于字符串:mozilla asdf rv:sadf

/(mozilla)(?:.*? rv:([\w.]+))?/
$0 = 'mozilla asdf rv:sadf'
$1 = 'mozilla'
$2 = 'sadf'

/(mozilla)/
$0 = 'mozilla'
$1 = 'mozilla'
$2 = ''

#2


2  

Note: I now notice that this answer might be a bit out of scope. I will still leave it for further information, but if you think it is too much out of scope, just comment and I will remove it.

注意:我现在注意到这个答案可能有点超出范围。我仍然保留它以获取进一步的信息,但是如果您认为它超出了范围,请评论一下,我将删除它。


@arnaud is right, it is to get the version. Here is the code where the expressions is used:

@arnaud是对的,是去取版本。这里是使用表达式的代码:

uaMatch: function( ua ) {
    ua = ua.toLowerCase();

    var match = rwebkit.exec( ua ) ||
                ropera.exec( ua ) ||
                rmsie.exec( ua ) ||
                ua.indexOf("compatible") < 0 && rmozilla.exec( ua ) ||
                [];

    return { browser: match[1] || "", version: match[2] || "0" };
},

You can see that the function returns the version if found and 0 if not. This might be necessary for some browsers or is just provided as additional information for developers.

可以看到,如果找到,函数返回版本,如果没有,返回0。这对于某些浏览器可能是必要的,或者只是作为附加信息提供给开发人员。

The function is called here:

函数在这里被调用:

browserMatch = jQuery.uaMatch( userAgent );
if ( browserMatch.browser ) {
    jQuery.browser[ browserMatch.browser ] = true;
    jQuery.browser.version = browserMatch.version;
}

#3


2  

First, I'd like to clarify the difference between:

首先,我想澄清一下:

.*? - non-greedy match
.* - greedy match

The non-greedy will match the smallest number of bytes possible (given the rest of the search string), and the greedy one will match the most.

非贪婪的将匹配尽可能少的字节(给定搜索字符串的其余部分),而贪婪的将匹配最多。

Given the string:

给定的字符串:

mozilla some text here rv:abc xyz

The regex will return both 'mozilla' and 'abc'. But if the 'rv:' doesn't exist, the regex will still return 'mozilla'.

regex将返回'mozilla'和'abc'。但如果“rv:”不存在,regex仍将返回“mozilla”。

#4


1  

The ([\w.]+) inside of (?:.*? rv:([\w.]+)) is capturing, so maybe this regex was used to get the revision number in the past (however it seems that currently jquery only checks if the regex matches).

(?:.*)的([\w.]+)rv:([\w.]+)正在捕获,所以这个regex可能过去被用来获取修订号(然而,目前jquery似乎只检查regex是否匹配)。

#5


0  

(pat) is a pattern delimiter for matching an full contained pattern. (?:pat) is the negation of above, just like the Character set bracket [^ ] is the negation of [ ]. In javascript the negation occurs with ! . matches any character, * is a quantifier of matches, and can in newer Regex Engines also written as {0,} (but those three additional characters may likely result in an earlier death of your keyboard!) ? redundant match quantifier: may match zero or one time rv: .... literal rv

(pat)是匹配完整包含模式的模式分隔符。(?:帕特)上面的否定,就像字符集支架[^][]的否定。在javascript中,否定发生在!。匹配任何字符,*是匹配的量词,并且在更新的Regex引擎中也可以写成{0,}(但是这三个附加字符可能会导致您的键盘更早地死亡!)冗余匹配量词:可能匹配零次或一次房车:....文字的房车

another submatch, may match zero or one time within the parent match ([\w.]+))? [\w.]... character set, with escapted w "\w": any alphanumerical character, aka [a-zA-Z0-9_] followed by a literal dot, and per match quantifier +, may occur one or more times

另一个子匹配,可能在父匹配中匹配零或一次([\w.]+) ?(\ w。)…字符集合,带有w ' \w ' \w ':任何字母数字字符,即[a- za - z0 -9_]后跟一个文字点,每个匹配量词+,可能出现一次或多次

To reverse engineer the meaning of the pattern match: just evaluate from left on right, in a text editor and substitute the letters by random literals that come to mind and for which each sub-expression matches. Then take a step back and ponder what the regex might have been for.

要反向设计模式匹配的含义,只需在文本编辑器中从左向右求值,并用随机出现在脑海中并与每个子表达式匹配的文字替换字母。然后后退一步,思考一下regex可能是为了什么。

#1


4  

The two regexes would match the same strings, but would store different information in their capturing groups.

这两个regex将匹配相同的字符串,但将在它们的捕获组中存储不同的信息。

for the string: mozilla asdf rv:sadf

用于字符串:mozilla asdf rv:sadf

/(mozilla)(?:.*? rv:([\w.]+))?/
$0 = 'mozilla asdf rv:sadf'
$1 = 'mozilla'
$2 = 'sadf'

/(mozilla)/
$0 = 'mozilla'
$1 = 'mozilla'
$2 = ''

#2


2  

Note: I now notice that this answer might be a bit out of scope. I will still leave it for further information, but if you think it is too much out of scope, just comment and I will remove it.

注意:我现在注意到这个答案可能有点超出范围。我仍然保留它以获取进一步的信息,但是如果您认为它超出了范围,请评论一下,我将删除它。


@arnaud is right, it is to get the version. Here is the code where the expressions is used:

@arnaud是对的,是去取版本。这里是使用表达式的代码:

uaMatch: function( ua ) {
    ua = ua.toLowerCase();

    var match = rwebkit.exec( ua ) ||
                ropera.exec( ua ) ||
                rmsie.exec( ua ) ||
                ua.indexOf("compatible") < 0 && rmozilla.exec( ua ) ||
                [];

    return { browser: match[1] || "", version: match[2] || "0" };
},

You can see that the function returns the version if found and 0 if not. This might be necessary for some browsers or is just provided as additional information for developers.

可以看到,如果找到,函数返回版本,如果没有,返回0。这对于某些浏览器可能是必要的,或者只是作为附加信息提供给开发人员。

The function is called here:

函数在这里被调用:

browserMatch = jQuery.uaMatch( userAgent );
if ( browserMatch.browser ) {
    jQuery.browser[ browserMatch.browser ] = true;
    jQuery.browser.version = browserMatch.version;
}

#3


2  

First, I'd like to clarify the difference between:

首先,我想澄清一下:

.*? - non-greedy match
.* - greedy match

The non-greedy will match the smallest number of bytes possible (given the rest of the search string), and the greedy one will match the most.

非贪婪的将匹配尽可能少的字节(给定搜索字符串的其余部分),而贪婪的将匹配最多。

Given the string:

给定的字符串:

mozilla some text here rv:abc xyz

The regex will return both 'mozilla' and 'abc'. But if the 'rv:' doesn't exist, the regex will still return 'mozilla'.

regex将返回'mozilla'和'abc'。但如果“rv:”不存在,regex仍将返回“mozilla”。

#4


1  

The ([\w.]+) inside of (?:.*? rv:([\w.]+)) is capturing, so maybe this regex was used to get the revision number in the past (however it seems that currently jquery only checks if the regex matches).

(?:.*)的([\w.]+)rv:([\w.]+)正在捕获,所以这个regex可能过去被用来获取修订号(然而,目前jquery似乎只检查regex是否匹配)。

#5


0  

(pat) is a pattern delimiter for matching an full contained pattern. (?:pat) is the negation of above, just like the Character set bracket [^ ] is the negation of [ ]. In javascript the negation occurs with ! . matches any character, * is a quantifier of matches, and can in newer Regex Engines also written as {0,} (but those three additional characters may likely result in an earlier death of your keyboard!) ? redundant match quantifier: may match zero or one time rv: .... literal rv

(pat)是匹配完整包含模式的模式分隔符。(?:帕特)上面的否定,就像字符集支架[^][]的否定。在javascript中,否定发生在!。匹配任何字符,*是匹配的量词,并且在更新的Regex引擎中也可以写成{0,}(但是这三个附加字符可能会导致您的键盘更早地死亡!)冗余匹配量词:可能匹配零次或一次房车:....文字的房车

another submatch, may match zero or one time within the parent match ([\w.]+))? [\w.]... character set, with escapted w "\w": any alphanumerical character, aka [a-zA-Z0-9_] followed by a literal dot, and per match quantifier +, may occur one or more times

另一个子匹配,可能在父匹配中匹配零或一次([\w.]+) ?(\ w。)…字符集合,带有w ' \w ' \w ':任何字母数字字符,即[a- za - z0 -9_]后跟一个文字点,每个匹配量词+,可能出现一次或多次

To reverse engineer the meaning of the pattern match: just evaluate from left on right, in a text editor and substitute the letters by random literals that come to mind and for which each sub-expression matches. Then take a step back and ponder what the regex might have been for.

要反向设计模式匹配的含义,只需在文本编辑器中从左向右求值,并用随机出现在脑海中并与每个子表达式匹配的文字替换字母。然后后退一步,思考一下regex可能是为了什么。