(^)*是什么意思在这个正则表达式?

时间:2021-01-22 00:33:55

I have this regex:

我有这个正则表达式:

^(^?)*\?(.*)$

If I understand correctly, this is the breakdown of what it does:

如果我理解正确的话,这就是它所做的:

  • ^ - start matching from the beginning of the string
  • ^——开始匹配字符串的开始
  • (^?)* - I don't know know, but it stores it in $1
  • (^)*——我不知道知道,但它将其存储在$ 1
  • \? - matches a question mark
  • \ ?-匹配一个问号。
  • (.*)$ - matches anything until the end of the string
  • (.*)$ -匹配任何东西,直到字符串结束

So what does (^?)* mean?

(^)*是什么意思?

4 个解决方案

#1


20  

The (^?) is simply looking for the literal character ^. The ^ character in a regex pattern only has special meaning when used as the first character of the pattern or the first character in a grouping match []. When used outside those 2 positions the ^ is interpreted literally meaning in looks for the ^ character in the input string

(^)仅仅是寻找文字字符^。^字符在一个正则表达式模式只有特殊的意义作为第一个字符的第一个字符模式或分组匹配[]。当之外使用这些2位置^解释字面上的意思就是在寻找^字符输入字符串

Note: Whether or not ^ outside of the first and grouping position is interpreted literally is regex engine specific. I'm not familiar enough with LUA to state which it does

注:是否^之外的第一个和分组的立场是字面解释是特定的正则表达式引擎。我对LUA还不够熟悉,无法说明它是什么

#2


7  

Lua does not have a conventional regexp language, it has Lua patterns in its place. While they look a lot like regexp, Lua patterns are a distinct language of their own that has a simpler set of rules and most importantly lacks grouping and alternation features.

Lua没有常规的regexp语言,它有自己的Lua模式。虽然它们看起来很像regexp,但Lua模式是一种独特的语言,具有更简单的规则集,最重要的是缺乏分组和交互功能。

Interpreted as a Lua pattern, the example will surprising a longtime regexp user since so many details are different.

将这个示例解释为Lua模式,会让长期使用regexp的用户感到惊讶,因为许多细节是不同的。

Lua patterns are described in PiL, and at a first glance are similar enough to a conventional regexp to cause confusion. The biggest differences are probably the lack of an alternation operator |, parenthesis are only used to mark captures, quantifiers (?, -, +, and *) only apply to a character or character class, and % is the escape character not \. A big clue that this example was probably not written with Lua in mind is the lack of the Lua pattern quoting character % applied to any (or ideally, all) of the non-alphanumeric characters in the pattern string, and the suspicious use of \? which smells like a conventional regexp to match a single literal ?.

在PiL中描述了Lua模式,乍一看,它们与常规的regexp非常相似,从而导致混淆。最大的区别可能是缺少交替运算符|,括号仅用于标记捕获,量词(?,-,+,和*)只适用于字符或字符类,%是转义字符而不是\。这个例子可能没有考虑Lua,一个重要的线索是Lua模式没有引用用于模式字符串中任何(或理想情况下,所有)非字母数字字符的%的Lua模式,以及对\的可疑使用?哪个闻起来像一个传统的regexp以匹配一个单一的文字?

The simple answer to the question asked is: (^?)* is not a recommended form, and would match ^* or *, capturing the presence or absence of the caret. If that were the intended effect, then I would write it as (%^?)%* to make that clearer.

一个简单的问题的答案是:(^ ?)*不是推荐的形式,并将匹配^ *或*,捕获插入符号的存在与否。如果这是预期效果,然后我会把它写成(% ^ ?)% *清晰。

To see why this is the case, let's take the pattern given and analyze it as a Lua pattern. The entire pattern is:

要了解这种情况的原因,让我们将给定的模式作为Lua模式进行分析。整个模式是:

^(^?)*\?(.*)$

Handed to string.match(), it would be interpreted as follows:

将其解释为:

^ anchors the match to the beginning of the string.

^锚匹配字符串的开始。

( marks the beginning of the first capture.

(标志着第一次捕获的开始。

^ is not at the beginning of the pattern or a character class, so it matches a literal ^ character. For clarity that should likely have been written as %^.

^不是初的模式或一个字符类,所以文字^字符匹配。为了清晰起见,可能应该被写成% ^。

? matches exactly zero or one of the previous character.

吗?匹配0或前一个字符。

) marks the end of the first capture.

)标志着第一次捕获的结束。

* is not after something that can be quantified so it matches a literal * character. For clarity that should likely have been written as %*.

*不是在可以量化的东西之后,所以它匹配一个文字*字符。为了清晰起见,应该把它写成%*。

\ in a pattern matches itself, it is not an escape character in the pattern language. However, it is an escape character in a Lua short string literal, making the following character not special to the string literal parser which in this case is moot because the ? that follows was not special to it in any case. So if the pattern were enclosed in double or single quotes, then the \ would be absorbed by string parsing. If written in a long string (as [[^(^?)*\?(.*)$]], the backslash would survive the string parser, to appear in the pattern.

\在模式匹配中,它不是模式语言中的转义字符。但是,它是Lua短字符串文字中的转义字符,使以下字符对字符串文字解析器不特殊,在本例中,由于?这在任何情况下都不是特别的。因此,如果模式被包含在双引号或单引号中,那么将会被字符串解析所吸收。如果用一个长字符串([[^(^ ?)* \ ?(. *)美元]],反斜杠将字符串解析器,生存模式中出现。

? matches exactly zero or one of the previous character.

吗?匹配0或前一个字符。

( marks the beginning the second capture.

标志着第二次捕获的开始。

. matches any character at all, effectively a synonym for the class [\000-\255] (remember, in Lua numeric escapes are in decimal not octal as in C).

。匹配任何字符,实际上是类的同义词[\000-\255](记住,在Lua中,数字转义是十进制的,而不是C中的八进制)。

* matches zero or more of the previous character, greedily.

*贪婪地匹配上一个字符的0或更多。

) marks the end of the second capture.

)标志着第二次捕获的结束。

$ anchors the pattern to the end of the string.

$锚定模式到字符串的末尾。

So it matches and captures an optional ^ at the beginning of the string, followed by *, then an optional \ which is not captured, and captures the entire rest of the string. string.match would return two strings on success (either or both of which might be zero length), or nil on failure.

所以它匹配和捕获一个可选的^在字符串的开始,紧随其后的是*,然后可选\不捕获,捕获整个字符串。字符串。match将在成功时返回两个字符串(其中一个或两个都可能为零长度),在失败时返回nil。

Edit: I've fixed some typos, and corrected an error in my answer, noticed by Egor in a comment. I forgot that in patterns, special symbols loose their specialness when in a spot where it can't apply. That makes the first asterisk match a literal asterisk rather than be an error. The cascade of that falls through most of the answer.

编辑:我修正了一些拼写错误,并改正了一个错误,Egor在评论中注意到了这个错误。我忘记了在图案中,特殊符号在不能应用的地方会失去其特殊性。这使得第一个星号匹配一个字面星号而不是一个错误。答案大部分都是这样的。

Note that if you really want a true regexp in Lua, there are libraries available that will provide it. That said, the built-in pattern language is quite powerful. If it is not sufficient, then you might be best off adopting a full parser, and use LPeg which can do everything a regexp can and more. It even comes with a module that provides a complete regexp syntax that is translated into an LPeg grammar for execution.

请注意,如果您真的希望Lua中有一个真正的regexp,那么有一些库可以提供它。也就是说,内置的模式语言非常强大。如果还不够,那么最好采用完整的解析器,并使用LPeg,它可以做regexp能做的任何事情,甚至更多。它甚至还附带了一个模块,该模块提供完整的regexp语法,并将其转换为LPeg语法以供执行。

#3


2  

In this case, the (^?) refers to the previous string "^" meaning the literal character ^ as Jared has said. Check out regexlib for any further deciphering.

在这种情况下,(^ ?)是指前面的字符串“^”意义的文字字符^贾里德说。请查看regexlib以了解更多的解密信息。

For all your Regex needs: http://regexlib.com/CheatSheet.aspx

对于所有您的Regex需要:http://regexlib.com/CheatSheet.aspx

#4


1  

It looks to me like the intent of the creator of the expression was to match any number of ^ before the question mark, but only wanted to capture the first instance of ^. However, it may not be a valid expression depending on the engine, as others have stated.

在我看来的意图表达的创造者是匹配任意数量的前^问号,但只是想捕捉^的第一个实例。但是,它可能不是一个有效的表达式,这取决于引擎,正如其他人所说的。

#1


20  

The (^?) is simply looking for the literal character ^. The ^ character in a regex pattern only has special meaning when used as the first character of the pattern or the first character in a grouping match []. When used outside those 2 positions the ^ is interpreted literally meaning in looks for the ^ character in the input string

(^)仅仅是寻找文字字符^。^字符在一个正则表达式模式只有特殊的意义作为第一个字符的第一个字符模式或分组匹配[]。当之外使用这些2位置^解释字面上的意思就是在寻找^字符输入字符串

Note: Whether or not ^ outside of the first and grouping position is interpreted literally is regex engine specific. I'm not familiar enough with LUA to state which it does

注:是否^之外的第一个和分组的立场是字面解释是特定的正则表达式引擎。我对LUA还不够熟悉,无法说明它是什么

#2


7  

Lua does not have a conventional regexp language, it has Lua patterns in its place. While they look a lot like regexp, Lua patterns are a distinct language of their own that has a simpler set of rules and most importantly lacks grouping and alternation features.

Lua没有常规的regexp语言,它有自己的Lua模式。虽然它们看起来很像regexp,但Lua模式是一种独特的语言,具有更简单的规则集,最重要的是缺乏分组和交互功能。

Interpreted as a Lua pattern, the example will surprising a longtime regexp user since so many details are different.

将这个示例解释为Lua模式,会让长期使用regexp的用户感到惊讶,因为许多细节是不同的。

Lua patterns are described in PiL, and at a first glance are similar enough to a conventional regexp to cause confusion. The biggest differences are probably the lack of an alternation operator |, parenthesis are only used to mark captures, quantifiers (?, -, +, and *) only apply to a character or character class, and % is the escape character not \. A big clue that this example was probably not written with Lua in mind is the lack of the Lua pattern quoting character % applied to any (or ideally, all) of the non-alphanumeric characters in the pattern string, and the suspicious use of \? which smells like a conventional regexp to match a single literal ?.

在PiL中描述了Lua模式,乍一看,它们与常规的regexp非常相似,从而导致混淆。最大的区别可能是缺少交替运算符|,括号仅用于标记捕获,量词(?,-,+,和*)只适用于字符或字符类,%是转义字符而不是\。这个例子可能没有考虑Lua,一个重要的线索是Lua模式没有引用用于模式字符串中任何(或理想情况下,所有)非字母数字字符的%的Lua模式,以及对\的可疑使用?哪个闻起来像一个传统的regexp以匹配一个单一的文字?

The simple answer to the question asked is: (^?)* is not a recommended form, and would match ^* or *, capturing the presence or absence of the caret. If that were the intended effect, then I would write it as (%^?)%* to make that clearer.

一个简单的问题的答案是:(^ ?)*不是推荐的形式,并将匹配^ *或*,捕获插入符号的存在与否。如果这是预期效果,然后我会把它写成(% ^ ?)% *清晰。

To see why this is the case, let's take the pattern given and analyze it as a Lua pattern. The entire pattern is:

要了解这种情况的原因,让我们将给定的模式作为Lua模式进行分析。整个模式是:

^(^?)*\?(.*)$

Handed to string.match(), it would be interpreted as follows:

将其解释为:

^ anchors the match to the beginning of the string.

^锚匹配字符串的开始。

( marks the beginning of the first capture.

(标志着第一次捕获的开始。

^ is not at the beginning of the pattern or a character class, so it matches a literal ^ character. For clarity that should likely have been written as %^.

^不是初的模式或一个字符类,所以文字^字符匹配。为了清晰起见,可能应该被写成% ^。

? matches exactly zero or one of the previous character.

吗?匹配0或前一个字符。

) marks the end of the first capture.

)标志着第一次捕获的结束。

* is not after something that can be quantified so it matches a literal * character. For clarity that should likely have been written as %*.

*不是在可以量化的东西之后,所以它匹配一个文字*字符。为了清晰起见,应该把它写成%*。

\ in a pattern matches itself, it is not an escape character in the pattern language. However, it is an escape character in a Lua short string literal, making the following character not special to the string literal parser which in this case is moot because the ? that follows was not special to it in any case. So if the pattern were enclosed in double or single quotes, then the \ would be absorbed by string parsing. If written in a long string (as [[^(^?)*\?(.*)$]], the backslash would survive the string parser, to appear in the pattern.

\在模式匹配中,它不是模式语言中的转义字符。但是,它是Lua短字符串文字中的转义字符,使以下字符对字符串文字解析器不特殊,在本例中,由于?这在任何情况下都不是特别的。因此,如果模式被包含在双引号或单引号中,那么将会被字符串解析所吸收。如果用一个长字符串([[^(^ ?)* \ ?(. *)美元]],反斜杠将字符串解析器,生存模式中出现。

? matches exactly zero or one of the previous character.

吗?匹配0或前一个字符。

( marks the beginning the second capture.

标志着第二次捕获的开始。

. matches any character at all, effectively a synonym for the class [\000-\255] (remember, in Lua numeric escapes are in decimal not octal as in C).

。匹配任何字符,实际上是类的同义词[\000-\255](记住,在Lua中,数字转义是十进制的,而不是C中的八进制)。

* matches zero or more of the previous character, greedily.

*贪婪地匹配上一个字符的0或更多。

) marks the end of the second capture.

)标志着第二次捕获的结束。

$ anchors the pattern to the end of the string.

$锚定模式到字符串的末尾。

So it matches and captures an optional ^ at the beginning of the string, followed by *, then an optional \ which is not captured, and captures the entire rest of the string. string.match would return two strings on success (either or both of which might be zero length), or nil on failure.

所以它匹配和捕获一个可选的^在字符串的开始,紧随其后的是*,然后可选\不捕获,捕获整个字符串。字符串。match将在成功时返回两个字符串(其中一个或两个都可能为零长度),在失败时返回nil。

Edit: I've fixed some typos, and corrected an error in my answer, noticed by Egor in a comment. I forgot that in patterns, special symbols loose their specialness when in a spot where it can't apply. That makes the first asterisk match a literal asterisk rather than be an error. The cascade of that falls through most of the answer.

编辑:我修正了一些拼写错误,并改正了一个错误,Egor在评论中注意到了这个错误。我忘记了在图案中,特殊符号在不能应用的地方会失去其特殊性。这使得第一个星号匹配一个字面星号而不是一个错误。答案大部分都是这样的。

Note that if you really want a true regexp in Lua, there are libraries available that will provide it. That said, the built-in pattern language is quite powerful. If it is not sufficient, then you might be best off adopting a full parser, and use LPeg which can do everything a regexp can and more. It even comes with a module that provides a complete regexp syntax that is translated into an LPeg grammar for execution.

请注意,如果您真的希望Lua中有一个真正的regexp,那么有一些库可以提供它。也就是说,内置的模式语言非常强大。如果还不够,那么最好采用完整的解析器,并使用LPeg,它可以做regexp能做的任何事情,甚至更多。它甚至还附带了一个模块,该模块提供完整的regexp语法,并将其转换为LPeg语法以供执行。

#3


2  

In this case, the (^?) refers to the previous string "^" meaning the literal character ^ as Jared has said. Check out regexlib for any further deciphering.

在这种情况下,(^ ?)是指前面的字符串“^”意义的文字字符^贾里德说。请查看regexlib以了解更多的解密信息。

For all your Regex needs: http://regexlib.com/CheatSheet.aspx

对于所有您的Regex需要:http://regexlib.com/CheatSheet.aspx

#4


1  

It looks to me like the intent of the creator of the expression was to match any number of ^ before the question mark, but only wanted to capture the first instance of ^. However, it may not be a valid expression depending on the engine, as others have stated.

在我看来的意图表达的创造者是匹配任意数量的前^问号,但只是想捕捉^的第一个实例。但是,它可能不是一个有效的表达式,这取决于引擎,正如其他人所说的。