c++ 11 regex多行:为什么集团([0 ^ \ \]+ \ n)?some_text有匹配[1]的所有内容吗?

时间:2022-02-18 16:38:58

I'm trying to understand regular expressions better. I'm using Visual Studio 2010. Take for example this expression. In Visual Studio 2010 you can't skip over newlines with [\s\S] so I've heard it's ok to use [^\0]. In the expression I want to match a line but only if it is line 3.

我试着更好地理解正则表达式。我用的是Visual Studio 2010。以这个表达式为例。在Visual Studio 2010中你不能跳过换行[\ s \ s]所以我听说那是可以使用[^ \ 0]。在表达式中,我想要匹配一条直线,但前提是它是第3行。

if(regex_search("line 1\nline 2\nline 3\n",
    match,
    regex("^([^\\0]+\\n)?line (3)\\n")))
{
    cout << "match.length(): " << match.length() << endl;

    for(unsigned i = 0; i < match.size(); ++i)
    {
        cout << "match[" << i <<"]: \"" << match[i] << "\"" << endl;
    }
}

Please note the above code won't work with gcc < 4.9 or ideone (since it uses gcc < 4.9).

请注意,上面的代码不能与gcc < 4.9或ideone一起工作(因为它使用gcc < 4.9)。

In Visual Studio 2010 the code returns:

在Visual Studio 2010中,代码返回:

match.length(): 21
match[0]: "line 1
line 2
line 3
"
match[1]: "line 1
line 2
line 3
"
match[2]: "3"

I'm sure there are better ways to match lines but my question is just why did match[1] group match the whole input? I figured the regex would read line 1\nline 2\n for match[1] and stop since I have line 3 after it in the regex. Is there a word for it in regular expressions or is it a bug?

我确信有更好的方法来匹配线,但我的问题是为什么match[1]组匹配整个输入?我认为regex将读取line 1\nline 2\n的匹配[1],并且在regex中有第3行之后停止。在正则表达式中有这个词吗?或者它是一个bug?

Thanks and if you have edit powers you're welcome to edit this so it's easier to understand.

谢谢,如果你有编辑权限,欢迎编辑,这样更容易理解。

2 个解决方案

#1


1  

For the record, this works in Visual Studio and finds the third line, returning "line 3: :

对于记录,这在Visual Studio中工作,并找到第三行,返回“第3行::

^(?<=(?:[^\n]+\n){2})[^\n]+

^(? < =(?:[^ \ n]+ \ n){ 2 })[^ \ n]+

As for your expression,

至于你的表达,

^([^\0]+\n)?line (3)\n

We have to decide if you are trying to match in Visual Studio's Find function or by making a console program in Visual Studio. These are two very different cases.

我们必须决定你是要在Visual Studio的查找功能中匹配,还是在Visual Studio中创建一个控制台程序。这是两个非常不同的案例。

A. In Visual Studio's Find Function

在Visual Studio的Find函数中。

In Visual Studio's Find function, if you make a text file like this:

在Visual Studio的Find函数中,如果你做一个这样的文本文件:

line 1
line 2
line 3

your regex will not match. Why? Because after line 3 you cannot find \n in a Visual Studio file. Instead, at the line break, you find \r\n which is the standard Windows line break.

你的正则表达式不匹配。为什么?因为在第3行之后,您不能在Visual Studio文件中找到\n。相反,在换行符时,你会发现\r\n是标准的Windows换行符。

Adding the \r fixes it:

添加\r修复:

^([^\0]+\n)?line (3)\r\n

That being said, this regex matches any line, not just line 3, for the simple reason that the [^\0] eats up all the characters, including the newlines, then backtracks until it is before the final new line, at which stage the \n, line 3 and \n tokens complete the match. If you wanted to use [^\0] instead of [^\n], this would be sure you match line 3:

话虽这么说,这个正则表达式匹配任何行,不仅仅是第3行,原因很简单,[^ \ 0]吃所有的人物,包括换行,然后放弃,直到最后新行之前,在这阶段\ n,第3行和\ n令牌完成比赛。如果你想使用[^ \ 0]不是[^ \ n],这将确定你比赛第3行:

^(?<=([^\0]+?\n){2})line 3\r\n

B. In a Console App built in Visual Studio

在Visual Studio中构建的控制台应用程序中。

If you feed a console app your string "line 1\nline 2\nline 3\n", then your original regex matches. However, it matches all three lines, for the reason mentioned above (the [^\0] eats up all the characters, including the newlines, then backtracks until it is before the final new line, at which stage the \n, line 3 and \n tokens complete the match).

如果你输入一个控制台应用程序,你的字符串“行1\nline 2\nline 3\n”,然后你的原始regex匹配。然而,它匹配所有三行,上面提到的原因([0 ^ \]吃所有的人物,包括换行,然后放弃,直到最后的新行之前,在这阶段\ n,第3行和\ n标记完成匹配)。

Here, if you only want line 3 and use [^\0], you can use this for instance:

在这里,如果你只希望第3行和使用[^ \ 0],你可以使用这个为例:

^(?<=([^\0]+?\n){2})line 3\n

#2


1  

I'm pretty sure the match[1] result I get in Visual Studio 2010 is due to a bug.

我非常确定[1]的结果,我在Visual Studio 2010中得到的结果是一个bug。

In Visual Studio 2012 and 2013 and gcc 4.9.0 (20140405) the code returns what I expect:

在Visual Studio 2012和2013和gcc 4.9.0(20140405)中,代码返回了我所期望的:

match.length(): 21
match[0]: "line 1
line 2
line 3
"
match[1]: "line 1
line 2
"
match[2]: "3"

Online regular expression testers RegExr and Regex Hero show the same thing.

在线正则表达式测试员RegExr和Regex Hero显示了相同的内容。

In Visual Studio 2010 to make the expression work properly I can make it "lazy" by adding a question mark after the plus sign: "^([^\\0]+?\\n)?line (3)\\n". (That's a string literal so each backslash is escaped with a backslash.) Although it works now (but differently since it's now finding the closest match since it's lazy) I'm sure it's better to just use the latest Visual Studio.

在Visual Studio 2010中,使表达正常工作我能做到“懒惰”通过添加加号后一个问号:“^(^ \ \[0]+ ? \ \ n)?(3)\ \ n”。(这是一个字符串字面意思,所以每个反斜杠都有一个反斜杠。)虽然它现在起作用了(但由于它是懒惰的,现在它已经找到了最接近的匹配),我相信最好还是使用最新的Visual Studio。

clang-503.0.40 has a different but related bug where it can't process "[^\0]*".

clang-503.0.40有不同但相关的缺陷,它不能处理“^ \[0]*”。

#1


1  

For the record, this works in Visual Studio and finds the third line, returning "line 3: :

对于记录,这在Visual Studio中工作,并找到第三行,返回“第3行::

^(?<=(?:[^\n]+\n){2})[^\n]+

^(? < =(?:[^ \ n]+ \ n){ 2 })[^ \ n]+

As for your expression,

至于你的表达,

^([^\0]+\n)?line (3)\n

We have to decide if you are trying to match in Visual Studio's Find function or by making a console program in Visual Studio. These are two very different cases.

我们必须决定你是要在Visual Studio的查找功能中匹配,还是在Visual Studio中创建一个控制台程序。这是两个非常不同的案例。

A. In Visual Studio's Find Function

在Visual Studio的Find函数中。

In Visual Studio's Find function, if you make a text file like this:

在Visual Studio的Find函数中,如果你做一个这样的文本文件:

line 1
line 2
line 3

your regex will not match. Why? Because after line 3 you cannot find \n in a Visual Studio file. Instead, at the line break, you find \r\n which is the standard Windows line break.

你的正则表达式不匹配。为什么?因为在第3行之后,您不能在Visual Studio文件中找到\n。相反,在换行符时,你会发现\r\n是标准的Windows换行符。

Adding the \r fixes it:

添加\r修复:

^([^\0]+\n)?line (3)\r\n

That being said, this regex matches any line, not just line 3, for the simple reason that the [^\0] eats up all the characters, including the newlines, then backtracks until it is before the final new line, at which stage the \n, line 3 and \n tokens complete the match. If you wanted to use [^\0] instead of [^\n], this would be sure you match line 3:

话虽这么说,这个正则表达式匹配任何行,不仅仅是第3行,原因很简单,[^ \ 0]吃所有的人物,包括换行,然后放弃,直到最后新行之前,在这阶段\ n,第3行和\ n令牌完成比赛。如果你想使用[^ \ 0]不是[^ \ n],这将确定你比赛第3行:

^(?<=([^\0]+?\n){2})line 3\r\n

B. In a Console App built in Visual Studio

在Visual Studio中构建的控制台应用程序中。

If you feed a console app your string "line 1\nline 2\nline 3\n", then your original regex matches. However, it matches all three lines, for the reason mentioned above (the [^\0] eats up all the characters, including the newlines, then backtracks until it is before the final new line, at which stage the \n, line 3 and \n tokens complete the match).

如果你输入一个控制台应用程序,你的字符串“行1\nline 2\nline 3\n”,然后你的原始regex匹配。然而,它匹配所有三行,上面提到的原因([0 ^ \]吃所有的人物,包括换行,然后放弃,直到最后的新行之前,在这阶段\ n,第3行和\ n标记完成匹配)。

Here, if you only want line 3 and use [^\0], you can use this for instance:

在这里,如果你只希望第3行和使用[^ \ 0],你可以使用这个为例:

^(?<=([^\0]+?\n){2})line 3\n

#2


1  

I'm pretty sure the match[1] result I get in Visual Studio 2010 is due to a bug.

我非常确定[1]的结果,我在Visual Studio 2010中得到的结果是一个bug。

In Visual Studio 2012 and 2013 and gcc 4.9.0 (20140405) the code returns what I expect:

在Visual Studio 2012和2013和gcc 4.9.0(20140405)中,代码返回了我所期望的:

match.length(): 21
match[0]: "line 1
line 2
line 3
"
match[1]: "line 1
line 2
"
match[2]: "3"

Online regular expression testers RegExr and Regex Hero show the same thing.

在线正则表达式测试员RegExr和Regex Hero显示了相同的内容。

In Visual Studio 2010 to make the expression work properly I can make it "lazy" by adding a question mark after the plus sign: "^([^\\0]+?\\n)?line (3)\\n". (That's a string literal so each backslash is escaped with a backslash.) Although it works now (but differently since it's now finding the closest match since it's lazy) I'm sure it's better to just use the latest Visual Studio.

在Visual Studio 2010中,使表达正常工作我能做到“懒惰”通过添加加号后一个问号:“^(^ \ \[0]+ ? \ \ n)?(3)\ \ n”。(这是一个字符串字面意思,所以每个反斜杠都有一个反斜杠。)虽然它现在起作用了(但由于它是懒惰的,现在它已经找到了最接近的匹配),我相信最好还是使用最新的Visual Studio。

clang-503.0.40 has a different but related bug where it can't process "[^\0]*".

clang-503.0.40有不同但相关的缺陷,它不能处理“^ \[0]*”。