I'm trying to understand regular expressions better. I'm using Visual Studio 2010. Take for example this expression. In Visual Studio 2010 you can't skip over newlines with [\s\S]
so I've heard it's ok to use [^\0]
. In the expression I want to match a line but only if it is line 3
.
我试着更好地理解正则表达式。我用的是Visual Studio 2010。以这个表达式为例。在Visual Studio 2010中你不能跳过换行[\ s \ s]所以我听说那是可以使用[^ \ 0]。在表达式中,我想要匹配一条直线,但前提是它是第3行。
if(regex_search("line 1\nline 2\nline 3\n",
match,
regex("^([^\\0]+\\n)?line (3)\\n")))
{
cout << "match.length(): " << match.length() << endl;
for(unsigned i = 0; i < match.size(); ++i)
{
cout << "match[" << i <<"]: \"" << match[i] << "\"" << endl;
}
}
Please note the above code won't work with gcc < 4.9 or ideone (since it uses gcc < 4.9).
请注意,上面的代码不能与gcc < 4.9或ideone一起工作(因为它使用gcc < 4.9)。
In Visual Studio 2010 the code returns:
在Visual Studio 2010中,代码返回:
match.length(): 21
match[0]: "line 1
line 2
line 3
"
match[1]: "line 1
line 2
line 3
"
match[2]: "3"
I'm sure there are better ways to match lines but my question is just why did match[1] group match the whole input? I figured the regex would read line 1\nline 2\n
for match[1] and stop since I have line 3
after it in the regex. Is there a word for it in regular expressions or is it a bug?
我确信有更好的方法来匹配线,但我的问题是为什么match[1]组匹配整个输入?我认为regex将读取line 1\nline 2\n的匹配[1],并且在regex中有第3行之后停止。在正则表达式中有这个词吗?或者它是一个bug?
Thanks and if you have edit powers you're welcome to edit this so it's easier to understand.
谢谢,如果你有编辑权限,欢迎编辑,这样更容易理解。
2 个解决方案
#1
1
For the record, this works in Visual Studio and finds the third line, returning "line 3: :
对于记录,这在Visual Studio中工作,并找到第三行,返回“第3行::
^(?<=(?:[^\n]+\n){2})[^\n]+
^(? < =(?:[^ \ n]+ \ n){ 2 })[^ \ n]+
As for your expression,
至于你的表达,
^([^\0]+\n)?line (3)\n
We have to decide if you are trying to match in Visual Studio's Find function or by making a console program in Visual Studio. These are two very different cases.
我们必须决定你是要在Visual Studio的查找功能中匹配,还是在Visual Studio中创建一个控制台程序。这是两个非常不同的案例。
A. In Visual Studio's Find Function
在Visual Studio的Find函数中。
In Visual Studio's Find function, if you make a text file like this:
在Visual Studio的Find函数中,如果你做一个这样的文本文件:
line 1
line 2
line 3
your regex will not match. Why? Because after line 3
you cannot find \n
in a Visual Studio file. Instead, at the line break, you find \r\n
which is the standard Windows line break.
你的正则表达式不匹配。为什么?因为在第3行之后,您不能在Visual Studio文件中找到\n。相反,在换行符时,你会发现\r\n是标准的Windows换行符。
Adding the \r
fixes it:
添加\r修复:
^([^\0]+\n)?line (3)\r\n
That being said, this regex matches any line, not just line 3, for the simple reason that the [^\0]
eats up all the characters, including the newlines, then backtracks until it is before the final new line, at which stage the \n
, line 3
and \n
tokens complete the match. If you wanted to use [^\0] instead of [^\n], this would be sure you match line 3:
话虽这么说,这个正则表达式匹配任何行,不仅仅是第3行,原因很简单,[^ \ 0]吃所有的人物,包括换行,然后放弃,直到最后新行之前,在这阶段\ n,第3行和\ n令牌完成比赛。如果你想使用[^ \ 0]不是[^ \ n],这将确定你比赛第3行:
^(?<=([^\0]+?\n){2})line 3\r\n
B. In a Console App built in Visual Studio
在Visual Studio中构建的控制台应用程序中。
If you feed a console app your string "line 1\nline 2\nline 3\n"
, then your original regex matches. However, it matches all three lines, for the reason mentioned above (the [^\0]
eats up all the characters, including the newlines, then backtracks until it is before the final new line, at which stage the \n
, line 3
and \n
tokens complete the match).
如果你输入一个控制台应用程序,你的字符串“行1\nline 2\nline 3\n”,然后你的原始regex匹配。然而,它匹配所有三行,上面提到的原因([0 ^ \]吃所有的人物,包括换行,然后放弃,直到最后的新行之前,在这阶段\ n,第3行和\ n标记完成匹配)。
Here, if you only want line 3 and use [^\0]
, you can use this for instance:
在这里,如果你只希望第3行和使用[^ \ 0],你可以使用这个为例:
^(?<=([^\0]+?\n){2})line 3\n
#2
1
I'm pretty sure the match[1]
result I get in Visual Studio 2010 is due to a bug.
我非常确定[1]的结果,我在Visual Studio 2010中得到的结果是一个bug。
In Visual Studio 2012 and 2013 and gcc 4.9.0 (20140405) the code returns what I expect:
在Visual Studio 2012和2013和gcc 4.9.0(20140405)中,代码返回了我所期望的:
match.length(): 21
match[0]: "line 1
line 2
line 3
"
match[1]: "line 1
line 2
"
match[2]: "3"
Online regular expression testers RegExr and Regex Hero show the same thing.
在线正则表达式测试员RegExr和Regex Hero显示了相同的内容。
In Visual Studio 2010 to make the expression work properly I can make it "lazy" by adding a question mark after the plus sign: "^([^\\0]+?\\n)?line (3)\\n"
. (That's a string literal so each backslash is escaped with a backslash.) Although it works now (but differently since it's now finding the closest match since it's lazy) I'm sure it's better to just use the latest Visual Studio.
在Visual Studio 2010中,使表达正常工作我能做到“懒惰”通过添加加号后一个问号:“^(^ \ \[0]+ ? \ \ n)?(3)\ \ n”。(这是一个字符串字面意思,所以每个反斜杠都有一个反斜杠。)虽然它现在起作用了(但由于它是懒惰的,现在它已经找到了最接近的匹配),我相信最好还是使用最新的Visual Studio。
clang-503.0.40 has a different but related bug where it can't process "[^\0]*".
clang-503.0.40有不同但相关的缺陷,它不能处理“^ \[0]*”。
#1
1
For the record, this works in Visual Studio and finds the third line, returning "line 3: :
对于记录,这在Visual Studio中工作,并找到第三行,返回“第3行::
^(?<=(?:[^\n]+\n){2})[^\n]+
^(? < =(?:[^ \ n]+ \ n){ 2 })[^ \ n]+
As for your expression,
至于你的表达,
^([^\0]+\n)?line (3)\n
We have to decide if you are trying to match in Visual Studio's Find function or by making a console program in Visual Studio. These are two very different cases.
我们必须决定你是要在Visual Studio的查找功能中匹配,还是在Visual Studio中创建一个控制台程序。这是两个非常不同的案例。
A. In Visual Studio's Find Function
在Visual Studio的Find函数中。
In Visual Studio's Find function, if you make a text file like this:
在Visual Studio的Find函数中,如果你做一个这样的文本文件:
line 1
line 2
line 3
your regex will not match. Why? Because after line 3
you cannot find \n
in a Visual Studio file. Instead, at the line break, you find \r\n
which is the standard Windows line break.
你的正则表达式不匹配。为什么?因为在第3行之后,您不能在Visual Studio文件中找到\n。相反,在换行符时,你会发现\r\n是标准的Windows换行符。
Adding the \r
fixes it:
添加\r修复:
^([^\0]+\n)?line (3)\r\n
That being said, this regex matches any line, not just line 3, for the simple reason that the [^\0]
eats up all the characters, including the newlines, then backtracks until it is before the final new line, at which stage the \n
, line 3
and \n
tokens complete the match. If you wanted to use [^\0] instead of [^\n], this would be sure you match line 3:
话虽这么说,这个正则表达式匹配任何行,不仅仅是第3行,原因很简单,[^ \ 0]吃所有的人物,包括换行,然后放弃,直到最后新行之前,在这阶段\ n,第3行和\ n令牌完成比赛。如果你想使用[^ \ 0]不是[^ \ n],这将确定你比赛第3行:
^(?<=([^\0]+?\n){2})line 3\r\n
B. In a Console App built in Visual Studio
在Visual Studio中构建的控制台应用程序中。
If you feed a console app your string "line 1\nline 2\nline 3\n"
, then your original regex matches. However, it matches all three lines, for the reason mentioned above (the [^\0]
eats up all the characters, including the newlines, then backtracks until it is before the final new line, at which stage the \n
, line 3
and \n
tokens complete the match).
如果你输入一个控制台应用程序,你的字符串“行1\nline 2\nline 3\n”,然后你的原始regex匹配。然而,它匹配所有三行,上面提到的原因([0 ^ \]吃所有的人物,包括换行,然后放弃,直到最后的新行之前,在这阶段\ n,第3行和\ n标记完成匹配)。
Here, if you only want line 3 and use [^\0]
, you can use this for instance:
在这里,如果你只希望第3行和使用[^ \ 0],你可以使用这个为例:
^(?<=([^\0]+?\n){2})line 3\n
#2
1
I'm pretty sure the match[1]
result I get in Visual Studio 2010 is due to a bug.
我非常确定[1]的结果,我在Visual Studio 2010中得到的结果是一个bug。
In Visual Studio 2012 and 2013 and gcc 4.9.0 (20140405) the code returns what I expect:
在Visual Studio 2012和2013和gcc 4.9.0(20140405)中,代码返回了我所期望的:
match.length(): 21
match[0]: "line 1
line 2
line 3
"
match[1]: "line 1
line 2
"
match[2]: "3"
Online regular expression testers RegExr and Regex Hero show the same thing.
在线正则表达式测试员RegExr和Regex Hero显示了相同的内容。
In Visual Studio 2010 to make the expression work properly I can make it "lazy" by adding a question mark after the plus sign: "^([^\\0]+?\\n)?line (3)\\n"
. (That's a string literal so each backslash is escaped with a backslash.) Although it works now (but differently since it's now finding the closest match since it's lazy) I'm sure it's better to just use the latest Visual Studio.
在Visual Studio 2010中,使表达正常工作我能做到“懒惰”通过添加加号后一个问号:“^(^ \ \[0]+ ? \ \ n)?(3)\ \ n”。(这是一个字符串字面意思,所以每个反斜杠都有一个反斜杠。)虽然它现在起作用了(但由于它是懒惰的,现在它已经找到了最接近的匹配),我相信最好还是使用最新的Visual Studio。
clang-503.0.40 has a different but related bug where it can't process "[^\0]*".
clang-503.0.40有不同但相关的缺陷,它不能处理“^ \[0]*”。