当包含特定字符串时,正则表达式提取连续的单行注释

时间:2022-09-13 16:14:50

Consider an SQL file as follows that contains a number of single-line comments:

考虑如下的SQL文件,其中包含许多单行注释:

-- I'm a little teapot
<<< not a comment >>>
-- some random junk
-- random Mary had a
-- little lamb random
-- more random junk
<<< not a comment >>>

Using regex, I was looking to match the string Mary.*?lamb and extract all consecutive (above and below) single line comments.

使用正则表达式,我希望匹配字符串Mary。*?lamb并提取所有连续(上方和下方)单行注释。

The expected output would be:

预期的产出是:

-- some random junk
-- random Mary had a
-- little lamb random
-- more random junk

I was trying something along these lines but had no luck.

我沿着这些路线尝试了一些东西,但没有运气。

(--[\S\t\x20]*\n)*?(--[\S\t\x20]*?Mary.*?lamb[\S\t\x20]*?\n)(--[\S\t\x20]*\n)*

1 个解决方案

#1


1  

Maybe you can try something like that:

也许你可以试试这样的东西:

^((?:--(?:(?!Mary)[^\n])*[\r\n]{1,2})*)(--[^\n]+?Mary[\s\S]+?lamb[^\n]+[\r\n]{1,2})((?:--(?:(?!Mary)[^\n])*[\r\n]{1,2})*)

regex101 demo

And since it's Java, you will probably have to add some backslashes before the ones in the regex above for it to work:

既然它是Java,你可能需要在上面的正则表达式之前添加一些反斜杠才能使它工作:

^((?:--(?:(?!Mary)[^\\n])*[\\r\\n]{1,2})*)(--[^\\n]+?Mary[\\s\\S]+?lamb[^\\n]+[\\r\\n]{1,2})((?:--(?:(?!Mary)[^\\n])*[\\r\\n]{1,2})*)

I'm using [\\r\\n]{1,2} because I can't be sure whether or not the file has only \n, or only \r or \r\n, so that it will work in all cases (though it can match two newlines in a row, but there's the -- which makes it fine).

我正在使用[\\ r \\ n] {1,2},因为我无法确定该文件是仅有\ n,还是只有\ r或\ r \ n,以便它可以在所有情况(虽然它可以连续匹配两个新行,但有 - 这使得它很好)。

#1


1  

Maybe you can try something like that:

也许你可以试试这样的东西:

^((?:--(?:(?!Mary)[^\n])*[\r\n]{1,2})*)(--[^\n]+?Mary[\s\S]+?lamb[^\n]+[\r\n]{1,2})((?:--(?:(?!Mary)[^\n])*[\r\n]{1,2})*)

regex101 demo

And since it's Java, you will probably have to add some backslashes before the ones in the regex above for it to work:

既然它是Java,你可能需要在上面的正则表达式之前添加一些反斜杠才能使它工作:

^((?:--(?:(?!Mary)[^\\n])*[\\r\\n]{1,2})*)(--[^\\n]+?Mary[\\s\\S]+?lamb[^\\n]+[\\r\\n]{1,2})((?:--(?:(?!Mary)[^\\n])*[\\r\\n]{1,2})*)

I'm using [\\r\\n]{1,2} because I can't be sure whether or not the file has only \n, or only \r or \r\n, so that it will work in all cases (though it can match two newlines in a row, but there's the -- which makes it fine).

我正在使用[\\ r \\ n] {1,2},因为我无法确定该文件是仅有\ n,还是只有\ r或\ r \ n,以便它可以在所有情况(虽然它可以连续匹配两个新行,但有 - 这使得它很好)。