java RegEx中的非捕获组

时间:2021-06-21 22:33:13

I have written a code, but it doesn't work correctly. Here you can find my RegEx, what I have as the input and what I expect as the output. I am using a non-capturing group, because I want to read the text unti I get "Bundle" word, but I don't want to include it in the captured one. But I don't know what I have done wrongly which causes it not to work.

我编写了一个代码,但它无法正常工作。在这里你可以找到我的RegEx,我有什么作为输入和我期望的输出。我正在使用一个非捕获组,因为我想读取文本,我得到“Bundle”字,但我不想将其包含在捕获的字中。但我不知道我做错了什么导致它无法正常工作。

Here is my code:

这是我的代码:

Pattern pattern = Pattern.compile(
                "((Bundle\\s+Components)|(Included\\s+Components))\\s+(.*?)(?:Bundle)", Pattern.DOTALL);

        Matcher matcher = pattern.matcher(tableInformation);

        while (matcher.find()) {

            String bundleComponents = matcher.group();
            System.out.println(bundleComponents);
        }

Here are the examples: Example 1:

以下是示例:示例1:

Bundle Components bla blah\blabla?!()\\ANY CHARACTER IS POSSIBLE HERE, EVEN LINEBREAK,blah blah
Bundle Type

Example 2:

 Included Components
    blah blah, like above,
    Bundle Type

output I expect for Ex. 1:

输出我期望Ex。 1:

Bundle Components bla blah\blabla?!()\\ANY CHARACTER IS POSSIBLE HERE, EVEN LINEBREAK,blah blah

output I expect for Ex. 2:

输出我期望Ex。 2:

Included Components
blah blah, like above,

What I get as the output for Ex. 2:

我得到的作为Ex的输出。 2:

 Bundle Components bla blah\blabla?!()\\ANY CHARACTER IS POSSIBLE HERE, EVEN LINEBREAK,blah blah
    Bundle Type

What I get as the output for Ex. 2:

我得到的作为Ex的输出。 2:

Included Components
blah blah, like above,
Bundle Type

2 个解决方案

#1


1  

In Full Match you get everything that regex says about, even non-capturing groups. You need to get appropriate Match to get rid of non-capturing groups. The other solution is to use positive lookahead instead of capturing group. Check the regex below. I also removed some unnecessary (IMO) groups.

在完全匹配中,您可以获得正则表达式所说的所有内容,甚至是非捕获组。您需要获得适当的匹配以摆脱非捕获组。另一种解决方案是使用正向前瞻而不是捕获组。检查下面的正则表达式。我还删除了一些不必要的(IMO)组。

(?:Bundle\s+Components|Included\s+Components)\s+.*?(?=Bundle)

It results with only one, full, match.

它只有一个完整的匹配结果。

Demo

PS: The sign of new line just before "Bundle" will be captured as well in this solution.

PS:在此解决方案中也将捕获“Bundle”之前的新行的符号。

#2


1  

You can do this with positive lookahead, since with this one the pattern inside the lookahead group is not included in the match:

你可以用积极的前瞻来做到这一点,因为有了这个,前瞻组中的模式不包含在匹配中:

((?:Bundle\\s+Components)|(?:Included\\s+Components))\\s+(.*?)(?=Bundle)

(not tested)

#1


1  

In Full Match you get everything that regex says about, even non-capturing groups. You need to get appropriate Match to get rid of non-capturing groups. The other solution is to use positive lookahead instead of capturing group. Check the regex below. I also removed some unnecessary (IMO) groups.

在完全匹配中,您可以获得正则表达式所说的所有内容,甚至是非捕获组。您需要获得适当的匹配以摆脱非捕获组。另一种解决方案是使用正向前瞻而不是捕获组。检查下面的正则表达式。我还删除了一些不必要的(IMO)组。

(?:Bundle\s+Components|Included\s+Components)\s+.*?(?=Bundle)

It results with only one, full, match.

它只有一个完整的匹配结果。

Demo

PS: The sign of new line just before "Bundle" will be captured as well in this solution.

PS:在此解决方案中也将捕获“Bundle”之前的新行的符号。

#2


1  

You can do this with positive lookahead, since with this one the pattern inside the lookahead group is not included in the match:

你可以用积极的前瞻来做到这一点,因为有了这个,前瞻组中的模式不包含在匹配中:

((?:Bundle\\s+Components)|(?:Included\\s+Components))\\s+(.*?)(?=Bundle)

(not tested)