在Java regex中对Matcher组感到困惑

I have the following line,

我有下面一行，

typeName="ABC:xxxxx;";

I need to fetch the word ABC,

我需要获取ABC这个词，

I wrote the following code snippet,

我写了下面的代码片段，

Pattern pattern4=Pattern.compile("(.*):");
matcher=pattern4.matcher(typeName);

String nameStr="";
if(matcher.find())
{
    nameStr=matcher.group(1);

}

So if I put group(0) I get ABC: but if I put group(1) it is ABC, so I want to know

如果我代入group(0)我得到ABC但如果代入group(1)它是ABC，所以我想知道

What does this 0 and 1 mean? It will be better if anyone can explain me with good examples.

0和1是什么意思?如果有人能给我举个好的例子，那就更好了。
The regex pattern contains a : in it, so why group(1) result omits that? Does group 1 detects all the words inside the parenthesis?

regex模式包含一个:在其中，为什么组(1)结果省略了这个?第1组是否检测到括号内的所有单词?
So, if I put two more parenthesis such as, \\s*(\d*)(.*): then, will be there two groups? group(1) will return the (\d*) part and group(2) return the (.*) part?

因此，如果我再插入两个括号，例如，\\s*(\ \ *)(.*):那么，将会有两个组吗?组(1)会返回(\d*)部分，组(2)会返回(.*)部分吗?

The code snippet was given in a purpose to clear my confusions. It is not the code I am dealing with. The code given above can be done with String.split() in a much easier way.

这段代码是为了消除我的困惑。这不是我要处理的代码。上面给出的代码可以用String.split()以一种更简单的方式完成。

3 个解决方案

#1

142

Capturing and grouping

Capturing group (pattern) creates a group that has capturing property.

捕获组(模式)创建一个捕获属性的组。

A related one that you might often see (and use) is (?:pattern), which creates a group without capturing property, hence named non-capturing group.

您可能经常看到(和使用)的一个相关的是(?:pattern)，它创建一个没有捕获属性的组，因此命名为非捕获组。

A group is usually used when you need to repeat a sequence of patterns, e.g. (\.\w+)+, or to specify where alternation should take effect, e.g. ^(0*1|1*0)$ (^, then 0*1 or 1*0, then $) versus ^0*1|1*0$ (^0*1 or 1*0$).

一群通常是当你需要使用重复的序列模式,例如(\ \ w +)+,或指定交替应该生效,如^(0 * 1 | 1 * 0)美元(^,那么0 * 1或1 * 0,那么美元)与^ 0 * 1 | 1 * 0美元(^ 0 * 1或1 * 0美元)。

A capturing group, apart from grouping, will also record the text matched by the pattern inside the capturing group (pattern). Using your example, (.*):, .* matches ABC and : matches :, and since .* is inside capturing group (.*), the text ABC is recorded for the capturing group 1.

捕获组除了分组之外，还将记录捕获组(模式)中与模式匹配的文本。使用您的示例(.*):，.*匹配ABC和:matches:，由于.*在capture组(.*)中，因此将为capture组1记录文本ABC。

Group number

The whole pattern is defined to be group number 0.

整个模式被定义为第0组。

Any capturing group in the pattern start indexing from 1. The indices are defined by the order of the opening parentheses of the capturing groups. As an example, here are all 5 capturing groups in the below pattern:

模式中的任何捕获组都从1开始索引。索引由捕获组的开始圆括号的顺序定义。以下是5个捕获组的例子:

(group)(?:non-capturing-group)(g(?:ro|u)p( (nested)inside)(another)group)(?=assertion)
|     |                       |          | |      |      ||       |     |
1-----1                       |          | 4------4      |5-------5     |
                              |          3---------------3              |
                              2-----------------------------------------2

The group numbers are used in back-reference \n in pattern and $n in replacement string.

组号在模式和$n替换字符串中使用。

^{In other regex flavors (PCRE, Perl), they can also be used in sub-routine calls.}

在其他regex风格(PCRE, Perl)中，它们也可以用于子例程调用。

You can access the text matched by certain group with Matcher.group(int group). The group numbers can be identified with the rule stated above.

您可以使用Matcher访问由特定组匹配的文本。集团(int)。可以用上面所述的规则识别组号。

^{In some regex flavors (PCRE, Perl), there is a branch reset feature which allows you to use the same number for capturing groups in different branches of alternation.}

在某些regex风格(PCRE, Perl)中，有一个分支重置特性，允许您使用相同的数字来捕获不同分支中的组。

Group name

From Java 7, you can define a named capturing group (?<name>pattern), and you can access the content matched with Matcher.group(String name). The regex is longer, but the code is more meaningful, since it indicates what you are trying to match or extract with the regex.

从Java 7中，您可以定义一个命名捕获组(? 模式)，并且可以访问与Matcher匹配的内容。组(字符串名称)。regex更长，但代码更有意义，因为它指示您试图与regex匹配或提取的内容。

The group names are used in back-reference \k<name> in pattern and ${name} in replacement string.

组名用于回引用\k in pattern， ${name} in replace string。

Named capturing groups are still numbered with the same numbering scheme, so they can also be accessed via Matcher.group(int group).

命名捕获组仍然使用相同的编号方案进行编号，因此也可以通过Matcher访问它们。集团(int)。

Internally, Java's implementation just maps from the name to the group number. Therefore, you cannot use the same name for 2 different capturing groups.

在内部，Java的实现只是将名称映射到组号。因此，不能对两个不同的捕获组使用相同的名称。

#2

For The Rest Of Us

Here is a simple and clear example of how this works

这里有一个简单而清晰的例子来说明这是如何工作的

Regex: ([a-zA-Z0-9]+)([\s]+)([a-zA-Z ]+)([\s]+)([0-9]+)

Regex:([a-zA-Z0-9]+)([\ s]+)([a-zA-Z]+)([\ s]+)([0 - 9]+)

String: "!* UserName10 John Smith 01123 *!"

字符串:" !* UserName10 John Smith 01123 *!

group(0): UserName10 John Smith 01123
group(1): UserName10
group(2):  
group(3): John Smith
group(4):  
group(5): 01123

As you can see, I have created FIVE groups which are each enclosed in parentheses.

如您所见，我已经创建了5个组，每个组都括在括号中。

I included the !* and *! on either side to make it clearer. Note that none of those characters are in the RegEx and therefore will not be produced in the results. Group(0) merely gives you the entire matched string (all of my search criteria in one single line). Group 1 stops right before the first space because the space character was not included in the search criteria. Groups 2 and 4 are simply the white space, which in this case is literally a space character, but could also be a tab or a line feed etc. Group 3 includes the space because I put it in the search criteria ... etc.

我包括了!*和*!两边都要更清楚。注意，这些字符都不在RegEx中，因此不会在结果中生成。Group(0)只提供整个匹配的字符串(我的所有搜索条件都在一行中)。组1在第一个空格前停止，因为空格字符没有包含在搜索条件中。第2组和第4组只是空格，在本例中，空格实际上是空格字符，但也可以是制表符或换行符等等。等。

Hope this makes sense.

希望这是有意义的。

#3

Parenthesis () are used to enable grouping of regex phrases.

括号()用于启用正则表达式短语的分组。

The group(1) contains the string that is between parenthesis (.*) so .* in this case

组(1)包含括号(.*)so .*之间的字符串

And group(0) contains whole matched string.

组(0)包含整个匹配的字符串。

If you would have more groups (read (...) ) it would be put into groups with next indexes (2, 3 and so on).

如果您有更多的组(read(…))，它将被分组到下一个索引(2,3等等)。

#1

142

Capturing and grouping

Capturing group (pattern) creates a group that has capturing property.

捕获组(模式)创建一个捕获属性的组。

A related one that you might often see (and use) is (?:pattern), which creates a group without capturing property, hence named non-capturing group.

您可能经常看到(和使用)的一个相关的是(?:pattern)，它创建一个没有捕获属性的组，因此命名为非捕获组。

Group number

The whole pattern is defined to be group number 0.

整个模式被定义为第0组。

模式中的任何捕获组都从1开始索引。索引由捕获组的开始圆括号的顺序定义。以下是5个捕获组的例子:

(group)(?:non-capturing-group)(g(?:ro|u)p( (nested)inside)(another)group)(?=assertion)
|     |                       |          | |      |      ||       |     |
1-----1                       |          | 4------4      |5-------5     |
                              |          3---------------3              |
                              2-----------------------------------------2

The group numbers are used in back-reference \n in pattern and $n in replacement string.

组号在模式和$n替换字符串中使用。

^{In other regex flavors (PCRE, Perl), they can also be used in sub-routine calls.}

在其他regex风格(PCRE, Perl)中，它们也可以用于子例程调用。

You can access the text matched by certain group with Matcher.group(int group). The group numbers can be identified with the rule stated above.

您可以使用Matcher访问由特定组匹配的文本。集团(int)。可以用上面所述的规则识别组号。

^{In some regex flavors (PCRE, Perl), there is a branch reset feature which allows you to use the same number for capturing groups in different branches of alternation.}

在某些regex风格(PCRE, Perl)中，有一个分支重置特性，允许您使用相同的数字来捕获不同分支中的组。

Group name

The group names are used in back-reference \k<name> in pattern and ${name} in replacement string.

组名用于回引用\k in pattern， ${name} in replace string。

Named capturing groups are still numbered with the same numbering scheme, so they can also be accessed via Matcher.group(int group).

命名捕获组仍然使用相同的编号方案进行编号，因此也可以通过Matcher访问它们。集团(int)。

Internally, Java's implementation just maps from the name to the group number. Therefore, you cannot use the same name for 2 different capturing groups.

在内部，Java的实现只是将名称映射到组号。因此，不能对两个不同的捕获组使用相同的名称。

#2

For The Rest Of Us

Here is a simple and clear example of how this works

这里有一个简单而清晰的例子来说明这是如何工作的

Regex: ([a-zA-Z0-9]+)([\s]+)([a-zA-Z ]+)([\s]+)([0-9]+)

Regex:([a-zA-Z0-9]+)([\ s]+)([a-zA-Z]+)([\ s]+)([0 - 9]+)

String: "!* UserName10 John Smith 01123 *!"

字符串:" !* UserName10 John Smith 01123 *!

group(0): UserName10 John Smith 01123
group(1): UserName10
group(2):  
group(3): John Smith
group(4):  
group(5): 01123

As you can see, I have created FIVE groups which are each enclosed in parentheses.

如您所见，我已经创建了5个组，每个组都括在括号中。

Hope this makes sense.

希望这是有意义的。

#3

Parenthesis () are used to enable grouping of regex phrases.

括号()用于启用正则表达式短语的分组。

The group(1) contains the string that is between parenthesis (.*) so .* in this case

组(1)包含括号(.*)so .*之间的字符串

And group(0) contains whole matched string.

组(0)包含整个匹配的字符串。

If you would have more groups (read (...) ) it would be put into groups with next indexes (2, 3 and so on).

如果您有更多的组(read(…))，它将被分组到下一个索引(2,3等等)。

秒客网

在Java regex中对Matcher组感到困惑

3 个解决方案

#1

Capturing and grouping

Group number

Group name

#2

For The Rest Of Us

#3

#1

Capturing and grouping

Group number

Group name

#2

For The Rest Of Us

#3

相关文章