Java正则表达式等同于PCRE / etc。简写`\ K`?

时间:2022-05-11 15:45:21

Perl RegEx and PCRE (Perl-Compatible RegEx) amongst others have the shorthand \K to discard all matches to the left of it except for capturing groups, but Java doesn't support it, so what's Java's equivalent to it ?

Perl RegEx和PCRE(Perl-Compatible RegEx)除了捕获组外,还有简写\ K来丢弃它左边的所有匹配项,但是Java不支持它,那么什么是Java的等价物?

1 个解决方案

#1


4  

There is no direct equivalent. However, you can always re-write such patterns using capturing groups.

没有直接的等价物。但是,您始终可以使用捕获组重新编写此类模式。

If you have a closer look at \K operator and its limitations, you will see you can replace this pattern with capturing groups.

如果仔细查看\ K运算符及其局限性,您将看到可以用捕获组替换此模式。

See rexegg.com \K reference:

请参阅rexegg.com \ K参考:

In the middle of a pattern, \K says "reset the beginning of the reported match to this point". Anything that was matched before the \K goes unreported, a bit like in a lookbehind.

在模式的中间,\ K表示“将报告的匹配的开头重置为此点”。任何在\ K未被报道之前匹配的东西,有点像在后面看。

The key difference between \K and a lookbehind is that in PCRE, a lookbehind does not allow you to use quantifiers: the length of what you look for must be fixed. On the other hand, \K can be dropped anywhere in a pattern, so you are free to have any quantifiers you like before the \K.

\ K和lookbehind之间的关键区别在于,在PCRE中,lookbehind不允许您使用量词:您查找的长度必须是固定的。另一方面,\ K可以放在模式中的任何位置,因此您可以在\ K之前*地使用任何量词。

However, all this means that the pattern before \K is still a consuming pattern, i.e. the regex engine adds up the matched text to the match value and advances its index while matching the pattern, and \K only drops the matched text from the match keeping the index where it is. This means that \K is no better than capturing groups.

但是,所有这些意味着\ K之前的模式仍然是消费模式,即正则表达式引擎将匹配的文本与匹配值相加并在匹配模式时推进其索引,并且\ K仅从匹配中删除匹配的文本将索引保持在原位。这意味着\ K并不比捕获组更好。

So, a value\s*=\s*\K\d+ PCRE/Onigmo pattern would translate into this Java code:

因此,值\ s * = \ s * \ K \ d + PCRE / Onigmo模式将转换为此Java代码:

String s = "Min value = 5000 km";
Matcher m = Pattern.compile("value\\s*=\\s*(\\d+)").matcher(s);
if(m.find()) {
    System.out.println(m.group(1));
}

There is an alternative, but that can only be used with smaller, simpler patterns. A constrained width lookbehind:

有一种替代方案,但只能用于更小,更简单的模式。约束宽度看后面:

Java accepts quantifiers within lookbehind, as long as the length of the matching strings falls within a pre-determined range. For instance, (?<=cats?) is valid because it can only match strings of three or four characters. Likewise, (?<=A{1,10}) is valid.

只要匹配字符串的长度落在预定范围内,Java就会接受lookbehind中的量词。例如,(?<= cats?)是有效的,因为它只能匹配三个或四个字符的字符串。同样,(?<= A {1,10})有效。

So, this will also work:

所以,这也有效:

    m = Pattern.compile("(?<=value\\s{0,10}=\\s{0,10})\\d+").matcher(s);
    if(m.find()) {
        System.out.println(m.group());
    }

See the Java demo.

请参阅Java演示。

#1


4  

There is no direct equivalent. However, you can always re-write such patterns using capturing groups.

没有直接的等价物。但是,您始终可以使用捕获组重新编写此类模式。

If you have a closer look at \K operator and its limitations, you will see you can replace this pattern with capturing groups.

如果仔细查看\ K运算符及其局限性,您将看到可以用捕获组替换此模式。

See rexegg.com \K reference:

请参阅rexegg.com \ K参考:

In the middle of a pattern, \K says "reset the beginning of the reported match to this point". Anything that was matched before the \K goes unreported, a bit like in a lookbehind.

在模式的中间,\ K表示“将报告的匹配的开头重置为此点”。任何在\ K未被报道之前匹配的东西,有点像在后面看。

The key difference between \K and a lookbehind is that in PCRE, a lookbehind does not allow you to use quantifiers: the length of what you look for must be fixed. On the other hand, \K can be dropped anywhere in a pattern, so you are free to have any quantifiers you like before the \K.

\ K和lookbehind之间的关键区别在于,在PCRE中,lookbehind不允许您使用量词:您查找的长度必须是固定的。另一方面,\ K可以放在模式中的任何位置,因此您可以在\ K之前*地使用任何量词。

However, all this means that the pattern before \K is still a consuming pattern, i.e. the regex engine adds up the matched text to the match value and advances its index while matching the pattern, and \K only drops the matched text from the match keeping the index where it is. This means that \K is no better than capturing groups.

但是,所有这些意味着\ K之前的模式仍然是消费模式,即正则表达式引擎将匹配的文本与匹配值相加并在匹配模式时推进其索引,并且\ K仅从匹配中删除匹配的文本将索引保持在原位。这意味着\ K并不比捕获组更好。

So, a value\s*=\s*\K\d+ PCRE/Onigmo pattern would translate into this Java code:

因此,值\ s * = \ s * \ K \ d + PCRE / Onigmo模式将转换为此Java代码:

String s = "Min value = 5000 km";
Matcher m = Pattern.compile("value\\s*=\\s*(\\d+)").matcher(s);
if(m.find()) {
    System.out.println(m.group(1));
}

There is an alternative, but that can only be used with smaller, simpler patterns. A constrained width lookbehind:

有一种替代方案,但只能用于更小,更简单的模式。约束宽度看后面:

Java accepts quantifiers within lookbehind, as long as the length of the matching strings falls within a pre-determined range. For instance, (?<=cats?) is valid because it can only match strings of three or four characters. Likewise, (?<=A{1,10}) is valid.

只要匹配字符串的长度落在预定范围内,Java就会接受lookbehind中的量词。例如,(?<= cats?)是有效的,因为它只能匹配三个或四个字符的字符串。同样,(?<= A {1,10})有效。

So, this will also work:

所以,这也有效:

    m = Pattern.compile("(?<=value\\s{0,10}=\\s{0,10})\\d+").matcher(s);
    if(m.find()) {
        System.out.println(m.group());
    }

See the Java demo.

请参阅Java演示。