有正则表达式尽可能保持匹配吗?

时间:2021-09-07 21:39:51

Is there a convenient way to write a regex that will try to match as much of the regex as possible?

有没有一种方便的方法来编写一个试图匹配尽可能多的正则表达式的正则表达式?

Example:

例:

my $re = qr/a ([a-z]+) (\d+)/;

match_longest($re, "a") => ()
match_longest($re, "a word") => ("word")
match_longest($re, "a word 123") => ("word", "123")
match_longest($re, "a 123") => ()

That is, $re is considered to be a sequence of regular expressions, and match_longest attempts to match as much of this sequence. In a sense, matching never fails - it's only a question of how much matching succeeded. Once a regex match fails, undef for the parts that didn't match.

也就是说,$ re被认为是一系列正则表达式,match_longest尝试匹配这个序列的多少。从某种意义上说,匹配永远不会失败 - 这只是一个匹配成功的问题。正则表达式匹配失败后,对于不匹配的部分,请取消undef。

I know I could write a function which takes a sequence of regexes and creates a single regex to do the job of match_longest. Here's an outline of the idea:

我知道我可以编写一个函数,它接受一系列正则表达式并创建一个正则表达式来完成match_longest的工作。以下是该想法的概述:

Suppose you have three regexes: $r1, $r2 and $r3. The single regex to perform the job of match_longest would have the following structure:

假设你有三个正则表达式:$ r1,$ r2和$ r3。执行match_longest作业的单个正则表达式将具有以下结构:

$r = ($r1 $r2 $r3)? | $r1 ($r2 $r3) | $r1 $r2 $r3?

Unfortunately, this is quadratic in the number of regexes. Is it possible to be more efficient?

不幸的是,这是正则数的二次方。是否可能更有效率?

3 个解决方案

#1


5  

You can use the regex

你可以使用正则表达式

$r = ($r1 ($r2 ($r3)?)?)?

which has each regex contained only once. You may also use non-capturing groups (?:...) in this example to not interfere with your original regular expressions.

每个正则表达式只包含一次。在此示例中,您还可以使用非捕获组(?:...)来不干扰原始正则表达式。

#2


2  

If I understand the question, using nested groups with ? should work:

如果我理解这个问题,使用嵌套组?应该管用:

my $re = qr/a ((\w+) (\d+)?)?/;

#3


0  

This particular case can be written like this:

这个特殊情况可以这样写:

m/a (?:(\w+)(?: (\d+))?)?/

#1


5  

You can use the regex

你可以使用正则表达式

$r = ($r1 ($r2 ($r3)?)?)?

which has each regex contained only once. You may also use non-capturing groups (?:...) in this example to not interfere with your original regular expressions.

每个正则表达式只包含一次。在此示例中,您还可以使用非捕获组(?:...)来不干扰原始正则表达式。

#2


2  

If I understand the question, using nested groups with ? should work:

如果我理解这个问题,使用嵌套组?应该管用:

my $re = qr/a ((\w+) (\d+)?)?/;

#3


0  

This particular case can be written like this:

这个特殊情况可以这样写:

m/a (?:(\w+)(?: (\d+))?)?/