如何捕获正则表达式交替的匹配组与拆分?

时间:2021-08-30 21:40:16

I have a string

我有一个字符串

my $foo = 'one#two#three!four#five#six';

from which I want to extract the parts that are seperated by either a # or a !. This is easy enough with split:

从中我想要提取由#或a!分隔的部分。拆分时这很容易:

my @parts = split /#|!/, $foo;

An additional requirement is that I also need to capture the exclamation marks. So I tried

另外一个要求是我还需要捕获感叹号。所以我试过了

my @parts = split /#|(!)/, $foo;

This however returns either an undef value or the exclamation mark (which is also clearly stated in the specification of split).

然而,这会返回undef值或感叹号(在拆分规范中也明确说明)。

So, I weed out the unwanted undef values with grep:

所以,我用grep清除了不需要的undef值:

my @parts = grep { defined } split /#|(!)/, $foo;

This does what I want.

这就是我想要的。

Yet I was wondering if I can change the regular expression in a way so that I don't have to also invoke grep.

然而,我想知道我是否可以以某种方式更改正则表达式,以便我不必也调用grep。

2 个解决方案

#1


5  

When you use split, you may not omit the empty captures once a match is found (as there are always as many captures in the match as there are defined in the regular expression). You may use a matching approach here, though:

当您使用split时,一旦找到匹配项,您可能不会省略空捕获(因为匹配中的捕获总是与正则表达式中定义的一样多)。不过,您可以在此处使用匹配方法:

my @parts = $foo =~ /[^!#]+|!/g;

This way, you will match 1 or more chars other than ! and # (with [^!#]+ alternative), or an exclamation mark, multiple times (/g).

这样,您将匹配除1之外的1个或更多字符!和#(带[^!#] +替代)或感叹号,多次(/ g)。

#2


2  

Use "empty string followed by an exclamation mark or empty string preceded by an exclamation mark" in place of your second alternative:

使用“空字符串后跟感叹号或带有感叹号的空字符串”代替您的第二个备选方案:

my @parts = split /#|(?=!)|(?<=!)/, $foo;

Demo: https://ideone.com/6pA1wx

#1


5  

When you use split, you may not omit the empty captures once a match is found (as there are always as many captures in the match as there are defined in the regular expression). You may use a matching approach here, though:

当您使用split时,一旦找到匹配项,您可能不会省略空捕获(因为匹配中的捕获总是与正则表达式中定义的一样多)。不过,您可以在此处使用匹配方法:

my @parts = $foo =~ /[^!#]+|!/g;

This way, you will match 1 or more chars other than ! and # (with [^!#]+ alternative), or an exclamation mark, multiple times (/g).

这样,您将匹配除1之外的1个或更多字符!和#(带[^!#] +替代)或感叹号,多次(/ g)。

#2


2  

Use "empty string followed by an exclamation mark or empty string preceded by an exclamation mark" in place of your second alternative:

使用“空字符串后跟感叹号或带有感叹号的空字符串”代替您的第二个备选方案:

my @parts = split /#|(?=!)|(?<=!)/, $foo;

Demo: https://ideone.com/6pA1wx