是否有Perl相当于Python的re.findall / re.finditer(迭代正则表达式结果)?

时间:2021-04-18 22:33:35

In Python compiled regex patterns have a findall method that does the following:

在Python中,编译的正则表达式模式有一个findall方法,它执行以下操作:

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

返回字符串中pattern的所有非重叠匹配,作为字符串列表。从左到右扫描字符串,并按找到的顺序返回匹配项。如果模式中存在一个或多个组,则返回组列表;如果模式有多个组,这将是一个元组列表。结果中包含空匹配,除非它们触及另一个匹配的开头。

What's the canonical way of doing this in Perl? A naive algorithm I can think of is along the lines of "while a search and replace with the empty string is successful, do [suite]". I'm hoping there's a nicer way. :-)

在Perl中执行此操作的规范方法是什么?我能想到的一个天真的算法是“当搜索并用空字符串替换成功时,做[套件]”。我希望有一个更好的方式。 :-)

Thanks in advance!

提前致谢!

3 个解决方案

#1


13  

Use the /g modifier in your match. From the perlop manual:

在匹配中使用/ g修饰符。从perlop手册:

The "/g" modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

“/ g”修饰符指定全局模式匹配 - 即在字符串中尽可能多地匹配。它的行为取决于上下文。在列表上下文中,它返回正则表达式中任何捕获括号匹配的子字符串列表。如果没有括号,则返回所有匹配字符串的列表,就好像整个模式周围有圆括号一样。

In scalar context, each execution of "m//g" finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the "/c" modifier (e.g. "m//gc"). Modifying the target string also resets the search position.

在标量上下文中,每次执行“m // g”都会找到下一个匹配项,如果匹配则返回true,如果没有进一步匹配则返回false。可以使用pos()函数读取或设置最后一次匹配后的位置;在perlfunc中看到“pos”。失败的匹配通常会将搜索位置重置为字符串的开头,但您可以通过添加“/ c”修饰符(例如“m // gc”)来避免这种情况。修改目标字符串也会重置搜索位置。

#2


7  

To build on Chris' response, it's probably most relevant to encase the //g regex in a while loop, like:

为了建立在Chris的响应上,将@g正则表达式包含在while循环中可能是最相关的,例如:

my @matches;
while ( 'foobarbaz' =~ m/([aeiou])/g )
{
    push @matches, $1;
}

Pasting some quick Python I/O:

粘贴一些快速的Python I / O:

>>> import re
>>> re.findall(r'([aeiou])([nrs])','I had a sandwich for lunch')
[('a', 'n'), ('o', 'r'), ('u', 'n')]

To get something comparable in Perl, the construct could be something like:

为了在Perl中获得类似的东西,构造可能是这样的:

my $matches = [];
while ( 'I had a sandwich for lunch' =~ m/([aeiou])([nrs])/g )
{
    push @$matches, [$1,$2];
}

But in general, whatever function you're iterating for, you can probably do within the while loop itself.

但总的来说,无论你要迭代什么函数,你都可以在while循环中做。

#3


2  

Nice beginner reference with similar content to @kyle's answer: Perl Tutorial: Using regular expressions

与@ kyle的答案类似的内容很好的初学者参考:Perl教程:使用正则表达式

#1


13  

Use the /g modifier in your match. From the perlop manual:

在匹配中使用/ g修饰符。从perlop手册:

The "/g" modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

“/ g”修饰符指定全局模式匹配 - 即在字符串中尽可能多地匹配。它的行为取决于上下文。在列表上下文中,它返回正则表达式中任何捕获括号匹配的子字符串列表。如果没有括号,则返回所有匹配字符串的列表,就好像整个模式周围有圆括号一样。

In scalar context, each execution of "m//g" finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the "/c" modifier (e.g. "m//gc"). Modifying the target string also resets the search position.

在标量上下文中,每次执行“m // g”都会找到下一个匹配项,如果匹配则返回true,如果没有进一步匹配则返回false。可以使用pos()函数读取或设置最后一次匹配后的位置;在perlfunc中看到“pos”。失败的匹配通常会将搜索位置重置为字符串的开头,但您可以通过添加“/ c”修饰符(例如“m // gc”)来避免这种情况。修改目标字符串也会重置搜索位置。

#2


7  

To build on Chris' response, it's probably most relevant to encase the //g regex in a while loop, like:

为了建立在Chris的响应上,将@g正则表达式包含在while循环中可能是最相关的,例如:

my @matches;
while ( 'foobarbaz' =~ m/([aeiou])/g )
{
    push @matches, $1;
}

Pasting some quick Python I/O:

粘贴一些快速的Python I / O:

>>> import re
>>> re.findall(r'([aeiou])([nrs])','I had a sandwich for lunch')
[('a', 'n'), ('o', 'r'), ('u', 'n')]

To get something comparable in Perl, the construct could be something like:

为了在Perl中获得类似的东西,构造可能是这样的:

my $matches = [];
while ( 'I had a sandwich for lunch' =~ m/([aeiou])([nrs])/g )
{
    push @$matches, [$1,$2];
}

But in general, whatever function you're iterating for, you can probably do within the while loop itself.

但总的来说,无论你要迭代什么函数,你都可以在while循环中做。

#3


2  

Nice beginner reference with similar content to @kyle's answer: Perl Tutorial: Using regular expressions

与@ kyle的答案类似的内容很好的初学者参考:Perl教程:使用正则表达式