在正则表达式的上下文中,“懒惰”和“贪婪”是什么意思?

时间:2021-04-27 15:45:40

Could someone explain these two terms in an understandable way?

有人能解释一下这两个术语吗?

11 个解决方案

#1


429  

Greedy will consume as much as possible. From http://www.regular-expressions.info/repeat.html we see the example of trying to match HTML tags with <.+>. Suppose you have the following:

贪婪会消耗尽可能多的东西。在http://www.regular-expressions.info/repeat.html中,我们看到了试图将HTML标记与<.+>匹配的例子。假设你有以下几点:

<em>Hello World</em>

You may think that <.+> (. means any non newline character and + means one or more) would only match the <em> and the </em>, when in reality it will be very greedy, and go from the first < to the last >. This means it will match <em>Hello World</em> instead of what you wanted.

你可能会认为。+ >(。表示任何非换行字符+表示一个或多个)将只匹配,而实际上它非常贪婪,从第一个 <到最后一个> 。这意味着它将匹配 Hello World而不是您想要的。

Making it lazy (<.+?>) will prevent this. By adding the ? after the +, we tell it to repeat as few times as possible, so the first > it comes across, is where we want to stop the matching.

使它变得懒惰(<.+?>)将防止这种情况发生。通过添加?在+之后,我们要求它重复尽可能少的次数,所以它遇到的第一个>,就是我们想要停止匹配的地方。

I'd encourage you to download RegExr, a great tool that will help you explore Regular Expressions - I use it all the time.

我鼓励您下载RegExr,这是一个很好的工具,可以帮助您研究正则表达式——我一直在使用它。

#2


208  

'Greedy' means match longest possible string.

“贪心”意味着匹配最长的字符串。

'Lazy' means match shortest possible string.

“懒惰”意味着匹配最短的字符串。

For example, the greedy h.+l matches 'hell' in 'hello' but the lazy h.+?l matches 'hel'.

例如,贪婪的h。+ i在" hello "中与" hell "匹配,但是h。+?l“冥界”匹配。

#3


57  

+-------------------+-----------------+------------------------------+
| Greedy quantifier | Lazy quantifier |        Description           |
+-------------------+-----------------+------------------------------+
| *                 | *?              | Star Quantifier: 0 or more   |
| +                 | +?              | Plus Quantifier: 1 or more   |
| ?                 | ??              | Optional Quantifier: 0 or 1  |
| {n}               | {n}?            | Quantifier: exactly n        |
| {n,}              | {n,}?           | Quantifier: n or more        |
| {n,m}             | {n,m}?          | Quantifier: between n and m  |
+-------------------+-----------------+------------------------------+

Add a ? to a quantifier to make it ungreedy i.e lazy.

添加一个吗?使它不贪心。e懒惰。

Example:
test string : *
greedy reg expression : s.*o output: stackoverflow
lazy reg expression : s.*?o output: stackoverflow

示例:测试字符串:*贪婪表达式:s。*o输出:* lazy reg表达式:s.*?o输出:*

#4


43  

Greedy means your expression will match as large a group as possible, lazy means it will match the smallest group possible. For this string:

贪婪意味着你的表达式将匹配尽可能大的组,而懒惰意味着它将匹配尽可能小的组。这个字符串:

abcdefghijklmc

and this expression:

这个表达式:

a.*c

A greedy match will match the whole string, and a lazy match will match just the first abc.

贪婪匹配将匹配整个字符串,而懒惰匹配将只匹配第一个abc。

#5


8  

Taken From www.regular-expressions.info

从www.regular-expressions.info

Greediness: Greedy quantifiers first tries to repeat the token as many times as possible, and gradually gives up matches as the engine backtracks to find an overall match.

贪婪:贪婪量词首先尝试尽可能多地重复这个标记,然后逐渐放弃匹配,因为引擎返回寻找一个整体匹配。

Laziness: Lazy quantifier first repeats the token as few times as required, and gradually expands the match as the engine backtracks through the regex to find an overall match.

惰性:Lazy quantifier首先按需要重复几次令牌,然后随着引擎在regex中回溯以找到一个整体匹配,逐渐扩展匹配。

#6


8  

As far as I know, most regex engine is greedy by default. Add a question mark at the end of quantifier will enable lazy match.

据我所知,大多数regex引擎默认是贪婪的。在量词的末尾加一个问号,将使你的“懒”匹配。

As @Andre S mentioned in comment.

正如@Andre S在评论中提到的。

  • Greedy: Keep searching until condition is not satisfied.
  • 贪心:继续搜索直到条件不满足。
  • Lazy: Stop searching once condition is satisfied.
  • 懒惰:一旦条件满足就停止搜索。

Refer to the example below for what is greedy and what is lazy.

请参考下面的例子,看看什么是贪婪的,什么是懒惰的。

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String args[]){
        String money = "100000000999";
        String greedyRegex = "100(0*)";
        Pattern pattern = Pattern.compile(greedyRegex);
        Matcher matcher = pattern.matcher(money);
        while(matcher.find()){
            System.out.println("I'm greeedy and I want " + matcher.group() + " dollars. This is the most I can get.");
        }

        String lazyRegex = "100(0*?)";
        pattern = Pattern.compile(lazyRegex);
        matcher = pattern.matcher(money);
        while(matcher.find()){
            System.out.println("I'm too lazy to get so much money, only " + matcher.group() + " dollars is enough for me");
        }
    }
}


The result is:

I'm greeedy and I want 100000000 dollars. This is the most I can get.

我想要100000000美元。这是我所能得到的。

I'm too lazy to get so much money, only 100 dollars is enough for me

我太懒了,不想要那么多钱,100美元就够了

#7


5  

From Regular expression

从正则表达式

The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex.

正则表达式中的标准量词是贪婪的,这意味着它们尽可能地匹配,只有在必要时才返回,以匹配正则表达式的其余部分。

By using a lazy quantifier, the expression tries the minimal match first.

通过使用惰性量词,表达式首先尝试最小匹配。

#8


2  

Best shown by example. String. 192.168.1.1 and a greedy regex \b.+\b You might think this would give you the 1st octet but is actually matches against the whole string. WHY!!! Because the.+ is greedy and a greedy match matches every character in '192.168.1.1' until it reaches the end of the string. This is the important bit!!! Now it starts to backtrack one character at a time until it finds a match for the 3rd token (\b).

最好的例子所示。192.168.1.1和贪婪的regex \b。+\b你可能认为它会给你第一个八位元,但实际上它和整个弦是匹配的。为什么! ! !因为。+是贪婪的,贪婪匹配'192.168.1.1'中的每个字符,直到它到达字符串的末尾。这是最重要的一点!!现在它开始每次后退一个字符,直到找到与第三个令牌匹配的字符(\b)。

If the string a 4GB text file and 192.168.1.1 was at the start you could easily see how this backtracking would cause an issue.

如果4GB文本文件和192.168.1.1的字符串在开始时,您可以很容易地看到这种回溯是如何引起问题的。

To make a regex non greedy (lazy) put a question mark after your greedy search e.g *? ?? +? What happens now is token 2 (+?) finds a match, regex moves along a character and then tries the next token (\b) rather than token 2 (+?). So it creeps along gingerly.

要使regex非贪心(懒惰),请在贪婪搜索e之后加上问号。g * ?? ?+ ?现在发生的是令牌2(+?)找到匹配,regex沿着字符移动,然后尝试下一个令牌(\b)而不是令牌2(+?)。所以它小心翼翼地爬行。

#9


1  

Greedy matching. The default behavior of regular expressions is to be greedy. That means it tries to extract as much as possible until it conforms to a pattern even when a smaller part would have been syntactically sufficient.

贪婪匹配。正则表达式的默认行为是贪婪。这意味着它会尽可能多地提取数据,直到它符合一个模式,即使一小部分在语法上是足够的。

Example:

例子:

import re
text = "<body>Regex Greedy Matching Example </body>"
re.findall('<.*>', text)
#> ['<body>Regex Greedy Matching Example </body>']

Instead of matching till the first occurrence of ‘>’, it extracted the whole string. This is the default greedy or ‘take it all’ behavior of regex.

它没有匹配直到第一次出现“>”,而是提取了整个字符串。这是默认的贪婪或“接受它所有”的行为的正则表达式。

Lazy matching, on the other hand, ‘takes as little as possible’. This can be effected by adding a ? at the end of the pattern.

另一方面,懒惰的匹配“需要的尽可能少”。这可以通过添加a来实现吗?在模式的最后。

Example:

例子:

re.findall('<.*?>', text)
#> ['<body>', '</body>']

If you want only the first match to be retrieved, use the search method instead.

如果您只想要检索第一个匹配项,请使用search方法。

re.search('<.*?>', text).group()
#> '<body>'

Source: Python Regex Examples

来源:Python正则表达式的例子

#10


1  

Greedy means it will consume your pattern until there are none of them left and it can look no further.

贪心的意思是它会消耗你的模式,直到没有剩下它们,它再也看不下去了。

Lazy will stop as soon as it will encounter the first pattern you requested.

Lazy将在遇到您要求的第一个模式时立即停止。

One common example that I often encounter is \s*-\s*? of a regex ([0-9]{2}\s*-\s*?[0-9]{7})

我经常遇到的一个常见的例子是\s* \s* \ ?正则表达式([0 - 9]{ 2 } \ s * - \ s * ?[0 - 9]{ 7 })

The first \s* is classified as greedy because of * and will look as many white spaces as possible after the digits are encountered and then look for a dash character "-". Where as the second \s*? is lazy because of the present of *? which means that it will look the first white space character and stop right there.

第一个\s*由于*而被归类为贪心型,在遇到数字后,将尽可能多地寻找空格,然后查找破折号字符“-”。在哪里?因为*的存在而懒惰?这意味着它将看起来是第一个空格字符,然后就停在那里。

#11


-2  

try to understand the following behavior:

试着理解以下行为:

    var input = "0014.2";

Regex r1 = new Regex("\\d+.{0,1}\\d+");
Regex r2 = new Regex("\\d*.{0,1}\\d*");

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // "0014.2"

input = " 0014.2";

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // " 0014"

input = "  0014.2";

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // ""

#1


429  

Greedy will consume as much as possible. From http://www.regular-expressions.info/repeat.html we see the example of trying to match HTML tags with <.+>. Suppose you have the following:

贪婪会消耗尽可能多的东西。在http://www.regular-expressions.info/repeat.html中,我们看到了试图将HTML标记与<.+>匹配的例子。假设你有以下几点:

<em>Hello World</em>

You may think that <.+> (. means any non newline character and + means one or more) would only match the <em> and the </em>, when in reality it will be very greedy, and go from the first < to the last >. This means it will match <em>Hello World</em> instead of what you wanted.

你可能会认为。+ >(。表示任何非换行字符+表示一个或多个)将只匹配,而实际上它非常贪婪,从第一个 <到最后一个> 。这意味着它将匹配 Hello World而不是您想要的。

Making it lazy (<.+?>) will prevent this. By adding the ? after the +, we tell it to repeat as few times as possible, so the first > it comes across, is where we want to stop the matching.

使它变得懒惰(<.+?>)将防止这种情况发生。通过添加?在+之后,我们要求它重复尽可能少的次数,所以它遇到的第一个>,就是我们想要停止匹配的地方。

I'd encourage you to download RegExr, a great tool that will help you explore Regular Expressions - I use it all the time.

我鼓励您下载RegExr,这是一个很好的工具,可以帮助您研究正则表达式——我一直在使用它。

#2


208  

'Greedy' means match longest possible string.

“贪心”意味着匹配最长的字符串。

'Lazy' means match shortest possible string.

“懒惰”意味着匹配最短的字符串。

For example, the greedy h.+l matches 'hell' in 'hello' but the lazy h.+?l matches 'hel'.

例如,贪婪的h。+ i在" hello "中与" hell "匹配,但是h。+?l“冥界”匹配。

#3


57  

+-------------------+-----------------+------------------------------+
| Greedy quantifier | Lazy quantifier |        Description           |
+-------------------+-----------------+------------------------------+
| *                 | *?              | Star Quantifier: 0 or more   |
| +                 | +?              | Plus Quantifier: 1 or more   |
| ?                 | ??              | Optional Quantifier: 0 or 1  |
| {n}               | {n}?            | Quantifier: exactly n        |
| {n,}              | {n,}?           | Quantifier: n or more        |
| {n,m}             | {n,m}?          | Quantifier: between n and m  |
+-------------------+-----------------+------------------------------+

Add a ? to a quantifier to make it ungreedy i.e lazy.

添加一个吗?使它不贪心。e懒惰。

Example:
test string : *
greedy reg expression : s.*o output: stackoverflow
lazy reg expression : s.*?o output: stackoverflow

示例:测试字符串:*贪婪表达式:s。*o输出:* lazy reg表达式:s.*?o输出:*

#4


43  

Greedy means your expression will match as large a group as possible, lazy means it will match the smallest group possible. For this string:

贪婪意味着你的表达式将匹配尽可能大的组,而懒惰意味着它将匹配尽可能小的组。这个字符串:

abcdefghijklmc

and this expression:

这个表达式:

a.*c

A greedy match will match the whole string, and a lazy match will match just the first abc.

贪婪匹配将匹配整个字符串,而懒惰匹配将只匹配第一个abc。

#5


8  

Taken From www.regular-expressions.info

从www.regular-expressions.info

Greediness: Greedy quantifiers first tries to repeat the token as many times as possible, and gradually gives up matches as the engine backtracks to find an overall match.

贪婪:贪婪量词首先尝试尽可能多地重复这个标记,然后逐渐放弃匹配,因为引擎返回寻找一个整体匹配。

Laziness: Lazy quantifier first repeats the token as few times as required, and gradually expands the match as the engine backtracks through the regex to find an overall match.

惰性:Lazy quantifier首先按需要重复几次令牌,然后随着引擎在regex中回溯以找到一个整体匹配,逐渐扩展匹配。

#6


8  

As far as I know, most regex engine is greedy by default. Add a question mark at the end of quantifier will enable lazy match.

据我所知,大多数regex引擎默认是贪婪的。在量词的末尾加一个问号,将使你的“懒”匹配。

As @Andre S mentioned in comment.

正如@Andre S在评论中提到的。

  • Greedy: Keep searching until condition is not satisfied.
  • 贪心:继续搜索直到条件不满足。
  • Lazy: Stop searching once condition is satisfied.
  • 懒惰:一旦条件满足就停止搜索。

Refer to the example below for what is greedy and what is lazy.

请参考下面的例子,看看什么是贪婪的,什么是懒惰的。

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String args[]){
        String money = "100000000999";
        String greedyRegex = "100(0*)";
        Pattern pattern = Pattern.compile(greedyRegex);
        Matcher matcher = pattern.matcher(money);
        while(matcher.find()){
            System.out.println("I'm greeedy and I want " + matcher.group() + " dollars. This is the most I can get.");
        }

        String lazyRegex = "100(0*?)";
        pattern = Pattern.compile(lazyRegex);
        matcher = pattern.matcher(money);
        while(matcher.find()){
            System.out.println("I'm too lazy to get so much money, only " + matcher.group() + " dollars is enough for me");
        }
    }
}


The result is:

I'm greeedy and I want 100000000 dollars. This is the most I can get.

我想要100000000美元。这是我所能得到的。

I'm too lazy to get so much money, only 100 dollars is enough for me

我太懒了,不想要那么多钱,100美元就够了

#7


5  

From Regular expression

从正则表达式

The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex.

正则表达式中的标准量词是贪婪的,这意味着它们尽可能地匹配,只有在必要时才返回,以匹配正则表达式的其余部分。

By using a lazy quantifier, the expression tries the minimal match first.

通过使用惰性量词,表达式首先尝试最小匹配。

#8


2  

Best shown by example. String. 192.168.1.1 and a greedy regex \b.+\b You might think this would give you the 1st octet but is actually matches against the whole string. WHY!!! Because the.+ is greedy and a greedy match matches every character in '192.168.1.1' until it reaches the end of the string. This is the important bit!!! Now it starts to backtrack one character at a time until it finds a match for the 3rd token (\b).

最好的例子所示。192.168.1.1和贪婪的regex \b。+\b你可能认为它会给你第一个八位元,但实际上它和整个弦是匹配的。为什么! ! !因为。+是贪婪的,贪婪匹配'192.168.1.1'中的每个字符,直到它到达字符串的末尾。这是最重要的一点!!现在它开始每次后退一个字符,直到找到与第三个令牌匹配的字符(\b)。

If the string a 4GB text file and 192.168.1.1 was at the start you could easily see how this backtracking would cause an issue.

如果4GB文本文件和192.168.1.1的字符串在开始时,您可以很容易地看到这种回溯是如何引起问题的。

To make a regex non greedy (lazy) put a question mark after your greedy search e.g *? ?? +? What happens now is token 2 (+?) finds a match, regex moves along a character and then tries the next token (\b) rather than token 2 (+?). So it creeps along gingerly.

要使regex非贪心(懒惰),请在贪婪搜索e之后加上问号。g * ?? ?+ ?现在发生的是令牌2(+?)找到匹配,regex沿着字符移动,然后尝试下一个令牌(\b)而不是令牌2(+?)。所以它小心翼翼地爬行。

#9


1  

Greedy matching. The default behavior of regular expressions is to be greedy. That means it tries to extract as much as possible until it conforms to a pattern even when a smaller part would have been syntactically sufficient.

贪婪匹配。正则表达式的默认行为是贪婪。这意味着它会尽可能多地提取数据,直到它符合一个模式,即使一小部分在语法上是足够的。

Example:

例子:

import re
text = "<body>Regex Greedy Matching Example </body>"
re.findall('<.*>', text)
#> ['<body>Regex Greedy Matching Example </body>']

Instead of matching till the first occurrence of ‘>’, it extracted the whole string. This is the default greedy or ‘take it all’ behavior of regex.

它没有匹配直到第一次出现“>”,而是提取了整个字符串。这是默认的贪婪或“接受它所有”的行为的正则表达式。

Lazy matching, on the other hand, ‘takes as little as possible’. This can be effected by adding a ? at the end of the pattern.

另一方面,懒惰的匹配“需要的尽可能少”。这可以通过添加a来实现吗?在模式的最后。

Example:

例子:

re.findall('<.*?>', text)
#> ['<body>', '</body>']

If you want only the first match to be retrieved, use the search method instead.

如果您只想要检索第一个匹配项,请使用search方法。

re.search('<.*?>', text).group()
#> '<body>'

Source: Python Regex Examples

来源:Python正则表达式的例子

#10


1  

Greedy means it will consume your pattern until there are none of them left and it can look no further.

贪心的意思是它会消耗你的模式,直到没有剩下它们,它再也看不下去了。

Lazy will stop as soon as it will encounter the first pattern you requested.

Lazy将在遇到您要求的第一个模式时立即停止。

One common example that I often encounter is \s*-\s*? of a regex ([0-9]{2}\s*-\s*?[0-9]{7})

我经常遇到的一个常见的例子是\s* \s* \ ?正则表达式([0 - 9]{ 2 } \ s * - \ s * ?[0 - 9]{ 7 })

The first \s* is classified as greedy because of * and will look as many white spaces as possible after the digits are encountered and then look for a dash character "-". Where as the second \s*? is lazy because of the present of *? which means that it will look the first white space character and stop right there.

第一个\s*由于*而被归类为贪心型,在遇到数字后,将尽可能多地寻找空格,然后查找破折号字符“-”。在哪里?因为*的存在而懒惰?这意味着它将看起来是第一个空格字符,然后就停在那里。

#11


-2  

try to understand the following behavior:

试着理解以下行为:

    var input = "0014.2";

Regex r1 = new Regex("\\d+.{0,1}\\d+");
Regex r2 = new Regex("\\d*.{0,1}\\d*");

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // "0014.2"

input = " 0014.2";

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // " 0014"

input = "  0014.2";

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // ""