Perl-grep在32k字符后停止匹配

时间:2021-02-21 08:56:16

My perl-grep statement is not capturing all the elements of a large match (~32k characters), but it has no trouble with smaller matches.

我的perl-grep语句没有捕获大型匹配的所有元素(~32k字符),但是对于较小的匹配没有问题。

The grep command I want to use in order to grab "allowed [ < TEXT > ]":

我想使用grep命令来获取“allowed [ ]”:

    grep -P '(?si)^\s*allowed\s*\[.*?\]' file.txt

For some reason, if the file is large-ish, the dot stops matching lines. Therefore the above grep doesn't match anything because '.*?\]' can't eat enough to find the ']'.

出于某种原因,如果文件很大,则点会停止匹配行。因此上面的grep与任何东西都不匹配,因为'。*?\]'不能吃得足以找到']'。

    grep -P '(?si)^\s*allowed\s*\[.*' bigFile.txt | wc
1883 1883 32764

But it can still consume the entire file using .*:

但它仍然可以使用。*消耗整个文件:

    grep -P '(?si).*' bigFile.txt | wc
10003 10003 178910

BigFile.txt:

    allowed
    [
        com.bar.baz1
        com.bar.baz2
        ....
        com.bar.baz10000
    ]

As you can see, the BigFile should be matched in its entirety. Instead it stops after about 32k characters, about at line 1880.

如您所见,BigFile应该完全匹配。相反,它在大约32k字符之后停止,大约在1880行。

I am using Grep2.5.1. My best guess is that this version of grep can only match about 2^15=32768 characters from within a pattern...

我正在使用Grep2.5.1。我最好的猜测是这个版本的grep只能匹配模式中的大约2 ^ 15 = 32768个字符...

For comparison, on another machine running grep 2.6.3, the following works fine

为了比较,在运行grep 2.6.3的另一台机器上,以下工作正常

grep -Pzo '(?si)^\s*allowed\s*\[.*?\]' bigFile.txt

1 个解决方案

#1


1  

You're using a non-greedy operator in one command:

您在一个命令中使用非贪婪的运算符:

grep -P '(?si)^\s*allowed\s*\[.*?\]' file.txt
                               ^^

and a greedy operator in the other:

另一个是贪婪的算子:

grep -P '(?si)^\s*allowed\s*\[.*' bigFile.txt | wc
                               ^

This may cause differences in how grep matches your file.

这可能会导致grep与您的文件匹配的差异。

#1


1  

You're using a non-greedy operator in one command:

您在一个命令中使用非贪婪的运算符:

grep -P '(?si)^\s*allowed\s*\[.*?\]' file.txt
                               ^^

and a greedy operator in the other:

另一个是贪婪的算子:

grep -P '(?si)^\s*allowed\s*\[.*' bigFile.txt | wc
                               ^

This may cause differences in how grep matches your file.

这可能会导致grep与您的文件匹配的差异。