正则表达式问题:在随机位置上仅匹配序列n次

时间:2021-09-23 23:25:43

I have a regex question, take for example:

我有一个正则表达式问题,例如:

  1. ...AAABZBZBCCCDDD...
  2. ...BZBZBDDDBZBZBCCC...

I am looking for a regular expression that matches BZBZB just n times.
in a line. So, if I wanted to match the sequence only once, I should only get the first line as output.

The string occurs on random places in the text. And the regex should be compatible with grep or egrep...

Thanks in advance.

我正在寻找一个与BZBZB匹配n次的正则表达式。在一条线上。所以,如果我只想匹配序列一次,我应该只将第一行作为输出。该字符串出现在文本中的随机位置。正则表达式应该与grep或egrep兼容...在此先感谢。

3 个解决方案

#1


8  

grep '\(.*BZBZB\)\{5\}' will do 5 times, but this will match anything which appears 5 times or more because grep checks if any substring of a line matches. Because grep doesn't have any way to do negative matching of strings in its regular expressions (only characters), this cannot be done with a single command unless, for example, you knew that the characters used in the string to be matched were not used elsewhere.

grep'\(。* BZBZB \)\ {5 \}'将执行5次,但这将匹配出现5次或更多次的任何内容,因为grep检查行的任何子字符串是否匹配。因为grep没有任何方法可以对正则表达式中的字符串进行负匹配(仅限字符),所以单个命令无法完成,除非,例如,您知道要匹配的字符串中使用的字符不是用在其他地方

However, you can do this in two grep commands:

但是,您可以在两个grep命令中执行此操作:

cat temp.txt | grep '\(.*BZBZB\)\{5\}' | grep -v '\(.*BZBZB\)\{6\}'

cat temp.txt | grep'\(。* BZBZB \)\ {5 \}'| grep -v'\(。* BZBZB \)\ {6 \}'

will return lines in which BZBZB appears exactly 5 times. (Basically, it's doing a positive check for 5 or more times and then a negative check for six or more times.)

将返回BZBZB恰好出现5次的行。 (基本上,它正在进行5次或更多次的正面检查,然后进行6次或更多次的负面检查。)

#2


1  

From the grep man page:

从grep手册页:

   -m NUM, --max-count=NUM
    Stop  reading  a file after NUM matching lines.  If the input is
    standard input from a regular file, and NUM matching  lines  are
    output,  grep  ensures  that the standard input is positioned to
    just after the last matching line before exiting, regardless  of
    the  presence of trailing context lines.  This enables a calling
    process to resume a search.  When grep stops after NUM  matching
    lines,  it  outputs  any trailing context lines.  When the -c or
    --count option is also  used,  grep  does  not  output  a  count
    greater  than NUM.  When the -v or --invert-match option is also
    used, grep stops after outputting NUM non-matching lines.

So we need two grep expressions:

所以我们需要两个grep表达式:

grep -e "BZ" -o
grep -e "BZ" -m n

The first one finds all instances of "BZ" in the previous string, without including the content around the lines. Each instance is spit out on its own line. The second one takes each line spit out and continues until n lines have been found.

第一个在前一个字符串中查找“BZ”的所有实例,但不包括行周围的内容。每个实例都在自己的行中吐出。第二个将每条线吐出并继续直到找到n条线。

>>>"ABZABZABX" |grep -e "BZ" -o | grep -e "BZ" -m 1
BZ

Hopefully that is what you needed.

希望这就是你所需要的。

#3


0  

Its ugly but if the grep can do look ahead assertions, this should work:

它的丑陋,但如果grep可以向前看断言,这应该工作:

/^(((?!BZBZB).)*BZBZB){5}((?!BZBZB).)*$/

Edit - The {5} above is the n times variable in the OP. Looks like GNU grep does Perl like assertions using the -P option.

编辑 - 上面的{5}是OP中的n倍变量。看起来像GNU grep使用-P选项做Perl之类的断言。

Perl sample

use strict;  
use warnings;  

my @strary = (  
  'this is BZBZB BZBZB BZBZB and 4 BZBZB then 5 BZBZB and done',  
  'BZBZBBZBZBBZBZBBZBZBBZBZBBZBZBBZBZBBZBZB BZBZB  BZBZB',  
  'BZBZBBZBZBBZBZBBZBZBBZBZB 1',  
  'BZBZBZBBZBZBBZBZBBZBZBBZBZBBZBZB 2',  
);  

my @result = grep /^(((?!BZBZB).)*BZBZB){5}((?!BZBZB).)*$/,  @strary;  

for (@result) {  
   print "Found: '$_'\n";  
}  

Output

Found: 'this is BZBZB BZBZB BZBZB and 4 BZBZB then 5 BZBZB and done'
Found: 'BZBZBBZBZBBZBZBBZBZBBZBZB 1'

#1


8  

grep '\(.*BZBZB\)\{5\}' will do 5 times, but this will match anything which appears 5 times or more because grep checks if any substring of a line matches. Because grep doesn't have any way to do negative matching of strings in its regular expressions (only characters), this cannot be done with a single command unless, for example, you knew that the characters used in the string to be matched were not used elsewhere.

grep'\(。* BZBZB \)\ {5 \}'将执行5次,但这将匹配出现5次或更多次的任何内容,因为grep检查行的任何子字符串是否匹配。因为grep没有任何方法可以对正则表达式中的字符串进行负匹配(仅限字符),所以单个命令无法完成,除非,例如,您知道要匹配的字符串中使用的字符不是用在其他地方

However, you can do this in two grep commands:

但是,您可以在两个grep命令中执行此操作:

cat temp.txt | grep '\(.*BZBZB\)\{5\}' | grep -v '\(.*BZBZB\)\{6\}'

cat temp.txt | grep'\(。* BZBZB \)\ {5 \}'| grep -v'\(。* BZBZB \)\ {6 \}'

will return lines in which BZBZB appears exactly 5 times. (Basically, it's doing a positive check for 5 or more times and then a negative check for six or more times.)

将返回BZBZB恰好出现5次的行。 (基本上,它正在进行5次或更多次的正面检查,然后进行6次或更多次的负面检查。)

#2


1  

From the grep man page:

从grep手册页:

   -m NUM, --max-count=NUM
    Stop  reading  a file after NUM matching lines.  If the input is
    standard input from a regular file, and NUM matching  lines  are
    output,  grep  ensures  that the standard input is positioned to
    just after the last matching line before exiting, regardless  of
    the  presence of trailing context lines.  This enables a calling
    process to resume a search.  When grep stops after NUM  matching
    lines,  it  outputs  any trailing context lines.  When the -c or
    --count option is also  used,  grep  does  not  output  a  count
    greater  than NUM.  When the -v or --invert-match option is also
    used, grep stops after outputting NUM non-matching lines.

So we need two grep expressions:

所以我们需要两个grep表达式:

grep -e "BZ" -o
grep -e "BZ" -m n

The first one finds all instances of "BZ" in the previous string, without including the content around the lines. Each instance is spit out on its own line. The second one takes each line spit out and continues until n lines have been found.

第一个在前一个字符串中查找“BZ”的所有实例,但不包括行周围的内容。每个实例都在自己的行中吐出。第二个将每条线吐出并继续直到找到n条线。

>>>"ABZABZABX" |grep -e "BZ" -o | grep -e "BZ" -m 1
BZ

Hopefully that is what you needed.

希望这就是你所需要的。

#3


0  

Its ugly but if the grep can do look ahead assertions, this should work:

它的丑陋,但如果grep可以向前看断言,这应该工作:

/^(((?!BZBZB).)*BZBZB){5}((?!BZBZB).)*$/

Edit - The {5} above is the n times variable in the OP. Looks like GNU grep does Perl like assertions using the -P option.

编辑 - 上面的{5}是OP中的n倍变量。看起来像GNU grep使用-P选项做Perl之类的断言。

Perl sample

use strict;  
use warnings;  

my @strary = (  
  'this is BZBZB BZBZB BZBZB and 4 BZBZB then 5 BZBZB and done',  
  'BZBZBBZBZBBZBZBBZBZBBZBZBBZBZBBZBZBBZBZB BZBZB  BZBZB',  
  'BZBZBBZBZBBZBZBBZBZBBZBZB 1',  
  'BZBZBZBBZBZBBZBZBBZBZBBZBZBBZBZB 2',  
);  

my @result = grep /^(((?!BZBZB).)*BZBZB){5}((?!BZBZB).)*$/,  @strary;  

for (@result) {  
   print "Found: '$_'\n";  
}  

Output

Found: 'this is BZBZB BZBZB BZBZB and 4 BZBZB then 5 BZBZB and done'
Found: 'BZBZBBZBZBBZBZBBZBZBBZBZB 1'