awk从每一行中提取多个组。

时间:2021-04-24 07:38:34

How do I perform action on all matching groups when the pattern matches multiple times in a line?

当模式在一行中多次匹配时,如何对所有匹配组执行操作?

To illustrate, I want to search for /Hello! (\d+)/ and use the numbers, for example, print them out or sum them, so for input

为了说明,我想搜索/Hello!(\d+)/并使用数字,例如,打印出来或求和,以便输入

abcHello! 200 300 Hello! Hello! 400z3
ads
Hello! 0

If I decided to print them out, I'd expect the output of

如果我决定打印出来,我希望输出

200
400
0

4 个解决方案

#1


10  

This is a simple syntax, and every awk (nawk, mawk, gawk, etc) can use this.

这是一个简单的语法,每个awk (nawk、mawk、gawk等)都可以使用它。

{
    while (match($0, /Hello! [0-9]+/)) {
        pattern = substr($0, RSTART, RLENGTH);
        sub(/Hello! /, "", pattern);
        print pattern;
        $0 = substr($0, RSTART + RLENGTH);
    }
}

#2


1  

GNU awk

GNU awk

awk 'BEGIN{ RS="Hello! ";}
{
    gsub(/[^0-9].*/,"",$1)
    if ($1 != ""){ 
        print $1 
    }
}' file

#3


1  

This is gawk syntax. It also works for patterns when there's no fixed text that can work as a record separator and doesn't match over linefeeds:

这是呆呆的语法。它也适用于没有固定文本作为记录分隔符且在换行符上不匹配的模式:

 {
     pattern = "([a-g]+|[h-z]+)"
     while (match($0, pattern, arr))
     {
         val = arr[1]
         print val
         sub(pattern, "")
     }
 }

#4


0  

There is no gawk function to match the same pattern multiple times in a line. Unless you know exactly how many times the pattern repeats.

没有gawk函数可以在一行中多次匹配相同的模式。除非你知道这个模式重复了多少次。

Having this, you have to iterate "manually" on all matches in the same line. For your example input, it would be:

有了这个,您必须对同一行中的所有匹配项进行“手动”迭代。对于您的示例输入,它将是:

{
  from = 0
  pos = match( $0, /Hello! ([0-9]+)/, val )
  while( 0 < pos )
  {
    print val[1]
    from += pos + val[0, "length"]
    pos = match( substr( $0, from ), /Hello! ([0-9]+)/, val )
  }
}

If the pattern shall match over a linefeed, you have to modify the input record separator - RS

如果模式在换行符上匹配,则必须修改输入记录分隔符- RS

#1


10  

This is a simple syntax, and every awk (nawk, mawk, gawk, etc) can use this.

这是一个简单的语法,每个awk (nawk、mawk、gawk等)都可以使用它。

{
    while (match($0, /Hello! [0-9]+/)) {
        pattern = substr($0, RSTART, RLENGTH);
        sub(/Hello! /, "", pattern);
        print pattern;
        $0 = substr($0, RSTART + RLENGTH);
    }
}

#2


1  

GNU awk

GNU awk

awk 'BEGIN{ RS="Hello! ";}
{
    gsub(/[^0-9].*/,"",$1)
    if ($1 != ""){ 
        print $1 
    }
}' file

#3


1  

This is gawk syntax. It also works for patterns when there's no fixed text that can work as a record separator and doesn't match over linefeeds:

这是呆呆的语法。它也适用于没有固定文本作为记录分隔符且在换行符上不匹配的模式:

 {
     pattern = "([a-g]+|[h-z]+)"
     while (match($0, pattern, arr))
     {
         val = arr[1]
         print val
         sub(pattern, "")
     }
 }

#4


0  

There is no gawk function to match the same pattern multiple times in a line. Unless you know exactly how many times the pattern repeats.

没有gawk函数可以在一行中多次匹配相同的模式。除非你知道这个模式重复了多少次。

Having this, you have to iterate "manually" on all matches in the same line. For your example input, it would be:

有了这个,您必须对同一行中的所有匹配项进行“手动”迭代。对于您的示例输入,它将是:

{
  from = 0
  pos = match( $0, /Hello! ([0-9]+)/, val )
  while( 0 < pos )
  {
    print val[1]
    from += pos + val[0, "length"]
    pos = match( substr( $0, from ), /Hello! ([0-9]+)/, val )
  }
}

If the pattern shall match over a linefeed, you have to modify the input record separator - RS

如果模式在换行符上匹配,则必须修改输入记录分隔符- RS