awk字符串处理之分组函数match

这个大概是我10年前学会的，之前一直简单记了几句，没有系统整理过，今天整理下，希望对处于新手有所帮助。

match表达式

先来仔细瞧瞧manual

match(s, r [, a])

Return the position in s where the regular expression r occurs, or zero if r is not present, and set the values of RSTART and RLENGTH.

返回字符串 s 中正则表达式 r 首次匹配的位置（从1开始），如果 r 未匹配，则返回 0，并设置全局变量 RSTART 和 RLENGTH 的值。

Note that the argument order is the same as for the ˜ operator: str ˜ re.

参数的顺序与 ˜ 操作符（匹配操作符）的顺序相同：str ˜ re

具体来说就是跟下面例子顺序是一样的,字符串在左，正则在右，只不过由~变成了match(s, r)这种形式

h@d:~$ echo qt998dp698 | awk '{if($0  ~ /998/))print "yqr,xf 998!"}'
yqr,xf 998!
h@d:~$ echo qt998dp698 | awk '{if(match($0,/998/))print "yqr,xf 998!"}'
yqr,xf 998!

If array a is provided, a is cleared and then elements 1 through n are filled with the portions of s that match the corresponding parenthesized subexpression in r.

如果提供了数组 a，则 a 会被清空，用来存储s中符合r正则的从第1个开始的元素到第n个元素。

The zero’th element of a contains the portion of s matched by the entire regular expression r.

a[0] 包含整个正则表达式匹配的结果

Subscripts a[n, "start"], and a[n, "length"] provide the starting index in the string and length respectively, of each matching substring.

a[n, "start"] 和 a[n, "length"] 分别表示第n个匹配子字符串在原字符串中的起始位置和长度。

应用

返回匹配的位置

h@d:~$ echo qt998 | awk '{match($0,/[0-9]+/);print "RSTART",RSTART,"LENGTH",RLENGTH}'
RSTART 3 LENGTH 3
h@nb:~$ echo qt998dp698 | awk '{match($0,/[0-9]+/);print "RSTART",RSTART,"LENGTH",RLENGTH}'
RSTART 3 LENGTH 3
h@d:~$ echo qt99844dp698 | awk '{match($0,/[0-9]+/);print "RSTART",RSTART,"LENGTH",RLENGTH}'
RSTART 3 LENGTH 5

可以看出来，这个RSTART和RLENGTH是只计算了初次匹配的，也就是只匹配了一次，无法直接多次匹配

当然，可以自己写循环多次匹配

h@d:~$ echo qt998dp698 | awk '{str = $0; while (match(str, /[0-9]+/)) { print "RSTART:",RSTART,"RLEGTH:",RLENGTH,"Result is:",substr(str, RSTART, RLENGTH);str = substr(str, RSTART + RLENGTH)}}'
RSTART: 3 RLEGTH: 3 Result is: 998
RSTART: 3 RLEGTH: 3 Result is: 698

捕获组,这个有点儿类似sed的\1

h@d:~$ echo qt998 | sed  -nE 's/([a-z]+)([0-9]+)/\2/p'
998
h@d:~$ echo qt998 | awk '{match($0,/[0-9]+/,a);print a[0]}'
998
h@d:~$ echo qt998dp698 | awk '{match($0,/([0-9]+)([a-z]+)([0-9]+)/,a)}END{print a[0],a[1],a[3]}'
998dp698 998 698

秒客网

awk字符串处理之分组函数match

match表达式

应用

相关文章