21、awk用法简介

awk -F ":" 'BEGIN{}; { };END{} ' files

awk [options] 'script' var=value file(s)

awk [options] -f scriptfile var=value file(s)

使用awk可以很方便处理结构化文本中每一行中的每一列。

section=$(sed '1,$p' ${everyname} | sed 's/,/ /g' | sed 's/: /:/g' | head -n 100 | awk '{if (($1 ~ ”hello”) && ($11 !~ ”worl”)) {section6=$6;print section6;}}')

1、简介

awk : providing a programming language instead of just editor commands。在awk中，

根据指定的分隔符(可以通过FS变量指定或者-F选项), 将文件的内容读出,并且根据条件可以指定要显示的字段。

我们可以：

■ Define variables to store data.（定义变量，存储数据）

■ Use arithmetic and string operators to operate on data.（通过算术或字符操作来操作数据）

■ Use structured programming concepts, such as if-then statements and loops, to add logic to your data processing.（有结构化编程概念，像C语言一样进行数据处理）

■ Generate formatted reports by extracting data elements within the data file and repositioning them in another order or format.（导出数据到文件，也可以重定向另一种模式）

awk选项有好多，常用的有：

F fs

Specify a file separator for delineating data fields in a line.

f file

Specify a filename to read the program from.

2、几个变量

ARGC 命令行变元个数

ARGV 命令行变元数组

FILENAME 当前输入文件名

FNR 当前文件中的记录号

FS 输入域分隔符，默认为一个空格

NF 当前记录里域个数

NR 到目前为止记录数（行数）

RS 记录分隔符(默认是一个换行符)。

～表示匹配

！～不匹配

OFS 输出域分隔符，默认空格

ORS 输出记录分隔符，默认回车

3、说明

1）模式 pattern可以是/ /包含的匹配形式

2）BEGIN（处理文件前的action，常包含FS、OFS等）、END（处理文件后的action）

3）条件与循环：if else（next，exit），for do while（continue，break）

4）数学运算符 + - * / % ^；数学函数sin int；字符串函数length index gsub substr等

5）数组与关联数组：a[1]; a[$1]; a[$0]; a[b[i]]

6）BEGIN、END块可有可无，分别只在开始和结束时执行一次。多个命令用分号相隔。

7）the gawk program defines data as records and data fields. A record is a line of data (delineated by the newline characters by default), and a data field is a separate data element within the line (delineated by a white space character, such as a space or tab, by default).

8)-v var=value Define a variable and default value used in the gawk program.

-mf N Specify the maximum number of fields to process in the data file.

-mr N Specify the maximum record size in the data file.

4、使用示例

1）awk '/101/' file 显示文件file中包含101的匹配行。

2）cat header.h | tail -n 5 | awk '{print NR,NF,$0}'

1 3 #define CONFIG_FILE "./config"

2 0

3 3 using namespace std;

4 0

5 1 #endif

3）awk -F '[ :\t|]' '{print $1}' file

按照正则表达式的值做为分隔符，这里代表空格、:、TAB、|同时做为分隔符。

4）awk '$1 ~ /101/ {print $1}' file 显示文件中第一个域匹配101的行（记录）。

5）awk 'BEGIN {system("echo \"Input your name:\\c\""); getline d;print "\nYour name is",d,"\b!\n"}' //实现交互

awk '{ i=1;while(i<NF) {print NF,$i;i++}}' file //实现循环

6) 通过从文件中读入脚本执行

$ cat script2

{ print $5 "’s userid is " $1 }

$ gawk -F":" -f script2 /etc/passwd

7)在命令行中赋值

$ cat script1

BEGIN{FS=","}

{print $n}

$ gawk -f script1 n=2 data1

$ gawk -f script1 n=3 data1

8）使用正则表示式

the regular expression must appear before the left brace of the program script that it controls:

gawk ’BEGIN{FS=","} /test/{print $1}’ data1 //从文件data1中一行一行的读入数据，把数据域的分隔符设为","，查找匹配test的行，并输入第一项数据域。

9）匹配操作符（The matching operator）

$1 ~ /^data/

This expression filters records where the first data field starts with the text data.

10）数学表达式

$ gawk -F: ’$4 == 0{print $1}’ /etc/passwd

displays the first data field value for all lines that contain the value 0 in the fourth data field.

11）要吧有if,for,while等结构。

12）在脚本中进行赋值

awk 'BEGIN{test="hello";print test}'

hello

13）模式

/正则表达式/ // /test/

关系表达式

模式匹配表达式～

14）$ awk '/root/,/mysql/' test将显示root第一次出现到mysql第一次出现之间的所有行。如果有一个模板没出现，则匹配到开头或末尾。

15）$ awk '/^(no|so)/' test //打印所有以模式no或so开头的行。

$ awk '/^[ns]/{print $1}' test—–如果记录以n或s开头，就打印这个记录。

$ awk '$1 ~/[0-9][0-9]$/(print $1}' test—–如果第一个域以两个数字结束就打印这个记录。

$ awk '$1 == 100 || $2 < 50' test—–如果第一个或等于100或者第二个域小于50，则打印该行。

$ awk '$1 != 10' test—–如果第一个域不等于10就打印该行。

$ awk '/test/{print $1 + 10}' test—–如果记录包含正则表达式test，则第一个域加10并打印出来。

参考

【1】 http://www.cnblogs.com/mydomain/archive/2010/10/14/1851755.html

【2】 http://www.cnblogs.com/mydomain/archive/2010/10/19/1856004.html

【3】 http://hi.baidu.com/linxhchina/blog/item/8cadc42a4897709b023bf640.html

【4】讲的相当详细

http://blog.microsuncn.com/?p=1232

秒客网

21、awk用法简介

相关文章