Shell编程之文本处理

时间:2021-03-08 17:19:28

cut 截取自定列

可以按照某个字符进行分割,然后取出其中的指定列:

[root@iz8vbbqbnh4ug2q9so5jflz logs]# cat  localhost_access_log.--.txt
140.205.201.30 - - [/Dec/::: +] "GET / HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /rs-status HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "POST /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "POST /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "POST /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "POST /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "POST /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "POST /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "POST /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "POST /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "POST /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "POST /phpmyadmin/ HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /ganglia/index.php HTTP/1.1" -
164.132.91.1 - - [/Dec/::: +] "GET / HTTP/1.1" -
114.215.45.101 - - [/Dec/::: +] "GET / HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /index.php HTTP/1.1" -
140.205.201.30 - - [/Dec/::: +] "GET /jobs/ HTTP/1.1" -
[root@iz8vbbqbnh4ug2q9so5jflz logs]# cat  localhost_access_log.--.txt |cut -d ' ' -f
"GET
"GET
"GET
"POST
"GET
"POST
"GET
"POST
"GET
"POST
"GET
"POST
"GET
"POST
"GET
"POST
"GET
"POST
"GET
"POST
"GET
"POST
"GET
"GET
"GET
"GET
"GET

可以指定更多的列:

[root@iz8vbbqbnh4ug2q9so5jflz logs]# cat  localhost_access_log.--.txt |cut -d ' ' -f ,,
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
- - [/Dec/:::
[root@iz8vbbqbnh4ug2q9so5jflz logs]# cat  localhost_access_log.--.txt |cut -d ' ' -f ,,-
- - "GET / HTTP/1.1" -
- - "GET /rs-status HTTP/1.1" -
- - "GET /phpmyadmin/ HTTP/1.1" -
- - "POST /phpmyadmin/ HTTP/1.1" -
- - "GET /phpmyadmin/ HTTP/1.1" -
- - "POST /phpmyadmin/ HTTP/1.1" -
- - "GET /phpmyadmin/ HTTP/1.1" -
- - "POST /phpmyadmin/ HTTP/1.1" -
- - "GET /phpmyadmin/ HTTP/1.1" -
- - "POST /phpmyadmin/ HTTP/1.1" -
- - "GET /phpmyadmin/ HTTP/1.1" -
- - "POST /phpmyadmin/ HTTP/1.1" -
- - "GET /phpmyadmin/ HTTP/1.1" -
- - "POST /phpmyadmin/ HTTP/1.1" -
- - "GET /phpmyadmin/ HTTP/1.1" -
- - "POST /phpmyadmin/ HTTP/1.1" -
- - "GET /phpmyadmin/ HTTP/1.1" -
- - "POST /phpmyadmin/ HTTP/1.1" -
- - "GET /phpmyadmin/ HTTP/1.1" -
- - "POST /phpmyadmin/ HTTP/1.1" -
- - "GET /phpmyadmin/ HTTP/1.1" -
- - "POST /phpmyadmin/ HTTP/1.1" -
- - "GET /ganglia/index.php HTTP/1.1" -
- - "GET / HTTP/1.1" -
- - "GET / HTTP/1.1" -
- - "GET /index.php HTTP/1.1" -
- - "GET /jobs/ HTTP/1.1" -

sort 对列进行排序

例如,对tomcat访问日志,对请求响应返回大小进行排序:

cat localhost_access_log.--.txt |sort -t ' ' -k 

-t : 指定分隔符

-k : 指定排序的列

114.241.108.197 - - [/Dec/::: +] "GET /js/plugin/jquery-file-upload/js/vendor/tmpl.min.js HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /js/plugin/jquery-file-upload/js/vendor/tmpl.min.js HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /js/plugin/jquery-file-upload/js/vendor/tmpl.min.js HTTP/1.1"
223.72.82.98 - - [/Dec/::: +] "GET /js/plugin/jquery-file-upload/js/vendor/tmpl.min.js HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /js/plugin/jquery-file-upload/js/vendor/tmpl.min.js HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /js/plugin/jquery-file-upload/js/vendor/tmpl.min.js HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /img/logo-pale.png HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /img/logo-pale.png HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /img/logo-pale.png HTTP/1.1"
223.72.82.98 - - [/Dec/::: +] "GET /img/logo-pale.png HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /img/logo-pale.png HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /img/logo-pale.png HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /img/logo-pale.png HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /interview/detail.do?manageKey=15ba76c6fbeeccd2f8df875379ac88e9&targetPanel=dialog HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /interview/detail.do?manageKey=15ba76c6fbeeccd2f8df875379ac88e9&targetPanel=dialog HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /interview/detail.do?manageKey=15ba76c6fbeeccd2f8df875379ac88e9&targetPanel=dialog HTTP/1.1"

排序是由方向的,默认是升序排序,如果要降序排列,可以在列号后面增加一个r:

cat localhost_access_log.--.txt |sort -t ' ' -k 10r

最后要注意的是,这里的排序默认是按字符串的字典顺序排列的,如果要按其数值拍,则需要增加一个n:

 cat localhost_access_log.--.txt |sort -t ' ' -k 10n
114.241.108.197 - - [/Dec/::: +] "GET /css/smartadmin-production.css HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /css/smartadmin-production.css HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /css/smartadmin-production.css HTTP/1.1"
223.72.82.98 - - [/Dec/::: +] "GET /css/smartadmin-production.css HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /css/smartadmin-production.css HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /css/smartadmin-production.css HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /css/smartadmin-production.css HTTP/1.1"
112.65.193.14 - - [/Dec/::: +] "GET /js/jqueryui/1.10.3/jquery-ui.min.js HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /js/jqueryui/1.10.3/jquery-ui.min.js HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /js/jqueryui/1.10.3/jquery-ui.min.js HTTP/1.1"
114.241.108.197 - - [/Dec/::: +] "GET /js/jqueryui/1.10.3/jquery-ui.min.js HTTP/1.1"
223.72.82.98 - - [/Dec/::: +] "GET /js/jqueryui/1.10.3/jquery-ui.min.js HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /js/jqueryui/1.10.3/jquery-ui.min.js HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /js/jqueryui/1.10.3/jquery-ui.min.js HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "GET /js/jqueryui/1.10.3/jquery-ui.min.js HTTP/1.1"

由此可见,此网站最大的静态资源是这个jquery-ui.min.js文件。

uniq去重

 cat localhost_access_log.--.txt |cut -d ' ' -f , |sort -t ' ' -k 2n,|uniq
223.72.82.98
59.108.217.106
114.241.108.197
223.72.82.98
59.108.217.106
114.241.108.197
223.72.82.98
59.108.217.106
112.65.193.14
114.241.108.197
223.72.82.98
59.108.217.106
114.241.108.197
223.72.82.98
59.108.217.106
112.65.193.14
114.241.108.197
223.72.82.98
59.108.217.106

wc统计

[root@iZ25klm6k7uZ logs]# wc -l localhost_access_log.--.txt  统计行数
localhost_access_log.--.txt
[root@iZ25klm6k7uZ logs]# wc -w localhost_access_log.--.txt 统计词数
localhost_access_log.--.txt
[root@iZ25klm6k7uZ logs]# wc -m localhost_access_log.--.txt 共计字符数
localhost_access_log.--.txt
[root@iZ25klm6k7uZ logs]#

sed正则查找

用sed来查找500的日志信息:

[root@iZ25klm6k7uZ logs]# sed -n '/\b500\b/p' localhost_access_log.--.txt
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"

注意:-n和-p配合,表示只打印匹配的行。

awk正则匹配

用awk来查找500日志信息:

awk '($9 ~ /500/)' localhost_access_log.--.txt 

输出和上面的sed一样。

zwk有默认的分隔符,比如\t,空格等。如果要指定分隔符可以用-F。

zwk的强大之处在于它支持编程,格式如下:

awk pattern { action } 例如上面的查找500日志可以完整表达如下:

[root@iZ25klm6k7uZ logs]# awk -F ' ' '($9 ~ /500/){print }' localhost_access_log.--.txt
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
119.127.17.97 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"
59.108.217.106 - - [/Dec/::: +] "POST /interview/add.do HTTP/1.1"

同时查找500和404的日志:

awk -F ' ' '($9 ~ /500/ || $9 ~ /404/){print $1,$6,$7,$9}' localhost_access_log.--.txt

或者

awk -F ' ' '($9 ~ /500|404|400/){print $1,"-",$4,"-",$6,"-",$9}' localhost_access_log.--.txt