在linux中经常需要处理日志文件。主要是从日志文件中进行格式匹配,提取一些数据,然后对这些数据进行运算。
某日志文件test.log内容如下,要提取出Uin=xxxxx中的xxxxx。由于没有标准的一列一列的,用awk不好处理。直接用sed过滤即可。
ssh -p36000 -ltest 34.34.34.34 'cd /home/oicq/log; grep NoLoginSconnUinSendMsg talk_srv.log'
NoLoginSconnUinSendMsg Uin=957995 ShmCmd=2 shSeq=50207
NoLoginSconnUinSendMsg Uin=957995 ShmCmd=2 shSeq=61326
NoLoginSconnUinSendMsg Uin=957995 ShmCmd=2 shSeq=3490
NoLoginSconnUinSendMsg Uin=957995 ShmCmd=2 shSeq=29543
NoLoginSconnUinSendMsg Uin=957995 ShmCmd=2 shSeq=25852
NoLoginSconnUinSendMsg Uin=957995 ShmCmd=2 shSeq=25035
NoLoginSconnUinSendMsg Uin=957995 ShmCmd=2 shSeq=7185
NoLoginSconnUinSendMsg Uin=957995 ShmCmd=2 shSeq=51694
NoLoginSconnUinSendMsg Uin=957995 ShmCmd=2 shSeq=16633
[2010-08-24 14:24:56] (DealMsgFromCrm)NoLoginSconnUinSendMsg Uin=1597748713 ShmCmd=4 shSeq=1698
[2010-08-24 14:26:00] (DealMsgFromCrm)NoLoginSconnUinSendMsg Uin=284815483 ShmCmd=4 shSeq=3166
[2010-08-24 14:27:26] (DealMsgFromCrm)NoLoginSconnUinSendMsg Uin=284815483 ShmCmd=4 shSeq=44776
[2010-08-24 14:27:27] (DealMsgFromCrm)NoLoginSconnUinSendMsg Uin=284815483 ShmCmd=4 shSeq=41575
[2010-08-24 14:29:45] (DealMsgFromCrm)NoLoginSconnUinSendMsg Uin=402300830 ShmCmd=4 shSeq=51399
程序sed -rn 's/.*Uin=([0-9]+).*//1/p ' test.log |uniq
输出为:
957995
1597748713
284815483
402300830
然后可以对输出进行进一步处理。
如果日志文件格式标准,就可以用awk统计。如下面,性能日志经常是这样子的:
[2009-09-20 15:50:48] PID[18883] PkgRecv[10438] PkgSent[10437] ErrPkgSent[500]
[2009-09-20 15:51:48] PID[18883] PkgRecv[10735] PkgSent[10734] ErrPkgSent[523]
[2009-09-20 15:52:48] PID[18883] PkgRecv[10937] PkgSent[10934] ErrPkgSent[508]
如果需要统计PkgRecv的累加值,假设日志名称为perform.log,可以用以下方法:
sed -e s/"/["/" "/g -e s/"/]"/" "/g perform.log | awk '{sum+=$6}END{print sum}'
原理:通过sed,将'[',']'替换为空格,用awk的函数功能,将值相加