基础环境:CentOS6.7、python3.6
需求描述:采集日志中的关键字,并对数据进行分析,按分钟统计总量,按关键字(name=*****)统计分量。
日志片段:
<2017-10-29 21:53:43> <WARN> related characters name=tryxxxx111001 count=3
<2017-10-29 21:53:43> <WARN> related characters name=tryxxxx111002 count=1
<2017-10-29 21:53:43> <WARN> related characters name=tryxxxx111003 count=43
<2017-10-29 21:54:53> <WARN> related characters name=tryxxxx111001 count=1
<2017-10-29 21:54:54> <WARN> related characters name=tryxxxx111002 count=3
<2017-10-29 21:54:54> <WARN> related characters name=tryxxxx111003 count=12
<2017-10-29 21:55:03> <WARN> related characters name=tryxxxx111001 count=2
<2017-10-29 21:55:03> <WARN> related characters name=tryxxxx111001 count=3
<2017-10-29 21:55:03> <WARN> related characters name=tryxxxx111000 count=2
程序如下:
#!/usr/bin/env python# -*- coding:utf-8 -*-
#filename:loganalysis.py
import sys
import re
filename = sys.argv[1]
countdict = {}
realtime = ''
count = 0
with open(filename,'r') as f:
for a in f.readlines():
####采集关键信息,第一个(.*)抓取时间只到分钟部分,第二个(.*)抓取‘xxxx111000’段信息,第三个(.*)抓取count=后面的次数信息
catchlist = re.findall('\<(.*)\:.*related characters name=(.*) count=(\d+)',a)
if len(catchlist) != 0:
realtime = catchlist[0][0]
name = catchlist[0][1]
count = catchlist[0][2]
count += int(count)
####结果存放在字典中,按{time:{name:count}}存放。
if realtime in countdict:
if name in countdict[realtime]:
countdict[realtime][name] += int(count)
else:
countdict[realtime][name] = int(count)
else:
countdict[realtime] = {name: int(count)}
for a in countdict:
finalcount = 0
for b in countdict[a]:
print(a,b,' ',countdict[a][b])
finalcount += countdict[a][b]
print('time:',a,'total:',finalcount)
运行方式为 python3 loganalysis.py logfilename