I have to create a Regex which extract specific field from large log file. I have created one but its not perfect as different type of occurane present in logs.
我必须创建一个从大型日志文件中提取特定字段的正则表达式。我创造了一个,但它不完美,因为日志中存在不同类型的事件。
I have attached screenshot. There are 2 different type of log entry and I want to extract real="value".
我附上截图。有两种不同类型的日志条目,我想提取real =“value”。
The problem is, multiple "real=value" present in extracted line and I want to get only first occurance. +
问题是,在提取的行中存在多个“real = value”,我想只获得第一次出现。 +
My Regex:
CMS-concurrent-abortable-preclean:\s.*real=(?P<cms_abortable_preclean>\d+\.\d+)\s
Screen Shot: Sample data and Regex command
屏幕截图:示例数据和Regex命令
Sample Data:
2017-05-16T13:21:47.420+0200: 5.114: [GC (Allocation Failure) 2017-05-16T13:21:47.420+0200: 5.114: [ParNew2017-05-16T13:21:47.461+0200: 5.155: [CMS-concurrent-abortable-preclean: 0.120/0.735 secs] [Times: user=1.17 sys=0.12, real=0.73 secs] : 886080K->110720K(996800K), 0.3158400 secs] 886080K->161751K(6180736K), 0.3168208 secs] [Times: user=0.33 sys=0.10, real=0.32 secs]
2017-05-16T13:21:47.420 + 0200:5.114:[GC(分配失败)2017-05-16T13:21:47.420 + 0200:5.114:[ParNew2017-05-16T13:21:47.461 + 0200:5.155:[ CMS-并发 - 流产 - 预清洁:0.120 / 0.735秒] [时间:用户= 1.17 sys = 0.12,实际= 0.73秒]:886080K-> 110720K(996800K),0.3158400秒] 886080K-> 161751K(6180736K),0.3168208秒] [时间:用户= 0.33 sys = 0.10,实际= 0.32秒]
1.583: [CMS-concurrent-abortable-preclean: 0.052/0.171 secs] [Times: user=0.20 sys=0.01, real=0.17 secs] CMS: abort preclean due to time 8077.162: [CMS-concurrent-abortable-preclean: 4.850/5.566 secs] [Times: user=5.92 sys=0.02, real=5.57 secs]
1.583:[CMS-并发 - 流产 - 预清洁:0.052 / 0.171秒] [时间:用户= 0.20 sys = 0.01,实际= 0.17秒] CMS:由于时间而中止预清洁8077.162:[CMS-并发 - 流产 - 预清洁:4.850 /5.566秒] [时间:用户= 5.92 sys = 0.02,真实= 5.57秒]
I want to extract fields in bold.
我想以粗体提取字段。
2 个解决方案
#1
0
You could use a regex like this:
你可以使用这样的正则表达式:
^.*?Times.*?real=(\d+(?:\.\d+))
The idea is to capture the first real
belonging to Times
for each line
这个想法是为每一行捕获第一个属于Times的真实属性
#2
0
One approach:
(CMS-concurrent-abortable-preclean:).*?(?=.*:)(?<=real=)(\d+\.\d+)
Demo
The first group match: CMS-concurrent-abortable-preclean:
And the second is: (\d+\.\d+)
第一组匹配:CMS-concurrent-abortable-preclean:第二组是:(\ d + \。\ d +)
Fast test:
perl -lne 'print "$1 and $2" while/(CMS-concurrent-abortable-preclean:).*?(?=.*:)(?<=real=)(\d+\.\d+)/g;' file
the output:
CMS-concurrent-abortable-preclean: and 0.73
CMS-concurrent-abortable-preclean: and 0.17
#1
0
You could use a regex like this:
你可以使用这样的正则表达式:
^.*?Times.*?real=(\d+(?:\.\d+))
The idea is to capture the first real
belonging to Times
for each line
这个想法是为每一行捕获第一个属于Times的真实属性
#2
0
One approach:
(CMS-concurrent-abortable-preclean:).*?(?=.*:)(?<=real=)(\d+\.\d+)
Demo
The first group match: CMS-concurrent-abortable-preclean:
And the second is: (\d+\.\d+)
第一组匹配:CMS-concurrent-abortable-preclean:第二组是:(\ d + \。\ d +)
Fast test:
perl -lne 'print "$1 and $2" while/(CMS-concurrent-abortable-preclean:).*?(?=.*:)(?<=real=)(\d+\.\d+)/g;' file
the output:
CMS-concurrent-abortable-preclean: and 0.73
CMS-concurrent-abortable-preclean: and 0.17