I have a log file which I am trying to scan for patterns and count the number of times certain patterns are seen. The log looks like this
我有一个日志文件,我试图扫描模式,并计算看到某些模式的次数。日志看起来像这样
11298 [out] [worker:83] data has been rebuilt.
11298 [out] [worker:83]数据已经重建。
11299 [out] [worker:83] END Building data for foo
11299 [out] [worker:83] END为foo建立数据
11299 [out] [worker:83] END Building data for bar
11299 [out] [worker:83] END建筑数据吧
11300 [out] [worker:83] BEGIN Building data for baz
11300 [out] [worker:83] BEGIN为baz建立数据
11301 [err] [worker:83] Putin bombed Syria
11301 [错误] [工人:83]普京轰炸了叙利亚
I am interested in all the lines starting with [out] and containing END, BEGIN or rebuilt (need to count them separately). So I thought the following regex
我对以[out]开头并包含END,BEGIN或重建的所有行感兴趣(需要单独计算)。所以我认为以下的正则表达式
(out.*END)*
would match the patch the pattern out] anything here END
multiple times, but it only gives me the first instance of out
in my file and stops. Can someone point me in the right direction ?
将补丁模式匹配]这里任何东西END多次,但它只给我在我的文件中的第一个实例并停止。有人能指出我正确的方向吗?
I am doing this in MATLAB with the syntax regexp(txt,expr,'start')
我在MATLAB中使用语法regexp(txt,expr,'start')执行此操作
1 个解决方案
#1
2
Here's a doc: http://www.mathworks.com/help/matlab/ref/regexp.html
这是一个doc:http://www.mathworks.com/help/matlab/ref/regexp.html
regexp
by default returns all matches.
regexp默认返回所有匹配项。
Try playing with options and outkeys (passed as 3rd argument). It looks like this one could help: 'dotexceptnewline'
- your regex is greedy and probably matches whole thing (from first out
to last END
).
尝试使用选项和outkeys(作为第三个参数传递)。看起来这个可以帮助:'dotexceptnewline' - 你的正则表达式是贪婪的,可能匹配整个事物(从第一个到最后一个END)。
Try using outkey 'match'
instead of 'start'
.
尝试使用outkey'匹配'而不是'开始'。
Also check if your results aren't truncated by mistake.
还要检查结果是否未被错误截断。
#1
2
Here's a doc: http://www.mathworks.com/help/matlab/ref/regexp.html
这是一个doc:http://www.mathworks.com/help/matlab/ref/regexp.html
regexp
by default returns all matches.
regexp默认返回所有匹配项。
Try playing with options and outkeys (passed as 3rd argument). It looks like this one could help: 'dotexceptnewline'
- your regex is greedy and probably matches whole thing (from first out
to last END
).
尝试使用选项和outkeys(作为第三个参数传递)。看起来这个可以帮助:'dotexceptnewline' - 你的正则表达式是贪婪的,可能匹配整个事物(从第一个到最后一个END)。
Try using outkey 'match'
instead of 'start'
.
尝试使用outkey'匹配'而不是'开始'。
Also check if your results aren't truncated by mistake.
还要检查结果是否未被错误截断。