如何多次匹配正则表达式?

时间:2022-09-13 12:20:30

I have a log file which I am trying to scan for patterns and count the number of times certain patterns are seen. The log looks like this

我有一个日志文件,我试图扫描模式,并计算看到某些模式的次数。日志看起来像这样


11298 [out] [worker:83] data has been rebuilt.

11298 [out] [worker:83]数据已经重建。

11299 [out] [worker:83] END Building data for foo

11299 [out] [worker:83] END为foo建立数据

11299 [out] [worker:83] END Building data for bar

11299 [out] [worker:83] END建筑数据吧

11300 [out] [worker:83] BEGIN Building data for baz

11300 [out] [worker:83] BEGIN为baz建立数据

11301 [err] [worker:83] Putin bombed Syria

11301 [错误] [工人:83]普京轰炸了叙利亚


I am interested in all the lines starting with [out] and containing END, BEGIN or rebuilt (need to count them separately). So I thought the following regex

我对以[out]开头并包含END,BEGIN或重建的所有行感兴趣(需要单独计算)。所以我认为以下的正则表达式

(out.*END)*

would match the patch the pattern out] anything here END multiple times, but it only gives me the first instance of out in my file and stops. Can someone point me in the right direction ?

将补丁模式匹配]这里任何东西END多次,但它只给我在我的文件中的第一个实例并停止。有人能指出我正确的方向吗?

I am doing this in MATLAB with the syntax regexp(txt,expr,'start')

我在MATLAB中使用语法regexp(txt,expr,'start')执行此操作

1 个解决方案

#1


2  

Here's a doc: http://www.mathworks.com/help/matlab/ref/regexp.html

这是一个doc:http://www.mathworks.com/help/matlab/ref/regexp.html

regexp by default returns all matches.

regexp默认返回所有匹配项。

Try playing with options and outkeys (passed as 3rd argument). It looks like this one could help: 'dotexceptnewline' - your regex is greedy and probably matches whole thing (from first out to last END).

尝试使用选项和outkeys(作为第三个参数传递)。看起来这个可以帮助:'dotexceptnewline' - 你的正则表达式是贪婪的,可能匹配整个事物(从第一个到最后一个END)。

Try using outkey 'match' instead of 'start'.

尝试使用outkey'匹配'而不是'开始'。

Also check if your results aren't truncated by mistake.

还要检查结果是否未被错误截断。

#1


2  

Here's a doc: http://www.mathworks.com/help/matlab/ref/regexp.html

这是一个doc:http://www.mathworks.com/help/matlab/ref/regexp.html

regexp by default returns all matches.

regexp默认返回所有匹配项。

Try playing with options and outkeys (passed as 3rd argument). It looks like this one could help: 'dotexceptnewline' - your regex is greedy and probably matches whole thing (from first out to last END).

尝试使用选项和outkeys(作为第三个参数传递)。看起来这个可以帮助:'dotexceptnewline' - 你的正则表达式是贪婪的,可能匹配整个事物(从第一个到最后一个END)。

Try using outkey 'match' instead of 'start'.

尝试使用outkey'匹配'而不是'开始'。

Also check if your results aren't truncated by mistake.

还要检查结果是否未被错误截断。