在bash脚本中提取文件的两个表达式之间的行(使用regexp, sed)

时间:2022-05-17 17:07:57

I've a log file with many lines, I've to extract lines from session start to session end using a bash script, for further analysis.

我有一个包含许多行的日志文件,我必须使用bash脚本从会话开始到会话结束提取行,以便进行进一步分析。

...
...

## TSM-INSTALL SESSION (pid) started at yyyy/mm/dd hh:mm:ss for host (variable) ##
...
...
...
...
...
...
...
## TSM-INSTALL SESSION (pid) ended at yyyy/mm/dd hh:mm:ss for host (variable) ##

...
...

I've googled and found a sed expression to extract the lines

我在谷歌上搜索并找到了一个sed表达式来提取代码行

sed '/start_pattern_here/,/end_pattern_here/!d' inputfile

But I'm unable to find a correct reg expression pattern to extract the info.

但是我无法找到一个正确的reg表达式模式来提取信息。

I'm pretty novice to reg exp. I'm also adding all the expressions (silly ones too) I've tried inside the script.

我是reg exp的新手,我还添加了所有的表达式(愚蠢的),我已经在脚本中尝试过了。

sed '/\.* started at \.* $server ##/,/\.* ended at \.* $server ##/!d' file

sed '/## TSM-INSTALL SESSION [0-9]\+ started at [0-9|\\|:]\+ for host $server ##/,/## TSM-INSTALL SESSION [0-9]\+ ended at [0-9|\\|:]\+ for host $server ##/!d' file

sed '/.\{30\}started{34\}$server ##$/,/.\{30\}ended{34\}$server ##$/!d' file

sed '/.## TSM-INSTALL SESSION\{6\}started at\{31\}$server ##$/,/.## TSM-INSTALL SESSION\{6\}ended at\{31\}$server ##$/!d' file

sed '/## TSM-INSTALL SESSION [0-9]+ started at .* $server/,/## TSM-INSTALL SESSION [0-9]+ ended at .* $server/!d' file

sed '/## TSM-INSTALL SESSION \.\.\.\.\. started at \.\.\.\.\.\.\.\.\.\. \.\.\.\.\.\.\.\. for host $server ##/,/## TSM-INSTALL SESSION \.\.\.\.\. ended at \.\.\.\.\.\.\.\.\.\. \.\.\.\.\.\.\.\. for host $server ##/!d' file

2 个解决方案

#1


3  

Why not:

为什么不:

$(sed "/^## TSM-INSTALL SESSION .* started .* $server ##/,/^## TSM-INSTALL SESSION .* ended .* $server ##/!d" file)

You don't need to get fancy with the regexps. All you care about is the leading TSM-INSTALL SESSION, the started or ended, and the hostname, so use .* to mean "whatever in-between".

你不需要对regexp有兴趣。您所关心的是主要的TSM-INSTALL会话、启动或结束会话和主机名,因此使用.*表示“介于两者之间的内容”。

#2


0  

If you stick this in a file called file.sed

如果您将它插入一个名为file.sed的文件中

/^## TSM-INSTALL SESSION ([0-9][0-9]*) started at [0-9][0-9]*\/[0-9][0-9]\/[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] or host ([^)]*) ##/,/^## TSM-INSTALL SESSION ([0-9][0-9]*) ended at [0-9][0-9]*\/[0-9][0-9]\/[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] or host ([^)]*) ##/p

and then call it like

然后这样称呼它

sed -n -f file.sed inputfile 

I think it will do what you want.

我想它会做你想做的。

The -n makes sed not print, so only the lines matched by expression will get printed.

n使sed不能打印,因此只有与表达式匹配的行才能打印。

#1


3  

Why not:

为什么不:

$(sed "/^## TSM-INSTALL SESSION .* started .* $server ##/,/^## TSM-INSTALL SESSION .* ended .* $server ##/!d" file)

You don't need to get fancy with the regexps. All you care about is the leading TSM-INSTALL SESSION, the started or ended, and the hostname, so use .* to mean "whatever in-between".

你不需要对regexp有兴趣。您所关心的是主要的TSM-INSTALL会话、启动或结束会话和主机名,因此使用.*表示“介于两者之间的内容”。

#2


0  

If you stick this in a file called file.sed

如果您将它插入一个名为file.sed的文件中

/^## TSM-INSTALL SESSION ([0-9][0-9]*) started at [0-9][0-9]*\/[0-9][0-9]\/[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] or host ([^)]*) ##/,/^## TSM-INSTALL SESSION ([0-9][0-9]*) ended at [0-9][0-9]*\/[0-9][0-9]\/[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] or host ([^)]*) ##/p

and then call it like

然后这样称呼它

sed -n -f file.sed inputfile 

I think it will do what you want.

我想它会做你想做的。

The -n makes sed not print, so only the lines matched by expression will get printed.

n使sed不能打印,因此只有与表达式匹配的行才能打印。