选择两个模式之间的第一个匹配。如果使用sed / awk / grep找到第三个模式,则重新开始搜索

时间:2022-05-13 16:51:38

I am struggling with the following task(I've been searching for answer for a while).

我正在努力完成以下任务(我一直在寻找答案)。

The search is for text between START_PATTERN and END_PATTERN1

搜索是在START_PATTERN和END_PATTERN1之间的文本

Having a file structured like this:

有一个像这样结构的文件:

text
text
...
START_PATTERN
line1
line2
END_PATTERN2
text
text
...
START_PATTERN
line1
line2
END_PATTERN1
text
text
...

The task would be to restart search if END_PATTERN2 is found. Thus the command output should be:

如果找到END_PATTERN2,任务将是重新开始搜索。因此命令输出应该是:

START_PATTERN
line1
line2
END_PATTERN1

Thank you for your time!

感谢您的时间!

3 个解决方案

#1


2  

this line should work for your example:

这一行应该适用于你的例子:

 tac file|sed '/END_PATTERN1/,/START_PAT/!d'|tac

test: (I added xx to the expected block lines):

测试:(我将xx添加到预期的块行):

kent$  cat f
text
text
...
START_PATTERN
line1
line2
END_PATTERN2
text
text
...
START_PATTERN
xxline1
xxline2
END_PATTERN1
text


kent$  tac f|sed '/END_PATTERN1/,/START_PAT/!d'|tac
START_PATTERN
xxline1
xxline2
END_PATTERN1

Edit

take only the first match, with awk only:

仅使用awk进行第一场比赛:

awk '{a[NR]=$0}
     /START_PAT/{s=NR}
     /END_PATTERN2/{s=0}
     /END_PATTERN1/{exit}
     END{for(i=s;i<=NR;i++)print a[i]}' file

#2


0  

I'd go about this by keeping a buffer of lines after the first pattern is found and resetting it if END_PATTERN2 is found:

我会通过在找到第一个模式后保留行缓冲区并在找到END_PATTERN2时重置它来解决这个问题:

awk 'x { next }
/START_PATTERN/ { n = 1; f = 1 } 
f { lines[n++] = $0 } 
/END_PATTERN1/ { f = 0; x = 1 } 
/END_PATTERN2/ { n = 1; f = 0 } 
END { for (i = 1; i < n; ++i) print lines[i] }' file

f is a flag to determine whether to save the current line to the buffer lines. n is a counter used to index the buffer. Once the file is processed, the first n lines in the buffer are printed.

f是用于确定是否将当前行保存到缓冲行的标志。 n是用于索引缓冲区的计数器。处理完文件后,将打印缓冲区中的前n行。

I've also added a variable x which, once set, causes all lines to be skipped. This means that only the first matching block will be saved.

我还添加了一个变量x,一旦设置,就会跳过所有行。这意味着只保存第一个匹配的块。

#3


0  

This might work for you (GNU sed):

这可能适合你(GNU sed):

sed -n '/START_PATTERN/!d;:a;N;/END_PATTERN2/d;/END_PATTERN1/!ba;p;d' file

Use the -n grep-like switch. Start collecting lines on finding START_PATTERN. Delete the collection if END_PATTERN2 is found. On finding END_PATTERN1 print the lines.

使用-n grep-like开关。在找到START_PATTERN时开始收集行。如果找到END_PATTERN2,则删除该集合。在找到END_PATTERN1时打印线条。

#1


2  

this line should work for your example:

这一行应该适用于你的例子:

 tac file|sed '/END_PATTERN1/,/START_PAT/!d'|tac

test: (I added xx to the expected block lines):

测试:(我将xx添加到预期的块行):

kent$  cat f
text
text
...
START_PATTERN
line1
line2
END_PATTERN2
text
text
...
START_PATTERN
xxline1
xxline2
END_PATTERN1
text


kent$  tac f|sed '/END_PATTERN1/,/START_PAT/!d'|tac
START_PATTERN
xxline1
xxline2
END_PATTERN1

Edit

take only the first match, with awk only:

仅使用awk进行第一场比赛:

awk '{a[NR]=$0}
     /START_PAT/{s=NR}
     /END_PATTERN2/{s=0}
     /END_PATTERN1/{exit}
     END{for(i=s;i<=NR;i++)print a[i]}' file

#2


0  

I'd go about this by keeping a buffer of lines after the first pattern is found and resetting it if END_PATTERN2 is found:

我会通过在找到第一个模式后保留行缓冲区并在找到END_PATTERN2时重置它来解决这个问题:

awk 'x { next }
/START_PATTERN/ { n = 1; f = 1 } 
f { lines[n++] = $0 } 
/END_PATTERN1/ { f = 0; x = 1 } 
/END_PATTERN2/ { n = 1; f = 0 } 
END { for (i = 1; i < n; ++i) print lines[i] }' file

f is a flag to determine whether to save the current line to the buffer lines. n is a counter used to index the buffer. Once the file is processed, the first n lines in the buffer are printed.

f是用于确定是否将当前行保存到缓冲行的标志。 n是用于索引缓冲区的计数器。处理完文件后,将打印缓冲区中的前n行。

I've also added a variable x which, once set, causes all lines to be skipped. This means that only the first matching block will be saved.

我还添加了一个变量x,一旦设置,就会跳过所有行。这意味着只保存第一个匹配的块。

#3


0  

This might work for you (GNU sed):

这可能适合你(GNU sed):

sed -n '/START_PATTERN/!d;:a;N;/END_PATTERN2/d;/END_PATTERN1/!ba;p;d' file

Use the -n grep-like switch. Start collecting lines on finding START_PATTERN. Delete the collection if END_PATTERN2 is found. On finding END_PATTERN1 print the lines.

使用-n grep-like开关。在找到START_PATTERN时开始收集行。如果找到END_PATTERN2,则删除该集合。在找到END_PATTERN1时打印线条。