I have a file which contains following text
我有一个包含以下文本的文件
<MY_TEXT="XYZ" PATH="MNO" #First occurrence of MY_TEXT
<location= "XYZ" path="ABC"
\location>
<R_DATA = MNOP
<Mylocation ="ghdf" stime=20150301 etime=20150401 >
<Mylocation ="ghdf" stime=20150401 etime=20150501 >
\R_DATA>
<Blah>
\MY_TEXT> #Second occurrence of MY_TEXT
<MY_TEXT="ABC" PATH="EFG" #Third occurrence of MY_TEXT
<location= "QQQ" path="LLL"
\location>
<R_DATA = MNOP
<Mylocation ="ghdf" stime=20150301 etime=20150401 >
<Mylocation ="ghdf" stime=20150401 etime=20150501 >
\R_DATA>
<Blah>
\MY_TEXT> #Fourth occurrence of MY_TEXT
My task is to find a text in line which has <MY_TEXT="XYZ"
, it may have spaces in start and then find its closing \MY_TEXT
So output is kind of
我的任务是找到一行
<MY_TEXT="XYZ" PATH="MNO" #First occurrence of MY_TEXT
<location= "XYZ" path="ABC"
\location>
<R_DATA = MNOP
<Mylocation ="ghdf" stime=20150301 etime=20150401 > #First occurrence of Mylocation
<Mylocation ="ghdf" stime=20150401 etime=20150501 > #Second occurrence of Mylocation
\R_DATA>
<Blah>
\MY_TEXT>
Then it finds last occurrence of Mylocation i.e #Second occurrence of Mylocation
here and modified the text etime=20150501
to something
and append a new line after it inline in the file.
然后它在此处找到Mylocation的最后一次出现,即#Second出现的Mylocation,并将文本etime = 20150501修改为某个内容,并在文件中内联后添加一个新行。
I came across this link Sed to extract text between two strings . But using sed command here either fetches me nothing when I use -n option or prints entire file when i remove -n . So I am not able to process the text further as I am not able to extract the text I want in the first place.
我遇到了这个链接Sed来提取两个字符串之间的文本。但是在这里使用sed命令或者在我使用-n选项时不取任何东西或在我删除-n时打印整个文件。所以我无法进一步处理文本,因为我无法首先提取我想要的文本。
I also tried sed -n '/^ *START=A *$/,/^ *END *$/p' yourfile
. But of no use. Can you guys help me as my scripting is not great. Thanks in advance.
我也试过sed -n'/ ^ * START = A * $ /,/ ^ * END * $ / p'yourfile。但没有用。你们可以帮助我,因为我的脚本不是很好。提前致谢。
2 个解决方案
#1
1
This is a little tricky with sed, but I'll have a go at it.
这对sed来说有点棘手,但我会对它有所了解。
Important note: This looks like a well-defined file format, but I don't recognize it. It might be prudent to see if there are tools that work on this format directly rather than treating it like a flat file the way sed must. It is very probable that such a solution would be shorter, easier to understand, and more robust than direct-text hackery.
重要说明:这看起来像一个明确定义的文件格式,但我不认识它。看看是否有直接使用这种格式的工具而不是像sed必须的那样将其视为平面文件可能是谨慎的。这种解决方案很可能比直接文本hackery更短,更容易理解,更健壮。
That said, you can use
也就是说,你可以使用
sed -n '/<MY_TEXT="XYZ"/ { :a /\\MY_TEXT>/! { N; ba }; s/\(.*\)\(<Mylocation\)/\1\\MY_TEXT>\n\2/; h; s/.*\\MY_TEXT>\n//; s/etime=[0-9]\+/etime=something/; s/\n/\n\n/; s/$/\\MY_TEXT>/; G; s/\(.*\)\\MY_TEXT>\n\(.*\)\\MY_TEXT>\n\(.*\)/\2\1/; p }' filename
Output:
输出:
<MY_TEXT="XYZ" PATH="MNO" #First occurrence of MY_TEXT
<location= "XYZ" path="ABC"
\location>
<R_DATA = MNOP
<Mylocation ="ghdf" stime=20150301 etime=20150401 >
<Mylocation ="ghdf" stime=20150401 etime=something >
\R_DATA>
<Blah>
\MY_TEXT>
The most confusing bit of this is the use of \MY_TEXT>\n
as a marker to separate the working chunks; this is done because we know it doesn't appear anywhere else in the text. \MY_TEXT>
first appears in the last line of the block we're working on, so there's never going to be a newline after it in the input data. (The code might be clearer with something else that doesn't appear in the text, but I don't know that of anything more obvious for certain).
最令人困惑的是使用\ MY_TEXT> \ n作为分隔工作块的标记;这样做是因为我们知道它没有出现在文本的任何其他地方。 \ MY_TEXT>首先出现在我们正在处理的块的最后一行,所以在输入数据中它之后永远不会是换行符。 (代码可能更清晰,其他内容没有出现在文本中,但我不知道任何更明显的东西)。
The code works as follows:
代码的工作原理如下:
#!/bin/sed -nf
/<MY_TEXT="XYZ"/ { # If we find the starter
# line:
:a
/\\MY_TEXT>/! { # fetch the rest of the
N # block into the
ba # pattern space
}
s/\(.*\)\(<Mylocation\)/\1\\MY_TEXT>\n\2/ # mark the place before
# the last Mylocation tag
h # copy that to the hold
# buffer
s/.*\\MY_TEXT>\n// # remove the stuff before
# the marker
s/etime=[0-9]\+/etime=something/ # replace the etime
# attribute
s/\n/\n\n/ # insert the new line
s/$/\\MY_TEXT>/ # put a marker at the end
G # fetch back the stuff
# from the hold buffer
s/\(.*\)\\MY_TEXT>\n\(.*\)\\MY_TEXT>\n\(.*\)/\2\1/ # replace the end chunk
# with the edited version
p # print the result.
}
#2
1
Simple solution is to use range
简单的解决方案是使用范围
awk '/<MY_TEXT="XYZ"/,/\\MY_TEXT/' file
<MY_TEXT="XYZ" PATH="MNO" #First occurrence of MY_TEXT
<location= "XYZ" path="ABC"
\location>
<R_DATA = MNOP
<Mylocation ="ghdf" stime=20150301 etime=20150401 >
<Mylocation ="ghdf" stime=20150401 etime=20150501 >
\R_DATA>
<Blah>
\MY_TEXT> #Second occurrence of MY_TEXT
Or sed
或者是sed
sed -n '/<MY_TEXT="XYZ"/,/\\MY_TEXT/p' file
<MY_TEXT="XYZ" PATH="MNO" #First occurrence of MY_TEXT
<location= "XYZ" path="ABC"
\location>
<R_DATA = MNOP
<Mylocation ="ghdf" stime=20150301 etime=20150401 >
<Mylocation ="ghdf" stime=20150401 etime=20150501 >
\R_DATA>
<Blah>
\MY_TEXT> #Second occurrence of MY_TEXT
#1
1
This is a little tricky with sed, but I'll have a go at it.
这对sed来说有点棘手,但我会对它有所了解。
Important note: This looks like a well-defined file format, but I don't recognize it. It might be prudent to see if there are tools that work on this format directly rather than treating it like a flat file the way sed must. It is very probable that such a solution would be shorter, easier to understand, and more robust than direct-text hackery.
重要说明:这看起来像一个明确定义的文件格式,但我不认识它。看看是否有直接使用这种格式的工具而不是像sed必须的那样将其视为平面文件可能是谨慎的。这种解决方案很可能比直接文本hackery更短,更容易理解,更健壮。
That said, you can use
也就是说,你可以使用
sed -n '/<MY_TEXT="XYZ"/ { :a /\\MY_TEXT>/! { N; ba }; s/\(.*\)\(<Mylocation\)/\1\\MY_TEXT>\n\2/; h; s/.*\\MY_TEXT>\n//; s/etime=[0-9]\+/etime=something/; s/\n/\n\n/; s/$/\\MY_TEXT>/; G; s/\(.*\)\\MY_TEXT>\n\(.*\)\\MY_TEXT>\n\(.*\)/\2\1/; p }' filename
Output:
输出:
<MY_TEXT="XYZ" PATH="MNO" #First occurrence of MY_TEXT
<location= "XYZ" path="ABC"
\location>
<R_DATA = MNOP
<Mylocation ="ghdf" stime=20150301 etime=20150401 >
<Mylocation ="ghdf" stime=20150401 etime=something >
\R_DATA>
<Blah>
\MY_TEXT>
The most confusing bit of this is the use of \MY_TEXT>\n
as a marker to separate the working chunks; this is done because we know it doesn't appear anywhere else in the text. \MY_TEXT>
first appears in the last line of the block we're working on, so there's never going to be a newline after it in the input data. (The code might be clearer with something else that doesn't appear in the text, but I don't know that of anything more obvious for certain).
最令人困惑的是使用\ MY_TEXT> \ n作为分隔工作块的标记;这样做是因为我们知道它没有出现在文本的任何其他地方。 \ MY_TEXT>首先出现在我们正在处理的块的最后一行,所以在输入数据中它之后永远不会是换行符。 (代码可能更清晰,其他内容没有出现在文本中,但我不知道任何更明显的东西)。
The code works as follows:
代码的工作原理如下:
#!/bin/sed -nf
/<MY_TEXT="XYZ"/ { # If we find the starter
# line:
:a
/\\MY_TEXT>/! { # fetch the rest of the
N # block into the
ba # pattern space
}
s/\(.*\)\(<Mylocation\)/\1\\MY_TEXT>\n\2/ # mark the place before
# the last Mylocation tag
h # copy that to the hold
# buffer
s/.*\\MY_TEXT>\n// # remove the stuff before
# the marker
s/etime=[0-9]\+/etime=something/ # replace the etime
# attribute
s/\n/\n\n/ # insert the new line
s/$/\\MY_TEXT>/ # put a marker at the end
G # fetch back the stuff
# from the hold buffer
s/\(.*\)\\MY_TEXT>\n\(.*\)\\MY_TEXT>\n\(.*\)/\2\1/ # replace the end chunk
# with the edited version
p # print the result.
}
#2
1
Simple solution is to use range
简单的解决方案是使用范围
awk '/<MY_TEXT="XYZ"/,/\\MY_TEXT/' file
<MY_TEXT="XYZ" PATH="MNO" #First occurrence of MY_TEXT
<location= "XYZ" path="ABC"
\location>
<R_DATA = MNOP
<Mylocation ="ghdf" stime=20150301 etime=20150401 >
<Mylocation ="ghdf" stime=20150401 etime=20150501 >
\R_DATA>
<Blah>
\MY_TEXT> #Second occurrence of MY_TEXT
Or sed
或者是sed
sed -n '/<MY_TEXT="XYZ"/,/\\MY_TEXT/p' file
<MY_TEXT="XYZ" PATH="MNO" #First occurrence of MY_TEXT
<location= "XYZ" path="ABC"
\location>
<R_DATA = MNOP
<Mylocation ="ghdf" stime=20150301 etime=20150401 >
<Mylocation ="ghdf" stime=20150401 etime=20150501 >
\R_DATA>
<Blah>
\MY_TEXT> #Second occurrence of MY_TEXT