for some reason, many of XML files I use for certain purposes, are now having the following structure:
出于某种原因,我用于某些目的的许多XML文件现在具有以下结构:
<A1333>006</ANDfoo>
<A45>RO0</ANDfoo>
<A5652>5486465465</ANDfoo>
<A173>TEST DUMMY</ANDfoo>
<A1805>34566000</ANDfoo>
<A3>FKK</ANDfoo>
<A2>FKK</ANDfoo>
<A2002></ANDfoo>
<A9903>CV0000</ANDfoo>
<A558>
<B1>GHJ</B1>
<B5>101010</B5>
</ANDfoo>
All end tags are now having the same value. How can I replace the value from end tag with the correct value from opening tag in order to have a valid XML again. I tried using sed but no succesfull result so far. Can you please give an example using sed to do such replacement?
所有结束标记现在具有相同的值。如何使用开始标记中的正确值替换end标记中的值,以便再次获得有效的XML。我尝试使用sed但到目前为止没有成功的结果。你能举个例子用sed做这样的替换吗?
Thank you!
1 个解决方案
#1
0
sed -e 's/<\([^>]*\)>\([^<>]*\)<[^>]*>/<\1>\2<\/\1>/g; \ # fix Tags ending on same line
/^<[^\/>]*>$/h; \ # Push single opening tag
/^<\/[^>]*>$/{g;s/</<\//}' # Pop and fix single closing tag
This will fix tags start and end in the same line, as well as tags that contain one nesting level and that start and end on a separate line.
这将修复标记在同一行中的开始和结束,以及包含一个嵌套级别并在单独行上开始和结束的标记。
To get to this from an XML-oneliner you can use sed again:
要从XML-oneliner中获取此信息,您可以再次使用sed:
sed -e 's/\(<\/[^>]*>\)\s*/\1\n/g' \ # Break after closing tag
| sed -e 's/>\s*\(<\w*>\)/>\n\1/g' # Break before opening tag if not on beginning of line
#1
0
sed -e 's/<\([^>]*\)>\([^<>]*\)<[^>]*>/<\1>\2<\/\1>/g; \ # fix Tags ending on same line
/^<[^\/>]*>$/h; \ # Push single opening tag
/^<\/[^>]*>$/{g;s/</<\//}' # Pop and fix single closing tag
This will fix tags start and end in the same line, as well as tags that contain one nesting level and that start and end on a separate line.
这将修复标记在同一行中的开始和结束,以及包含一个嵌套级别并在单独行上开始和结束的标记。
To get to this from an XML-oneliner you can use sed again:
要从XML-oneliner中获取此信息,您可以再次使用sed:
sed -e 's/\(<\/[^>]*>\)\s*/\1\n/g' \ # Break after closing tag
| sed -e 's/>\s*\(<\w*>\)/>\n\1/g' # Break before opening tag if not on beginning of line