SED(或其他发现和替换);改变嵌入式标签

时间:2022-06-23 20:11:20

I have many instances following this format in .xml file:

我在.xml文件中有这么多格式的实例:

<FFFFF>
    <BBBBB>
         "good B data"
    </BBBBB>
    <BBBBB>
         "more good B data"
    </BBBBB>
</FFFFF>


<AAAAA>
    <BBBBB>
         "some data"
    </BBBBB>
    <BBBBB>
         "more B data"
    </BBBBB>
</AAAAA>

I am trying to remove the A tags, and rename the B tags that are in the A tags; so the final result would be: (please note, renaming the B tags to any tags would also be fine, they just cannot be B anymore)

我试图删除A标签,并重命名A标签中的B标签;所以最终的结果是:(请注意,将B标签重命名为任何标签也没问题,它们就不能再为B了)

<FFFFF>
    <BBBBB>
         "good B data"
    </BBBBB>
    <BBBBB>
         "more good B data"
    </BBBBB>
</FFFFF>

 <AAAAA>
      "some data"
 </AAAAA>
 <AAAAA>
      "more B data"
 </AAAAA>

I have been messing around with sed, but I cannot figure out how to do it. There is no set number of B tags in each A (some have none, some may have 20, etc.). The other issue is that I don't want to remove the B tags that are present elsewhere; so I cant do a simple find and replace on B tags as that would alter the ones embedded in .

我一直在搞乱sed,但我无法弄清楚如何做到这一点。每个A中没有固定数量的B标签(有些没有,有些可能有20个,等等)。另一个问题是我不想删除其他地方存在的B标记;所以我不能在B标签上进行简单的查找和替换,因为这会改变嵌入的标签。

Any assistance appreciated, thanks!

任何帮助表示赞赏,谢谢!

2 个解决方案

#1


1  

$ cat file
<FFFFF>
    <BBBBB>
         "good B data"
    </BBBBB>
    <BBBBB>
         "more good B data"
    </BBBBB>
</FFFFF>


<AAAAA>
    <BBBBB>
         "some data"
    </BBBBB>
    <BBBBB>
         "more B data"
    </BBBBB>
</AAAAA>

$ cat tst.awk
BEGIN{ remove="AAAAA"; changeFrom="BBBBB"; changeTo="XXXXX" }

$1 ~ "^<" remove ">$" {
    inRemove = 1
    next
}

inRemove {
    if ($1 ~ "^</" remove ">$") {
        inRemove = 0
        next
    }
    else if ($1 ~ "^</?" changeFrom ">$") {
        sub(changeFrom,changeTo)
    }
    sub(/^    /,"")
}

{ print }

$ awk -f tst.awk file
<FFFFF>
    <BBBBB>
         "good B data"
    </BBBBB>
    <BBBBB>
         "more good B data"
    </BBBBB>
</FFFFF>


<XXXXX>
     "some data"
</XXXXX>
<XXXXX>
     "more B data"
</XXXXX>

#2


0  

sed '/^<AAAAA>/,/^<\/AAAAA>/ {
   /^<\/*AAAAA>/ s/^<\/*AAAAA>//
   /^<\/*AAAAA>/ !{
      s/^\([[:space:]]*\)<\(\/*\)BBBBB>/\1<\2AAAAA>/
      }
   }' YourFile
  1. This is for your sample so maybe it could be usefull to use a variable for the TAG to search/modify
  2. 这适用于您的样本,因此使用变量进行搜索/修改TAG可能非常有用

  3. Space in front of modified tag (indent) is unchanged
  4. 修改后的标签(缩进)前面的空格不变

  5. Line containing old are just empty but still there
  6. 包含旧的行只是空的但仍然存在

#1


1  

$ cat file
<FFFFF>
    <BBBBB>
         "good B data"
    </BBBBB>
    <BBBBB>
         "more good B data"
    </BBBBB>
</FFFFF>


<AAAAA>
    <BBBBB>
         "some data"
    </BBBBB>
    <BBBBB>
         "more B data"
    </BBBBB>
</AAAAA>

$ cat tst.awk
BEGIN{ remove="AAAAA"; changeFrom="BBBBB"; changeTo="XXXXX" }

$1 ~ "^<" remove ">$" {
    inRemove = 1
    next
}

inRemove {
    if ($1 ~ "^</" remove ">$") {
        inRemove = 0
        next
    }
    else if ($1 ~ "^</?" changeFrom ">$") {
        sub(changeFrom,changeTo)
    }
    sub(/^    /,"")
}

{ print }

$ awk -f tst.awk file
<FFFFF>
    <BBBBB>
         "good B data"
    </BBBBB>
    <BBBBB>
         "more good B data"
    </BBBBB>
</FFFFF>


<XXXXX>
     "some data"
</XXXXX>
<XXXXX>
     "more B data"
</XXXXX>

#2


0  

sed '/^<AAAAA>/,/^<\/AAAAA>/ {
   /^<\/*AAAAA>/ s/^<\/*AAAAA>//
   /^<\/*AAAAA>/ !{
      s/^\([[:space:]]*\)<\(\/*\)BBBBB>/\1<\2AAAAA>/
      }
   }' YourFile
  1. This is for your sample so maybe it could be usefull to use a variable for the TAG to search/modify
  2. 这适用于您的样本,因此使用变量进行搜索/修改TAG可能非常有用

  3. Space in front of modified tag (indent) is unchanged
  4. 修改后的标签(缩进)前面的空格不变

  5. Line containing old are just empty but still there
  6. 包含旧的行只是空的但仍然存在