使用bash脚本添加/删除xml标记

I have an xml file that I want to configure using a bash script. For example if I had this xml:

我有一个要使用bash脚本配置的xml文件。例如，如果我有这个xml:

<a>

  <b>
    <bb>
        <yyy>
            Bla 
        </yyy>
    </bb>
  </b>

  <c>
    <cc>
      Something
    </cc>
  </c>

  <d>
    bla
  </d>
</a>

(confidential info removed)

(机密信息删除)

I would like to write a bash script that will remove section  (or comment it) but keep the rest of the xml intact. I am pretty new the the whole scripting thing. I was wondering if anyone could give me a hint as to what I should look into.

我想编写一个bash脚本，它将删除section (或注释它)，但保留xml的其余部分。我对整个脚本编写都很陌生。我想知道是否有人能给我一个提示，告诉我应该调查什么。

I was thinking that sed could be used except sed is a line editor. I think it would be easy to remove the  tags however I am unsure if sed would be able to remove all the text between the  tags.

我认为除了sed是行编辑器之外，还可以使用sed。我认为删除标记很容易，但是我不确定sed是否能够删除标记之间的所有文本。

I will also need to write a script to add back the deleted section.

我还需要编写一个脚本来添加已删除的部分。

6 个解决方案

#1

This would not be difficult to do in sed, as sed also works on ranges.

这在sed中并不困难，因为sed也在范围中工作。

Try this (assuming xml is in a file named foo.xml):

试试这个(假设xml在一个名为foo.xml的文件中):

sed -i '/<b>/,/<\/b>/d' foo.xml

-i will write the change into the original file (use -i.bak to keep a backup copy of the original)

-我将把更改写入原始文件(使用-i。保留原作的备份)

This sed command will perform an action d (delete) on all of the lines specified by the range

这个sed命令将对范围指定的所有行执行操作d (delete)

# all of the lines between a line that matches <b>
# and the next line that matches <\/b>, inclusive
/<b>/,/<\/b>/

So, in plain English, this command will delete all of the lines between and including the line with and the line with

因此，在简单的英语中，这个命令将删除的行和的行之间的所有行

If you'd rather comment out the lines, try one of these:

如果你想注释掉这些台词，试试下面的一个:

# block comment
sed -i 's/<b>/<!-- <b>/; s/<\/b>/<\/b> -->/' foo.xml

# comment out every line in the range
sed -i '/<b>/,/<\/b>/s/.*/<!-- & -->/' foo.xml

#2

Using xmlstarlet:

使用xmlstarlet:

#xmlstarlet ed -d "/a/b" file.xml > tmp.xml
xmlstarlet ed -d "//b" file.xml > tmp.xml
mv tmp.xml file.xml

#3

You can use an XSLT such as this that is a modified identity transform. It copies all of the content by default, and has an empty template for b that does nothing(effectively deleting from output):

您可以使用这样的XSLT，它是一个修改后的身份转换。默认情况下复制所有内容，b有一个空模板，什么都不做(有效地从输出中删除):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<!--Identity transform copies all items by default -->
<xsl:template match="@* | node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<!--Empty template to match on b elements and prevent it from being copied to output -->
<xsl:template match="b"/>

</xsl:stylesheet>

Create a bash script that executes the transform using Java and the Xalan commandline utility like this:

创建一个bash脚本，使用Java和Xalan命令行实用程序执行转换，如下所示:

java org.apache.xalan.xslt.Process -IN foo.xml -XSL foo.xsl -OUT foo.out

java org.apache.xalan.xslt。过程——foo。xml xsl foo。xsl治疗foo.out

The result is this:

结果是这样的:

<?xml version="1.0" encoding="UTF-16"?><a><c><cc>
      Something
    </cc></c><d>
    bla
  </d></a>

EDIT: if you would prefer to have the b commented out, to make it easier to put back, then use this stylesheet:

编辑:如果您希望将b注释掉，以便更容易放回，那么使用这个样式表:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <!--Identity transform copies all items by default -->
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <!--Match on b element, wrap in a comment and construct text representing XML structure by applying templates in "comment" mode -->
    <xsl:template match="b">
        <xsl:comment>
            <xsl:apply-templates select="self::*" mode="comment" />
        </xsl:comment>
    </xsl:template>

    <xsl:template match="*" mode="comment">
        <xsl:value-of select="'&lt;'"/>
            <xsl:value-of select="name()"/>
        <xsl:value-of select="'&gt;'"/>
            <xsl:apply-templates select="@*|node()" mode="comment" />
        <xsl:value-of select="'&lt;/'"/>
            <xsl:value-of select="name()"/>
        <xsl:value-of select="'&gt;'"/>
    </xsl:template>

    <xsl:template match="text()" mode="comment">
        <xsl:value-of select="."/>
    </xsl:template>

    <xsl:template match="@*" mode="comment">
        <xsl:value-of select="name()"/>
        <xsl:text>="</xsl:text>
        <xsl:value-of select="."/>
        <xsl:text>" </xsl:text>
    </xsl:template>

</xsl:stylesheet>

It produces this output:

它产生该输出:

<?xml version="1.0" encoding="UTF-16"?><a><!--<b><bb><yyy>
            Bla
        </yyy></bb></b>--><c><cc>
      Something
    </cc></c><d>
    bla
  </d></a>

#4

If you want the most appropriate replacement for sed for XML data, it would be an XSLT processor. Like sed it's a complex language but specialized for the task of XML-to-anything transformations.

如果您希望最合适地替换为XML数据的sed，那么它应该是XSLT处理器。像sed一样，它是一种复杂的语言，但专门用于xml到任何东西的转换任务。

On the other hand, this does seem to be the point at which I would seriously consider switching to a real programming language, like Python.

另一方面，这似乎是我认真考虑切换到真正的编程语言(如Python)的重点。

#5

@OP, you can use awk eg

@OP，你可以使用awk eg

$ cat file
<a>                              

some text before   <b>
    <bb>
        <yyy>
            Bla
        </yyy>
    </bb>
  </b> some text after

  <c>
    <cc>
      Something
    </cc>
  </c>

  <d>
    bla
  </d>
</a>

$ awk 'BEGIN{RS="</b>"}/<b>/{gsub(/<b>.*/,"")}1' file
<a>

some text before
 some text after

  <c>
    <cc>
      Something
    </cc>
  </c>

  <d>
    bla
  </d>
</a>

#6

# edit file inplace
xmlstarlet ed -L -d "//b" file.xml

#1