在节点中删除重复的值

时间:2022-06-02 07:38:39

I have been trying to figure out how to remove elements with duplicate values from an XML document using XSLT.

我一直在研究如何使用XSLT从XML文档中删除具有重复值的元素。

For example: Input:

例如:输入:

<main>
   <h1>
      <node1>duplicate</node1>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
      <node1>duplicate</node1>
   </h1>
</main>

expected output:

预期的输出:

<main>
   <h1>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
   </h1>
</main>

I'm sure this must not be too complicated but I am failing to understand any methods I have seen so far. Thanks!

我确信这一定不会太复杂,但我还没有理解到目前为止我所见过的任何方法。谢谢!

Thanks to Michael below! I have a further question, if the above example had more nodes (which would never be duplicate), for example

由于迈克尔下面!我还有一个问题,例如,如果上面的例子有更多的节点(它们永远不会重复)

 <main>
   <h1>
      <node1>duplicate</node1>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
      <node1>duplicate</node1>
      <node2> Data </node2>

   </h1>
</main>

How would I bring this data through in the XSLT code? The below solution removes any additional data I have found despite my understanding of the identity transform to copy all, and the match to modify only matching templates.

如何在XSLT代码中传递这些数据?下面的解决方案删除了我发现的任何额外数据,尽管我理解了要复制所有的身份转换,并且匹配只修改匹配的模板。

2 个解决方案

#1


2  

Here's one way:

这里有一个方法:

XSLT 2.0

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each-group select="node1" group-by=".">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each-group>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Here's another:

这是另一个:

XSLT 2.0

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each select="distinct-values(node1)">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Added:

To process other nodes under the h1 header, add the following instruction:

要处理h1报头下的其他节点,请添加以下指令:

<xsl:apply-templates select="* except node1"/>

For example (in the first case):

例如(在第一种情况下):

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each-group select="node1" group-by=".">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each-group>
        <xsl:apply-templates select="* except node1"/>
    </xsl:copy>
</xsl:template>

#2


3  

Do note that the currently accepted answer is incorrect!

请注意,当前接受的答案是不正确的!

Even the 3rd solution, which doesn't lose elements is incorrect, because it doesn't preserve the order of the elements.

即使是第三种不丢失元素的解决方案也是错误的,因为它没有保持元素的顺序。

Given this XML document:

鉴于这个XML文档:

 <main>
   <h1>
      <node1>duplicate</node1>
      <node2> Data </node2>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
      <node1>duplicate</node1>
   </h1>
</main>

the last (3rd) transformation in the accepted answer:

在已接受的答案中的最后(3)转换:

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each-group select="node1" group-by=".">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each-group>
        <xsl:apply-templates select="* except node1"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

produces result where <node2> is after all <node1> elements -- clearly not what one expects from an "identity" that loses duplicates:

产生的结果是, 毕竟是 元素——显然不是一个丢失重复的“标识”所期望的:

<?xml version="1.0" encoding="UTF-8"?>
<main>
   <h1>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
      <node2> Data </node2>
   </h1>
</main>

Now a correct and very short solution :)

现在,一个正确且非常简短的解决方案:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:key name="kNode1ByVal" match="h1/node1" use="."/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="h1/node1[not(. is key('kNode1ByVal',.)[1])]"/>
</xsl:stylesheet>

This produces the expected, correct results -- even when applied on the above XML document -- do note that the order of the <node1> and <node2> elements is preserved!:

这将产生预期的、正确的结果——即使应用于上面的XML文档——请注意, 元素的顺序是保留的!

<main>
   <h1>
      <node1>duplicate</node1>
      <node2> Data </node2>
      <node1>New data</node1>
      <node1>New data 2</node1>
   </h1>
</main>

#1


2  

Here's one way:

这里有一个方法:

XSLT 2.0

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each-group select="node1" group-by=".">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each-group>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Here's another:

这是另一个:

XSLT 2.0

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each select="distinct-values(node1)">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Added:

To process other nodes under the h1 header, add the following instruction:

要处理h1报头下的其他节点,请添加以下指令:

<xsl:apply-templates select="* except node1"/>

For example (in the first case):

例如(在第一种情况下):

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each-group select="node1" group-by=".">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each-group>
        <xsl:apply-templates select="* except node1"/>
    </xsl:copy>
</xsl:template>

#2


3  

Do note that the currently accepted answer is incorrect!

请注意,当前接受的答案是不正确的!

Even the 3rd solution, which doesn't lose elements is incorrect, because it doesn't preserve the order of the elements.

即使是第三种不丢失元素的解决方案也是错误的,因为它没有保持元素的顺序。

Given this XML document:

鉴于这个XML文档:

 <main>
   <h1>
      <node1>duplicate</node1>
      <node2> Data </node2>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
      <node1>duplicate</node1>
   </h1>
</main>

the last (3rd) transformation in the accepted answer:

在已接受的答案中的最后(3)转换:

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each-group select="node1" group-by=".">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each-group>
        <xsl:apply-templates select="* except node1"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

produces result where <node2> is after all <node1> elements -- clearly not what one expects from an "identity" that loses duplicates:

产生的结果是, 毕竟是 元素——显然不是一个丢失重复的“标识”所期望的:

<?xml version="1.0" encoding="UTF-8"?>
<main>
   <h1>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
      <node2> Data </node2>
   </h1>
</main>

Now a correct and very short solution :)

现在,一个正确且非常简短的解决方案:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:key name="kNode1ByVal" match="h1/node1" use="."/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="h1/node1[not(. is key('kNode1ByVal',.)[1])]"/>
</xsl:stylesheet>

This produces the expected, correct results -- even when applied on the above XML document -- do note that the order of the <node1> and <node2> elements is preserved!:

这将产生预期的、正确的结果——即使应用于上面的XML文档——请注意, 元素的顺序是保留的!

<main>
   <h1>
      <node1>duplicate</node1>
      <node2> Data </node2>
      <node1>New data</node1>
      <node1>New data 2</node1>
   </h1>
</main>