R:将节点插入特定位置的xml树中

时间:2022-09-13 09:19:36

Data

I have an xml file with a structure like this (large example to show the needed flexibility):

我有一个像这样的结构的xml文件(显示所需灵活性的大例子):

<rootnode sth="something" descr="ex">
  <tag sth="sth1" descr="ex" anoAttr="sth2">
    <tag sth="sth3" descr="ex2" searchA="sth4" anoAttr="sth5">
      <tag sth="sth6" descr="ex3" oAttr="sth7" searchA="sth8" anoAttr="sth9">
        <tag sth="sth10" descr="ex4" oAttr="sth11" searchA="sth12" anoAttr="sth13">
          <someContent/>
        </tag>
        <someContent/>
      </tag>
      <tag sth="sth14" descr="ex5" oAttr="sth15" searchA="sth16" anoAttr="sth17">
        <someContent/>
      </tag>
      <tag sth="sth1" descr="ex6" oAttr="sth15" searchA="sth18" anoAttr="sth17">
        <someContent/>
      </tag>
    </tag>
    <tag sth="sth10" descr="ex2" oAttr="sth19" searchA="sth20" anoAttr="sth9">
      <someContent/>
    </tag>
    <tag sth="sth10" descr="ex7" searchA="sth21" anoAttr="sth13">
      <tag sth="sth21" descr="ex8" oAttr="sth22" searchA="sth23" anoAttr="sth9">
        <tag sth="sth23" descr="ex9" oAttr="sth22" searchA="sth24" anoAttr="sth5">
          <someContent/>
        </tag>
        <someContent/>
      </tag>
    </tag>
  </tag>
  <otherNode>
    <someNode/>
  </otherNode>
</rootnode>

Specifically, the size of any of the tag nodes is unknown, the number of attributes is not equal for all tag nodes and the values of the attributes are not unique.
What I do know, however, is that the value of the searchA attribute is unique. Also, only tag nodes can contain an attribute called searchA and all of them except the top level one do.

具体而言,任何标记节点的大小都是未知的,所有标记节点的属性数量不相等,并且属性的值不是唯一的。然而,我所知道的是searchA属性的值是唯一的。此外,只有标记节点可以包含一个名为searchA的属性,所有这些属性除*一个外都有。

Before

I first parse this document using the XML package with the function xmlTreeParse() and store the root node. I then create a new node using newXMLNode().

我首先使用带有函数xmlTreeParse()的XML包解析此文档并存储根节点。然后,我使用newXMLNode()创建一个新节点。

xmlfile = xmlTreeParse(filename, useInternalNodes = TRUE)
xmltop = xmlRoot(xmlfile)
newNode = newXMLNode(name = "newlyCreatedNode")

Goal

My goal is to insert my newly created newNode as a child of the node that has a certain value (for example "sth23") as the searchA attribute.
So in this case I want the result to look like this (notice the <newlyCreatedNode/> near the bottom):

我的目标是将我新创建的newNode作为具有特定值的节点(例如“sth23”)的子节点插入searchA属性。所以在这种情况下我希望结果看起来像这样(注意底部附近的 ):

<rootnode sth="something" descr="ex">
  <tag sth="sth1" descr="ex" anoAttr="sth2">
    <tag sth="sth3" descr="ex2" searchA="sth4" anoAttr="sth5">
      <tag sth="sth6" descr="ex3" oAttr="sth7" searchA="sth8" anoAttr="sth9">
        <tag sth="sth10" descr="ex4" oAttr="sth11" searchA="sth12" anoAttr="sth13">
          <someContent/>
        </tag>
        <someContent/>
      </tag>
      <tag sth="sth14" descr="ex5" oAttr="sth15" searchA="sth16" anoAttr="sth17">
        <someContent/>
      </tag>
      <tag sth="sth1" descr="ex6" oAttr="sth15" searchA="sth18" anoAttr="sth17">
        <someContent/>
      </tag>
    </tag>
    <tag sth="sth10" descr="ex2" oAttr="sth19" searchA="sth20" anoAttr="sth9">
      <someContent/>
    </tag>
    <tag sth="sth10" descr="ex7" searchA="sth21" anoAttr="sth13">
      <tag sth="sth21" descr="ex8" oAttr="sth22" searchA="sth23" anoAttr="sth9">
        <tag sth="sth23" descr="ex9" oAttr="sth22" searchA="sth24" anoAttr="sth5">
          <someContent/>
        </tag>
        <someContent/>
        <newlyCreatedNode/>
      </tag>
    </tag>
  </tag>
  <otherNode>
    <someNode/>
  </otherNode>
</rootnode>

Basically, in this case addChildren(xmltop[[1]][[3]][[1]], kids = list(newNode)) gets me the result that I want. Of course I do not want to specify [[1]][[3]][[1]].

基本上,在这种情况下,addChildren(xmltop [[1]] [[3]] [[1]],kids = list(newNode))获取我想要的结果。当然我不想指定[[1]] [[3]] [[1]]。

What I tried

I can get a list of all relevant nodes with xmlElementsByTagName() and get all attributes with xmlAttrs(). I can even get a logical index vector which gives me the correct location.

我可以使用xmlElementsByTagName()获取所有相关节点的列表,并使用xmlAttrs()获取所有属性。我甚至可以得到一个逻辑索引向量,它给我正确的位置。

listOfNodes = xmlElementsByTagName(el = xmltop, "tag", recursive = T)
attributeList = lapply(listOfNodes, FUN = function(x) xmlAttrs(x))
indexVector = sapply(attributeList, FUN = function(x) x["searchA"] == "sth23")
indexVector[is.na(indexVector)] = FALSE
listOfNodes[indexVector]

What I do not know is how to use this information to insert my node into the tree at the correct location.
listOfNodes[indexVector] gives me the correct node, but it is now a list and not a node I can use addChildren() on.
Even if I somehow managed to map the indexVector and the xmlSize() of all nodes to the correct indices that I could use on xmltop directly, I would still have the problem of a variable number of double brackets (xmltop[[1]][[3]] vs xmltop[[1]][[2]][[1]]).

我不知道的是如何使用此信息将我的节点插入到正确位置的树中。 listOfNodes [indexVector]为我提供了正确的节点,但它现在是一个列表而不是我可以使用addChildren()的节点。即使我以某种方式设法将indexVector和所有节点的xmlSize()映射到我可以直接在xmltop上使用的正确索引,我仍然会遇到可变数量的双括号问题(xmltop [[1]] [ [3]] vs xmltop [[1]] [[2]] [[1]])。

I have also tried several other functions of the XML package, including xmlApply, getNodeLocation and getNodeSet, but they did not seem to help.

我还尝试了XML包的其他几个函数,包括xmlApply,getNodeLocation和getNodeSet,但它们似乎没有帮助。

What I have not really tried

I do not really understand the difference of xmlTreeParse(), xmlInternalTreeParse() and xmlTreeParse(useInternalNodes = T) and I cannot wrap my head around XPath, so I did not get very far trying to use it.

我真的不明白xmlTreeParse(),xmlInternalTreeParse()和xmlTreeParse(useInternalNodes = T)的区别,我不能把我的头包裹在XPath中,所以我没有尝试使用它。

Any helpful pointers would be much appreciated.

任何有用的指针将不胜感激。

1 个解决方案

#1


0  

The reason for my confusion was the help page for ?xmlElementsByTagName. It says there:

我混淆的原因是?xmlElementsByTagName的帮助页面。它说:

"The addition of the recursive argument makes this function behave like the getElementsByTagName in other language APIs such as Java, C\#. However, one should be careful to understand that in those languages, one would get back a set of node objects. These nodes have references to their parents and children. Therefore one can navigate the tree from each node, find its relations, etc. In the current version of this package (and for the forseeable future), the node set is a “copy” of the nodes in the original tree. And these have no facilities for finding their siblings or parent."

“添加递归参数使得此函数的行为类似于其他语言API中的getElementsByTagName,例如Java,C \#。但是,应该注意理解,在这些语言中,人们会获得一组节点对象。节点具有对其父节点和子节点的引用。因此,可以从每个节点导航树,找到它的关系等。在该包的当前版本中(并且对于可预见的未来),节点集是该节点的“副本”。原始树中的节点。这些节点没有找到兄弟姐妹或父母的设施。“

This made me think that the function returns a list of copies instead of references to the nodes themselves.
This might possibly be the case if the xml was parsed with the flag useInternalNodes of the xmlTreeParse() function set to FALSE, but if it is set to TRUE when parsing, the list returned by xmlElementsByTagName() seems to contain the actual references.
These can easily be manipulated using for example addChildren().

这让我觉得该函数返回一个副本列表而不是对节点本身的引用。如果使用设置为FALSE的xmlTreeParse()函数的标志useInternalNodes解析xml,可能就是这种情况,但如果在解析时将其设置为TRUE,则xmlElementsByTagName()返回的列表似乎包含实际引用。这些可以使用例如addChildren()轻松操作。

In short, the very simple solution to my problem is:

简而言之,我的问题的简单解决方案是:

addChildren(listOfNodes[indexVector], kids = list(newNode))

#1


0  

The reason for my confusion was the help page for ?xmlElementsByTagName. It says there:

我混淆的原因是?xmlElementsByTagName的帮助页面。它说:

"The addition of the recursive argument makes this function behave like the getElementsByTagName in other language APIs such as Java, C\#. However, one should be careful to understand that in those languages, one would get back a set of node objects. These nodes have references to their parents and children. Therefore one can navigate the tree from each node, find its relations, etc. In the current version of this package (and for the forseeable future), the node set is a “copy” of the nodes in the original tree. And these have no facilities for finding their siblings or parent."

“添加递归参数使得此函数的行为类似于其他语言API中的getElementsByTagName,例如Java,C \#。但是,应该注意理解,在这些语言中,人们会获得一组节点对象。节点具有对其父节点和子节点的引用。因此,可以从每个节点导航树,找到它的关系等。在该包的当前版本中(并且对于可预见的未来),节点集是该节点的“副本”。原始树中的节点。这些节点没有找到兄弟姐妹或父母的设施。“

This made me think that the function returns a list of copies instead of references to the nodes themselves.
This might possibly be the case if the xml was parsed with the flag useInternalNodes of the xmlTreeParse() function set to FALSE, but if it is set to TRUE when parsing, the list returned by xmlElementsByTagName() seems to contain the actual references.
These can easily be manipulated using for example addChildren().

这让我觉得该函数返回一个副本列表而不是对节点本身的引用。如果使用设置为FALSE的xmlTreeParse()函数的标志useInternalNodes解析xml,可能就是这种情况,但如果在解析时将其设置为TRUE,则xmlElementsByTagName()返回的列表似乎包含实际引用。这些可以使用例如addChildren()轻松操作。

In short, the very simple solution to my problem is:

简而言之,我的问题的简单解决方案是:

addChildren(listOfNodes[indexVector], kids = list(newNode))