将带有重复节点的XML转换为R数据帧

时间:2020-12-26 16:33:14

I am trying to flatten XML with repeated nodes of the same name (but uniquely identified by an attribute value) to a flat data in R. The example I have is

我试图用重复的同名节点(但由属性值唯一标识)将XML展平为R中的平面数据。我的例子是

<?xml version="1.0"?>
<data>
<tr id="1">
    <A id="100">100</A>
    <B>abc</B>
    <C>true</C>
</tr>
<tr id="2">
    <A id="200">200</A>
    <A id="300">300</A>
    <B>wxyz</B>
    <C>FALSE</C>
</tr>
</data>

The desired result is a data.frame that would look like

期望的结果是看起来像的data.frame

tr     A     B     C
 1   100   abc  true
 2   200  wxyz FALSE
 2   300  wxyz FALSE

I have read the xml ...

我已经阅读了xml ...

library(XML)
xmlfile <- "H:/My Documents/Code/R/xml/example.xml"
xmldoc <- xmlTreeParse(xmlfile)

Using xpathSApply(), I can retrieve each node and attribute without problem, e.g.,

使用xpathSApply(),我可以毫无问题地检索每个节点和属性,例如,

data.frame(id = xpathSApply(xmldoc, "//A", xmlGetAttr, "id"))

but I fail to organize the whole lot in a data.frame, because the number of "A" nodes is larger (3) than the number of all the other nodes (2).

但我无法在data.frame中组织整个批次,因为“A”节点的数量(3)大于所有其他节点(2)的数量。

Any help will be greatly appreciated ...

任何帮助将不胜感激 ...

1 个解决方案

#1


0  

You probably need to create a data.frame for each node and combine the results.

您可能需要为每个节点创建一个data.frame并合并结果。

tr <- getNodeSet(xmldoc, "//tr")
x <- lapply(tr, function(x)  data.frame(tr = xpathSApply(x, "." , xmlGetAttr, "id"),
                                         A = xpathSApply(x, ".//A", xmlValue),
                                         B = xpathSApply(x, ".//B", xmlValue),
                                         C = xpathSApply(x, ".//C", xmlValue) ))

do.call("rbind", x)
  tr   A    B     C
1  1 100  abc  true
2  2 200 wxyz FALSE
3  2 300 wxyz FALSE

#1


0  

You probably need to create a data.frame for each node and combine the results.

您可能需要为每个节点创建一个data.frame并合并结果。

tr <- getNodeSet(xmldoc, "//tr")
x <- lapply(tr, function(x)  data.frame(tr = xpathSApply(x, "." , xmlGetAttr, "id"),
                                         A = xpathSApply(x, ".//A", xmlValue),
                                         B = xpathSApply(x, ".//B", xmlValue),
                                         C = xpathSApply(x, ".//C", xmlValue) ))

do.call("rbind", x)
  tr   A    B     C
1  1 100  abc  true
2  2 200 wxyz FALSE
3  2 300 wxyz FALSE