I have something like this:
我有这样的东西:
<ValuesPeaks>
<Peak Start="244" Stop="248" Max="245" XValue="149" YValue="100.0000"/>
<Peak Start="361" Stop="368" Max="366" XValue="173.2" YValue="96.2713"/>
<ValuesPeaks>
Except they are a lot longer and I have about 300 sets of <ValuesPeaks>
. How can I extract only the XValue and YValue elements of everything? I thought I can do xpathSApply('//ValuesPeaks[XValue]',xmlValue)
, but its not working. I then thought I can do toString.XMLNode()
then use regexpr()
and substr()
to obtain what I want but that seems inefficient. I think I'm missing something. Please share your expertise. Thanks.
但是它们要长得多我有大约300套
p<-list.files()[[1]]
library(XML)
x<-xmlParse(p)
getNodeSet(x,'//Data/RESULT/*/*/*/ValuesPeaks/Peak')
f<-xpathSApply(x,'//Data/RESULT/*/*/*/ValuesPeaks/Peak')
t<-toString.XMLNode(f)
2 个解决方案
#1
2
There are a few ways to extract those attributes. It all depends on what you want the result to look like. Here are a couple of examples.
有几种方法可以提取这些属性。这完全取决于你希望结果是什么样子。这里有几个例子。
The first uses xmlAttrs()
and subsets the results.
第一个使用xmlAttrs()并对结果进行子集设置。
xpathApply(doc, "//ValuesPeaks//*", function(x) xmlAttrs(x)[c("XValue", "YValue")])
# [[1]]
# XValue YValue
# "149" "100.0000"
#
# [[2]]
# XValue YValue
# "173.2" "96.2713"
The second is likely more efficient. It uses an XPath statement to get the two relevant attributes.
第二种可能更有效。它使用XPath语句获取两个相关属性。
xpathSApply(doc, "//ValuesPeaks//@*[name()='XValue' or name()='YValue']")
# XValue YValue XValue YValue
# "149" "100.0000" "173.2" "96.2713"
You could even do
你甚至可以做的
sapply(unname(xmlToList(doc)), "[", c("XValue", "YValue"))
# [,1] [,2]
# XValue "149" "173.2"
# YValue "100.0000" "96.2713"
Data:
数据:
txt <- '<ValuesPeaks>
<Peak Start="244" Stop="248" Max="245" XValue="149" YValue="100.0000"/>
<Peak Start="361" Stop="368" Max="366" XValue="173.2" YValue="96.2713"/>
</ValuesPeaks>'
library(XML)
doc <- xmlParse(txt)
#2
2
Your XML is malformed (the second ValuePeaks
tag needs a /
to make it a closing tag), which causes xml2::read_xml
to complain. read_html
actually automatically fixes it though, so you can do
您的XML格式不正确(第二个valuepeak标记需要a /使其成为结束标记),这导致xml2::read_xml抱怨。read_html实际上会自动修复它,所以你可以这么做
library(xml2)
library(tidyverse)
x <- '<ValuesPeaks>
<Peak Start="244" Stop="248" Max="245" XValue="149" YValue="100.0000"/>
<Peak Start="361" Stop="368" Max="366" XValue="173.2" YValue="96.2713"/>
<ValuesPeaks>'
df <- x %>%
read_html() %>%
xml_find_all('//peak') %>% {
data_frame(xvalue = xml_attr(., 'xvalue'),
yvalue = xml_attr(., 'yvalue'))
} %>%
type_convert()
df
#> # A tibble: 2 x 2
#> xvalue yvalue
#> <dbl> <dbl>
#> 1 149.0 100.0000
#> 2 173.2 96.2713
#1
2
There are a few ways to extract those attributes. It all depends on what you want the result to look like. Here are a couple of examples.
有几种方法可以提取这些属性。这完全取决于你希望结果是什么样子。这里有几个例子。
The first uses xmlAttrs()
and subsets the results.
第一个使用xmlAttrs()并对结果进行子集设置。
xpathApply(doc, "//ValuesPeaks//*", function(x) xmlAttrs(x)[c("XValue", "YValue")])
# [[1]]
# XValue YValue
# "149" "100.0000"
#
# [[2]]
# XValue YValue
# "173.2" "96.2713"
The second is likely more efficient. It uses an XPath statement to get the two relevant attributes.
第二种可能更有效。它使用XPath语句获取两个相关属性。
xpathSApply(doc, "//ValuesPeaks//@*[name()='XValue' or name()='YValue']")
# XValue YValue XValue YValue
# "149" "100.0000" "173.2" "96.2713"
You could even do
你甚至可以做的
sapply(unname(xmlToList(doc)), "[", c("XValue", "YValue"))
# [,1] [,2]
# XValue "149" "173.2"
# YValue "100.0000" "96.2713"
Data:
数据:
txt <- '<ValuesPeaks>
<Peak Start="244" Stop="248" Max="245" XValue="149" YValue="100.0000"/>
<Peak Start="361" Stop="368" Max="366" XValue="173.2" YValue="96.2713"/>
</ValuesPeaks>'
library(XML)
doc <- xmlParse(txt)
#2
2
Your XML is malformed (the second ValuePeaks
tag needs a /
to make it a closing tag), which causes xml2::read_xml
to complain. read_html
actually automatically fixes it though, so you can do
您的XML格式不正确(第二个valuepeak标记需要a /使其成为结束标记),这导致xml2::read_xml抱怨。read_html实际上会自动修复它,所以你可以这么做
library(xml2)
library(tidyverse)
x <- '<ValuesPeaks>
<Peak Start="244" Stop="248" Max="245" XValue="149" YValue="100.0000"/>
<Peak Start="361" Stop="368" Max="366" XValue="173.2" YValue="96.2713"/>
<ValuesPeaks>'
df <- x %>%
read_html() %>%
xml_find_all('//peak') %>% {
data_frame(xvalue = xml_attr(., 'xvalue'),
yvalue = xml_attr(., 'yvalue'))
} %>%
type_convert()
df
#> # A tibble: 2 x 2
#> xvalue yvalue
#> <dbl> <dbl>
#> 1 149.0 100.0000
#> 2 173.2 96.2713