I have seen several things on this, but nothing has seemed to work so far. I am parsing an xml via a url using nokogiri on rails 3 ruby 1.9.2.
我已经看到了几件事,但到目前为止似乎没有任何工作。我正在使用rails 3 ruby 1.9.2上的nokogiri通过url解析xml。
A snippet of the xml looks like this:
xml的片段如下所示:
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
I am trying to parse this out to get the text associated with the NewsLineText
我试图解析这个以获取与NewsLineText相关联的文本
r = node.at_xpath('.//newslinetext') if node.at_xpath('.//newslinetext')
s = node.at_xpath('.//newslinetext').text if node.at_xpath('.//newslinetext')
t = node.at_xpath('.//newslinetext').content if node.at_xpath('.//newslinetext')
puts r
puts s ? if s.blank? 'NOTHING' : s
puts t ? if t.blank? 'NOTHING' : t
What I get in return is
我得到的回报是
<newslinetext></newslinetext>
NOTHING
NOTHING
So I know my tags are named/spelled correctly to get at the newslinetext data, but the cdata text never shows up.
所以我知道我的标签被正确命名/拼写以获取newslinetext数据,但是cdata文本永远不会出现。
What do I need to do with nokogiri to get this text?
我需要用nokogiri来获取此文本?
2 个解决方案
#1
11
You're trying to parse XML using Nokogiri's HMTL parser. If node
as from the XML parser then r
would be nil
since XML is case sensitive; your r
is not nil
so you're using the HTML parser which is case insensitive.
您正在尝试使用Nokogiri的HMTL解析器解析XML。如果节点来自XML解析器,则r将为零,因为XML区分大小写;你的r不是零,所以你正在使用不区分大小写的HTML解析器。
Use Nokogiri's XML parser and you will get things like this:
使用Nokogiri的XML解析器,你会得到这样的东西:
>> r = doc.at_xpath('.//NewsLineText')
=> #<Nokogiri::XML::Element:0x8066ad34 name="NewsLineText" children=[#<Nokogiri::XML::Text:0x8066aac8 "\n ">, #<Nokogiri::XML::CDATA:0x8066a9c4 "\n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n ">, #<Nokogiri::XML::Text:0x8066a8d4 "\n">]>
>> r.text
=> "\n \n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n \n"
and you'll be able to get at the CDATA through r.text
or r.children
.
你将能够通过r.text或r.children获得CDATA。
#2
3
Ah I see. What @mu said is correct. But to get at the cdata directly, maybe:
啊,我明白了。 @mu所说的是对的。但要直接得到cdata,也许:
xml =<<EOF
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
EOF
node = Nokogiri::XML xml
cdata = node.search('NewsLineText').children.find{|e| e.cdata?}
#1
11
You're trying to parse XML using Nokogiri's HMTL parser. If node
as from the XML parser then r
would be nil
since XML is case sensitive; your r
is not nil
so you're using the HTML parser which is case insensitive.
您正在尝试使用Nokogiri的HMTL解析器解析XML。如果节点来自XML解析器,则r将为零,因为XML区分大小写;你的r不是零,所以你正在使用不区分大小写的HTML解析器。
Use Nokogiri's XML parser and you will get things like this:
使用Nokogiri的XML解析器,你会得到这样的东西:
>> r = doc.at_xpath('.//NewsLineText')
=> #<Nokogiri::XML::Element:0x8066ad34 name="NewsLineText" children=[#<Nokogiri::XML::Text:0x8066aac8 "\n ">, #<Nokogiri::XML::CDATA:0x8066a9c4 "\n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n ">, #<Nokogiri::XML::Text:0x8066a8d4 "\n">]>
>> r.text
=> "\n \n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n \n"
and you'll be able to get at the CDATA through r.text
or r.children
.
你将能够通过r.text或r.children获得CDATA。
#2
3
Ah I see. What @mu said is correct. But to get at the cdata directly, maybe:
啊,我明白了。 @mu所说的是对的。但要直接得到cdata,也许:
xml =<<EOF
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
EOF
node = Nokogiri::XML xml
cdata = node.search('NewsLineText').children.find{|e| e.cdata?}