I have some XHTML (but really any XML will do) like this:
我有一些XHTML(但实际上任何XML)是这样的:
<h1>
Hello<span class='punctuation'>,</span>
<span class='noun'>World<span class='punctuation'>!</span>
</h1>
How do I get the full content of the <h1/>
as a String in Ruby? As in:
如何将
的全部内容作为Ruby中的字符串?如:assert_equal "Hello, World!", h1_node.some_method_that_aggregates_all_content
Do any of the XML frameworks (Nokogiri, libxml-ruby, &c.) have this sort of thing built-in? If not, I feel like a Y-Combinator might the right tool for the job, but I can't quite figure out what it would look like.
有任何XML框架(Nokogiri、libxml-ruby等)内置这种东西吗?如果不是,我觉得Y-Combinator可能是最适合这个工作的工具,但我不太清楚它是什么样的。
2 个解决方案
#1
3
With Nokogiri you can just ask for the text
of a node. The issue I see when doing that though is that all of the whitespace and newlines that are in that node will be returned, so you might want to strip those out (likely a better way to do that than what I did for this example).
使用Nokogiri,您可以只要求节点的文本。当我这样做时,我看到的问题是,该节点中的所有空格和换行都将被返回,所以您可能想要删除它们(可能比我在本例中所做的更好)。
Here is a sample:
这是一个示例:
def test_nokogiri_text
value = Nokogiri::HTML.parse(<<-HTML_END)
"<h1>
Hello<span class='punctuation'>,</span>
<span class='noun'>World<span class='punctuation'>!</span>
</h1>"
HTML_END
h1_node = value.search("h1").first
assert_equal("Hello, World!", h1_node.text.split(/\s+/).join(' ').strip)
end
#2
2
Nokogiri's Nokogiri::XML::Node#content will do it:
Nokogiri的Nokogiri::XML::Node#content将完成它:
irb(main):020:0> node
=> <h1>
Hello<span class="punctuation">,</span>
<span class="noun">World<span class="punctuation">!</span>
</span>
</h1>
irb(main):021:0> node.content
=> "\n Hello,\n World!\n\n"
#1
3
With Nokogiri you can just ask for the text
of a node. The issue I see when doing that though is that all of the whitespace and newlines that are in that node will be returned, so you might want to strip those out (likely a better way to do that than what I did for this example).
使用Nokogiri,您可以只要求节点的文本。当我这样做时,我看到的问题是,该节点中的所有空格和换行都将被返回,所以您可能想要删除它们(可能比我在本例中所做的更好)。
Here is a sample:
这是一个示例:
def test_nokogiri_text
value = Nokogiri::HTML.parse(<<-HTML_END)
"<h1>
Hello<span class='punctuation'>,</span>
<span class='noun'>World<span class='punctuation'>!</span>
</h1>"
HTML_END
h1_node = value.search("h1").first
assert_equal("Hello, World!", h1_node.text.split(/\s+/).join(' ').strip)
end
#2
2
Nokogiri's Nokogiri::XML::Node#content will do it:
Nokogiri的Nokogiri::XML::Node#content将完成它:
irb(main):020:0> node
=> <h1>
Hello<span class="punctuation">,</span>
<span class="noun">World<span class="punctuation">!</span>
</span>
</h1>
irb(main):021:0> node.content
=> "\n Hello,\n World!\n\n"