I am parsing an XML file using Nokogiri with the following snippet:
我正在使用Nokogiri解析XML文件,其中包含以下代码段:
doc.xpath('//root').each do |root|
puts "# ROOT found"
root.xpath('//page').each do |page|
puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
page.children.each do |content|
...
end
end
end
How can I parse through all elements in the page element? There are three different elements: image, text and video. How can I make a case statement for each element?
如何解析页面元素中的所有元素?有三个不同的元素:图像,文本和视频。如何为每个元素创建一个case语句?
2 个解决方案
#1
10
Honestly, you look pretty close to me..
老实说,你看起来非常接近我..
doc.xpath('//root').each do |root|
puts "# ROOT found"
root.xpath('//page').each do |page|
puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
page.children.each do |child|
case child.name
when 'image'
do_image_stuff
when 'text'
do_text_stuff
when 'video'
do_video_stuff
end
end
end
end
#2
5
Both Nokogiri's CSS and XPath accessors allow multiple tags to be specified, which can be useful for this sort of problem. Rather than walk through every tag in the document's page
tag:
Nokogiri的CSS和XPath访问器都允许指定多个标签,这对于这类问题很有用。而不是遍历文档页面标记中的每个标记:
require 'nokogiri'
doc = Nokogiri::XML('
<xml>
<body>
<image>image</image>
<text>text</text>
<video>video</video>
<other>other</other>
<image>image</image>
<text>text</text>
<video>video</video>
<other>other</other>
</body>
</xml>')
This is a search using CSS:
这是使用CSS的搜索:
doc.search('image, text, video').each do |node|
case node.name
when 'image'
puts node.text
when 'text'
puts node.text
when 'video'
puts node.text
else
puts 'should never get here'
end
end
# >> image
# >> image
# >> text
# >> text
# >> video
# >> video
Notice it returns the tags in the order that the CSS accessor specifies it. If you need the order of the tags in the document, you can use XPath:
请注意,它按CSS访问者指定的顺序返回标记。如果您需要文档中的标记顺序,则可以使用XPath:
doc.search('//image | //text | //video').each do |node|
puts node.text
end
# >> image
# >> text
# >> video
# >> image
# >> text
# >> video
In either case, the program should run faster because all the searching occurs in libXML, returning only the nodes you need for Ruby's processing.
在任何一种情况下,程序都应该运行得更快,因为所有搜索都发生在libXML中,只返回Ruby处理所需的节点。
If you need to restrict the search to within a <page>
tag you can do a search up front to find the page
node, then search underneath it:
如果您需要将搜索限制在
doc.at('page').search('image, text, video').each do |node|
...
end
or
doc.at('//page').search('//image | //text | //video').each do |node|
...
end
#1
10
Honestly, you look pretty close to me..
老实说,你看起来非常接近我..
doc.xpath('//root').each do |root|
puts "# ROOT found"
root.xpath('//page').each do |page|
puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
page.children.each do |child|
case child.name
when 'image'
do_image_stuff
when 'text'
do_text_stuff
when 'video'
do_video_stuff
end
end
end
end
#2
5
Both Nokogiri's CSS and XPath accessors allow multiple tags to be specified, which can be useful for this sort of problem. Rather than walk through every tag in the document's page
tag:
Nokogiri的CSS和XPath访问器都允许指定多个标签,这对于这类问题很有用。而不是遍历文档页面标记中的每个标记:
require 'nokogiri'
doc = Nokogiri::XML('
<xml>
<body>
<image>image</image>
<text>text</text>
<video>video</video>
<other>other</other>
<image>image</image>
<text>text</text>
<video>video</video>
<other>other</other>
</body>
</xml>')
This is a search using CSS:
这是使用CSS的搜索:
doc.search('image, text, video').each do |node|
case node.name
when 'image'
puts node.text
when 'text'
puts node.text
when 'video'
puts node.text
else
puts 'should never get here'
end
end
# >> image
# >> image
# >> text
# >> text
# >> video
# >> video
Notice it returns the tags in the order that the CSS accessor specifies it. If you need the order of the tags in the document, you can use XPath:
请注意,它按CSS访问者指定的顺序返回标记。如果您需要文档中的标记顺序,则可以使用XPath:
doc.search('//image | //text | //video').each do |node|
puts node.text
end
# >> image
# >> text
# >> video
# >> image
# >> text
# >> video
In either case, the program should run faster because all the searching occurs in libXML, returning only the nodes you need for Ruby's processing.
在任何一种情况下,程序都应该运行得更快,因为所有搜索都发生在libXML中,只返回Ruby处理所需的节点。
If you need to restrict the search to within a <page>
tag you can do a search up front to find the page
node, then search underneath it:
如果您需要将搜索限制在
doc.at('page').search('image, text, video').each do |node|
...
end
or
doc.at('//page').search('//image | //text | //video').each do |node|
...
end