
时间:2021-07-24 01:16:53

My code is supposed to "guess" the path(s) that lies before the relevant text nodes in my XML file. Relevant in this case means: text nodes nested within the recurring product/person/something tag, but not text nodes that are used outside of it.

我的代码应该“猜测”我的XML文件中相关文本节点之前的路径。在这种情况下相关意味着:文本节点嵌套在重复产品/ person / something标记内,但不包含在其外部使用的文本节点。

This code:

    @doc, items = Nokogiri.XML(@file), []

    path = []
    @doc.traverse do |node|
      if node.class.to_s == "Nokogiri::XML::Element"
        is_path_element = false
        node.children.each do |child|
          is_path_element = true if child.class.to_s == "Nokogiri::XML::Element"
        path.push(node.name) if is_path_element == true && !path.include?(node.name)
    final_path = "/"+path.reverse.join("/")

works for simple XML files, for example:


<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <title>Some XML file title</title>
    <description>Some XML file description</description>
      <title>Some product title</title>
      <brand>Some product brand</brand>
      <title>Some product title</title>
      <brand>Some product brand</brand>

puts final_path # => "/rss/channel/item"

But when it gets more complicated, how should I then approach the challenge? For example with this one:


<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <title>Some XML file title</title>
    <description>Some XML file description</description>
        <title>Some product title</title>
        <brand>Some product brand</brand>
        <title>Some product title</title>
        <brand>Some product brand</brand>

1 个解决方案



If you are looking for a list of deepest "parent" paths in the XML, there is more than one way to view that.


Although I think your own code could be adjusted to achieve the same output, I was convinced the same thing could be achieved by using xpath. And my motivation is to get my XML skills unrusty (not used Nokogiri yet, but I will need to do so professionally soon). So here is how to get all parent paths that have just one child level beneath them, using xpath:


xml.xpath('//*[child::* and not(child::*/*)]').each { |node| puts node.path }

The output of this for your second example file is:



. . . if you took this list and gsub out the indexes, then make the array unique, then this looks a lot like the output of your loop . . .

。 。 。如果您使用此列表并gsub输出索引,然后使数组唯一,那么这看起来很像循环的输出。 。 。

paths = xml.xpath('//*[child::* and not(child::*/*)]').map { |node| node.path }
paths.map! { |path| path.gsub(/\[[0-9]+\]/,'') }.uniq!
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]

Or in one line:


paths = xml.xpath('//*[* and not(*/*)]').map { |node| node.path.gsub(/\[[0-9]+\]/,'') }.uniq
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]



If you are looking for a list of deepest "parent" paths in the XML, there is more than one way to view that.


Although I think your own code could be adjusted to achieve the same output, I was convinced the same thing could be achieved by using xpath. And my motivation is to get my XML skills unrusty (not used Nokogiri yet, but I will need to do so professionally soon). So here is how to get all parent paths that have just one child level beneath them, using xpath:


xml.xpath('//*[child::* and not(child::*/*)]').each { |node| puts node.path }

The output of this for your second example file is:



. . . if you took this list and gsub out the indexes, then make the array unique, then this looks a lot like the output of your loop . . .

。 。 。如果您使用此列表并gsub输出索引,然后使数组唯一,那么这看起来很像循环的输出。 。 。

paths = xml.xpath('//*[child::* and not(child::*/*)]').map { |node| node.path }
paths.map! { |path| path.gsub(/\[[0-9]+\]/,'') }.uniq!
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]

Or in one line:


paths = xml.xpath('//*[* and not(*/*)]').map { |node| node.path.gsub(/\[[0-9]+\]/,'') }.uniq
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]