在将XML转换为Ruby散列时保存属性

I have a large XML document I am looking to parse. In this document, many tags have different attributes within them. For example:

我有一个要解析的大型XML文档。在这个文档中，许多标记具有不同的属性。例如:

<album>
 <song-name type="published">Do Re Mi</song-name>
</album>

Currently, I am using Rail's hash-parsing library by requiring 'active_support/core_ext/hash'.

目前，我正在使用rails的哈希解析库，需要“active_support/core_ext/hash”。

When I convert it to a hash, it drops the attributes. It returns:

当我将它转换为散列时，它会删除属性。它返回:

{"album"=>{"song-name"=>"Do Re Mi"}}

How do I maintain those attributes, in this case, the type="published" attribute?

如何维护这些属性，在本例中是type="published"属性?

This seems to have been previously been asked in "How can I use XML attributes when converting into a hash with from_xml?", which had no conclusive answer, but that was from 2010, and I'm curious if things have changed since then. Or, I wonder if you know of an alternative way of parsing this XML so that I could still have the attribute information included.

在“如何在将XML属性转换为带有from_xml的散列时使用XML属性?”但那是2010年的事，我很好奇从那以后事情是否发生了变化。或者，我想知道您是否知道解析这个XML的另一种方法，以便我仍然可以包含属性信息。

4 个解决方案

#1

Converting XML to a hash isn't a good solution. You're left with a hash that is more difficult to parse than the original XML. Plus, if the XML is too big, you'll be left with a hash that won't fit into memory, and can't be processed, whereas the original XML could be parsed using a SAX parser.

将XML转换为散列并不是一个好的解决方案。剩下的散列比原来的XML更难解析。另外，如果XML太大，您将得到一个不适合内存的散列，并且无法处理，而原始的XML可以使用SAX解析器解析。

Assuming the file isn't going to overwhelm your memory when loaded, I'd recommend using Nokogiri to parse it, doing something like:

假设文件在加载时不会占用内存，我建议使用Nokogiri来解析它，方法如下:

require 'nokogiri'

class Album

  attr_reader :song_name, :song_type
  def initialize(song_name, song_type)
    @song_name = song_name
    @song_type = song_type
  end
end

xml = <<EOT
<xml>
  <album>
   <song-name type="published">Do Re Mi</song-name>
  </album>
  <album>
    <song-name type="unpublished">Blah blah blah</song-name>
  </album>
</xml>
EOT

albums = []
doc = Nokogiri::XML(xml)
doc.search('album').each do |album|
  song_name = album.at('song-name')
  albums << Album.new(
      song_name.text,
      song_name['type']
    )
end

puts albums.first.song_name
puts albums.last.song_type

Which outputs:

输出:

Do Re Mi
unpublished

The code starts by defining a suitable object to be used to hold the data you want. When the XML is parsed into a DOM, the code will loop through all the <album> nodes, and extract the information, defining an instance of the class, and appending it to the albums array.

代码首先定义一个合适的对象，用于保存您想要的数据。当XML被解析为DOM时，代码将遍历所有节点，并提取信息，定义类的实例，并将其附加到相册数组。

After running you'd have an array you would walk, and process each item, storing it into a database, or manipulating it however you want. Though, if your goal is to insert that information into a database, you'd be smarter to let the DBM read the XML and import it directly.

在运行之后，您将拥有一个将要遍历的数组，并对每个项进行处理，将其存储到数据库中，或者任意操作它。但是，如果您的目标是将这些信息插入数据库，那么最好让DBM读取XML并直接导入它。

#2

It's problem with active support XMLConverter class Please add following code to any of your initializers file.

活动支持XMLConverter类有问题，请将以下代码添加到任何初始化器文件中。

module ActiveSupport
    class XMLConverter
        private
            def become_content?(value)
                value['type'] == 'file' || (value['__content__'] && (value.keys.size == 1 && value['__content__'].present?))
            end
    end
end

It will gives you output like following.

它将提供如下输出。

Ex Input XML

输入XML交货

xml = '<album>
   <song-name type="published">Do Re Mi</song-name>
</album>'

Hash.from_xml(xml)

Output will be

输出将

{"album"=>{"song_name"=>{"type"=>"published", "__content__"=>"Do Re Mi"}}}

#3

I actually think its the garbage method, it's checking the type attribute and if it doesn't return a hash it'll return true which in the method become_hash? returns false. Which is the last check in the process_hash method. So it'll return nil for type attribute and won't build the hash for it.

我认为它是垃圾方法，它检查类型属性如果它不返回散列它会返回true在方法变成e_hash?返回false。这是process_hash方法中的最后一次检查。它会返回nil类型属性，不会为它建立哈希。

For those interested what I'm talking about is in the active support gem active_support/core_ext/hash/conversions.rb

对于那些感兴趣的人，我所说的是活动支持gem active_support/core_ext/hash/ conversiones .rb

module ActiveSupport class XMLConverter private def garbage?(value) false end end end

模块ActiveSupport类XMLConverter private def garbage?(value

I just defaulted it to false and it worked for me but it might not be for everyone.

我只是默认它为false，它对我有用，但可能不是对所有人都适用。

#4

-2

As in the question you linked above, Nokogiri is the (short) answer.

就像你上面链接的问题一样，Nokogiri是(简短的)答案。

If you can provide some sample code, someone might come up with better answers.

如果您可以提供一些示例代码，可能会有人提出更好的答案。

#1