I have a large XML document I am looking to parse. In this document, many tags have different attributes within them. For example:
我有一个要解析的大型XML文档。在这个文档中,许多标记具有不同的属性。例如:
<album>
<song-name type="published">Do Re Mi</song-name>
</album>
Currently, I am using Rail's hash-parsing library by requiring 'active_support/core_ext/hash'
.
目前,我正在使用rails的哈希解析库,需要“active_support/core_ext/hash”。
When I convert it to a hash, it drops the attributes. It returns:
当我将它转换为散列时,它会删除属性。它返回:
{"album"=>{"song-name"=>"Do Re Mi"}}
How do I maintain those attributes, in this case, the type="published"
attribute?
如何维护这些属性,在本例中是type="published"属性?
This seems to have been previously been asked in "How can I use XML attributes when converting into a hash with from_xml?", which had no conclusive answer, but that was from 2010, and I'm curious if things have changed since then. Or, I wonder if you know of an alternative way of parsing this XML so that I could still have the attribute information included.
在“如何在将XML属性转换为带有from_xml的散列时使用XML属性?”但那是2010年的事,我很好奇从那以后事情是否发生了变化。或者,我想知道您是否知道解析这个XML的另一种方法,以便我仍然可以包含属性信息。
4 个解决方案
#1
4
Converting XML to a hash isn't a good solution. You're left with a hash that is more difficult to parse than the original XML. Plus, if the XML is too big, you'll be left with a hash that won't fit into memory, and can't be processed, whereas the original XML could be parsed using a SAX parser.
将XML转换为散列并不是一个好的解决方案。剩下的散列比原来的XML更难解析。另外,如果XML太大,您将得到一个不适合内存的散列,并且无法处理,而原始的XML可以使用SAX解析器解析。
Assuming the file isn't going to overwhelm your memory when loaded, I'd recommend using Nokogiri to parse it, doing something like:
假设文件在加载时不会占用内存,我建议使用Nokogiri来解析它,方法如下:
require 'nokogiri'
class Album
attr_reader :song_name, :song_type
def initialize(song_name, song_type)
@song_name = song_name
@song_type = song_type
end
end
xml = <<EOT
<xml>
<album>
<song-name type="published">Do Re Mi</song-name>
</album>
<album>
<song-name type="unpublished">Blah blah blah</song-name>
</album>
</xml>
EOT
albums = []
doc = Nokogiri::XML(xml)
doc.search('album').each do |album|
song_name = album.at('song-name')
albums << Album.new(
song_name.text,
song_name['type']
)
end
puts albums.first.song_name
puts albums.last.song_type
Which outputs:
输出:
Do Re Mi
unpublished
The code starts by defining a suitable object to be used to hold the data you want. When the XML is parsed into a DOM, the code will loop through all the <album>
nodes, and extract the information, defining an instance of the class, and appending it to the albums
array.
代码首先定义一个合适的对象,用于保存您想要的数据。当XML被解析为DOM时,代码将遍历所有
After running you'd have an array you would walk, and process each item, storing it into a database, or manipulating it however you want. Though, if your goal is to insert that information into a database, you'd be smarter to let the DBM read the XML and import it directly.
在运行之后,您将拥有一个将要遍历的数组,并对每个项进行处理,将其存储到数据库中,或者任意操作它。但是,如果您的目标是将这些信息插入数据库,那么最好让DBM读取XML并直接导入它。
#2
2
It's problem with active support XMLConverter class Please add following code to any of your initializers file.
活动支持XMLConverter类有问题,请将以下代码添加到任何初始化器文件中。
module ActiveSupport
class XMLConverter
private
def become_content?(value)
value['type'] == 'file' || (value['__content__'] && (value.keys.size == 1 && value['__content__'].present?))
end
end
end
It will gives you output like following.
它将提供如下输出。
Ex Input XML
输入XML交货
xml = '<album>
<song-name type="published">Do Re Mi</song-name>
</album>'
Hash.from_xml(xml)
Output will be
输出将
{"album"=>{"song_name"=>{"type"=>"published", "__content__"=>"Do Re Mi"}}}
#3
0
I actually think its the garbage method, it's checking the type attribute and if it doesn't return a hash it'll return true which in the method become_hash? returns false. Which is the last check in the process_hash method. So it'll return nil for type attribute and won't build the hash for it.
我认为它是垃圾方法,它检查类型属性如果它不返回散列它会返回true在方法变成e_hash?返回false。这是process_hash方法中的最后一次检查。它会返回nil类型属性,不会为它建立哈希。
For those interested what I'm talking about is in the active support gem active_support/core_ext/hash/conversions.rb
对于那些感兴趣的人,我所说的是活动支持gem active_support/core_ext/hash/ conversiones .rb
module ActiveSupport class XMLConverter private def garbage?(value) false end end end
模块ActiveSupport类XMLConverter private def garbage?(value
I just defaulted it to false and it worked for me but it might not be for everyone.
我只是默认它为false,它对我有用,但可能不是对所有人都适用。
#4
-2
As in the question you linked above, Nokogiri is the (short) answer.
就像你上面链接的问题一样,Nokogiri是(简短的)答案。
If you can provide some sample code, someone might come up with better answers.
如果您可以提供一些示例代码,可能会有人提出更好的答案。
#1
4
Converting XML to a hash isn't a good solution. You're left with a hash that is more difficult to parse than the original XML. Plus, if the XML is too big, you'll be left with a hash that won't fit into memory, and can't be processed, whereas the original XML could be parsed using a SAX parser.
将XML转换为散列并不是一个好的解决方案。剩下的散列比原来的XML更难解析。另外,如果XML太大,您将得到一个不适合内存的散列,并且无法处理,而原始的XML可以使用SAX解析器解析。
Assuming the file isn't going to overwhelm your memory when loaded, I'd recommend using Nokogiri to parse it, doing something like:
假设文件在加载时不会占用内存,我建议使用Nokogiri来解析它,方法如下:
require 'nokogiri'
class Album
attr_reader :song_name, :song_type
def initialize(song_name, song_type)
@song_name = song_name
@song_type = song_type
end
end
xml = <<EOT
<xml>
<album>
<song-name type="published">Do Re Mi</song-name>
</album>
<album>
<song-name type="unpublished">Blah blah blah</song-name>
</album>
</xml>
EOT
albums = []
doc = Nokogiri::XML(xml)
doc.search('album').each do |album|
song_name = album.at('song-name')
albums << Album.new(
song_name.text,
song_name['type']
)
end
puts albums.first.song_name
puts albums.last.song_type
Which outputs:
输出:
Do Re Mi
unpublished
The code starts by defining a suitable object to be used to hold the data you want. When the XML is parsed into a DOM, the code will loop through all the <album>
nodes, and extract the information, defining an instance of the class, and appending it to the albums
array.
代码首先定义一个合适的对象,用于保存您想要的数据。当XML被解析为DOM时,代码将遍历所有
After running you'd have an array you would walk, and process each item, storing it into a database, or manipulating it however you want. Though, if your goal is to insert that information into a database, you'd be smarter to let the DBM read the XML and import it directly.
在运行之后,您将拥有一个将要遍历的数组,并对每个项进行处理,将其存储到数据库中,或者任意操作它。但是,如果您的目标是将这些信息插入数据库,那么最好让DBM读取XML并直接导入它。
#2
2
It's problem with active support XMLConverter class Please add following code to any of your initializers file.
活动支持XMLConverter类有问题,请将以下代码添加到任何初始化器文件中。
module ActiveSupport
class XMLConverter
private
def become_content?(value)
value['type'] == 'file' || (value['__content__'] && (value.keys.size == 1 && value['__content__'].present?))
end
end
end
It will gives you output like following.
它将提供如下输出。
Ex Input XML
输入XML交货
xml = '<album>
<song-name type="published">Do Re Mi</song-name>
</album>'
Hash.from_xml(xml)
Output will be
输出将
{"album"=>{"song_name"=>{"type"=>"published", "__content__"=>"Do Re Mi"}}}
#3
0
I actually think its the garbage method, it's checking the type attribute and if it doesn't return a hash it'll return true which in the method become_hash? returns false. Which is the last check in the process_hash method. So it'll return nil for type attribute and won't build the hash for it.
我认为它是垃圾方法,它检查类型属性如果它不返回散列它会返回true在方法变成e_hash?返回false。这是process_hash方法中的最后一次检查。它会返回nil类型属性,不会为它建立哈希。
For those interested what I'm talking about is in the active support gem active_support/core_ext/hash/conversions.rb
对于那些感兴趣的人,我所说的是活动支持gem active_support/core_ext/hash/ conversiones .rb
module ActiveSupport class XMLConverter private def garbage?(value) false end end end
模块ActiveSupport类XMLConverter private def garbage?(value
I just defaulted it to false and it worked for me but it might not be for everyone.
我只是默认它为false,它对我有用,但可能不是对所有人都适用。
#4
-2
As in the question you linked above, Nokogiri is the (short) answer.
就像你上面链接的问题一样,Nokogiri是(简短的)答案。
If you can provide some sample code, someone might come up with better answers.
如果您可以提供一些示例代码,可能会有人提出更好的答案。