Perhaps this is nitpicky, but I have to ask.
也许这很挑剔,但我得问问。
I'm using Nokogiri to parse XML, remove certain tags, and write over the original file with the results. Using .remove
leaves blank lines in the XML. I'm currently using a regex to get rid of the blank lines. Is there some built-in Nokogiri method I should be using?
我使用Nokogiri来解析XML,删除某些标记,并将结果写入原始文件。使用.remove会在XML中留下空行。我目前正在使用regex来删除空行。我是否应该使用一些内置的Nokogiri方法?
Here's what I have:
这就是我有:
require 'Nokogiri'
io_path = "/path/to/metadata.xml"
io = File.read(io_path)
document = Nokogiri::XML(io)
document.xpath('//artwork_files', '//tracks', '//previews').remove
# write to file and remove blank lines with a regular expression
File.open(io_path, 'w') do |x|
x << document.to_s.gsub(/\n\s+\n/, "\n")
end
3 个解决方案
#1
7
There is not built-in methods, but we can add one
没有内置的方法,但是我们可以添加一个。
class Nokogiri::XML::Document
def remove_empty_lines!
self.xpath("//text()").each { |text| text.content = text.content.gsub(/\n(\s*\n)+/,"\n") }; self
end
end
#2
1
Doing a substitution on each text node didn't work for me either. The problem is that after removing nodes, text nodes that just became adjacent don't get merged. When you loop over text nodes, each one has only a single newline, but there are now several of them in a row.
对每个文本节点进行替换对我来说也不起作用。问题是,在删除节点之后,刚刚成为邻接的文本节点不会被合并。当循环遍历文本节点时,每个节点只有一个新行,但是现在一行中有几个新行。
One rather messy solution I found was to reparse the document:
我发现一个相当混乱的解决方案是重新解析文档:
xml = Nokogiri::XML.parse xml.to_xml
Now adjacent text nodes will be merged and you can do regexes on them.
现在相邻的文本节点将被合并,你可以在它们上做regex。
But this looks like it's probably a better option:
但这似乎是一个更好的选择:
https://github.com/tobym/nokogiri-pretty
https://github.com/tobym/nokogiri-pretty
#3
1
This removed blank lines for me;
这消除了我的空白行;
doc.xpath('//text()').find_all {|t| t.to_s.strip == ''}.map(&:remove)
#1
7
There is not built-in methods, but we can add one
没有内置的方法,但是我们可以添加一个。
class Nokogiri::XML::Document
def remove_empty_lines!
self.xpath("//text()").each { |text| text.content = text.content.gsub(/\n(\s*\n)+/,"\n") }; self
end
end
#2
1
Doing a substitution on each text node didn't work for me either. The problem is that after removing nodes, text nodes that just became adjacent don't get merged. When you loop over text nodes, each one has only a single newline, but there are now several of them in a row.
对每个文本节点进行替换对我来说也不起作用。问题是,在删除节点之后,刚刚成为邻接的文本节点不会被合并。当循环遍历文本节点时,每个节点只有一个新行,但是现在一行中有几个新行。
One rather messy solution I found was to reparse the document:
我发现一个相当混乱的解决方案是重新解析文档:
xml = Nokogiri::XML.parse xml.to_xml
Now adjacent text nodes will be merged and you can do regexes on them.
现在相邻的文本节点将被合并,你可以在它们上做regex。
But this looks like it's probably a better option:
但这似乎是一个更好的选择:
https://github.com/tobym/nokogiri-pretty
https://github.com/tobym/nokogiri-pretty
#3
1
This removed blank lines for me;
这消除了我的空白行;
doc.xpath('//text()').find_all {|t| t.to_s.strip == ''}.map(&:remove)