Nokogiri (RubyGem):查找并替换HTML标记

时间:2021-12-01 19:13:10

I have the following HTML:

我有以下HTML:

<html>
<body>
<h1>Foo</h1>
<p>The quick brown fox.</p>
<h1>Bar</h1>
<p>Jumps over the lazy dog.</p>
</body>
</html>

...and by using the RubyGem Nokogiri (a hpricot replacement), I'd like to change it into the following HTML:

…通过使用RubyGem Nokogiri (hpricot替换品),我想将它更改为以下HTML:

<html>
<body>
<p class="title">Foo</p>
<p>The quick brown fox.</p>
<p class="title">Bar</p>
<p>Jumps over the lazy dog.</p>
</body>
</html>

In other words: How can I find and replace certain HTML tags by using Nokogiri? I know how to find them (using css keywords), but I don't know how to replace them while parsing the document.

换句话说:如何使用Nokogiri查找和替换某些HTML标记?我知道如何找到它们(使用css关键字),但我不知道如何在解析文档时替换它们。

Thanks for your help!

谢谢你的帮助!

3 个解决方案

#1


18  

Try this:

试试这个:

require 'nokogiri'

html_text = "<html><body><h1>Foo</h1><p>The quick brown fox.</p><h1>Bar</h1><p>Jumps over the lazy dog.</p></body></html>"

frag = Nokogiri::HTML(html_text)
frag.xpath("//h1").each { |div|  div.name= "p"; div.set_attribute("class" , "title") }

#2


15  

Seems like this works right:

这似乎是正确的:

require 'rubygems'
require 'nokogiri'

markup = Nokogiri::HTML.parse(<<-somehtml)
<html>
<body>
<h1>Foo</h1>
<p>The quick brown fox.</p>
<h1>Bar</h1>
<p>Jumps over the lazy dog.</p>
</body>
</html>
somehtml

markup.css('h1').each do |el|
  el.name = 'p'
  el.set_attribute('class','title')
end

puts markup.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p class="title">Foo</p>
# >> <p>The quick brown fox.</p>
# >> <p class="title">Bar</p>
# >> <p>Jumps over the lazy dog.</p>
# >> </body></html>

#3


7  

#!/usr/bin/env ruby
require 'rubygems'
gem 'nokogiri', '~> 1.2.1'
require 'nokogiri'

doc = Nokogiri::HTML.parse <<-HERE
  <html>
    <body>
      <h1>Foo</h1>
      <p>The quick brown fox.</p>
      <h1>Bar</h1>
      <p>Jumps over the lazy dog.</p>
    </body>
  </html>
HERE

doc.search('h1').each do |heading|
  heading.name = 'p'
  heading['class'] = 'title'
end

puts doc.to_html

#1


18  

Try this:

试试这个:

require 'nokogiri'

html_text = "<html><body><h1>Foo</h1><p>The quick brown fox.</p><h1>Bar</h1><p>Jumps over the lazy dog.</p></body></html>"

frag = Nokogiri::HTML(html_text)
frag.xpath("//h1").each { |div|  div.name= "p"; div.set_attribute("class" , "title") }

#2


15  

Seems like this works right:

这似乎是正确的:

require 'rubygems'
require 'nokogiri'

markup = Nokogiri::HTML.parse(<<-somehtml)
<html>
<body>
<h1>Foo</h1>
<p>The quick brown fox.</p>
<h1>Bar</h1>
<p>Jumps over the lazy dog.</p>
</body>
</html>
somehtml

markup.css('h1').each do |el|
  el.name = 'p'
  el.set_attribute('class','title')
end

puts markup.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p class="title">Foo</p>
# >> <p>The quick brown fox.</p>
# >> <p class="title">Bar</p>
# >> <p>Jumps over the lazy dog.</p>
# >> </body></html>

#3


7  

#!/usr/bin/env ruby
require 'rubygems'
gem 'nokogiri', '~> 1.2.1'
require 'nokogiri'

doc = Nokogiri::HTML.parse <<-HERE
  <html>
    <body>
      <h1>Foo</h1>
      <p>The quick brown fox.</p>
      <h1>Bar</h1>
      <p>Jumps over the lazy dog.</p>
    </body>
  </html>
HERE

doc.search('h1').each do |heading|
  heading.name = 'p'
  heading['class'] = 'title'
end

puts doc.to_html