如何用ruby / nokogiri解析html源代码?

时间:2022-10-29 16:06:56

I've successfully used ruby (1.8) and nokogiri's css parsing to pull out front facing data from web pages.

我已成功使用ruby(1.8)和nokogiri的css解析来从网页中提取前面的数据。

However I now need to pull out some data from a series of pages where the data is in the "meta" tags in the source code of the page.

但是,我现在需要从一系列页面中提取一些数据,其中数据位于页面源代码中的“元”标签中。

One of the lines I need is the following:

我需要的其中一条线如下:

<meta name="geo.position" content="35.667459;139.706256" />

I've tried using xpath put haven't been able to get it right.

我已经尝试过使用xpath put无法正确使用它。

Any help as to what syntax is needed would be much appreciated.

任何有关所需语法的帮助都将非常感激。

Thanks

2 个解决方案

#1


2  

This is a good case for a CSS attribute selector. For example:

这是CSS属性选择器的一个很好的例子。例如:

doc.css('meta[name="geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end

The equivalent XPath expression is almost identical:

等效的XPath表达式几乎相同:

doc.xpath('//meta[@name = "geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end

#2


1  

require 'nokogiri'

doc = Nokogiri::HTML('<meta name="geo.position" content="35.667459;139.706256" />')
doc.at('//meta[@name="geo.position"]')['content'] # => "35.667459;139.706256"

#1


2  

This is a good case for a CSS attribute selector. For example:

这是CSS属性选择器的一个很好的例子。例如:

doc.css('meta[name="geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end

The equivalent XPath expression is almost identical:

等效的XPath表达式几乎相同:

doc.xpath('//meta[@name = "geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end

#2


1  

require 'nokogiri'

doc = Nokogiri::HTML('<meta name="geo.position" content="35.667459;139.706256" />')
doc.at('//meta[@name="geo.position"]')['content'] # => "35.667459;139.706256"