I'm guessing this is a trivial question for someone with a bit of experience with Nokogiri, but I haven't been able to find an answer in the documentation or tutorials I've found online.
我猜这对于对Nokogiri有一点经验的人来说是一个微不足道的问题,但是我在网上找到的文档或教程中找不到答案。
I have a Nokogiri document like this:
我有一个像这样的Nokogiri文件:
page = Nokogiri::HTML(open("http://www.example.com"))
And the page contains the following tag:
该页面包含以下标记:
<a title="could be anything" href="http://www.example.com/foo"></a>
How do I get the value of href
if the value of title
is unknown?
如果title的值未知,我如何获得href的值?
3 个解决方案
#1
2
If you want the value of the href
attribute for a
elements having a title
attribute you can use Nokogiri's xpath
as follows:
如果你想要具有title属性的元素的href属性的值,你可以使用Nokogiri的xpath,如下所示:
require 'nokogiri'
doc = Nokogiri::HTML(File.open('sample.html'))
a_with_title = doc.xpath('//a[@title]').map { |e| puts e['href'] }
If you want to select from an URL online you can use
如果您想从在线URL中选择,您可以使用
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://*.com/'))
a_with_title = doc.xpath('//a[@title]').map { |e| puts e['href'] }
#2
1
I finally figured it out. I believe, the following will work to select the href
from the first link element with a title attribute: page.css('a[title]')[0]['href']
.
我终于弄明白了。我相信,以下将使用title属性从第一个link元素中选择href:page.css('a [title]')[0] ['href']。
I had thought page.css('a[title]')
was selecting the value of the title
attribute, but in fact it selects the entire element. You can then reference this element to get values from it.
我原以为page.css('a [title]')选择了title属性的值,但实际上它选择了整个元素。然后,您可以引用此元素以从中获取值。
#3
0
require 'nokogiri'
doc = Nokogiri::HTML::DocumentFragment.parse <<-SCRIPT
<a title="xx" href="http://www.example1.com/foo1"></a>
<a title="aa" href="http://www.example2.com/foo2"></a>
<a id=5 href="http://www.foo.com/foo3"></a>
<a title="zz" href="http://www.example3.com/foo4"></a>
<a id=5 href="http://www.test.com/foo5"></a>
SCRIPT
p doc.search("a").map { |nd| nd['href'] if nd.key?('title')}.compact
#=> ["http://www.example1.com/foo1", "http://www.example2.com/foo2", "http://www.example3.com/foo4"]
#1
2
If you want the value of the href
attribute for a
elements having a title
attribute you can use Nokogiri's xpath
as follows:
如果你想要具有title属性的元素的href属性的值,你可以使用Nokogiri的xpath,如下所示:
require 'nokogiri'
doc = Nokogiri::HTML(File.open('sample.html'))
a_with_title = doc.xpath('//a[@title]').map { |e| puts e['href'] }
If you want to select from an URL online you can use
如果您想从在线URL中选择,您可以使用
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://*.com/'))
a_with_title = doc.xpath('//a[@title]').map { |e| puts e['href'] }
#2
1
I finally figured it out. I believe, the following will work to select the href
from the first link element with a title attribute: page.css('a[title]')[0]['href']
.
我终于弄明白了。我相信,以下将使用title属性从第一个link元素中选择href:page.css('a [title]')[0] ['href']。
I had thought page.css('a[title]')
was selecting the value of the title
attribute, but in fact it selects the entire element. You can then reference this element to get values from it.
我原以为page.css('a [title]')选择了title属性的值,但实际上它选择了整个元素。然后,您可以引用此元素以从中获取值。
#3
0
require 'nokogiri'
doc = Nokogiri::HTML::DocumentFragment.parse <<-SCRIPT
<a title="xx" href="http://www.example1.com/foo1"></a>
<a title="aa" href="http://www.example2.com/foo2"></a>
<a id=5 href="http://www.foo.com/foo3"></a>
<a title="zz" href="http://www.example3.com/foo4"></a>
<a id=5 href="http://www.test.com/foo5"></a>
SCRIPT
p doc.search("a").map { |nd| nd['href'] if nd.key?('title')}.compact
#=> ["http://www.example1.com/foo1", "http://www.example2.com/foo2", "http://www.example3.com/foo4"]