用Ruby中的Nokogiri解析HTML

时间:2021-12-12 01:23:12

With this HTML code:

使用此HTML代码:

<div class="one">
  .....
</div>
<div class="one">
  .....
</div>
<div class="one">
  .....
</div>
<div class="one">
  .....
</div>

How can I select with Nokogiri the second or third div whose class is one?

我如何选择Nokogiri的第二或第三个div是哪一类?

2 个解决方案

#1


5  

page.css('div.one')[1] # For the second
page.css('div.one')[2] # For the third

#2


7  

You can use Ruby to pare down a large results set to specific items:

您可以使用Ruby来削减特定项目的大型结果集:

page.css('div.one')[1,2]  # Two items starting at index 1 (2nd item)
page.css('div.one')[1..2] # Items with indices between 1 and 2, inclusive

Because Ruby indexing starts at zero you must take care with which items you want.

因为Ruby索引从零开始,所以您必须注意所需的项目。

Alternatively, you can use CSS selectors to find the nth item:

或者,您可以使用CSS选择器来查找第n个项目:

# Second and third items from the set, jQuery-style
page.css('div.one:eq(2),div.one:eq(3)')

# Second and third children, CSS3-style
page.css('div.one:nth-child(2),div.one:nth-child(3)')

Or you can use XPath to get back specific matches:

或者您可以使用XPath来获取特定匹配项:

# Second and third children
page.xpath("//div[@class='one'][position()=2 or position()=3]")

# Second and third items in the result set
page.xpath("(//div[@class='one'])[position()=2 or position()=3]")

With both the CSS and XPath alternatives note that:

使用CSS和XPath备选方案时请注意:

  1. Numbering starts at 1, not 0
  2. 编号从1开始,而不是0

  3. You can use at_css and at_xpath instead to get back the first-such matching element, instead of a NodeSet.

    您可以使用at_css和at_xpath来取回第一个匹配元素,而不是NodeSet。

    # A NodeSet with a single element in it:
    page.css('div.one:eq(2)')
    
    # The second div element
    page.at_css('div.one:eq(2)')
    

Finally, note that if you are selecting a single element by index with XPath, you can use a shorter format:

最后,请注意,如果您使用XPath按索引选择单个元素,则可以使用更短的格式:

# First div.one seen that is the second child of its parent
page.at_xpath('//div[@class="one"][2]')

# Second div.one in the entire document
page.at_xpath('(//div[@class="one"])[2]')

#1


5  

page.css('div.one')[1] # For the second
page.css('div.one')[2] # For the third

#2


7  

You can use Ruby to pare down a large results set to specific items:

您可以使用Ruby来削减特定项目的大型结果集:

page.css('div.one')[1,2]  # Two items starting at index 1 (2nd item)
page.css('div.one')[1..2] # Items with indices between 1 and 2, inclusive

Because Ruby indexing starts at zero you must take care with which items you want.

因为Ruby索引从零开始,所以您必须注意所需的项目。

Alternatively, you can use CSS selectors to find the nth item:

或者,您可以使用CSS选择器来查找第n个项目:

# Second and third items from the set, jQuery-style
page.css('div.one:eq(2),div.one:eq(3)')

# Second and third children, CSS3-style
page.css('div.one:nth-child(2),div.one:nth-child(3)')

Or you can use XPath to get back specific matches:

或者您可以使用XPath来获取特定匹配项:

# Second and third children
page.xpath("//div[@class='one'][position()=2 or position()=3]")

# Second and third items in the result set
page.xpath("(//div[@class='one'])[position()=2 or position()=3]")

With both the CSS and XPath alternatives note that:

使用CSS和XPath备选方案时请注意:

  1. Numbering starts at 1, not 0
  2. 编号从1开始,而不是0

  3. You can use at_css and at_xpath instead to get back the first-such matching element, instead of a NodeSet.

    您可以使用at_css和at_xpath来取回第一个匹配元素,而不是NodeSet。

    # A NodeSet with a single element in it:
    page.css('div.one:eq(2)')
    
    # The second div element
    page.at_css('div.one:eq(2)')
    

Finally, note that if you are selecting a single element by index with XPath, you can use a shorter format:

最后,请注意,如果您使用XPath按索引选择单个元素,则可以使用更短的格式:

# First div.one seen that is the second child of its parent
page.at_xpath('//div[@class="one"][2]')

# Second div.one in the entire document
page.at_xpath('(//div[@class="one"])[2]')