如何用HTML: TreeBuilder找到直接的后代?

时间:2022-10-29 19:56:30

Suppose I've a HTML tree like this:

假设我有这样的HTML树:

div
`- ul
   `- li          (*)
   `- li          (*)
   `- li          (*)
   `- li          (*)
      `- ul
         `- li
         `- li
         `- li

How do I select the <li> elements that are marked with (*)? They are direct descendants of the first <ul> element.

如何选择标有(*)的

  • 元素?它们是第一个
      元素的直接后代。

  • 元素?它们是第一个元素的直接后代。
  • Here is how I find the first <ul> element:

    如何找到第一个

      元素:

    my $ul = $div->look_down(_tag => 'ul');
    

    Now I've the $ul, but when I do things like:

    现在我有了$ul,但当我做以下事情时:

    my @li_elements = $ul->look_down(_tag => 'li');
    

    It also finds <li> elements that are buried deeper in the HTML tree.

    它还可以找到隐藏在HTML树中更深处的

  • 元素。

  • 元素。
  • How do I find just the <li> elements that are direct descendants of the first <ul> element? I've an unknown number of them. (I can't just select first 4 as in example).

    如何找到

  • 元素,它们是第一个
      元素的直接后代?我不知道他们的数量。(例如,我不能只选择前4个)。

  • 元素,它们是第一个元素的直接后代?我不知道他们的数量。(例如,我不能只选择前4个)。
  • 3 个解决方案

    #1


    8  

    You can get all the children of an HTML::Element object using the content_list method, so all the child nodes of the first <ul> element in the document would be

    您可以使用content_list方法来获得HTML的所有子元素:元素对象,因此文档中第一个

      元素的所有子节点将是。

    use HTML::TreeBuilder;
    
    my $tree = HTML::TreeBuilder->new_from_file('my.html');
    
    my @items = $tree->look_down(_tag => 'ul')->content_list;
    

    But it is far more expressive to use HTML::TreeBuilder::XPath, which lets you find all <li> children of <ul> children of <div> elements anywhere in the document, like this

    但是使用HTML::TreeBuilder::XPath更有表现力,它可以让您在文档中任何地方找到

    元素
      的所有
    • 子元素,如下所示

    • 子元素,如下所示
    use HTML::TreeBuilder::XPath;
    
    my $tree = HTML::TreeBuilder->new_from_file('my.html');
    
    my @items = $tree->findnodes('//div/ul/li')->get_nodelist;
    

    #2


    5  

    If you want to use the look_down method you can add an extra criteria to get only the children:

    如果您想使用look_down方法,可以添加一个额外的条件,只获取子元素:

    my @li_elements = $ul->look_down(_tag => 'li', sub {$_[0]->parent() == $ul});
    

    #3


    0  

    To make this page perfectly complete, I'll add one more option:

    为了使这个页面完美地完成,我将添加一个选项:

    @li = grep { $_->tag() eq 'li' } $ul->content_list;
    

    (Where $ul is your top-level element)

    (其中$ul是您的*元素)

    #1


    8  

    You can get all the children of an HTML::Element object using the content_list method, so all the child nodes of the first <ul> element in the document would be

    您可以使用content_list方法来获得HTML的所有子元素:元素对象,因此文档中第一个

      元素的所有子节点将是。

    use HTML::TreeBuilder;
    
    my $tree = HTML::TreeBuilder->new_from_file('my.html');
    
    my @items = $tree->look_down(_tag => 'ul')->content_list;
    

    But it is far more expressive to use HTML::TreeBuilder::XPath, which lets you find all <li> children of <ul> children of <div> elements anywhere in the document, like this

    但是使用HTML::TreeBuilder::XPath更有表现力,它可以让您在文档中任何地方找到

    元素
      的所有
    • 子元素,如下所示

    • 子元素,如下所示
    use HTML::TreeBuilder::XPath;
    
    my $tree = HTML::TreeBuilder->new_from_file('my.html');
    
    my @items = $tree->findnodes('//div/ul/li')->get_nodelist;
    

    #2


    5  

    If you want to use the look_down method you can add an extra criteria to get only the children:

    如果您想使用look_down方法,可以添加一个额外的条件,只获取子元素:

    my @li_elements = $ul->look_down(_tag => 'li', sub {$_[0]->parent() == $ul});
    

    #3


    0  

    To make this page perfectly complete, I'll add one more option:

    为了使这个页面完美地完成,我将添加一个选项:

    @li = grep { $_->tag() eq 'li' } $ul->content_list;
    

    (Where $ul is your top-level element)

    (其中$ul是您的*元素)