Suppose I've a HTML tree like this:
假设我有这样的HTML树:
div
`- ul
`- li (*)
`- li (*)
`- li (*)
`- li (*)
`- ul
`- li
`- li
`- li
How do I select the <li>
elements that are marked with (*)
? They are direct descendants of the first <ul>
element.
如何选择标有(*)的
-
元素的直接后代。
Here is how I find the first <ul>
element:
如何找到第一个
-
元素:
my $ul = $div->look_down(_tag => 'ul');
Now I've the $ul
, but when I do things like:
现在我有了$ul,但当我做以下事情时:
my @li_elements = $ul->look_down(_tag => 'li');
It also finds <li>
elements that are buried deeper in the HTML tree.
它还可以找到隐藏在HTML树中更深处的
How do I find just the <li>
elements that are direct descendants of the first <ul>
element? I've an unknown number of them. (I can't just select first 4 as in example).
如何找到
-
元素的直接后代?我不知道他们的数量。(例如,我不能只选择前4个)。
3 个解决方案
#1
8
You can get all the children of an HTML::Element
object using the content_list
method, so all the child nodes of the first <ul>
element in the document would be
您可以使用content_list方法来获得HTML的所有子元素:元素对象,因此文档中第一个
-
元素的所有子节点将是。
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new_from_file('my.html');
my @items = $tree->look_down(_tag => 'ul')->content_list;
But it is far more expressive to use HTML::TreeBuilder::XPath
, which lets you find all <li>
children of <ul>
children of <div>
elements anywhere in the document, like this
但是使用HTML::TreeBuilder::XPath更有表现力,它可以让您在文档中任何地方找到
-
的所有
- 子元素,如下所示
- 子元素,如下所示
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder->new_from_file('my.html');
my @items = $tree->findnodes('//div/ul/li')->get_nodelist;
#2
5
If you want to use the look_down method you can add an extra criteria to get only the children:
如果您想使用look_down方法,可以添加一个额外的条件,只获取子元素:
my @li_elements = $ul->look_down(_tag => 'li', sub {$_[0]->parent() == $ul});
#3
0
To make this page perfectly complete, I'll add one more option:
为了使这个页面完美地完成,我将添加一个选项:
@li = grep { $_->tag() eq 'li' } $ul->content_list;
(Where $ul is your top-level element)
(其中$ul是您的*元素)
#1
8
You can get all the children of an HTML::Element
object using the content_list
method, so all the child nodes of the first <ul>
element in the document would be
您可以使用content_list方法来获得HTML的所有子元素:元素对象,因此文档中第一个
-
元素的所有子节点将是。
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new_from_file('my.html');
my @items = $tree->look_down(_tag => 'ul')->content_list;
But it is far more expressive to use HTML::TreeBuilder::XPath
, which lets you find all <li>
children of <ul>
children of <div>
elements anywhere in the document, like this
但是使用HTML::TreeBuilder::XPath更有表现力,它可以让您在文档中任何地方找到
-
的所有
- 子元素,如下所示
- 子元素,如下所示
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder->new_from_file('my.html');
my @items = $tree->findnodes('//div/ul/li')->get_nodelist;
#2
5
If you want to use the look_down method you can add an extra criteria to get only the children:
如果您想使用look_down方法,可以添加一个额外的条件,只获取子元素:
my @li_elements = $ul->look_down(_tag => 'li', sub {$_[0]->parent() == $ul});
#3
0
To make this page perfectly complete, I'll add one more option:
为了使这个页面完美地完成,我将添加一个选项:
@li = grep { $_->tag() eq 'li' } $ul->content_list;
(Where $ul is your top-level element)
(其中$ul是您的*元素)