I want to get all the <a>
tags which are children of <li>
我想要得到所有
<div>
<li class="test">
<a>link1</a>
<ul>
<li>
<a>link2</a>
</li>
</ul>
</li>
</div>
I know how to find element with particular class like this
我知道如何找到像这样特殊类的元素
soup.find("li", { "class" : "test" })
But i don't know how to find all a
which are children of <li class=test>
but not any others
但我不知道如何找到
like i want to select
就像我想要选择的
<a> link1 </a>
6 个解决方案
#1
49
Try this
试试这个
li = soup.find('li', {'class': 'text'})
children = li.findChildren()
for child in children:
print child
#2
77
Theres a super small section in the DOCs that shows how to find/find_all direct children.
文档中有一个超小的部分,展示了如何找到/find_all直接子节点。
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument
http://www.crummy.com/software/BeautifulSoup/bs4/doc/ the-recursive-argument
in your case:
在你的例子:
soup.find("li", { "class" : "test" },recursive=False)
soup.find_all("li", { "class" : "test" },recursive=False)
#3
11
try this:
试试这个:
li = soup.find("li", { "class" : "test" })
children = li.find_all("a") # returns a list of all <a> children of li
other reminders:
其他提示:
The find method only gets the first occurring child element. The find_all method gets all descendant elements and are stored in a list.
find方法只获取第一个出现的子元素。find_all方法获取所有子代元素并存储在列表中。
#4
8
Perhaps you want to do
也许你想做
soup.find("li", { "class" : "test" }).find('a')
#5
4
Yet another method - create a filter function that returns True
for all desired tags:
另一种方法——创建一个过滤器函数,返回所有需要的标签为真:
def my_filter(tag):
return (tag.name == 'a' and
tag.parent.name == 'li' and
'test' in tag.parent['class'])
Then just call find_all
with the argument:
然后用参数调用find_all:
for a in soup(my_filter): # or soup.find_all(my_filter)
print a
#6
1
"How to find all a
which are children of <li class=test>
but not any others?"
“如何找到
Given the HTML below (I added another <a>
to show te difference between select
and select_one
):
下面的HTML(我添加了另一个< >,以显示select和select_one之间的区别):
<div>
<li class="test">
<a>link1</a>
<ul>
<li>
<a>link2</a>
</li>
</ul>
<a>link3</a>
</li>
</div>
The solution is to use child combinator (>
) that is placed between two CSS selectors:
解决方案是使用子组合器(>),它位于两个CSS选择器之间:
>>> soup.select('li.test > a')
[<a>link1</a>, <a>link3</a>]
In case you want to find only the first child:
如果你只想找到第一个孩子:
>>> soup.select_one('li.test > a')
<a>link1</a>
#1
49
Try this
试试这个
li = soup.find('li', {'class': 'text'})
children = li.findChildren()
for child in children:
print child
#2
77
Theres a super small section in the DOCs that shows how to find/find_all direct children.
文档中有一个超小的部分,展示了如何找到/find_all直接子节点。
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument
http://www.crummy.com/software/BeautifulSoup/bs4/doc/ the-recursive-argument
in your case:
在你的例子:
soup.find("li", { "class" : "test" },recursive=False)
soup.find_all("li", { "class" : "test" },recursive=False)
#3
11
try this:
试试这个:
li = soup.find("li", { "class" : "test" })
children = li.find_all("a") # returns a list of all <a> children of li
other reminders:
其他提示:
The find method only gets the first occurring child element. The find_all method gets all descendant elements and are stored in a list.
find方法只获取第一个出现的子元素。find_all方法获取所有子代元素并存储在列表中。
#4
8
Perhaps you want to do
也许你想做
soup.find("li", { "class" : "test" }).find('a')
#5
4
Yet another method - create a filter function that returns True
for all desired tags:
另一种方法——创建一个过滤器函数,返回所有需要的标签为真:
def my_filter(tag):
return (tag.name == 'a' and
tag.parent.name == 'li' and
'test' in tag.parent['class'])
Then just call find_all
with the argument:
然后用参数调用find_all:
for a in soup(my_filter): # or soup.find_all(my_filter)
print a
#6
1
"How to find all a
which are children of <li class=test>
but not any others?"
“如何找到
Given the HTML below (I added another <a>
to show te difference between select
and select_one
):
下面的HTML(我添加了另一个< >,以显示select和select_one之间的区别):
<div>
<li class="test">
<a>link1</a>
<ul>
<li>
<a>link2</a>
</li>
</ul>
<a>link3</a>
</li>
</div>
The solution is to use child combinator (>
) that is placed between two CSS selectors:
解决方案是使用子组合器(>),它位于两个CSS选择器之间:
>>> soup.select('li.test > a')
[<a>link1</a>, <a>link3</a>]
In case you want to find only the first child:
如果你只想找到第一个孩子:
>>> soup.select_one('li.test > a')
<a>link1</a>