Beautiful Soup的用法（五）：select的使用

select 的功能跟find和find_all 一样用来选取特定的标签，它的选取规则依赖于css，我们把它叫做css选择器，如果之前有接触过jquery ，可以发现select的选取规则和jquery有点像。

通过标签名查找

在进行过滤时标签名不加任何修饰，如下：

from bs4 import BeautifulSoup  
import re  
  
html = """  
<html><head><title>The Dormouse's story</title></head>  
<body>  
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>  
<p class="story">Once upon a time there were three little sisters; and their names were  
<a href="/lacie" class="sister" >Lacie</a> and  
<a href="/tillie" class="sister" >Tillie</a>;  
and they lived at the bottom of a well.</p>  
</body>  
</html>  
"""  
  
soup = BeautifulSoup(html, "lxml")  
print ('p')

返回的结果如下：

[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>, <p class="story">Once upon a time there were three little sisters; and their names were\n<a class="sister" href="/lacie" >Lacie</a> and\n<a class="sister" href="/tillie" >Tillie</a>;\nand they lived at the bottom of a well.</p>]

通过结果可以看出，他返回的是一个数组，再继续看看数组里的元素是什么呢？

print type(('p')[0])

结果为：

<class ''>

清楚了返回的是，这一点和find_all是一样的，select('p') 返回了所有标签名为p的tag。

通过类名和id进行查找

在进行过滤时类名前加点，id名前加 #

print ('.title')  
print ('#link2')

返回的结果为：

[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>]
[<a class="sister" href="/lacie" >Lacie</a>]

通过属性查找

如果不是id或者是类名，是不是就不能进行过滤了？如果可以，该如何来表达，

print ('[href="/lacie"]')

选择href 为/lacie　的tag。

组合查找

组合查找可以分为两种，一种是在一个tag中进行两个条件的查找，一种是树状的查找一层一层之间的查找。

第一种情况，如下所示：

print ('a#link2')

选择标签名为a，id为link2的tag。

输出的结果如下：

[<a class="sister" href="/lacie" >Lacie</a>]

另一种情况，如下：

从body开始，在body里面查找所有的 p，在所有的p 中查找标签名为a，id 为link2的tag，这样像树状一层一层的查找，在分析html结构是是非常常见的。层和层之间用空格分开。

print ('body p a#link2')

结果如下：

[<a class="sister" href="/lacie" >Lacie</a>]

秒客网

Beautiful Soup的用法（五）：select的使用

通过标签名查找

通过类名和id进行查找

通过属性查找

组合查找

相关文章