4.Python爬虫小案例

1.网络爬虫定义：按照一定的规则，自动的抓取网站信息的程序或者脚本。

2.流程：request打开url得到html文档==浏览器打开源码分析元素节点==通过BeautifulSoup得到想要的数据==存储到指定路径

3.代码如下：

from urllib import request

from bs4 import BeautifulSoup

url = "https://www.jianshu.com/"

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36'}

page = request.Request(url,headers=headers)

pageinfo = request.urlopen(page).read().decode('UTF-8')  #打开url,获取数据返回response对象

soup = BeautifulSoup(pageinfo,'html.parser')  #将获取到的内容转换为BeautifulSoup格式，并将html.parser作为解析器

titles = soup.find_all('a','title')  #查找所有a标签中class='title'的语句

with open(r"E:\python.txt",'w') as file:   #open()是读写文件的函数，with会自动close（）已打开的文件

    for title in titles:

        file.write(title.string)

秒客网

4.Python爬虫小案例

相关文章