关于python网络爬虫——摘取新闻标题及链接

Python是最近流行的编程语言，最近学习了python编程，并在网络爬虫方面进行了研究，下面给出简单实例来实现网络信息的获取

步骤一：要有python开发环境，可下载python3.5版本，或anaconda3

步骤二：在命令行输入jupyter notebook 打开jupyter

步骤三：书写如下代码：

import requests
 from bs4 import BeautifulSoup
 res = requests.get(这里写上要获取的网页链接字符串)
 res.encoding = 'utf-8'
 soup = BeautifulSoup(res.text,'html.parser')
 for news in soup.select('.news-item'):
     if len(news.select('h2'))>0:
         h2 = news.select('h2')[0].text
         a = news.select('a')[0]['href']
         print(h2,a)

摘取标题和文本内容：

import requests

 from bs4 import BeautifulSoup

 res = requests.get('http://news.sina.com.cn/c/nd/2017-09-04/doc-ifykqmrv9167659.shtml')#以新浪新闻为例

 res.encoding = 'uth=f-8'#设置中文编码方式

 #print(res.text)

 soup = BeautifulSoup(res.text,'html.parser')

 vname = soup.select('#artibodyTitle')#这是新闻标题的标签id

 print(vname[0].text)

 Text = soup.select('#artibody')#这是文本标签id

 p = Text[0].select('p')#提取p标签内文本

 for i in p:#for循环输出文本

     print(i.text)

秒客网

关于python网络爬虫——摘取新闻标题及链接

相关文章