获取单独一个table,代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
#!/usr/bin/env python3
# _*_ coding=utf-8 _*_
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
from urllib.request import HTTPError
try :
html = urlopen( "http://en.wikipedia.org/wiki/Comparison_of_text_editors" )
except HTTPError as e:
print ( "not found" )
bsObj = BeautifulSoup(html, "html.parser" )
table = bsObj.findAll( "table" ,{ "class" : "wikitable" })[ 0 ]
if table is None :
print ( "no table" );
exit( 1 )
rows = table.findAll( "tr" )
csvFile = open ( "editors.csv" , 'wt' ,newline = ' ',encoding=' utf - 8 ')
writer = csv.writer(csvFile)
try :
for row in rows:
csvRow = []
for cell in row.findAll([ 'td' , 'th' ]):
csvRow.append(cell.get_text())
writer.writerow(csvRow)
finally :
csvFile.close()
|
获取所有table,代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
#!/usr/bin/env python3
# _*_ coding=utf-8 _*_
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
from urllib.request import HTTPError
try :
html = urlopen( "http://en.wikipedia.org/wiki/Comparison_of_text_editors" )
except HTTPError as e:
print ( "not found" )
bsObj = BeautifulSoup(html, "html.parser" )
tables = bsObj.findAll( "table" ,{ "class" : "wikitable" })
if tables is None :
print ( "no table" );
exit( 1 )
i = 1
for table in tables:
fileName = "table%s.csv" % i
rows = table.findAll( "tr" )
csvFile = open (fileName, 'wt' ,newline = ' ',encoding=' utf - 8 ')
writer = csv.writer(csvFile)
try :
for row in rows:
csvRow = []
for cell in row.findAll([ 'td' , 'th' ]):
csvRow.append(cell.get_text())
writer.writerow(csvRow)
finally :
csvFile.close()
i + = 1
|
以上这篇python 获取页面表格数据存放到csv中的方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/u011085172/article/details/73810708