Python:爬取乌云厂商列表，使用BeautifulSoup解析

在SSS论坛看到有人写的Python爬取乌云厂商，想练一下手，就照着重新写了一遍

原帖：http://bbs.sssie.com/thread-965-1-1.html

#coding:utf-

import urllib2

from bs4 import BeautifulSoup

url = 'http://wooyun.org/corps/page/'

total_page =

count = 

file = open('wooyunCS1.csv', 'w')

for num in range(, total_page + ):

    real_url = url + str(num)

    response = urllib2.urlopen(real_url)

    html = response.read()

    soup = BeautifulSoup(html, 'html.parser', from_encoding='utf-8')

    for i in range(, len(soup('td', width=''))):

        if i %  == :

            name = soup('td', width='')[i].get_text()

            link = soup('td', width='')[i + ].get_text()

            print name, ':', link

            file.write(str(count) + ',' + name.encode('utf-8') + ',' + link.encode('utf-8'))

            count += 

file.close()

print "OVER"

#总结：

#存储CSV时候的格式： 用 + ',' + 格式，就会把每个参数分开成每一列存储

#所需要的内容交替出现时，可用取位置的方法，偶数行和奇数行来分别取

#在此例中使用str(num)，比使用re.sub()简便

秒客网

Python:爬取乌云厂商列表，使用BeautifulSoup解析

相关文章