python 淘宝爬虫

时间:2018-08-11 07:22:25
【文件属性】:

文件名称:python 淘宝爬虫

文件大小:4KB

文件格式:PY

更新时间:2018-08-11 07:22:25

python 淘宝爬虫

import time import leveldb from urllib.parse import quote_plus import re import json import itertools import sys import requests from queue import Queue from threading import Thread URL_BASE = 'http://s.m.taobao.com/search?q={}&n=200&m=api4h5&style=list&page={}' def url_get(url): # print('GET ' + url) header = dict() header['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' header['Accept-Encoding'] = 'gzip,deflate,sdch' header['Accept-Language'] = 'en-US,en;q=0.8' header['Connection'] = 'keep-alive' header['DNT'] = '1' #header['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36' header['User-Agent'] = 'Mozilla/12.0 (compatible; MSIE 8.0; Windows NT)' return requests.get(url, timeout = 5, headers = header).text


网友评论