Python 分析youku sohu tudou视频各种清晰度的下载地址

Sohu的下载地址获取参考了git：iambus 的部分代码

Youku: 不说了，网上一大堆，我通过M3U8分析的。

#defi=['mp4', 'flv', 'hd1','hd2'] #选一个分别代表mp4，标清，高清，超清
#id 视频地址上的id_(********)
url='http://v.youku.com/player/getRealM3U8/vid/'+id+'/type/'+defi+'/video.m3u8'
chunk=urllib2.urlopen(url)
m3u8_lines = chunk.readlines()
links=[]
for defi in ['flv','mp4']
    for i in m3u8_lines:
        try:
            m=re.match(r'http.*%s'%defi,i)
            if m.group() not in links:
                links.append(m.group())
        except BaseException, e:
            print e
            continue
    if links:
        break

Tudou: 查看网页源代码，找出视频的icode vcode 如有有vcode说明视频地址链接到优酷了，参照优酷的代码，将id = vcode就可以了。如果没有vcode说明这是tudou的视频。

打开 http://www.tudou.com/outplay/goto/getItemSegs.action?code=(icode)就可以看到视频的一些下载信息。

2 3 5分别代表清晰度

k 代表视频地址的id

打开 http://v2.tudou.com/f?id=(k) 就可以看到下载链接

上代码：

#defi = [1,2,3]#分别代表超清，高清，标清，
html = urllib2.urlopen(url).read()
icode = re.findall("\,icode: '(.+?)'",html)[0]
#vcode = re.findall("\,vcode: '(.+?)'",html)[0]#如果有vcode，去优酷上下
title = re.findall("\,kw: '(.+?)'",html)[0]
data = json.loads(urllib2.urlopen('http://www.tudou.com/outplay/goto/getItemSegs.action?code=%s' % icode).read())
links=[]
if defi=='1'and '5' in data:
    data  = data['5']
elif defi=='2' and '3' in data:
    data  = data['3']
elif defi=='3' and '2' in data:
    data  = data['2']
else:
    data = data.items()[0][1]
for i in data:
    chunk = urllib2.urlopen('http://v2.tudou.com/f?id=%s'%i['k']).read()
    root = ElementTree.fromstring(chunk)
    link = root.getiterator('f')[0].text
    link = re.findall("(.+?)\&bc=",link)[0]
    links.append(link)

Sohu: 查看网页源代码，找出视频的vid，打开http://hot.vrs.sohu.com/vrs_flash.action?vid=(vid)就可以看到视频的下载信息了，sohu这货不同清晰度视频的vid都不一样，'oriVid' 'superVid' 'highVid' 'norVid'分别代表原画，超清，高清，标清的vid。

上代码：

#----------------------------------------------------------------------
def info(url, defi):
    '''获取下载信息'''
    #defi = [1,2,3,4]选择你想要下载的清晰度，1:原画 2:超清 3:高清 4：标清
    html = urllib2.urlopen(url).read()
    id = int(re.findall(r'vid="(\d+)"',html)[0])
    data = json.loads(urllib2.urlopen('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % id).read())
    if defi=='1':
        if data['data']['oriVid'] not in (0, id):
            data = json.loads(urllib2.urlopen('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % data['data']['oriVid']).read())
        else:
            defi = '2'
    if defi=='2':
        if data['data']['superVid'] not in (0, id):
            data = json.loads(urllib2.urlopen('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % data['data']['superVid']).read())
        else:
            defi = '3'
    if defi=='3':
        if data['data']['highVid'] not in (0, id):
            data = json.loads(urllib2.urlopen('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % data['data']['highVid']).read())
        else:
            defi = '4'
    if defi=='4' and data['data']['norVid'] not in (0, id):
        data = json.loads(urllib2.urlopen('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % data['data']['norVid']).read())
    links=[]
    host = data['allot']
    prot = data['prot']
    data = data['data']
    title = data['tvName']
    for file, new in zip(data['clipsURL'], data['su']):
        links.append(real_url(host, prot, file, new))
    return (title, links, 'mp4')
#----------------------------------------------------------------------
def real_url(host, prot, file, new):
    '''实时url'''
    url = 'http://%s/?prot=%s&file=%s&new=%s' % (host, prot, file, new)
    s = urllib2.urlopen(url).read().split('|')
    return '%s%s?key=%s' % (s[0][:-1], new, s[3])

秒客网

Python 分析youku sohu tudou视频各种清晰度的下载地址

相关文章