Python 分析youku sohu tudou视频各种清晰度的下载地址

时间:2021-07-20 05:44:59

Sohu的下载地址获取参考了git:iambus 的部分代码

Youku: 不说了,网上一大堆,我通过M3U8分析的。

#defi=['mp4', 'flv', 'hd1','hd2'] #选一个分别代表mp4,标清,高清,超清
#id 视频地址上的id_(********)
url='http://v.youku.com/player/getRealM3U8/vid/'+id+'/type/'+defi+'/video.m3u8'
chunk=urllib2.urlopen(url)
m3u8_lines = chunk.readlines()
links=[]
for defi in ['flv','mp4']
for i in m3u8_lines:
try:
m=re.match(r'http.*%s'%defi,i)
if m.group() not in links:
links.append(m.group())
except BaseException, e:
print e
continue
if links:
break

Tudou: 查看网页源代码,找出视频的icode vcode 如有有vcode说明视频地址链接到优酷了,参照优酷的代码,将id = vcode就可以了。如果没有vcode说明这是tudou的视频。

打开 http://www.tudou.com/outplay/goto/getItemSegs.action?code=(icode)就可以看到视频的一些下载信息。

2 3 5分别代表清晰度

k 代表视频地址的id

打开 http://v2.tudou.com/f?id=(k) 就可以看到下载链接

上代码:

#defi = [1,2,3]#分别代表超清,高清,标清,
html = urllib2.urlopen(url).read()
icode = re.findall("\,icode: '(.+?)'",html)[0]
#vcode = re.findall("\,vcode: '(.+?)'",html)[0]#如果有vcode,去优酷上下
title = re.findall("\,kw: '(.+?)'",html)[0]
data = json.loads(urllib2.urlopen('http://www.tudou.com/outplay/goto/getItemSegs.action?code=%s' % icode).read())
links=[]
if defi=='1'and '5' in data:
data = data['5']
elif defi=='2' and '3' in data:
data = data['3']
elif defi=='3' and '2' in data:
data = data['2']
else:
data = data.items()[0][1]
for i in data:
chunk = urllib2.urlopen('http://v2.tudou.com/f?id=%s'%i['k']).read()
root = ElementTree.fromstring(chunk)
link = root.getiterator('f')[0].text
link = re.findall("(.+?)\&bc=",link)[0]
links.append(link)

Sohu: 查看网页源代码,找出视频的vid,打开http://hot.vrs.sohu.com/vrs_flash.action?vid=(vid)就可以看到视频的下载信息了,sohu这货不同清晰度视频的vid都不一样,'oriVid' 'superVid' 'highVid' 'norVid'分别代表原画,超清,高清,标清的vid。

上代码:

#----------------------------------------------------------------------
def info(url, defi):
'''获取下载信息'''
#defi = [1,2,3,4]选择你想要下载的清晰度,1:原画 2:超清 3:高清 4:标清
html = urllib2.urlopen(url).read()
id = int(re.findall(r'vid="(\d+)"',html)[0])
data = json.loads(urllib2.urlopen('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % id).read())
if defi=='1':
if data['data']['oriVid'] not in (0, id):
data = json.loads(urllib2.urlopen('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % data['data']['oriVid']).read())
else:
defi = '2'
if defi=='2':
if data['data']['superVid'] not in (0, id):
data = json.loads(urllib2.urlopen('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % data['data']['superVid']).read())
else:
defi = '3'
if defi=='3':
if data['data']['highVid'] not in (0, id):
data = json.loads(urllib2.urlopen('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % data['data']['highVid']).read())
else:
defi = '4'
if defi=='4' and data['data']['norVid'] not in (0, id):
data = json.loads(urllib2.urlopen('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % data['data']['norVid']).read())
links=[]
host = data['allot']
prot = data['prot']
data = data['data']
title = data['tvName']
for file, new in zip(data['clipsURL'], data['su']):
links.append(real_url(host, prot, file, new))
return (title, links, 'mp4')
#----------------------------------------------------------------------
def real_url(host, prot, file, new):
'''实时url'''
url = 'http://%s/?prot=%s&file=%s&new=%s' % (host, prot, file, new)
s = urllib2.urlopen(url).read().split('|')
return '%s%s?key=%s' % (s[0][:-1], new, s[3])