(七) 爬虫之爬取视频和音频文件

时间:2023-02-01 19:43:19

  之前都是爬取网页中的文本信息,没有爬取过视频和音频文件,所以爬取了下b站和网易云音乐,记录下整个过程,留着学习。

1. 爬取b站视频

  1.1 网页分析

  最近python机器学习比较火热,那就爬取点机器学习的视频吧。首先打开b站网页,输入“python机器”进行搜索,返回页面中,审查元素可以发现每个视频系列都有一个唯一的ID,如下图所示: av28879057, 即为当前视频的一个ID值。

(七) 爬虫之爬取视频和音频文件

  得知每个视频对应的唯一ID后,点击视频进去查看下,发现视频url主要有这下面这两种:

          1:https://www.bilibili.com/video/av28879057  (视频只有一集,url即为上面我们观察到的ID值)

(七) 爬虫之爬取视频和音频文件

   

    2. https://www.bilibili.com/video/av30292394/?p=3 (视频为一个系列,后面参数p=3,表示该ID下的第三集)

(七) 爬虫之爬取视频和音频文件

 

  至此我们基本上对于每个视频界面的url构造清楚了,接下来就是寻找视频的下载地址了。刷新下网页,点击播放,查看下网络请求,对结果按大小排序,可以发现一个x-flv格式的大文件的传输请求,应该就是视频的下载地址,如下图所示,可以看到请求需要7个参数,研究了下别的视频后发现,有两个参数是动态变化的:ssig和trid。查看了下其他的json返回请求,并没有发现这两个参数,最后只能去网页源码里搜索下,看看有没有相关的动态生成函数,却发现网页源码中直接包含视频的下载地址,存在于一个window.__playinfo__={} 的字典json中,只需对其正则匹配就行了,这下就简单了。

(七) 爬虫之爬取视频和音频文件

      将这个字典匹配后进行查看,结果如下:可以发现整个视频被拆分成了多个小的视频,按顺序进行了编号,order为序号,url即为视频下载地址,因此只需要分别对这些视频进行下载,最后再拼接就可以了。

(七) 爬虫之爬取视频和音频文件(七) 爬虫之爬取视频和音频文件
{
    "code": 0,
    "message": "0",
    "ttl": 1,
    "data": {
        "from": "local",
        "result": "suee",
        "message": "",
        "quality": 32,
        "format": "flv480",
        "timelength": 7121936,
        "accept_format": "flv720,flv480,flv360",
        "accept_description": ["高清 720P", "清晰 480P", "流畅 360P"],
        "accept_quality": [64, 32, 16],
        "video_codecid": 7,
        "seek_param": "start",
        "seek_type": "offset",
        "durl": [{
            "order": 1,
            "length": 363246,
            "size": 24653145,
            "ahead": "EZA=",
            "vhead": "AWQAH//hAB5nZAAfrNlAvD3m//DQEM/xAAADAAEAAAMAPA8YMZYBAAVo6+zyPA==",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?expires=1554535500&platform=pc&ssig=tz7ktrLd7bdj8qukIG9cjQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?expires=1554535500&platform=pc&ssig=tz7ktrLd7bdj8qukIG9cjQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=732e5ee7aad2a9a08406b92aa0bb2ca3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 2,
            "length": 330944,
            "size": 23865726,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?expires=1554535500&platform=pc&ssig=LemBQ8rVic-aAAN9iXwWGg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?expires=1554535500&platform=pc&ssig=LemBQ8rVic-aAAN9iXwWGg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?e=ig8euxZM2rNcNbR3hwdVhoM1nwdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=bb0c67342e48e1a8b438dcc9606f9e91&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 3,
            "length": 352981,
            "size": 25848758,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?expires=1554535500&platform=pc&ssig=vSDeETHYfUOLYf8caLiW5Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?expires=1554535500&platform=pc&ssig=vSDeETHYfUOLYf8caLiW5Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?e=ig8euxZM2rNcNbR3hbUVhoM1nwNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=30faa351c57a559f7b69654809418da9&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 4,
            "length": 394413,
            "size": 26565740,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?expires=1554535500&platform=pc&ssig=uaupgm_tbgSyVbou66oO-A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?expires=1554535500&platform=pc&ssig=uaupgm_tbgSyVbou66oO-A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=2bb21503e670b1a82769ed6524ea7c25&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 5,
            "length": 388312,
            "size": 26901267,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?expires=1554535500&platform=pc&ssig=DM7BjFfnFGzoux7NA7Ix5g&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?expires=1554535500&platform=pc&ssig=DM7BjFfnFGzoux7NA7Ix5g&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=68a9f6b8213285eb7fba15736e2c683b&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 6,
            "length": 239979,
            "size": 15473865,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?expires=1554535500&platform=pc&ssig=KGQ7DIH2XeAfW0QU4C7X7w&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?expires=1554535500&platform=pc&ssig=KGQ7DIH2XeAfW0QU4C7X7w&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?e=ig8euxZM2rNcNbRjhwdVhoM17bdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=4e27dfa3076edd399b0e6ee547f1dd51&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 7,
            "length": 426645,
            "size": 29245686,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?expires=1554535500&platform=pc&ssig=X_NsbB2FEjaE4W2yGI2YMQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?expires=1554535500&platform=pc&ssig=X_NsbB2FEjaE4W2yGI2YMQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?e=ig8euxZM2rNcNbRahwdVhoM17zdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=c3ef5ea3bdd2ab1ac310970d85341c80&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 8,
            "length": 423211,
            "size": 30372670,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?expires=1554535500&platform=pc&ssig=rU90cc9rkqn--2je747LAQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?expires=1554535500&platform=pc&ssig=rU90cc9rkqn--2je747LAQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?e=ig8euxZM2rNcNbRa7zUVhoM17zuBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=98d5301937834486e0bd9c2996cd73f4&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 9,
            "length": 291178,
            "size": 19475045,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?expires=1554535500&platform=pc&ssig=sMfGnyjVuKCsOzIp9EAanQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?expires=1554535500&platform=pc&ssig=sMfGnyjVuKCsOzIp9EAanQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?e=ig8euxZM2rNcNbRj7WdVhoM17bUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=08439132f1831b423be6577c7bd5ef89&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 10,
            "length": 370880,
            "size": 25219151,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?expires=1554535500&platform=pc&ssig=kKqhofi4ayRRMoquCxz-pw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?expires=1554535500&platform=pc&ssig=kKqhofi4ayRRMoquCxz-pw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=3d5d140e0dd02a83245ae86da23eb8b9&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 11,
            "length": 381612,
            "size": 26624914,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?expires=1554535500&platform=pc&ssig=HFFhsFFGyXOV8Q3QmF8sJQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?expires=1554535500&platform=pc&ssig=HFFhsFFGyXOV8Q3QmF8sJQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=63f2b71981080c752eed5166a9a85332&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 12,
            "length": 361344,
            "size": 25254786,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?expires=1554535500&platform=pc&ssig=UuAqqNbr1xC5gMlu5FUYdQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?expires=1554535500&platform=pc&ssig=UuAqqNbr1xC5gMlu5FUYdQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=1705e8d3f1075a717c6a91ae018396fe&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 13,
            "length": 334912,
            "size": 24639608,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?expires=1554535500&platform=pc&ssig=MQbcDgFo8iqQ2Uf4yO-L0A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?expires=1554535500&platform=pc&ssig=MQbcDgFo8iqQ2Uf4yO-L0A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?e=ig8euxZM2rNcNbR3hbUVhoM1nwNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=a5f1be479528b8a92a462acab849af46&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 14,
            "length": 365845,
            "size": 24930389,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?expires=1554535500&platform=pc&ssig=bpVSp4oDvkaLf1HTlWl5xA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?expires=1554535500&platform=pc&ssig=bpVSp4oDvkaLf1HTlWl5xA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?e=ig8euxZM2rNcNbRahwdVhoM17zdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=31a76487e1d32acd5b573f45a4169997&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 15,
            "length": 338347,
            "size": 23943047,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?expires=1554535500&platform=pc&ssig=ieioDVAxcZLksQ55egulgg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?expires=1554535500&platform=pc&ssig=ieioDVAxcZLksQ55egulgg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=36cdd93257a88fdc90a0c85f2b9babe3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 16,
            "length": 475181,
            "size": 34293360,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?expires=1554535500&platform=pc&ssig=Ps_lae8ZoX800sJZh-eRRA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?expires=1554535500&platform=pc&ssig=Ps_lae8ZoX800sJZh-eRRA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?e=ig8euxZM2rNcNbR3hwdVhoM1nwdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=fe34135d841548c79f78f687282c6bc3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 17,
            "length": 204846,
            "size": 13746922,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?expires=1554535500&platform=pc&ssig=mzbEJYcCFWAO0ioYePxG_Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?expires=1554535500&platform=pc&ssig=mzbEJYcCFWAO0ioYePxG_Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=915e7dc2c91a4072e91bd43988379c8b&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 18,
            "length": 469078,
            "size": 32875195,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?expires=1554535500&platform=pc&ssig=gdm21_hyrHYWZfsmgPkMDA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?expires=1554535500&platform=pc&ssig=gdm21_hyrHYWZfsmgPkMDA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=e644f16f487b7bd5625326a550716479&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 19,
            "length": 328213,
            "size": 21350561,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?expires=1554535500&platform=pc&ssig=3LoFiUwUGXFRJHBpigewOw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?expires=1554535500&platform=pc&ssig=3LoFiUwUGXFRJHBpigewOw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?e=ig8euxZM2rNcNbRjhbUVhoM17bNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=ec0d506311d0efdbfbf576d297c3ebba&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }, {
            "order": 20,
            "length": 280769,
            "size": 19777669,
            "ahead": "",
            "vhead": "",
            "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?expires=1554535500&platform=pc&ssig=r8NbvnHMQ58qfdYJHoD4kw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
            "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?expires=1554535500&platform=pc&ssig=r8NbvnHMQ58qfdYJHoD4kw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=1991fbc0d72dfe2aaac05943f26d54e4&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
        }]
    },
    "session": "e5c0e030d13633062a9889d1390010d9",
    "videoFrame": {}
}
View Code

(七) 爬虫之爬取视频和音频文件

 

  1.2 视频下载

   根据上面的分析过程,视频爬取步骤如下:

      1,根据视频的ID,构造该视频的url

      2,访问视频url,对返回的网页进行正则匹配,拿到所有的视频下载地址和编号

      3,根据视频下载地址,将视频保存到本地 (请求头中注意加入Referer和Origin,否则会返回Http 458)

   代码如下:

(七) 爬虫之爬取视频和音频文件(七) 爬虫之爬取视频和音频文件
#coding:utf-8
import requests
import re
import json
import os
import time 
import subprocess

#传入视频的url
def down_video(video_url,path="temp_videos"):
    """
    video_url 待下载的video的url
    path 下载的视频保存地址
    """
    #video_url = "https://www.bilibili.com/video/av30292394?p=3"
    #video_url = "https://www.bilibili.com/video/av28879057"
    
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; r…) Gecko/20100101 Firefox/66.0",
    }
    response = requests.get(video_url,headers=headers)

    #在网页源码中匹配视频地址信息
    match_text = re.search(r'<script>window.__playinfo__=(\{.*?\})</script>',response.text,re.S) #re.S,将字符窜中有换行时,将字符窜作为一个整体进行匹配;(否则一行匹配不到时,再匹配下一行)

    json_data = json.loads(match_text.group(1),encoding="utf-8")  #match_text.group(1)为unicode字符窜
    urls = json_data["data"]["durl"]  #视频包括多个部分,拿到包括各个部分url的列表
    content_size = sum([item["size"] for item in urls]) #视频总大小
    print("视频总大小为:%0.2f Mb"%(content_size/(1024*1024)))

    if not os.path.exists(path):
        os.mkdir(path)

    header={
        "Origin":"https://www.bilibili.com",
        "Referer":video_url,                    #请求头必须添加referer
    }
    headers.update(header)
    size=0
    start = time.time()
    for i,item in enumerate(urls):
        url = item["url"]
        try:
            result = requests.get(url,headers=headers,stream=True,verify=False)
            print result.status_code
            video_path = os.path.join(path,"{}.mp4".format(i))
            with open(video_path,"wb") as f:
                for chunk in result.iter_content(1024):
                    f.write(chunk)
                    f.flush() #清空缓存
                    size = size+len(chunk)
            #print("已下载:%0.2f Mb"%(size/(1024*1024)))
        except Exception as e:
            print("url下载错误:%s"%url)
            print(e)
    stop = time.time()
    print("下载完成,耗时:%0.2f秒"%(stop-start))
View Code

  1.3 视频拼接

    上面下载下来的视频也可以直接播放,但逐个播放比较麻烦,可以利用ffmpeg进行拼接。

    首先需要下载ffmpeg(https://ffmpeg.zeranoe.com/builds/),解压将其拷贝到相应的文件夹,然后将bin目录下的ffmpeg.exe加入到环境变量,命令行输入ffmpeg -version, 返回提示信息即安装成功

    ffmpeg拼接视频的命令语句为: ffmpeg -f concat -safe 0 -i path.txt -c copy output.mp4

    其中path.txt包含需要拼接的视频的路径,格式如下:(表示video路径下的v_1.mp4)   

file 'video/v_1.mp4'
file 'video/v_2.mp4'
file 'video/v_3.mp4'

    output.mp4表示拼接后的视频存放地址,也可以写成 video/output.mp4,即保存到video文件夹下。

    最终拼接的代码如下:

(七) 爬虫之爬取视频和音频文件(七) 爬虫之爬取视频和音频文件
#将下载的多个视频拼接成一个完整的视频    
def concatenate(path,title,output="vidoes"):
    """
    path 为待拼接的视频的保存地址
    title 为拼接后视频的名称
    output 为拼接后视频保存的地址
    """
    with open("path.txt",'w') as f:
        for root,dirs,files in os.walk(path):
            for file in files:
                if os.path.splitext(file)[1] in [".flv",".mkv",".mp4"]:
                    v_path = os.path.join(root,file)
                    f.write("file '{}'\n".format(v_path))
                    
    if os.path.exists("path.txt"):
        if not os.path.exists(output):
            os.mkdir(output)
        try:
            print("开始合并视频")
            path_name = os.path.join(output,title+".mp4")
            ffmpeg_command = r"D:\ffmpeg-win32-static\bin\ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name)
            #若将D:\ffmpeg-win32-static\bin\ffmpeg.exe路径加入环境变量,可以用"ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name)
            #print ffmpeg_command
            subprocess.call(ffmpeg_command)
            subprocess.call("rmdir /s %s"%path) #windows 删除目录
            subprocess.call("del path.txt")  #windows 删除文件
        except Exception as e:
            print(e)    
View Code

  完整的代码如下:

(七) 爬虫之爬取视频和音频文件(七) 爬虫之爬取视频和音频文件
#coding:utf-8
import requests
import re
import json
import os
import time 
import subprocess

#传入视频的url
def down_video(video_url,path="temp_videos"):
    """
    video_url 待下载的video的url
    path 下载的视频保存地址
    """
    #video_url = "https://www.bilibili.com/video/av30292394?p=3"
    #video_url = "https://www.bilibili.com/video/av28879057"
    
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; r…) Gecko/20100101 Firefox/66.0",
    }
    response = requests.get(video_url,headers=headers)

    #在网页源码中匹配视频地址信息
    match_text = re.search(r'<script>window.__playinfo__=(\{.*?\})</script>',response.text,re.S) #re.S,将字符窜中有换行时,将字符窜作为一个整体进行匹配;(否则一行匹配不到时,再匹配下一行)

    json_data = json.loads(match_text.group(1),encoding="utf-8")  #match_text.group(1)为unicode字符窜
    urls = json_data["data"]["durl"]  #视频包括多个部分,拿到包括各个部分url的列表
    content_size = sum([item["size"] for item in urls]) #视频总大小
    print("视频总大小为:%0.2f Mb"%(content_size/(1024*1024)))

    if not os.path.exists(path):
        os.mkdir(path)

    header={
        "Origin":"https://www.bilibili.com",
        "Referer":video_url,                    #请求头必须添加referer
    }
    headers.update(header)
    size=0
    start = time.time()
    for i,item in enumerate(urls):
        url = item["url"]
        try:
            result = requests.get(url,headers=headers,stream=True,verify=False)
            print result.status_code
            video_path = os.path.join(path,"{}.mp4".format(i))
            with open(video_path,"wb") as f:
                for chunk in result.iter_content(1024):
                    f.write(chunk)
                    f.flush() #清空缓存
                    size = size+len(chunk)
            #print("已下载:%0.2f Mb"%(size/(1024*1024)))
        except Exception as e:
            print("url下载错误:%s"%url)
            print(e)
    stop = time.time()
    print("下载完成,耗时:%0.2f秒"%(stop-start))    

#将下载的多个视频拼接成一个完整的视频    
def concatenate(path,title,output="vidoes"):
    """
    path 为待拼接的视频的保存地址
    title 为拼接后视频的名称
    output 为拼接后视频保存的地址
    """
    with open("path.txt",'w') as f:
        for root,dirs,files in os.walk(path):
            for file in files:
                if os.path.splitext(file)[1] in [".flv",".mkv",".mp4"]:
                    v_path = os.path.join(root,file)
                    f.write("file '{}'\n".format(v_path))
                    
    if os.path.exists("path.txt"):
        if not os.path.exists(output):
            os.mkdir(output)
        try:
            print("开始合并视频")
            path_name = os.path.join(output,title+".mp4")
            ffmpeg_command = r"D:\ffmpeg-win32-static\bin\ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name)
            #若将D:\ffmpeg-win32-static\bin\ffmpeg.exe路径加入环境变量,可以用"ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name)
            #print ffmpeg_command
            subprocess.call(ffmpeg_command)
            subprocess.call("rmdir /s %s"%path) #windows 删除目录
            subprocess.call("del path.txt")  #windows 删除文件
        except Exception as e:
            print(e)    
    
if __name__=="__main__":
    # down_video("https://www.bilibili.com/video/av28879057")
    # concatenate("temp_videos",title="python")
    down_video("https://www.bilibili.com/video/av30292394?p=3")
    concatenate("temp_videos",title="python机器学习与量化分析")
                    
    
    
View Code

 参考:

https://amberwest.github.io/2018/09/11/%E7%94%A8python%E4%B8%8B%E8%BD%BD%E5%93%94%E5%93%A9%E5%93%94%E5%93%A9%E8%A7%86%E9%A2%91/

https://github.com/Henryhaohao/Bilibili_video_download

 

2. 爬取网易云音乐

  2.1 网页分析

    查看了下网页版的网易云音乐,也是每首歌有一个ID,如下,对应的网址组成为 https://music.163.com/song?id=1353372483(请求时网易自动添加了一个“#”,从而变成了https://music.163.com/#/song?id=1352541009)

(七) 爬虫之爬取视频和音频文件

 

    接着刷新网页,看下网络请求,同样按大小排序,可以发现一个较大的mp3传输请求,如下图所示:该url即为音乐的下载url,直接发送请求就能下载该视频,剩下就是如何获得每首歌的下载url。

(七) 爬虫之爬取视频和音频文件

 

   查看了下其他xhr请求的返回值,发现了如下的返回值,可以看到其包含了歌曲的相关信息,从中可以拿到我们需要的url。观察这个请求,发现是一个post请求,需要提交表单数据,主要是两个参数'params' 和'encSecKey', 但是是加密后的数据,如下第二张图所示,因此需要对加密方法进行解析。

(七) 爬虫之爬取视频和音频文件

(七) 爬虫之爬取视频和音频文件

    整理下思路,下载音乐的整个流程可以分为三步,如下:

      1.通过get请求,访问https://music.163.com/song?id=1353372483,能拿到歌曲的名字,歌词等基本信息

      2.通过post请求,提交两个参数'params' 和'encSecKey',访问https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token=,从返回的json       数据中能拿到歌曲的下载地址和大小等信息

      3. 访问歌曲的下载地址(http://m10.music.126.net/20190407154531/74f897c9d014dede19a0905644433907/ymusic/035c/5458/530f/46ebf59083c2f04cc090de3b1e0beaf0.mp3),将其写到本地,即完成下载信息

    因此,剩下的就是如何构造加密后的两个参数'params' 和'encSecKey'。点击浏览器的source选项,在每个js文件下搜索下encSecKey(或者直接ctrl+shift +f 全局搜索),在如下js文件中找到了相关的代码,正好包括了我们需要的两个参数。

(七) 爬虫之爬取视频和音频文件

    对上面的代码进行分析,主要是var bYl2x = window.asrsea()这个函数完成具体的工作,搜索这个函数发现了如下的语句 window.asrsea = d, 即该函数是d函数,而d函数中调用了一次a函数,两次b函数和一次c函数

 (七) 爬虫之爬取视频和音频文件

       其中a函数主要是产生一组随机的字符窜,这里是a(16)产生一个包含16个字符的随机字符窜,上面js代码和对应的python实现如下:

(七) 爬虫之爬取视频和音频文件(七) 爬虫之爬取视频和音频文件
#a 函数
 function a(a) {
        var d, e, b = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", c = "";
        for (d = 0; a > d; d += 1)
            e = Math.random() * b.length,
            e = Math.floor(e),
            c += b.charAt(e);
        return c
    }

#对应python产生随机字符窜代码

def random_str(size):
    return binascii.hexlify(os.urandom(size))[:16] #binascii.hexlify()接受byte字符窜,返回ascii字符窜
python实现a函数

   b函数是对数据进行AES对称加密,js代码和对应的python实现如下: 

           python需要用到Crypto模块,pip install crypto安装会有问题,通过如下方式安装:(windows 7和python2.7环境安装成功)

                           python -m pip install pycrypto

(七) 爬虫之爬取视频和音频文件(七) 爬虫之爬取视频和音频文件
#b函数
function b(a, b) {
        var c = CryptoJS.enc.Utf8.parse(b)
          , d = CryptoJS.enc.Utf8.parse("0102030405060708")
          , e = CryptoJS.enc.Utf8.parse(a)
          , f = CryptoJS.AES.encrypt(e, c, {
            iv: d,
            mode: CryptoJS.mode.CBC
        });
        return f.toString()
    }

#python 实现b函数
from Crypto.Cipher import AES
import base64

def get_params(text,key):  #AES对称加密
    iv = '0102030405060708'
    pad = 16 - len(text)%16
    text = text + pad * chr(pad) 
    encryptor = AES.new(key, AES.MODE_CBC, iv) 
    result = encryptor.encrypt(text) 
    result_str = base64.b64encode(result).decode('utf-8') 
    return result_str
python实现b函数

   c函数是对数据进行RSA不对称加密,s代码和对应的python实现如下:

(七) 爬虫之爬取视频和音频文件(七) 爬虫之爬取视频和音频文件
#c函数
function c(a, b, c) {
        var d, e;
        return setMaxDigits(131),
        d = new RSAKeyPair(b,"",c),
        e = encryptedString(d, a)
    }

#python实现c函数
def get_encSecKey(text,pubkey,modulus):  #rsa不对称加密
    text = text[::-1]
    rs = pow(int(binascii.hexlify(text),16),int(pubkey,16),int(modulus,16))
    return format(rs,'x').zfill(256)
python实现c函数

  接下来就该分析下window.asrsea()传入的四个参数了,需要插入断点,如图所示,点击某一行插入断点,然后点击播放音乐,执行到断点处后,点击右边红圈处的两个按钮(第一个向下执行一个过程,第二个向下执行一句),当我们选中四个参数中的某一个时(复制时那样选中),即能看到该参数的值。

(七) 爬虫之爬取视频和音频文件

   如下图是选中第二个参数时,显示的值为“010001”,说明第二个参数为一个常量,查看其它参数后发现第二三四个参数都为常量,第一个参数为与id相关的json数据。四个参数的示例可以见下面:

(七) 爬虫之爬取视频和音频文件

四个参数示例:

first_param = {"ids":"[1353194608]","level":"standard","encodeType":"aac","csrf_token":""}
second_param = "010001"
third_param = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7"
fourth_param = "0CoJUm6Qyw8W8jud"

    上面整个过程只需要利用歌曲的ID值和上面三个常量参数,就可以构造最终的加密数据了,剩下的就是写代码了

  2.2 歌曲下载

  根据上面的分析过程,代码书写流程如下:

    1,根据歌曲id值,访问https://music.163.com/song?id=1353372483,利用正则表达式匹配网页内容,获得歌曲名称

            2,计算加密后的参数'params' 和'encSecKey',post请求访问https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token=,拿到歌曲url和size

   3. 访问歌曲的下载地址,将结果写到本地

  完整代码如下:

(七) 爬虫之爬取视频和音频文件(七) 爬虫之爬取视频和音频文件
#coding:utf-8
import os
import binascii
from Crypto.Cipher import AES
import base64
import json
import requests
import re

first_param = {"ids":"[1353194608]","level":"standard","encodeType":"aac","csrf_token":""}
second_param = "010001"
third_param = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7"
fourth_param = "0CoJUm6Qyw8W8jud"
headers={
            "Referer":"https://music.163.com/",
            "User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Mobile Safari/537.36"
            }
def random_str(size):
    return binascii.hexlify(os.urandom(size))[:16] #binascii.hexlify()接受byte字符窜,返回ascii字符窜

def get_params(text,key):  #AES对称加密
    iv = '0102030405060708'
    pad = 16 - len(text)%16
    text = text + pad * chr(pad) 
    encryptor = AES.new(key, AES.MODE_CBC, iv) 
    result = encryptor.encrypt(text) 
    result_str = base64.b64encode(result).decode('utf-8') 
    return result_str

def get_encSecKey(text,pubkey,modulus):  #rsa不对称加密
    text = text[::-1]
    rs = pow(int(binascii.hexlify(text),16),int(pubkey,16),int(modulus,16))
    return format(rs,'x').zfill(256)
    
def encrypt_data(first_param,second_param,third_param,fourth_param):
    data={}
    i = random_str(16)
    temp = get_params(json.dumps(first_param),fourth_param)
    params = get_params(temp,i)
    encSecKey = get_encSecKey(i,second_param,third_param)
    data['params']=params.encode("utf-8")
    data['encSecKey']=encSecKey
    return data
    
#获取歌曲名称
def get_song_title(id):
    url = "https://music.163.com/song?id=%s"%(id)
    response = requests.get(url,headers=headers)
    title = re.search(r'<title>(.*?)\s-',response.text).group(1)    #匹配歌曲标题
    #print(title)
    return title
    
#获取歌曲的下载地址,大小等信息    
def get_song_info(id):
    first_param['ids'] = "[%s]"%id
    data = encrypt_data(first_param,second_param,third_param,fourth_param)
    url="https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token="
    response = requests.post(url,headers=headers,data=data)
    #print response.status_code
    json_data = json.loads(response.text)
    return json_data
    
#下载歌曲
def down_song(id,down_url,song_title,size):
    filename = song_title+str(id)+".mp3"
    print("歌曲大小为:%0.2f Mb"%(size/(1024*1024)))
    try:
        result = requests.get(down_url,headers=headers)
        with open(filename,"wb") as f:
            for chunk in result.iter_content(1024):
                f.write(chunk)
                f.flush()
    except Exception as e:
        print("下载失败,id值为:%s"%id)
        print(e)
    print("下载完成")
    

if __name__=="__main__":
    
    id=input("请输入歌曲的id值,如:1353194608  ")
    song_title = get_song_title(id)
    song_info=get_song_info(id)
    down_url = song_info["data"][0]["url"]
    size = song_info["data"][0]["size"]
    #print down_url,size
    down_song(id,down_url,song_title,size)
    
    
    
    
    
    
网易云音乐下载

 参考:

  https://blog.csdn.net/qq_38282706/article/details/80251666

  https://github.com/Jack-Cherish/python-spider/blob/master/Netease/Netease.py