小爬爬5:重点回顾&&移动端数据爬取1

时间:2024-02-21 10:49:37

1.

(1)什么是selenium
    - 基于浏览器自动化的一个模块
(2)在爬虫中为什么使用selenium及其和爬虫之间的关联
    - 可以便捷的获取动态加载的数据
    - 实现模拟登陆
(3)列举常见的selenium模块的方法及其作用
    - get(url)              
    - find系列的函数进行标签定位         #记住常用的几个
    - send_keys(‘key’)            #录入1个数据池
    - click()                #点击
    - excute_script(‘jsCode’)      #执行js代码
    - page_source              #获取页面的数据
    - switch_to.frame(\'iframeID\')    #iframe需要切换
    - quite()                 #关闭
    - save_screenshot()          #保存屏幕的内容
    - a = ActionChains(bro)        #动作链实例化对象
    - a.click_and_hold(\'tag\')        #点击且长按这个标签
    - tag.move_by_offset(x,y).perform()  #偏移某个标签

(4)loop的作用:
    可以将多个任务对象注册到loop中
    loop就可以通过不间断循环的形式异步的执行任务对象

(5)多任务异步协程是如何实现异步的
    - 协程
    - 任务对象
    - loop

 

2.单线程多任务异步协程回顾

# Author: studybrother sun
import asyncio
import aiohttp
#在实现该函数的时候,其函数实现内部不可以出现非异步模块的代码
async def request(url):
   async with aiohttp.ClientSession() as s:
       async with await s.get(url=url) as response:
            page_text = await response.text()     #解析的搜索界面

            return page_text

def callback(task):  #回调
    print(task.result())
def callback1(task):
    print(task.result())

#事件循环对象:
loop = asyncio.get_event_loop()
c = request(\'https://www.baidu.com\')
c1 = request(\'https://www.sogou.com\')

task = asyncio.ensure_future(c)
task.add_done_callback(callback)

task1 = asyncio.ensure_future(c1)
task1.add_done_callback(callback1)

tasks = [task,task1]
loop.run_until_complete(asyncio.wait(tasks))

运行的得到下面的结果:

<html>
<head>
    <script>
        location.replace(location.href.replace("https://","http://"));
    </script>
</head>
<body>
    <noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>
<!DOCTYPE html>
<html lang="cn">
<head>
    <script>window._speedMark = new Date();
     window.lead_ip = \'221.218.208.77\';window.now = 1559307496193;</script>    <meta charset="utf-8">
<link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dlweb.sogoucdn.com">
<title>搜狗搜索引擎 - 上网从搜狗开始</title>
<link rel="shortcut icon" href="/images/logo/new/favicon.ico?v=4" type="image/x-icon">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="搜狗搜索">
<meta name="keywords" content="搜狗搜索,网页搜索,微信搜索,视频搜索,图片搜索,音乐搜索,新闻搜索,软件搜索,问答搜索,百科搜索,购物搜索">
<meta name="description" content="搜狗搜索是全球第三代互动式搜索引擎,支持微信公众号和文章搜索、知乎搜索、英文搜索及翻译等,通过自主研发的人工智能算法为用户提供专业、精准、便捷的搜索服务。">    <link rel="stylesheet" type="text/css" href="/web/index/css/base.v.1.4.12.css">
<style>.wrapper .suggestion{border: 1px solid #e8e8e8; width:622px;-moz-box-shadow: 0px 1px 8px rgba(0,0,0,0.1);-webkit-box-shadow: 0px 1px 8px rgba(0,0,0,0.1);box-shadow: 0px 1px 8px rgba(0,0,0,0.1);border-top-left-radius: 0px;border-top-right-radius: 0px;border-bottom-right-radius: 2px;border-bottom-left-radius: 2px; top:43px;}  .wrapper .suglist{width: 206px;}  .wrapper .suglist .keyword {color: #7a77c8;}  .big-scn .suggestion {width: 654px;}  .big-scn .suglist{width:236px;}  .wrapper .suglist{ padding:4px 0}</style></head>
<body >
        <div class="bg-gj-w" id="settings-mask" style="display: none;"></div>
<div class="gjss" id="settings-advanced" style="display: none;top:-240px;">
    <div class="hf-box" id="settings-save-layer">
        <div class="hf-def">已保存设置</div>
    </div>
    <div class="gjss-tab">
        <a uigs-id="tab_set" href="javascript:void(0);" class="js-settings-tab tab-a cur">搜索设置</a>
        <a uigs-id="tab_adv" href="javascript:void(0);" class="js-settings-tab tab-a">高级搜索</a>
        <a href="javascript:void(0);" class="close-btn" id="settings-close"></a>
    </div>
    <div class="gjss-main">
        <div class="gjss-sz js-settings-content">
            <p class="gjss-err js-settings-mask" style="display: none;">搜索设置暂不可用,请启用浏览器的Cookie功能,然后刷新本页。</p>
            <div class="bg-wkq js-settings-mask" id="settings-tips" style="display: none;"></div>

            <dl class="js-as-select">
                <dt>搜索结果显示条数</dt>
                <dd>
                    <a href="javascript:void(0);" class="xz" id="settings-number" data-value="10">每页显示10条</a>
                    <ul id="settings-number-list">
                        <li><a uigs-id="set_10" href="javascript:void(0);" data-value="10">每页显示10条</a></li>
                        <li><a uigs-id="set-20" href="javascript:void(0);" data-value="20">每页显示20条</a></li>
                        <li><a uigs-id="set-50" href="javascript:void(0);" data-value="50">每页显示50条</a></li>
                        <li><a uigs-id="set-100" href="javascript:void(0);" data-value="100">每页显示100条</a></li>
                    </ul>
                </dd>
                <input type="hidden" name="pageNum" id="settings-show-number" value="10">
            </dl>
            <p class="enter" style="padding-top: 20px;">
                <a href="javascript:void(0);" id="settings-save" uigs-id="set-save" class="a1">保存</a>
                <a href="javascript:void(0);" id="settings-reset" uigs-id="set-reset" class="a2">恢复默认</a>
            </p>
        </div>
        <div class="gjss-sz js-settings-content" style="display: none;">
            <form action="/web" target="_blank" id="advanced-search-form">
                <input type="hidden" name="query" value="">
                <input name="fieldtitle" type="hidden" value=""/>
                <input name="fieldcontent" type="hidden" value=""/>
                <input name="fieldstripurl" type="hidden" value=""/>
                <input name="bstype" type="hidden" value=""/>
                <input name="ie" type="hidden" value="utf8"/>
                <dl>
                    <dt>搜索关键词</dt>
                    <dd class="js-as-radio">
                                                <div class="input-box js-input-box" id="advanced-query-box">
                            <input name="q" type="text" must="1" size="42" maxlength="100" autocomplete="off" placeholder="例如:搜狗真棒(多个关键词可用空格区分)">
                            <span class="err-word">* 请输入搜索关键词</span>
                        </div>
                        <a uigs-id="adv_split-query" href="javascript:void(0);" data-value="checkbox" class="dk-btn cur">拆分关键词</a>
                        <a uigs-id="adv_no-split-query" href="javascript:void(0);" data-value="" class="dk-btn">不拆分关键词</a>
                        <input type="hidden" name="include" value="checkbox">
                    </dd>
                </dl>
                <dl>
                    <dt>在指定站内搜索</dt>
                    <dd>
                        <div class="input-box js-input-box"><input name="sitequery" type="text" size="40" autocomplete="off" placeholder="例如:www.sogou.com"></div>
                    </dd>
                </dl>
                <dl class="js-as-select" style="padding-top:16px">
                    <dt>搜索词位于</dt>
                    <dd>
                        <a href="javascript:void(0);" class="xz">网页中任何地方</a>
                        <ul>
                            <li><a href="javascript:void(0);" data-value="0">网页中任何地方</a></li>
                            <li><a href="javascript:void(0);" data-value="1">仅在标题中</a></li>
                            <li><a href="javascript:void(0);" data-value="2">仅在正文中</a></li>
                            <li><a href="javascript:void(0);" data-value="3">仅在网址中</a></li>
                        </ul>
                    </dd>
                    <input type="hidden" name="located" value="0">
                </dl>
                <dl class="js-as-select" style="padding-top:16px">
                    <dt>需要搜索的文件格式</dt>
                    <dd >
                        <a href="javascript:void(0);" class="xz">全部网页</a>
                        <ul>
                            <li><a href="javascript:void(0);" data-value="">全部网页</a></li>
                            <li><a href="javascript:void(0);" data-value="doc">Microsoft Word (.doc)</a></li>
                            <li><a href="javascript:void(0);" data-value="xls">Microsoft Excel (.xls)</a></li>
                            <li><a href="javascript:void(0);" data-value="ppt">Microsoft Powerpoint (.ppt)</a></li>
                            <li><a href="javascript:void(0);" data-value="pdf">Adobe Acrobat PDF (.pdf)</a></li>
                            <li><a href="javascript:void(0);" data-value="rtf">RTF (.rtf)</a></li>
                            <li><a href="javascript:void(0);" data-value="all">全部文档</a></li>
                        </ul>
                    </dd>
                    <input type="hidden" name="filetype" value="">
                </dl>
                <dl>
                    <dt>搜索结果排序方式</dt>
                    <dd class="js-as-radio">
                        <a uigs-id="adv_relevance-ranking" href="javascript:void(0);" data-value="off" class="dk-btn cur">按相关性排序</a>
                        <a uigs-id="adv_time-sort" href="javascript:void(0);" data-value="on" class="dk-btn">按时间排序</a>
                        <input type="hidden" name="tro" value="off">
                    </dd>
                </dl>
                <p class="enter"><input id="adv-search-btn" uigs-id="adv_search-btn" type="submit" class="a1" value="开始搜索"></p>
            </form>
        </div>
    </div>
</div>
    <div class="wrapper" id="wrap">
        <div class="header">
            <div class="top-nav">
    <ul>
        <li><a onclick="st(this,\'40030300\',\'news\')" href="http://news.sogou.com" uigs-id="nav_news" id="news">新闻</a></li>
        <li class="cur"><span>网页</span></li>
        <li><a onclick="st(this,\'73141200\',\'weixin\')" href="http://weixin.sogou.com/" uigs-id="nav_weixin" id="weixinch">微信</a></li>
        <li><a onclick="st(this,\'40051200\',\'zhihu\')" href="http://zhihu.sogou.com/" uigs-id="nav_zhihu" id="zhihu">知乎</a></li>
        <li><a onclick="st(this,\'40030500\',\'pic\')" href="http://pic.sogou.com" uigs-id="nav_pic" id="pic">图片</a></li>
        <li><a onclick="st(this,\'40030600\',\'video\')" href="https://v.sogou.com/" uigs-id="nav_v" id="video">视频</a></li>
        <li><a href="http://mingyi.sogou.com?fr=common_index_nav" uigs-id="nav_mingyi" id="mingyi" onclick="st(this,\'\',\'myingyi\')">明医</a></li>
        <li><a href="http://english.sogou.com?fr=pcweb_index_nav" uigs-id="nav_overseas" id="overseas" onclick="st(this,\'\',\'overseas\')" >英文</a></li>
        <li><a onclick="st(this,\'web2ww\',\'wenwen\')" href="https://wenwen.sogou.com/?ch=websearch" uigs-id="nav_wenwen" id="index_more_wenwen">问问</a></li>
        <li><a href="http://scholar.sogou.com?fr=common_index_nav" uigs-id="nav_scholar" id="scholar" onclick="st(this,\'\',\'scholar\')">学术</a></li>
        <li class="show-more">
            <a href="javascript:void(0);" id="more-product">更多<i class="m-arr"></i></a>
            <div class="pos-more" id="products-box" style="top: 40px;">
                <span class="ico-san"></span>

                <a onclick="st(this,\'40031000\')" href="http://map.sogou.com" uigs-id="nav_map" id="map">地图</a>
                <a onclick="st(this,\'40031500\')" href="http://gouwu.sogou.com/" uigs-id="nav_gouwu" id="index_more_gouwu">购物</a>
                <a onclick="st(this,\'40051203\')" href="http://baike.sogou.com/Home.v" uigs-id="nav_baike" id="index_more_baike">百科</a>
                <a onclick="st(this)" href="http://zhishi.sogou.com" uigs-id="nav_zhishi" id="index_more_zhishi">知识</a>
                <a onclick="st(this,\'40051205\')" href="http://as.sogou.com/" uigs-id="nav_app" id="index_more_appli">应用</a>
                <a onclick="st(this,\'40051205\',\'fanyi\')" href="http://fanyi.sogou.com?fr=common_index_nav_pc" uigs-id="nav_fanyi" id="index_more_fanyi">翻译</a>
                <a href="http://index.sogou.com" uigs-id="nav_index" id="index_more_index">指数</a>
                                    <a href="http://dangjian.sogou.com" uigs-id="nav_dangjian" id="dangjian" onclick="st(this,\'\',\'dangjian\')">党建</a>
                                <span class="all"><a onclick="st(this,\'40051206\')" href="http://www.sogou.com/docs/more.htm?v=1" uigs-id="nav_all" target="_blank">全部</a></span>
            </div>
        </li>
    </ul>
</div>            <div class="user-box">
    <div class="local-weather" id="local-weather">
        <div class="wea-box" id="cur-weather" style="display: none;"></div>
        <div class="pos-more" id="detail-weather" style="top:40px;"></div>
    </div>
    <span class="line" id="user-box-line" style="display: none;"></span>
    <div class="user-enter">
        <a href="javascript:void(0);" id="show-card" style="display: none" uigs-id="settings_show-card">显示卡片</a>
                    <a href="javascript:void(0);" uigs-id="settings_change-skin" id="changeSkinBtn" >换肤</a>
                <span class="s-dw">
            <a href="javascript:void(0);" id="settings">设置</a>
            <div class="pos-more" id="settings-box" style="top:40px;">
                <span class="ico-san"></span>
                <a href="javascript:void(0);" id="search-settings" uigs-id="settings_config">搜索设置</a>
                <a href="javascript:void(0);" id="advanced-search" uigs-id="settings_advanced">高级搜索</a>
                <a href="http://help.sogou.com/?w=01091500&v=1" uigs-id="settings_help">帮助</a>
            </div>
        </span>
                    <a href="javascript:void(0);" class="enter" id="loginBtn">登录</a>            </div>
</div>
        </div>
        <div class="content" id="content">
            <div class="pos-header" id="top-float-bar">
    <div class="part-one"></div>
    <div class="part-two" id="card-tab-layer">
        <div class="c-top" id="top-card-tab"></div>
    </div>
</div>
<div class="logo2" id="logo-s"><span></span></div>            <div class="logo" id="logo-l"><span></span></div>            <div class="search-box" id="search-box">
    <form action="/web" name="sf" id="sf">
        <span class="sec-input-box">
            <input type="text" class="sec-input active" name="query" id="query" maxlength="100" len="80" autocomplete="off" />
        </span>
        <span class="enter-input"><input type="submit" value="" id="stb"></span>
        <input type="hidden" name="_asf" value="www.sogou.com" />
        <input type="hidden" name="_ast" />
        <input type="hidden" name="w" value="01019900" />
        <input type="hidden" name="p" value="40040100" />
        <input type="hidden" name="ie" value="utf8" />
                <input type="hidden" name="from" value="index-nologin" />
                <input type="hidden" name="s_from" value="index" />
        <div class="keywords-tips" id="keywordsTips" style="display:none">
            <i></i><p>搜狗的查询限制在"<strong>40个汉字</strong>"以内。</p>
        </div>
    </form>
</div>
        </div>
            <div class="card-box" id="card-box" style="display: none;">
    <div class="card-box2" id="card-box2">
        <div class="c-top" id="card-tab-box">
            <a href="javascript:void(0);" id="card-settings" uigs-id="settings_settings-btn" class="shezhi"></a>
            <div class="pos-more" id="card-options">
                <span class="ico-san"></span>
                <a href="javascript:void(0);" uigs-id="settings_close-card" id="close-card">关闭卡片</a>
            </div>
        </div>
        <div class="c-main" id="card-content"></div>
    </div>
</div>
<div class="loog-more" id="scroll-more" style="display: none;">
    <a href="javascript:void(0);" uigs-id="scroll-more">滚动查看更多<br><span class="ico_san"></span></a>
</div>            <div class="ft" id="footer" style="display: none;">
    <a href="http://fuwu.sogou.com/" target="_blank" uigs-id="footer_tuiguang">企业推广</a><span class="line"></span><a href="http://corp.sogou.com/" target="_blank" uigs-id="footer_about">关于搜狗</a><span class="line"></span><a href="http://ir.sogou.com/" target="_blank" uigs-id="footer_aboutEnglish">About Sogou</a><span class="line"></span><a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" uigs-id="footer_disclaimer">免责声明</a><span class="line"></span><a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank"  uigs-id="footer_feedback">意见反馈及投诉</a><span class="line"></span><a href="http://corp.sogou.com/private.html" target="_blank" uigs-id="footer_private">隐私政策</a><br>
    &copy;&nbsp;2004-2019&nbsp;Sogou.com&nbsp;/&nbsp;<span class="g">京网文 (2016) 6432-852号</span>&nbsp;/&nbsp;<a href="http://www.miibeian.gov.cn" target="_blank" class="g">京ICP证050897号</a><br>
    <span class="g">(京)-经营性-2016-0019</span>&nbsp;/&nbsp;<a href="http://www.miibeian.gov.cn/" target="_blank" class="g">京ICP备11001839号-1</a>&nbsp;/&nbsp;<a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a>
</div>
<div class="ft-v1" id="QRcode-footer" style="padding-bottom:53px; ">
    <div class="erwm-box">
        <span class="ewm"></span>
        <div class="erwx">
            <p>搜狗搜索APP</p>
            <p class="p2">搜你所想</p>
        </div>
    </div>
    <div class="ft-info">
        <a uigs-id="mid_pinyin" href="http://pinyin.sogou.com/" target="_blank"><i class="i1"></i>搜狗输入法</a><span class="line"></span><a uigs-id="mid_liulanqi" href="http://ie.sogou.com/" target="_blank"><i class="i2"></i>浏览器</a><span class="line"></span><a uigs-id="mid_daohang" href="http://123.sogou.com/" target="_blank"><i class="i3"></i>网址导航</a><br> <a href="http://corp.sogou.com/" target="_blank" class="g">关于搜狗</a>&nbsp;-&nbsp;<a href="http://ir.sogou.com/" target="_blank" class="g">About Sogou</a>&nbsp;-&nbsp;<a href="http://fuwu.sogou.com/" target="_blank" class="g">企业推广</a>&nbsp;-&nbsp;<a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" class="g">免责声明</a>&nbsp;-&nbsp;<a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" class="g">意见反馈及投诉</a>&nbsp;-&nbsp;<a href="http://corp.sogou.com/private.html" target="_blank" class="g" uigs-id="footer_private">隐私政策</a><br>
        &copy;&nbsp;2004-2019&nbsp;Sogou.com&nbsp;/&nbsp;<span class="g">京网文 (2016) 6432-852号</span>&nbsp;/&nbsp;<span class="g">(京)-经营性-2016-0019</span><br>
        <a href="http://www.miibeian.gov.cn" target="_blank" class="g">京ICP证050897号</a>&nbsp;/&nbsp;<a href="http://www.miibeian.gov.cn/" target="_blank" class="g">京ICP备11001839号-1</a>&nbsp;/&nbsp;<a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a>
    </div>
</div>            <div class="kuozhan" id="QRcode-box" style="display: none;">
    <a href="javascript:void(0);" id="miniQRcode"></a>
    <span id="QRcode"></span>
</div>
<a href="javascript:void(0);" class="back-top" id="back-top"></a>    </div>
        <script>
    var SugPara, uigs_para,
        msBrowserName = navigator.userAgent.toLowerCase(),
        msIsSe = false,
        msIsMSearch = false,
        hasDoodle = false,
        queryinput = document.getElementById(\'query\');

    uigs_para={
        "uigs_productid": "webapp",
        "type": "webindex_new",
        "stype": "nologin",
        "scrnwi": screen.width,
        "scrnhi": screen.height,
        "uigs_pbtag": "A",
        "uigs_cookie": "SUID,sct",
                "protocol": location.protocol.toLowerCase() == "https:" ? "https" : "http"
    };

    SugPara = {"enableSug":true,"sugType":"web","domain":"w.sugg.sogou.com","productId":"web","sugFormName":"sf","inputid":"query","submitId":"stb","suggestRid":"01015002","normalRid":"01019900","useParent":0 ,"sugglocation":"index","showVr":true,"showHotwords":true,"suggAbtestObject":{"suggestHistoryStrategy1":"","suggestHistoryStrategy2":"0|1|2|3|4|5|6|7|8","suggHistoryAbtest":""}};

        
    function mk_con() {
        try {
            window.external.metasearch(\'make_connection\', \'www.google.com.hk\');
        } catch (e) {}
    }

    if (/se 2\.x/i.test(msBrowserName)) {
        msIsSe = true;
    }

    if (/metasr/i.test(msBrowserName)) {
        msIsMSearch = true;
    }

    if (queryinput) {
        if (msIsSe && msIsMSearch) {
            if (queryinput.addEventListener) {
                queryinput.addEventListener(\'keypress\', mk_con, false);
                queryinput.addEventListener(\'keydown\', mk_con, false)
            } else if (queryinput.attachEvent) {
                queryinput.attachEvent(\'onkeypress\', mk_con);
                queryinput.attachEvent(\'onkeydown\', mk_con);
            } else {
                queryinput.onkeypress = mk_con;
                queryinput.onkeydown = mk_con;
            }
        }
    }
    function getDomain(){
        var domainName = document.domain;
        if(domainName.indexOf("sogou.com")==(domainName.length-9)){
            return ".sogou.com";
        }else if(domainName.indexOf("soso.com")==(domainName.length-8)){
            return ".soso.com";
        }else if(domainName.indexOf("sogo.com") != -1){
            return ".sogo.com"
        }
    }
    window.m_s_index = function() {
        var w = document.sf.query,
                c = Math.round((new Date().getTime() + Math.random()) * 1000);

        w.focus();

        if(new RegExp("kw=([^&]+)").test(location.search)) {
            if(w.value.length == 0) {
                w.value = decodeURIComponent(RegExp.$1);
            }
        }

        if (document.cookie.indexOf("SUV=") < 0) {
            document.cookie = "SUV=" + c + ";path=/;expires=Sun, 29 July 2026 00:00:00 UTC;domain="+getDomain();
        }

                            (new Image).src = \'//pb6.sogou.com/v6\';
        
    };

    function st(self, p, product, anchor) {
        var searchBox = document.sf.query,
            query = encodeURIComponent(searchBox.value),

            productUrl = {
                "news": \'http://news.sogou.com/news?ie=utf8&query=\',
                "web": \'web?ie=utf8&query=\',
                "weixin": \'http://weixin.sogou.com/weixin?type=2&ie=utf8&query=\',
                "zhihu": \'http://zhihu.sogou.com/zhihu?ie=utf8&query=\',
                "pic": \'http://pic.sogou.com/pics?ie=utf8&query=\',
                "video": \'https://v.sogou.com/v?ie=utf8&query=\',
                "myingyi": \'https://www.sogou.com/web?m2web=mingyi.sogou.com&ie=utf8&query=\',
                "overseas": \'http://english.sogou.com?b_o_e=1&ie=utf8&fr=pcweb_index_nav&query=\',
                "scholar": \'http://scholar.sogou.com?ie=utf8&fr=common_index_nav&query=\',
                "fanyi": \'http://fanyi.sogou.com/?fr=common_index_nav_pc&ie=utf8&keyword=\',
                "wenwen":\'http://wenwen.sogou.com/s/?ch=websearch&w=\',
                "dangjian":\'http://dangjian.sogou.com/dangjian?query=\'
            },
            newHref = productUrl[product] || self.href;

        function getConnectSymbol(url) {
            return url.indexOf("?") > -1 ? \'&\' : \'?\';
        }

        if(searchBox && searchBox.value !== \'\'){

            if(productUrl[product]) {
                newHref = productUrl[product] + query;
            } else if(newHref.indexOf("kw=") > 0) {
                newHref = newHref.replace(new RegExp("kw=[^&$]*"), "kw=" + query)
            } else {
                newHref += getConnectSymbol(newHref) + \'kw=\' + query;
            }
        }

        if(p){
            newHref += getConnectSymbol(newHref) + "p=" + p;
        }

        if (anchor && anchor.length > 0){
            newHref += "#" + anchor;
        }

        if (searchBox && searchBox.value == \'\' && (product == \'wenwen\' || product == \'dangjian\')){//问问首页链接单独处理
            newHref = self.href;
        }

        self.href = newHref;
    }

    window.cid = function(o, p) {
        var w = document.sf.query,
            q = encodeURIComponent(w.value);

        if (!q) {
            o.href += "?cid=" + p
        } else {
            if (p === "web2ww") {
                o.href += "s/?cid=web2ww&w=" + q
            } else if (p === "web2bk") {
                o.href += "Search.e?sp=S" + q + "&cid=web2bk"
            }
        }
    };

    window.m_s_index();
</script>
<script src="//dlweb.sogoucdn.com/common/lib/jquery/jquery-1.11.0.min.js"></script>
<script charset="gbk" type="text/javascript" src="/js/sugg_new.v.104.js"></script>
<script src="/js/pb_v.1.9.6.min.js"></script>
<script src="/js/lib/jquery.mousewheel.min.js"></script>
<script src="/js/lib/juicer-min.js"></script>
<script src="/js/common/widget/login_new.min.v.0.5.js"></script>
<script src="//account.sogou.com/static/api/passport-async.js"></script>
<script src="/web/index/js/base.v.1.1.14.js"></script>
<script src="/web/js/voice.min.v.0.0.6.js"></script>
<script src="/web/js/taspeed.min.v.0.0.1.js"></script>
</body>
</html>
<!--zly-->
View Code

 

3.移动端数据爬取&&环境配置等

实验:参考下面的blog

https://www.cnblogs.com/bobo-zhang/p/10068994.html

- 移动端数据爬取:
    - 抓包工具:(定义:代理服务器)
        window:- fiddler,mitproxy(两者都是代理服务器)
     mac:青花瓷
- 在手机中安装证书: - 1让电脑开启一个wifi,然后手机连接wifi(手机和电脑是在同一个网段下) - 手机浏览器中:ip:8888,点击超链进行证书下载 - 需要将手机的代理开启:将代理ip和端口号设置成fiddler的端口和fidd所在机器的ip

 

(1)将证书发送给"手机"

(2)在Fiddler中,点击Tools=>Options=>

下一步,"允许"其他设备连接:=>"确定"=>OK

在浏览器中访问:http://localhost:8888/http://localhost:8888/

 

得到下面的结果

我们可以在上图的最后一行下载"证书"