爬虫实战:汽车之家配置页面 破解伪元素和混淆JS

时间:2024-01-21 09:35:38

 

 

本篇介绍如何破解汽车之家配置页面的伪元素和混淆的JS。 

 

** 温馨提示:如需转载本文,请注明内容出处。**

本文链接:https://www.cnblogs.com/grom/p/9242156.html 

(本文分多次编辑,可从原文章查看最新更新)

 

笔者爬取得网站中,印象最为深刻的就是汽车之家的网站了,也是麻烦最多的网站之一了,特点是页面大面积使用伪元素代替关键字,解析伪元素的JS进行了动态混淆,每次刷新后的JS都是不同的,页面被禁用右键菜单,无法选中或复制。

(因为破解了一周,怕分享出来后汽车之家就改了所以到现在运行了半年后才分享出来= =)

网站地址:http://car.autohome.com.cn/config/spec/25898.html

 

 

基本就是这样,如果单纯的抓取页面元素后会是这样:

 

 开始分析:

1.整个页面通过及配置数据都是直接Write出来的,因为配置项的详情在页面JS里同页面一起生成,并非通过接口。

2.配置项数据在页面上

 

(PS小妙招:将网页保存本地后发现文字依旧显示,然后大面积删除JS后刷新页面,如果文字依旧显示,继续删,直到找到加载数据的JS为止)

事后发现第一个变量keyLink是左边配置名称的超链接

第二个变量config是我们要的配置上半页(到车轮制动那),

第三个变量option是主/被动安全装备及以下,

第三四个变量color和innerColor是外观内饰颜色

 

 其他的没什么用,第五个可能是什么运动套装之类的,豪车才有,没仔细看。

3.解密JS在这里

这个JS是被混淆过的,不可以根据变量名去获取。

 

4.破解流程,拿到这个配置JSON串,然后找到解析JS,计算JS的变量得到字典集(一大串文字)和下标集(一大串数字集合)根据下标取字典里对应的文字,得到真正的数据字典,然后替换指定的伪元素。

5.解析被混淆的JS,格式化后可以得到这样的一串JS

 

提供一个完整的JS,有兴趣的小伙伴可以去研究研究

function(nv_) {
    var pk_ = function () {
        \'return pk_\';
        return \'S\';
    };

    function AH_() {
        function _A() {
            return \'UV\';
        };
        if (_A() == \'UV,\') {
            return \'AH_\';
        } else {
            return _A();
        }
    }

    function cU_() {
        \'return cU_\';
        return \'万价\';
    }
    var xN_ = \'元全准\';

    function $GetCustomStyle$() {
        var $customstyle$ = \'\';
        try {
            if (HS_GetCustomStyle) {
                $customstyle$ = HS_GetCustomStyle();
            } else {
                if (navigator.userAgent.indexOf(\'Windows NT 5\') != -1) {
                    $customstyle$ = \'margin-bottom:-4.8px;\';
                } else {
                    $customstyle$ = \'margin-bottom:-5px;\';
                }
            }
        } catch (e) { }
        return $customstyle$;
    }
    var Qz_ = \'前力功\';
    var rC_ = function () {
        \'rC_\';
        var _r = function () {
            return \'动助华\';
        };
        return _r();
    };
    var cO_ = function () {
        \'cO_\';
        var _c = function () {
            return \'\';
        };
        return _c();
    };

    function ts_() {
        \'return ts_\';
        return \'号合\';
    }
    var vO_ = function (vO__) {
        var _v = function (vO__) {
            \'return vO_\';
            return vO__;
        };
        return _v(vO__);
    };
    var zS_ = \'\';

    function Gm_() {
        function _G() {
            return \'Gm_\';
        };
        if (_G() == \'Gm__\') {
            return _G();
        } else {
            return \'器国\';
        }
    }

    function Fo_() {
        function _F() {
            return \'\';
        };
        if (_F() == \'\') {
            return \'\';
        } else {
            return _F();
        }
    }
    var wo_ = function (wo__) {
        var _w = function (wo__) {
            \'return wo_\';
            return wo__;
        };
        return _w(wo__);
    };
    var zk_ = function (zk__) {
        var _z = function (zk__) {
            \'return zk_\';
            return zk__;
        };
        return _z(zk__);
    };

    function WT_() {
        function _W() {
            return \'子实容\';
        };
        if (_W() == \'子实容\') {
            return \'子实容\';
        } else {
            return _W();
        }
    }
    var Ma_ = \'\';
    var vk_ = function () {
        \'vk_\';
        var _v = function () {
            return \'寸导小\';
        };
        return _v();
    };
    var zl_ = \'度式弗\';
    var ZS_ = function () {
        \'ZS_\';
        var _Z = function () {
            return \'\';
        };
        return _Z();
    };

    function Wh_() {
        \'return Wh_\';
        return \'\';
    }

    function fG_() {
        function _f() {
            return \'\';
        };
        if (_f() == \'\') {
            return \'\';
        } else {
            return _f();
        }
    }

    function $GetClassName$($index$) {
        return \'.hs_kw\' + $index$ + \'_configMd\';
    }

    function $RuleCalss1$() {
        return \'::before {content:\'
    }

    function kE_() {
        function _k() {
            return \'\';
        };
        if (_k() == \'\') {
            return \'\';
        } else {
            return _k();
        }
    }

    function wp_() {
        \'return wp_\';
        return \'\';
    }
    var yW_ = \'\';

    function bc_() {
        \'return bc_\';
        return \'\';
    }

    function tk_() {
        function _t() {
            return \'tk__\';
        };
        if (_t() == \'tk__\') {
            return \'\';
        } else {
            return _t();
        }
    }
    var Yp_ = function () {
        \'return Yp_\';
        return \'\';
    };

    function pR_() {
        function _p() {
            return \'pR__\';
        };
        if (_p() == \'pR__\') {
            return \'\';
        } else {
            return _p();
        }
    }

    function BS_() {
        function _B() {
            return \'\';
        };
        if (_B() == \'\') {
            return \'\';
        } else {
            return _B();
        }
    }
    var Bi_ = \'\';
    var fQ_ = \'\';

    function $GetWindow$() {
        return this[\'\' + YE_() + (function (MR__) {
            \'return MR_\';
            return MR__;
        })(\'in\') + zh_()];
    }
    var Bh_ = function () {
        \'Bh_\';
        var _B = function () {
            return \'\';
        };
        return _B();
    };
    var JW_ = function () {
        \'return JW_\';
        return \'\';
    };

    function wd_() {
        function _w() {
            return \'wd__\';
        };
        if (_w() == \'wd__\') {
            return \'\';
        } else {
            return _w();
        }
    }

    function UX_() {
        function _U() {
            return \'UX__\';
        };
        if (_U() == \'UX__\') {
            return \'\';
        } else {
            return _U();
        }
    }

    function QU_() {
        function _Q() {
            return \'气油\';
        };
        if (_Q() == \'气油,\') {
            return \'QU_\';
        } else {
            return _Q();
        }
    }
    var Ed_ = function () {
        \'return Ed_\';
        return \'\';
    };

    function cZ_() {
        \'return cZ_\';
        return \'海液\';
    }
    var UZ_ = function (UZ__) {
        var _U = function (UZ__) {
            \'return UZ_\';
            return UZ__;
        };
        return _U(UZ__);
    };
    var vI_ = function () {
        \'return vI_\';
        return \'\';
    };
    var EI_ = function () {
        \'EI_\';
        var _E = function () {
            return \'版独率\';
        };
        return _E();
    };

    function DT_() {
        function _D() {
            return \'\';
        };
        if (_D() == \'\') {
            return \'\';
        } else {
            return _D();
        }
    }
    var JI_ = function (JI__) {
        var _J = function (JI__) {
            \'return JI_\';
            return JI__;
        };
        return _J(JI__);
    };

    function $Split$($item$, $index$) {
        if ($item$) {
            return $item$[\'\' + jn_() + Dg_() + iu_()]($index$);
        } else {
            return \'\';
        }
    }

    function YY_() {
        \'return YY_\';
        return \'\';
    }

    function hb_() {
        function _h() {
            return \'称程立\';
        };
        if (_h() == \'称程立\') {
            return \'称程立\';
        } else {
            return _h();
        }
    }
    var DC_ = function () {
        \'return DC_\';
        return \'\';
    };
    var ec_ = function () {
        \'return ec_\';
        return \'\';
    };
    var $ruleDict$ = \'\';
    var $rulePosList$ = \'\';
    var Wr_ = function () {
        \'Wr_\';
        var _W = function () {
            return \'\';
        };
        return _W();
    };

    function zq_() {
        function _z() {
            return \'zq_\';
        };
        if (_z() == \'zq__\') {
            return _z();
        } else {
            return \'胎自\';
        }
    }
    var YS_ = function (YS__) {
        \'return YS_\';
        return YS__;
    };
    var Hj_ = \'距车转\';

    function Du_() {
        function _D() {
            return \'\';
        };
        if (_D() == \'\') {
            return \'\';
        } else {
            return _D();
        }
    }
    var cQ_ = function () {
        \'return cQ_\';
        return \'轴载进\';
    };

    function WM_() {
        \'return WM_\';
        return \'\';
    }

    function yQ_() {
        \'return yQ_\';
        return \'\';
    }
    var uC_ = function () {
        \'return uC_\';
        return \'配量铝\';
    };
    var lz_ = function (lz__) {
        var _l = function (lz__) {
            \'return lz_\';
            return lz__;
        };
        return _l(lz__);
    };
    var Te_ = \'间隙风\';
    var Ph_ = function () {
        \'Ph_\';
        var _P = function () {
            return \'\';
        };
        return _P();
    };

    function UO_() {
        function _U() {
            return \'驱驻\';
        };
        if (_U() == \'驱驻,\') {
            return \'UO_\';
        } else {
            return _U();
        }
    }

    function Iw_() {
        \'return Iw_\';
        return \'高麦\';
    }
    var KE_ = \'7;107;3\';

    function HA_() {
        function _H() {
            return \';9\';
        };
        if (_H() == \';9,\') {
            return \'HA_\';
        } else {
            return _H();
        }
    }

    function PI_() {
        function _P() {
            return \'PI_\';
        };
        if (_P() == \'PI__\') {
            return _P();
        } else {
            return \'5;70\';
        }
    }

    function yr_() {
        \'return yr_\';
        return \'82,29\';
    }
    var mK_ = function () {
        \'return mK_\';
        return \'1\';
    };
    var Ff_ = \'16,117;\';

    function $Innerhtml$($item$, $index$) {
        var $tempArray$ = $GetElementsByCss$($GetClassName$($item$));
        for (x in $tempArray$) {
            $tempArray$[x].innerHTML = $index$;
            try {
                $tempArray$[x].currentStyle = \'\';
            } catch (e) { }
        }
    }

    function vs_() {
        function _v() {
            return \'vs_\';
        };
        if (_v() == \'vs__\') {
            return _v();
        } else {
            return \'5,31\';
        }
    }
    var Ds_ = \';102,11\';

    function DV_() {
        function _D() {
            return \'0;42,\';
        };
        if (_D() == \'0;42,\') {
            return \'0;42,\';
        } else {
            return _D();
        }
    }

    function lU_() {
        function _l() {
            return \'49;57,3\';
        };
        if (_l() == \'49;57,3\') {
            return \'49;57,3\';
        } else {
            return _l();
        }
    }
    var yc_ = function (yc__) {
        \'return yc_\';
        return yc__;
    };

    function lf_() {
        function _l() {
            return \'66,\';
        };
        if (_l() == \'66,\') {
            return \'66,\';
        } else {
            return _l();
        }
    }
    var IN_ = function () {
        \'return IN_\';
        return \'115\';
    };

    function Fb_() {
        function _F() {
            return \'Fb__\';
        };
        if (_F() == \'Fb__\') {
            return \',54;1\';
        } else {
            return _F();
        }
    }

    function $InsertRule$($index$, $item$) {
        $sheet$[\'\' + Mn_() + BP_ + Ni_() + FS_() + qg_() + KK_() + (function (cT__) {
            \'return cT_\';
            return cT__;
        })(\'e\')]($GetClassName$($index$) + $RuleCalss1$() + \'"\' + $item$ + \'" }\', 0);
        var $tempArray$ = $GetElementsByCss$($GetClassName$($index$));
        for (x in $tempArray$) {
            try {
                $tempArray$[x].currentStyle = \'\';
            } catch (e) { }
        }
    }
    var GE_ = function () {
        \'GE_\';
        var _G = function () {
            return \'01,11\';
        };
        return _G();
    };

    function Xq_() {
        function _X() {
            return \'5\';
        };
        if (_X() == \'5\') {
            return \'5\';
        } else {
            return _X();
        }
    }
    var UE_ = function () {
        \'return UE_\';
        return \',54;7\';
    };
    var Xv_ = function () {
        \'return Xv_\';
        return \'4\';
    };
    var wv_ = \';40\';

    function Kb_() {
        function _K() {
            return \',3\';
        };
        if (_K() == \',3,\') {
            return \'Kb_\';
        } else {
            return _K();
        }
    }
    var Ej_ = \'0,0,1\';

    function Xm_() {
        function _X() {
            return \'Xm_\';
        };
        if (_X() == \'Xm__\') {
            return _X();
        } else {
            return \';1\';
        }
    }

    function NT_() {
        \'return NT_\';
        return \'21,101\';
    }

    function rN_() {
        \'return rN_\';
        return \';\';
    }
    var Fc_ = function () {
        \'Fc_\';
        var _F = function () {
            return \'7,60;\';
        };
        return _F();
    };

    function $ChartAt$($item$) {
        return $ruleDict$[\'\' + (function () {
            \'return Sm_\';
            return \'c\'
        })() + aT_() + wF_()](parseInt($item$));
    }

    function vC_() {
        \'return vC_\';
        return \'98;53\';
    }
    var iB_ = function () {
        \'iB_\';
        var _i = function () {
            return \',\';
        };
        return _i();
    };

    function sn_() {
        \'return sn_\';
        return \'11\';
    }

    function ZU_() {
        function _Z() {
            return \'ZU_\';
        };
        if (_Z() == \'ZU__\') {
            return _Z();
        } else {
            return \'2;51\';
        }
    }

    function lM_() {
        \'return lM_\';
        return \',105,\';
    }

    function CF_() {
        function _C() {
            return \'44;67,9\';
        };
        if (_C() == \'44;67,9\') {
            return \'44;67,9\';
        } else {
            return _C();
        }
    }

    function Ri_() {
        \'return Ri_\';
        return \'2;6,67\';
    }

    function Ye_() {
        function _Y() {
            return \'Ye_\';
        };
        if (_Y() == \'Ye__\') {
            return _Y();
        } else {
            return \';111\';
        }
    }

    function HB_() {
        \'return HB_\';
        return \',66;1\';
    }

    function EW_() {
        \'return EW_\';
        return \'3,10\';
    }
    var cW_ = function () {
        \'return cW_\';
        return \'3\';
    };

    function $GetDefaultView$() {
        return nv_[\'\' + Tb_() + Vo_() + \'au\' + FI_() + ak_() + (function () {
            \'return Ya_\';
            return \'Vie\'
        })() + (function () {
            \'return Ki_\';
            return \'w\'
        })()];
    }

    function Yf_() {
        \'return Yf_\';
        return \',100;37\';
    }
    var oh_ = function (oh__) {
        var _o = function (oh__) {
            \'return oh_\';
            return oh__;
        };
        return _o(oh__);
    };
    var Jn_ = \'3\';

    function tl_() {
        function _t() {
            return \';48,\';
        };
        if (_t() == \';48,,\') {
            return \'tl_\';
        } else {
            return _t();
        }
    }
    var xY_ = function () {
        \'return xY_\';
        return \'15;88,2\';
    };
    var AD_ = function () {
        \'AD_\';
        var _A = function () {
            return \'1;4\';
        };
        return _A();
    };
    var iX_ = function (iX__) {
        var _i = function (iX__) {
            \'return iX_\';
            return iX__;
        };
        return _i(iX__);
    };
    var Cy_ = function () {
        \'Cy_\';
        var _C = function () {
            return \';90,79;\';
        };
        return _C();
    };

    function CV_() {
        \'return CV_\';
        return \'1,10;94\';
    }

    function Xx_() {
        function _X() {
            return \'Xx__\';
        };
        if (_X() == \'Xx__\') {
            return \',\';
        } else {
            return _X();
        }
    }
    var QW_ = function () {
        \'QW_\';
        var _Q = function () {
            return \'7\';
        };
        return _Q();
    };

    function Vh_() {
        function _V() {
            return \'Vh__\';
        };
        if (_V() == \'Vh__\') {
            return \'2\';
        } else {
            return _V();
        }
    }

    function Bw_() {
        \'return Bw_\';
        return \';13,1\';
    }
    var Vs_ = \'2,1\';
    var Sq_ = \'6\';

    function ed_() {
        function _e() {
            return \',27;1\';
        };
        if (_e() == \',27;1\') {
            return \',27;1\';
        } else {
            return _e();
        }
    }

    function Tn_() {
        function _T() {
            return \'Tn_\';
        };
        if (_T() == \'Tn__\') {
            return _T();
        } else {
            return \'23,45,\';
        }
    }

    function pr_() {
        function _p() {
            return \'pr__\';
        };
        if (_p() == \'pr__\') {
            return \'8\';
        } else {
            return _p();
        }
    }
    var aZ_ = function () {
        \'return aZ_\';
        return \';31,9\';
    };
    var CL_ = \'116\';

    function fk_() {
        function _f() {
            return \'fk__\';
        };
        if (_f() == \'fk__\') {
            return \';78\';
        } else {
            return _f();
        }
    }
    var pz_ = function (pz__) {
        \'return pz_\';
        return pz__;
    };

    function bC_() {
        function _b() {
            return \'bC__\';
        };
        if (_b() == \'bC__\') {
            return \'5\';
        } else {
            return _b();
        }
    }

    function $ResetSystemFun$() {
        if ($GetWindow$()[\'\' + iH_() + Ct_() + Ap_() + XV_() + GP_() + BJ_() + fB_() + iz_()] != undefined) {
            if (window.hs_fuckyou == undefined) {
                window.hs_fuckyou = $GetWindow$()[\'\' + iH_() + Ct_() + Ap_() + XV_() + GP_() + BJ_() + fB_() + iz_()];
            }
        }
        if ($GetDefaultView$()) {
            if ($GetDefaultView$()[\'\' + iH_() + Ct_() + Ap_() + XV_() + GP_() + BJ_() + fB_() + iz_()] != undefined) {
                if (window.hs_fuckyou_dd == undefined) {
                    window.hs_fuckyou_dd = $GetDefaultView$()[\'\' + iH_() + Ct_() + Ap_() + XV_() + GP_() + BJ_() + fB_() + iz_()];
                }
            }
        }
    }
    var YD_ = function () {
        \'return YD_\';
        return \'8,64;15\';
    };
    var Dl_ = \',76;5\';

    function $InsertRuleRun$() {
        for ($index$ = 0; $index$ < $rulePosList$.length; $index$++) {
            var $tempArray$ = $Split$($rulePosList$[$index$], \',\');
            var $temp$ = \'\';
            for ($itemIndex$ = 0; $itemIndex$ < $tempArray$.length; $itemIndex$++) {
                $temp$ += $ChartAt$($tempArray$[$itemIndex$]) + \'\';
            }
            $InsertRule$($index$, $temp$);
        }
    }
    var dl_ = function (dl__) {
        var _d = function (dl__) {
            \'return dl_\';
            return dl__;
        };
        return _d(dl__);
    };

    function jK_() {
        function _j() {
            return \'jK__\';
        };
        if (_j() == \'jK__\') {
            return \'3,91;32\';
        } else {
            return _j();
        }
    }

    function fI_() {
        function _f() {
            return \',71;\';
        };
        if (_f() == \',71;,\') {
            return \'fI_\';
        } else {
            return _f();
        }
    }

    function Wm_() {
        function _W() {
            return \'24,\';
        };
        if (_W() == \'24,\') {
            return \'24,\';
        } else {
            return _W();
        }
    }
    var CP_ = function () {
        \'return CP_\';
        return \'6\';
    };
    var Ga_ = function (Ga__) {
        var _G = function (Ga__) {
            \'return Ga_\';
            return Ga__;
        };
        return _G(Ga__);
    };

    function pT_() {
        \'return pT_\';
        return \';12\';
    }

    function Ae_() {
        function _A() {
            return \'2,43;\';
        };
        if (_A() == \'2,43;\') {
            return \'2,43;\';
        } else {
            return _A();
        }
    }
    var Ry_ = function () {
        \'Ry_\';
        var _R = function () {
            return \'1\';
        };
        return _R();
    };
    var rM_ = \'23,103,\';

    function XI_() {
        function _X() {
            return \'XI_\';
        };
        if (_X() == \'XI__\') {
            return _X();
        } else {
            return \'93;9\';
        }
    }
    var gk_ = \'7,6\';

    function oQ_() {
        function _o() {
            return \'2;4;1\';
        };
        if (_o() == \'2;4;1\') {
            return \'2;4;1\';
        } else {
            return _o();
        }
    }

    function kp_() {
        \'return kp_\';
        return \'04\';
    }

    function NC_() {
        function _N() {
            return \'100;28\';
        };
        if (_N() == \'100;28,\') {
            return \'NC_\';
        } else {
            return _N();
        }
    }

    function NP_() {
        function _N() {
            return \'NP_\';
        };
        if (_N() == \'NP__\') {
            return _N();
        } else {
            return \';52;\';
        }
    }
    var sT_ = \'50,14,6\';

    function ux_() {
        function _u() {
            return \'ux__\';
        };
        if (_u() == \'ux__\') {
            return \'3;50,81\';
        } else {
            return _u();
        }
    }

    function hT_() {
        function _h() {
            return \'hT__\';
        };
        if (_h() == \'hT__\') {
            return \';\';
        } else {
            return _h();
        }
    }

    function tL_() {
        \'return tL_\';
        return \'90,5;\';
    }
    var sX_ = \'114,4\';

    function qx_() {
        \'return qx_\';
        return \'14;78,\';
    }
    var kS_ = function () {
        \'return kS_\';
        return \'26;96,8\';
    };
    var OC_ = function (OC__) {
        \'return OC_\';
        return OC__;
    };
    var eT_ = function (eT__) {
        var _e = function (eT__) {
            \'return eT_\';
            return eT__;
        };
        return _e(eT__);
    };

    function yV_() {
        \'return yV_\';
        return \'8;90,\';
    }

    function $GetLocationURL$() {
        return $GetWindow$()[\'\' + Kp_() + Ka_() + Lw_][\'\' + rI_() + hw_() + MU_(\'f\')];
    }

    function Ra_() {
        function _R() {
            return \'Ra__\';
        };
        if (_R() == \'Ra__\') {
            return \'46;25\';
        } else {
            return _R();
        }
    }

    function Hh_() {
        \'return Hh_\';
        return \';18\';
    }

    function $SystemFunction1$($item$) {
        $ResetSystemFun$();
        if ($GetWindow$()[\'\' + iH_() + Ct_() + Ap_() + XV_() + GP_() + BJ_() + fB_() + iz_()] != undefined) {
            $GetWindow$()[\'\' + iH_() + Ct_() + Ap_() + XV_() + GP_() + BJ_() + fB_() + iz_()] = function (element, pseudoElt) {
                if (pseudoElt != undefined && typeof (pseudoElt) == \'string\' && pseudoElt.toLowerCase().indexOf(\':before\') > -1) {
                    var obj = {};
                    obj.getPropertyValue = function (x) {
                        return x;
                    };
                    return obj;
                } else {
                    return window.hs_fuckyou(element, pseudoElt);
                }
            };
        }
        return $item$;
    }

    function Wc_() {
        function _W() {
            return \';\';
        };
        if (_W() == \';\') {
            return \';\';
        } else {
            return _W();
        }
    }
    var $imgPosList$ = \'\';
    var Rd_ = function () {
        \'Rd_\';
        var _R = function () {
            return \'75,86;7\';
        };
        return _R();
    };
    var uZ_ = function () {
        \'uZ_\';
        var _u = function () {
            return \'3\';
        };
        return _u();
    };

    function nn_() {
        function _n() {
            return \',67;9\';
        };
        if (_n() == \',67;9\') {
            return \',67;9\';
        } else {
            return _n();
        }
    }

    function Kj_() {
        function _K() {
            return \'Kj__\';
        };
        if (_K() == \'Kj__\') {
            return \',41,3\';
        } else {
            return _K();
        }
    }
    var Zk_ = \'8;36,\';

    function JK_() {
        function _J() {
            return \'83;35,6\';
        };
        if (_J() == \'83;35,6\') {
            return \'83;35,6\';
        } else {
            return _J();
        }
    }
    var Zn_ = function (Zn__) {
        var _Z = function (Zn__) {
            \'return Zn_\';
            return Zn__;
        };
        return _Z(Zn__);
    };

    function hV_() {
        function _h() {
            return \'hV_\';
        };
        if (_h() == \'hV__\') {
            return _h();
        } else {
            return \',93;\';
        }
    }
    var JL_ = \'58,\';

    function $SuperInsertRule$() {
        if ($sheet$ !== undefined && $sheet$[\'\' + Mn_() + BP_ + Ni_() + FS_() + qg_() + KK_() + (function (cT__) {
            \'return cT_\';
            return cT__;
        })(\'e\')]) {
            return true;
        } else {
            return false;
        }
    }
    var UA_ = function () {
        \'UA_\';
        var _U = function () {
            return \'59;106,\';
        };
        return _U();
    };
    var bQ_ = \'6\';
    var zR_ = function () {
        \'zR_\';
        var _z = function () {
            return \'6\';
        };
        return _z();
    };
    var JD_ = function (JD__) {
        var _J = function (JD__) {
            \'return JD_\';
            return JD__;
        };
        return _J(JD__);
    };

    function gs_() {
        function _g() {
            return \'gs_\';
        };
        if (_g() == \'gs__\') {
            return _g();
        } else {
            return \'7;66,9\';
        }
    }

    function pf_() {
        function _p() {
            return \'0;\';
        };
        if (_p() == \'0;,\') {
            return \'pf_\';
        } else {
            return _p();
        }
    }
    var Hz_ = function (Hz__) {
        \'return Hz_\';
        return Hz__;
    };

    function Ix_() {
        \'return Ix_\';
        return \'20;\';
    }
    var fV_ = function () {
        \'return fV_\';
        return \'6\';
    };

    function xQ_() {
        function _x() {
            return \'xQ_\';
        };
        if (_x() == \'xQ__\') {
            return _x();
        } else {
            return \'9,119;\';
        }
    }

    function CE_() {
        function _C() {
            return \'CE__\';
        };
        if (_C() == \'CE__\') {
            return \'2\';
        } else {
            return _C();
        }
    }
    var fN_ = \'3,12,16\';

    function DG_() {
        function _D() {
            return \',27\';
        };
        if (_D() == \',27\') {
            return \',27\';
        } else {
            return _D();
        }
    }

    function JZ_() {
        \'return JZ_\';
        return \';19,\';
    }

    function uk_() {
        function _u() {
            return \'89,65;1\';
        };
        if (_u() == \'89,65;1\') {
            return \'89,65;1\';
        } else {
            return _u();
        }
    }
    var jW_ = function () {
        \'return jW_\';
        return \'09,11\';
    };
    var Hu_ = function () {
        \'Hu_\';
        var _H = function () {
            return \'8;23,10\';
        };
        return _H();
    };

    function Jw_() {
        function _J() {
            return \'Jw_\';
        };
        if (_J() == \'Jw__\') {
            return _J();
        } else {
            return \'3,\';
        }
    }
    var nP_ = \'1\';
    var ZL_ = \'00;20;3\';
    var Dw_ = function () {
        \'return Dw_\';
        return \'9\';
    };

    function iH_() {
        \'return iH_\';
        return \'get\';
    }

    function Ct_() {
        function _C() {
            return \'Co\';
        };
        if (_C() == \'Co,\') {
            return \'Ct_\';
        } else {
            return _C();
        }
    }

    function Ap_() {
        function _A() {
            return \'Ap__\';
        };
        if (_A() == \'Ap__\') {
            return \'m\';
        } else {
            return _A();
        }
    }
    var XV_ = function () {
        \'return XV_\';
        return \'put\';
    };

    function GP_() {
        \'return GP_\';
        return \'edS\';
    }
    var BJ_ = function () {
        \'BJ_\';
        var _B = function () {
            return \'t\';
        };
        return _B();
    };
    var fB_ = function () {
        \'return fB_\';
        return \'y\';
    };

    function iz_() {
        function _i() {
            return \'le\';
        };
        if (_i() == \'le,\') {
            return \'iz_\';
        } else {
            return _i();
        }
    }

    function Mn_() {
        function _M() {
            return \'i\';
        };
        if (_M() == \'i\') {
            return \'i\';
        } else {
            return _M();
        }
    }
    var BP_ = \'nse\';
    var Ni_ = function () {
        \'Ni_\';
        var _N = function () {
            return \'r\';
        };
        return _N();
    };

    function FS_() {
        \'return FS_\';
        return \'t\';
    }
    var qg_ = function () {
        \'qg_\';
        var _q = function () {
            return \'R\';
        };
        return _q();
    };

    function KK_() {
        \'return KK_\';
        return \'ul\';
    }

    function YE_() {
        \'return YE_\';
        return \'w\';
    }

    function zh_() {
        function _z() {
            return \'zh__\';
        };
        if (_z() == \'zh__\') {
            return \'dow\';
        } else {
            return _z();
        }
    }
    var Tb_ = function () {
        \'Tb_\';
        var _T = function () {
            return \'d\';
        };
        return _T();
    };

    function Vo_() {
        function _V() {
            return \'Vo_\';
        };
        if (_V() == \'Vo__\') {
            return _V();
        } else {
            return \'ef\';
        }
    }
    var FI_ = function () {
        \'FI_\';
        var _F = function () {
            return \'l\';
        };
        return _F();
    };

    function ak_() {
        function _a() {
            return \'t\';
        };
        if (_a() == \'t\') {
            return \'t\';
        } else {
            return _a();
        }
    }

    function $SystemFunction2$($item$) {
        $ResetSystemFun$();
        if ($GetDefaultView$()) {
            if ($GetDefaultView$()[\'\' + iH_() + Ct_() + Ap_() + XV_() + GP_() + BJ_() + fB_() + iz_()] != undefined) {
                $GetDefaultView$()[\'\' + iH_() + Ct_() + Ap_() + XV_() + GP_() + BJ_() + fB_() + iz_()] = function (element, pseudoElt) {
                    if (pseudoElt != undefined && typeof (pseudoElt) == \'string\' && pseudoElt.toLowerCase().indexOf(\':before\') > -1) {
                        var obj = {};
                        obj.getPropertyValue = function (x) {
                            return x;
                        };
                        return obj;
                    } else {
                        return window.hs_fuckyou_dd(element, pseudoElt);
                    }
                };
            }
        }
        return $item$;
    }

    function $FillDicData$() {
        $ruleDict$ = $GetWindow$()[\'\' + ht_() + Sc_() + (function () {
            \'return vW_\';
            return \'e\'
        })() + (function () {
            \'return FC_\';
            return \'URI\'
        })() + UU_ + gA_ + Qg_() + Ec_ + ZP_()](\'\' + pk_() + AH_() + cU_() + (function () {
            \'return KF_\';
            return \'体供保\'
        })() + xN_ + \'列制\' + Qz_ + rC_() + cO_() + ts_() + \'名后\' + vO_(\'吸商\') + zS_ + Gm_() + Fo_() + wo_(\'型备\') + zk_(\'多大\') + WT_() + Ma_ + vk_() + \'尺年\' + zl_ + ZS_() + Wh_() + fG_() + kE_() + wp_() + yW_ + bc_() + tk_() + Yp_() + pR_() + (function () {
            \'return KX_\';
            return \'\'
        })() + BS_() + (function () {
            \'return Ty_\';
            return \'\'
        })() + Bi_ + fQ_ + Bh_() + JW_() + wd_() + UX_() + (function () {
            \'return PM_\';
            return \'\'
        })() + QU_() + Ed_() + cZ_() + UZ_(\'点然\') + vI_() + EI_() + (function (eL__) {
            \'return eL_\';
            return eL__;
        })(\'环电\') + DT_() + JI_(\'盘矩\') + (function () {
            \'return ez_\';
            return \'\'
        })() + YY_() + hb_() + DC_() + ec_() + Wr_() + (function () {
            \'return xW_\';
            return \'\'
        })() + (function (gW__) {
            \'return gW_\';
            return gW__;
        })(\'置耗\') + zq_() + YS_(\'舒行\') + (function () {
            \'return BN_\';
            return \'规豪质\'
        })() + Hj_ + Du_() + cQ_() + WM_() + (function (yl__) {
            \'return yl_\';
            return yl__;
            })(\'逊通\') + yQ_() + uC_() + lz_(\'长门\') + Te_ + Ph_() + UO_() + Iw_() + $SystemFunction1$(\'\'));

        $rulePosList$ = $Split$(($SystemFunction1$(\'\') + \'\' + (function () {
            \'return Xs_\';
            return \'77,\'
        })() + KE_ + HA_() + PI_() + (function (vJ__) {
            \'return vJ_\';
            return vJ__;
        })(\',19;\') + yr_() + (function () {
            \'return Uj_\';
            return \',\'
        })() + mK_() + Ff_ + (function () {
            \'return lX_\';
            return \'67,87;5\'
        })() + vs_() + Ds_ + DV_() + lU_() + yc_(\'3;\') + lf_() + IN_() + Fb_() + GE_() + Xq_() + UE_() + Xv_() + wv_ + Kb_() + Ej_ + \',2\' + Xm_() + NT_() + rN_() + (function (qt__) {
            \'return qt_\';
            return qt__;
        })(\'23,4\') + Fc_() + vC_() + iB_() + sn_() + ZU_() + lM_() + CF_() + Ri_() + Ye_() + HB_() + EW_() + cW_() + Yf_() + oh_(\',4\') + Jn_ + tl_() + xY_() + AD_() + iX_(\'7,60\') + Cy_() + (function () {
            \'return zK_\';
            return \'6\'
        })() + CV_() + Xx_() + QW_() + Vh_() + Bw_() + Vs_ + Sq_ + ed_() + Tn_() + \'10\' + pr_() + aZ_() + (function () {
            \'return VT_\';
            return \'3;56,\'
        })() + CL_ + fk_() + pz_(\',34;\') + bC_() + YD_() + Dl_ + dl_(\'3,11;1\') + jK_() + fI_() + Wm_() + CP_() + Ga_(\'6;13\') + \',47,60\' + pT_() + Ae_() + Ry_() + (function () {
            \'return VR_\';
            return \'1\'
        })() + (function (YX__) {
            \'return YX_\';
            return YX__;
        })(\'3;\') + rM_ + XI_() + gk_ + oQ_() + kp_() + (function () {
            \'return eq_\';
            return \',\'
        })() + NC_() + NP_() + sT_ + ux_() + hT_() + tL_() + sX_ + (function () {
            \'return FK_\';
            return \'3;17,\'
        })() + qx_() + kS_() + OC_(\'5;80\') + eT_(\',44;\') + yV_() + Ra_() + Hh_() + Wc_() + Rd_() + uZ_() + nn_() + Kj_() + Zk_ + JK_() + Zn_(\'8;\') + (function (GM__) {
            \'return GM_\';
            return GM__;
        })(\'13,103\') + hV_() + JL_ + UA_() + bQ_ + zR_() + JD_(\';22,84\') + (function (Wf__) {
            \'return Wf_\';
            return Wf__;
        })(\';99,\') + gs_() + pf_() + (function () {
            \'return Ia_\';
            return \'99,112;\'
        })() + Hz_(\'13,1\') + Ix_() + fV_() + xQ_() + CE_() + fN_ + DG_() + JZ_() + uk_() + jW_() + Hu_() + Jw_() + nP_ + ZL_ + Dw_()), $SystemFunction2$(\';\'));
        $imgPosList$ = $Split$((\'##imgPosList_jsFuns##\' + $SystemFunction2$(\';\')), $SystemFunction1$(\';\'));
        $RenderToHTML$();
        return \';\';
    }

    function $GetElementsByCss$($item$) {
        return document.querySelectorAll($item$);
    }

    function Rm_() {
        function _R() {
            return \'g\';
        };
        if (_R() == \'g\') {
            return \'g\';
        } else {
            return _R();
        }
    }
    var sf_ = function () {
        \'sf_\';
        var _s = function () {
            return \'e\';
        };
        return _s();
    };
    var kJ_ = function () {
        \'kJ_\';
        var _k = function () {
            return \'P\';
        };
        return _k();
    };
    var VZ_ = function (VZ__) {
        \'return VZ_\';
        return VZ__;
    };

    function Bf_() {
        function _B() {
            return \'Bf__\';
        };
        if (_B() == \'Bf__\') {
            return \'p\';
        } else {
            return _B();
        }
    }
    var UF_ = function () {
        \'UF_\';
        var _U = function () {
            return \'e\';
        };
        return _U();
    };
    var pB_ = function () {
        \'return pB_\';
        return \'r\';
    };

    function ry_() {
        function _r() {
            return \'ry_\';
        };
        if (_r() == \'ry__\') {
            return _r();
        } else {
            return \'Va\';
        }
    }

    function XP_() {
        function _X() {
            return \'XP__\';
        };
        if (_X() == \'XP__\') {
            return \'l\';
        } else {
            return _X();
        }
    }
    var Yy_ = function () {
        \'return Yy_\';
        return \'u\';
    };
    var ue_ = function () {
        \'ue_\';
        var _u = function () {
            return \'e\';
        };
        return _u();
    };
    var Kp_ = function () {
        \'Kp_\';
        var _K = function () {
            return \'loc\';
        };
        return _K();
    };

    function Ka_() {
        function _K() {
            return \'Ka__\';
        };
        if (_K() == \'Ka__\') {
            return \'ati\';
        } else {
            return _K();
        }
    }
    var Lw_ = \'on\';
    var rI_ = function () {
        \'return rI_\';
        return \'h\';
    };

    function hw_() {
        function _h() {
            return \'hw_\';
        };
        if (_h() == \'hw__\') {
            return _h();
        } else {
            return \'re\';
        }
    }
    var MU_ = function (MU__) {
        \'return MU_\';
        return MU__;
    };

    function jn_() {
        \'return jn_\';
        return \'s\';
    }

    function Dg_() {
        function _D() {
            return \'Dg__\';
        };
        if (_D() == \'Dg__\') {
            return \'pli\';
        } else {
            return _D();
        }
    }
    var iu_ = function () {
        \'iu_\';
        var _i = function () {
            return \'t\';
        };
        return _i();
    };
    var $style$ = nv_.createElement(\'style\');
    if (nv_.head) {
        nv_.head.appendChild($style$);
    } else {
        nv_.getElementsByTagName(\'head\')[0].appendChild($style$);
    }
    var $sheet$ = $style$.sheet;

    function ht_() {
        function _h() {
            return \'ht_\';
        };
        if (_h() == \'ht__\') {
            return _h();
        } else {
            return \'de\';
        }
    }

    function Sc_() {
        \'return Sc_\';
        return \'cod\';
    }

    function $RenderToHTML$() {
        $InsertRuleRun$();
    }
    var UU_ = \'C\';
    var gA_ = \'o\';

    function Qg_() {
        function _Q() {
            return \'mpo\';
        };
        if (_Q() == \'mpo\') {
            return \'mpo\';
        } else {
            return _Q();
        }
    }
    var Ec_ = \'nen\';
    var ZP_ = function () {
        \'ZP_\';
        var _Z = function () {
            return \'t\';
        };
        return _Z();
    };

    function aT_() {
        function _a() {
            return \'aT__\';
        };
        if (_a() == \'aT__\') {
            return \'har\';
        } else {
            return _a();
        }
    }

    function wF_() {
        function _w() {
            return \'At\';
        };
        if (_w() == \'At,\') {
            return \'wF_\';
        } else {
            return _w();
        }
    }
    var yd_ = $FillDicData$(\'aJ_\');

    function Xn_() {
        function _X() {
            return \'_;_\';
        };
        if (_X() == \'_;_\') {
            return \'_;_\';
        } else {
            return _X();
        }
    }

    function iJ_() {
        \'return iJ_\';
        return \';\';
    }

    function bN_() {
        \'return bN_\';
        return \'7\';
    }
    var vY_ = \';\';

    function PG_() {
        \'return PG_\';
        return \'_0\';
    }
    var FG_ = function () {
        \'return FG_\';
        return \'3\';
    };

    function uV_() {
        function _u() {
            return \'6\';
        };
        if (_u() == \'6\') {
            return \'6\';
        } else {
            return _u();
        }
    }
    var lI_ = function () {
        \'return lI_\';
        return \'3;7\';
    };
})(document);
替换伪元素的整个JS

 6.全文所有JS代码因为被混淆,可能会有差异,但结构一样,可仔细寻找。

 

开始解析:

如上图所示,里面有好多函数和变量,里面会返回一段文字或者符号,这些零零散散的文字将被组成一个完成数据字典库,

大致分为这几种:

  直接变量赋值的,如

 var mH_ = \'例\'

  通过函数为变量赋值的,值等于return后面的字符串,如

 var lI_ = function() {
     \'return lI_\';
     return \'3;7\';
 };

  函数,调用的时候获得值,值等于return后面的字符串,如

  function hw_() {
      function _h() {
          return \'hw_\';
      };
      if (_h() == \'hw__\') {
          return _h();
      } else {
          return \'re\';
      }
  }

(其实笔者想过使用.net直接运行JS,后来发现他们这个JS是有错误的,并且(……)(document)这种形式使用MSScriptControl.ScriptControl和JScript都无法识别,只能硬着头皮分析了。。。如果有能识别这种JS,求留言推荐,十分感谢。)

众所周知,函数是需要被调用才能运行的,那么入口呢,就很巧妙的隐藏在了这里 ↓

 var HH_ = $FillDicData$(\'iU_\');

 接着会跳到这个函数

 

这个就是调用上面的那些大部分的变量组成字典集

 

紧接着下面的这个方法就是获取下标集合

 

这个方法实现根据坐标集取得字典 ,注意这个方法是不混淆的!可以直接搜索方法名找到。

 

"77,7"就是"环保" 通过这种方式替换页面的伪元素

 

分析到这里了,后面也就不难了,不再详细说明,如有不明白的,可以留言给我。

 

获取数据字典,模拟了刚才分析的JS

  1   #region  获取汽车之家车辆信息
  2         /// <summary>
  3         /// 获取汽车之家车辆信息
  4         /// </summary>
  5         /// <param name="Parameter">参数(汽车之家ID或者Url)</param>
  6         /// <param name="Url">是否为Url</param>
  7         /// <param name="JsonKeyLink"></param>
  8         /// <param name="JsonConfig"></param>
  9         /// <param name="JsonOption"></param>
 10         /// <param name="JsonColor"></param>
 11         /// <param name="JsonInnerColor"></param>
 12         /// <param name="JsonBag"></param>
 13         /// <param name="ErrorMessage"></param>
 14         /// <returns></returns>
 15         public bool GetAutoHomeCarInfo(string Parameter, bool Url, ref string JsonKeyLink, ref string JsonConfig, ref string JsonOption, ref string JsonColor, ref string JsonInnerColor, ref string JsonBag, ref string ErrorMessage)
 16         {
 17             if (Url) return false;
 18             #region
 19             try
 20             {
 21                 //这里的变量是车型ID
 22                 string strUrl = Url ? Parameter : "http://car.autohome.com.cn/config/spec/" + Parameter + ".html";
 23                 HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(strUrl);
 24                 webrequest.AllowAutoRedirect = true;
 25                 webrequest.Timeout = 30000;
 26                 CookieContainer c = new CookieContainer();
 27                 webrequest.CookieContainer = c;
 28                 HttpWebResponse response = (HttpWebResponse)webrequest.GetResponse();
 29                 StreamReader read = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("utf-8"));
 30                 string strAllHTML = read.ReadToEnd();
 31 
 32                 #region 获取数据字典
 33                 string[] KeyLink = null;
 34                 string[] Configpl = null;
 35                 string[] Optionpl = null;
 36                 GetAutoHomeDictionary(strAllHTML, ref KeyLink, ref Configpl, ref Optionpl);
 37                 #endregion
 38 
 39                 MatchCollection carInfoMatches = Regex.Matches(strAllHTML, "<script type=\"text/javascript\">((?:.|\\n)*?)</script>");
 40                 string strCarInfo = string.Empty;
 41                 for (int i = 0; i < carInfoMatches.Count; i++)
 42                 {
 43                     if (carInfoMatches[i].Result("$1").Trim().IndexOf("var option =") > 0) strCarInfo = carInfoMatches[i].Result("$1").Trim();
 44                 }
 45                 if (strCarInfo != string.Empty)
 46                 {
 47                     Hashtable htCarInfo = new Hashtable();
 48                     if (strCarInfo.IndexOf("var keyLink =") > -1) htCarInfo.Add(strCarInfo.IndexOf("var keyLink ="), "JsonKeyLink");
 49                     if (strCarInfo.IndexOf("var config =") > -1) htCarInfo.Add(strCarInfo.IndexOf("var config ="), "JsonConfig");
 50                     if (strCarInfo.IndexOf("var option =") > -1) htCarInfo.Add(strCarInfo.IndexOf("var option ="), "JsonOption");
 51                     if (strCarInfo.IndexOf("var color =") > -1) htCarInfo.Add(strCarInfo.IndexOf("var color ="), "JsonColor");
 52                     if (strCarInfo.IndexOf("var innerColor =") > -1) htCarInfo.Add(strCarInfo.IndexOf("var innerColor ="), "JsonInnerColor");
 53                     if (strCarInfo.IndexOf("var bag =") > -1) htCarInfo.Add(strCarInfo.IndexOf("var bag ="), "JsonBag");
 54                     ArrayList arrayList = new ArrayList(htCarInfo.Keys);
 55                     arrayList.Sort();
 56                     for (int i = 0; i < arrayList.Count; i++)
 57                     {
 58                         //有些没有的字典和解析JS要筛掉
 59                         string JsonTemp = string.Empty;
 60                         if (i == arrayList.Count - 1)
 61                         {
 62                             continue;
 63                             JsonTemp = strCarInfo.Substring(int.Parse(arrayList[i].ToString()), strCarInfo.Length - int.Parse(arrayList[i].ToString()));
 64                             JsonTemp = JsonTemp.Substring(0, JsonTemp.IndexOf("]}};")) + "]}};";
 65                         }
 66                         else
 67                         {
 68                             JsonTemp = strCarInfo.Substring(int.Parse(arrayList[i].ToString()), int.Parse(arrayList[i + 1].ToString()) - int.Parse(arrayList[i].ToString()));
 69                         }
 70                         //if (JsonTemp.IndexOf("_baikeVJ") > 0)
 71                         if (Regex.IsMatch(JsonTemp, @"<span class=\'hs_kw.*?_baike\w{0,2}\'></span>"))
 72                         {
 73                             string tmp = JsonTemp.Substring(JsonTemp.IndexOf("_baike") , 8);
 74                             for (int j = 0; j < KeyLink.Length; j++)
 75                             {                            
 76                                 JsonTemp = JsonTemp.Replace("<span class=\'hs_kw" + j + tmp + "\'></span>", KeyLink[j]);
 77                             }
 78                         }
 79                         if (Regex.IsMatch(JsonTemp, @"<span class=\'hs_kw.*?_config\w{0,2}\'></span>"))
 80                         {
 81                             string tmp = JsonTemp.Substring(JsonTemp.IndexOf("_config"), 9);
 82                             for (int j = 0; j < Configpl.Length; j++)
 83                             {
 84                                 JsonTemp = JsonTemp.Replace("<span class=\'hs_kw" + j + tmp + "\'></span>", Configpl[j]);
 85                             }
 86                         }
 87                         if (Regex.IsMatch(JsonTemp, @"<span class=\'hs_kw.*?_option\w{0,2}\'></span>"))
 88                         {
 89                             string tmp = JsonTemp.Substring(JsonTemp.IndexOf("_option"), 9);
 90                             for (int j = 0; j < Optionpl.Length; j++)
 91                             {
 92                                 JsonTemp = JsonTemp.Replace("<span class=\'hs_kw" + j + tmp + "\'></span>", Optionpl[j]);
 93                             }
 94                         }
 95                         switch (htCarInfo[arrayList[i]].ToString())
 96                         {
 97                             //这里只解析了左边配置栏和上下配置,其他可自行修改
 98                             case "JsonKeyLink":
 99                                 JsonTemp = JsonTemp.Replace("var keyLink =", string.Empty).Replace(";", string.Empty).Trim();
100                                 JsonKeyLink = JsonTemp;
101                                 break;
102                             case "JsonConfig":
103                                 JsonTemp = JsonTemp.Replace("var config =", string.Empty).Replace(";", string.Empty).Trim();
104                                 JsonConfig = JsonTemp;
105                                 break;
106                             case "JsonOption":
107                                 JsonTemp = JsonTemp.Replace("var option =", string.Empty).Replace(";", string.Empty).Trim();
108                                 JsonOption = JsonTemp;
109                                 break;
110                         }
111                     }
112                 }
113                 return true;
114             }
115             catch (Exception Ex)
116             {
117                 ErrorMessage = Ex.Message;
118                 return false;
119             }
120             #endregion
121         }
122         #endregion        

 

破解数据字典,其实就是模拟我们上面分析的JS解析过程,其中用到大量的正则分别处理不同格式的数据集

  1         /// <summary>
  2         /// 获取数据字典
  3         /// </summary>
  4         /// <param name="strAllHTML"></param>
  5         /// <param name="keyLink"></param>
  6         /// <param name="configpl"></param>
  7         /// <param name="optionpl"></param>
  8         public void GetAutoHomeDictionary(string strAllHTML, ref string[] keyLink, ref string[] configpl, ref string[] optionpl)
  9         {
 10             MatchCollection carInfoMatches = Regex.Matches(strAllHTML, "<script>((?:.|\\n)*?)</script>");
 11             List<string> matcheslist = new List<string>();
 12             foreach (var item in carInfoMatches)
 13             {
 14                 if (item.ToString().IndexOf("try{document.") < 0 && item.ToString().Length > 500)
 15                 {
 16                     matcheslist.Add(item.ToString());
 17                 }
 18             }
 19             for (int i = 0; i < matcheslist.Count; i++)
 20             {
 21                 #region 生成文字集1
 22                 Dictionary<string, string> dc = new Dictionary<string, string>();
 23                 MatchCollection matchlist = Regex.Matches(matcheslist[i].Replace("})(document);</script>", " function"), @"function\s(\S){0,2}_\(\)\s*\{.*?\}.*?(?=function)");//取出function              
 24                 for ( int j = 0; j < matchlist.Count; j++)
 25                 {
 26                     string str1 = string.Empty, str2 = string.Empty;
 27                     getStr(matchlist[j].Value, ref str1, ref str2);
 28                     dc.Add(str1, str2);
 29                 }
 30                 try
 31                 {
 32                     MatchCollection matchlist2 = Regex.Matches(matcheslist[i], @"var\s?\S\S_=\s?\'\S*\'");//取出赋值变量
 33                     for (int j = 0; j < matchlist2.Count; j++)
 34                     {
 35                         string str1 = string.Empty, str2 = string.Empty;
 36                         getStr2(matchlist2[j].Value, ref str1, ref str2);
 37                         dc.Add(str1, str2);
 38                     }
 39 
 40                     MatchCollection matchlist3 = Regex.Matches(matcheslist[i], @"var\s?\S\S_=\s?function\s?\(\)\s?\{.*?return.*?return.*?\}");//取出赋值函数
 41                     for (int j = 0; j < matchlist3.Count; j++)
 42                     {
 43                         string str1 = string.Empty, str2 = string.Empty;
 44                         getStr3(matchlist3[j].Value, ref str1, ref str2);
 45                         dc.Add(str1, str2);
 46                     }
 47                 }
 48                 catch (Exception ex)
 49                 {
 50                     throw ex;
 51                 }
 52                 StringBuilder sb = new StringBuilder();
 53                 string str = Regex.Match(matcheslist[i], @"function\s*\$FillDicData\$\s*\(\)\s*?{.*?\$RenderToHTML").Value;
 54                 string tmp2 = str.Substring(str.IndexOf("$GetWindow$()"), str.IndexOf("$rulePosList$") - str.IndexOf("$GetWindow$()"));
 55                 string tmp3 = tmp2.Substring(tmp2.IndexOf(\']\') + 1);
 56                 string[] tmp4 = tmp3.Split(\'+\');
 57                 try
 58                 {
 59                     for (int j = 1; j < tmp4.Length - 1; j++)
 60                     {
 61                         //if (Regex.IsMatch(tmp4[j], @"[\u4e00-\u9fbb]{1,5}"))
 62                         //{
 63                         //    sb.Append(Regex.Match(tmp4[j], @"[\u4e00-\u9fbb]{1,5}").ToString());
 64                         //}
 65                         if (Regex.IsMatch(tmp4[j], @"\(function\s{0,3}\(\)\{.*?return.*?return.*?\}\)"))
 66                         {
 67                             var strtmp = Regex.Match(tmp4[j], @"\(function\s{0,3}\(\)\{.*?return.*?return.*?\}\)").Value;
 68                             var strtmp2 = Regex.Match(strtmp, "return.*?(.*?).*?return.*(.*?)").Value.Split(new string[] { "return" }, StringSplitOptions.RemoveEmptyEntries);
 69                             foreach (var item in strtmp2)
 70                             {
 71                                 if (item.Split(\'\\'\').Length == 3) sb.Append(item.Split(\'\\'\')[1].Replace("\'", "").Trim());
 72                             }
 73                         }
 74                         else if (Regex.IsMatch(tmp4[j], @"\(\'([A-Z]|[a-z]|[0-9]|[,]|[\']|[;]|[\u4e00-\u9fbb]){1,10}\'\)"))
 75                         {
 76                             sb.Append(Regex.Match(tmp4[j], @"\(\'([A-Z]|[a-z]|[0-9]|[,]|[\']|[;]|[\u4e00-\u9fbb]){1,10}(?=\'\))").ToString().Substring(2));
 77                         }
 78                         else if (Regex.IsMatch(tmp4[j], @"\(\)"))
 79                         {
 80                             sb.Append(dc[tmp4[j].Replace("()", "")]);
 81                         }
 82                         else if (Regex.IsMatch(tmp4[j], @"\'([A-Z]|[a-z]|[0-9]|[,]|[\']|[;]|[\u4e00-\u9fbb]){1,10}\'(?!\))"))
 83                         {
 84                             sb.Append(Regex.Match(tmp4[j], @"\'([A-Z]|[a-z]|[0-9]|[,]|[\']|[;]|[\u4e00-\u9fbb]){1,10}\'").ToString().Replace("\'",""));
 85                         }
 86                         else if (Regex.IsMatch(tmp4[j], @"\S{3}"))
 87                         {
 88                             sb.Append(dc[tmp4[j]]);
 89                         }
 90                         else
 91                         {
 92                             sb.Append("X");
 93                         }
 94                     }
 95                 }
 96                 catch (Exception ex)
 97                 {
 98                     throw;
 99                 }
100                 #endregion                
101 
102                 #region 取下标
103                 string tmp11 = str.Substring(str.IndexOf("$rulePosList$"));
104                 string tmp12 = tmp11.Substring(0, tmp11.IndexOf("$SystemFunction2$"));
105                 StringBuilder sb2 = new StringBuilder();
106                 string[] tmp13 = tmp12.Split(\'+\');
107                 try
108                 {
109                     tmp13[tmp13.Length - 1] = tmp13[tmp13.Length - 1].Replace("),", "");
110                     for (int j = 1; j < tmp13.Length; j++)
111                     {
112                         if (Regex.IsMatch(tmp13[j], @"\(\'([A-Z]|[a-z]|[0-9]|[,]|[\']|[;]|[\u4e00-\u9fbb]){1,10}\'\)"))
113                         {
114                             sb2.Append(Regex.Match(tmp13[j], @"\(\'([A-Z]|[a-z]|[0-9]|[,]|[\']|[;]|[\u4e00-\u9fbb]){1,10}(?=\'\))").ToString().Substring(2));
115                         }
116                         else if (Regex.IsMatch(tmp13[j], @"return\s{0,2}\'([0-9]|[,]|[;]){1,10}\'"))
117                         {
118                             var tmp = Regex.Match(tmp13[j], @"return\s{0,2}\'([0-9]|[,]|[;]){1,10}\'").Value.ToLower().Replace("return", "").Replace("\'", "").Trim();
119                             sb2.Append(tmp);
120                         }
121                         else if (Regex.IsMatch(tmp13[j], @"\(\)"))
122                         {
123                             tmp13[j] = tmp13[j].Substring(0, tmp13[j].IndexOf("()") + 2);
124                             sb2.Append(dc[tmp13[j].Replace("()", "")]);
125                         }
126                         else if (Regex.IsMatch(tmp13[j], @"\S{3}") && tmp13[j].IndexOf("\'") < 0)
127                         {
128                             sb2.Append(dc[tmp13[j]]);
129                         }
130                         else if (tmp13[j].Split(new string[] { "\'" }, StringSplitOptions.None).Length > 2)
131                         {
132                             sb2.Append(tmp13[j].Replace("\'", "").Trim());
133                         }
134                         else if (tmp13[j].Trim() == "\'\'")
135                         {
136                             continue;
137                         }
138                         else
139                         {
140                             sb2.Append("X");
141                         }
142                     }
143                 }
144                 catch (Exception ex)
145                 {
146                     throw;
147                 }
148 
149                 #endregion
150 
151                 #region 生成字典
152                 List<string> list = new List<string>();
153                 try
154                 {
155                     foreach (var item in sb2.ToString().Split(\';\'))
156                     {
157                         var numlist = item.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
158                         StringBuilder sbresult = new StringBuilder();
159                         foreach (var num in numlist)
160                         {
161                             var tmpstr = sb.ToString()[Cvt.ToInt32(num)];
162                             sbresult.Append(tmpstr);
163                         }
164                         list.Add(sbresult.ToString());
165                     }
166                 }
167                 catch (Exception e)
168                 {
169 
170                     throw;
171                 }
172 
173                 #endregion
174 
175                 if (i == 0) keyLink = list.ToArray();
176                 else if (i == 1) configpl = list.ToArray();
177                 else if (i == 2) optionpl = list.ToArray();
178             }
179         }

 

 

 

 1         /// <summary>
 2         /// 格式化字符串
 3         /// </summary>
 4         /// <param name="str"></param>
 5         /// <param name="resultKey"></param>
 6         /// <param name="resultValue"></param>
 7         public void getStr(string str, ref string resultKey, ref string resultValue)
 8         {
 9             try
10             {
11                 if (str.IndexOf("var") > 0)
12                 {
13                     str = str.Substring(0, str.IndexOf("var"));
14                 }
15                 resultKey = str.Split(new string[] { "()" }, StringSplitOptions.RemoveEmptyEntries).FirstOrDefault().Replace("function", "").Trim();
16                 resultValue = JSHelper.ExecJs(str + " " + resultKey + "();").ToString();
17                 return;
18             }
19             catch (Exception ex)
20             {
21                throw;
22             }
23         }
24         public void getStr2(string str, ref string resultKey, ref string resultValue)
25         {
26             try
27             {
28                 string[] str2 = str.Replace("var", "").Replace("\\'", "").Trim().Split(\'=\');
29                 resultKey = str2[0];
30                 resultValue = str2[1];
31             }
32             catch (Exception ex)
33             {
34                 throw ex;
35             }
36         }
37         public void getStr3(string str, ref string resultKey, ref string resultValue)
38         {
39             try
40             {//var AC_=function(){\'AC_\';var _A=function(){return \'格\';}; return _A();}
41                 string[] str2 = str.Replace("var", "").Trim().Split(\'=\');
42                 resultKey = str2[0];
43                 if (str.Split(new string[] { "function" }, StringSplitOptions.None).Length > 2)
44                 {
45                     string str3 = Regex.Match(str, @"var\s?\S\S_=\s?function\s?\(\S{0,5}\)\s?\{.*?return.*?\}").Value;//取出赋值函数
46                     string str4 = str3.Substring(str3.IndexOf("return") + 6);
47                     string[] str5 = str4.Split(new string[] { "\\'" }, StringSplitOptions.None);
48                     resultValue = str5[1];
49                 }
50                 else
51                 {
52                     string str3 = str2[str2.Length - 1].Substring(str2[str2.Length - 1].LastIndexOf("return"));
53                     string[] str4 = str3.Split(\'\\'\');
54                     resultValue = str4[1];
55                 }
56             }
57             catch (Exception ex)
58             {
59                 throw ex;
60             }
61         }    

 

文中部分解析直接将变量丢进了JS里执行,这个破解比较早,用的JScript,现在推荐使用MSScriptControl.ScriptControl,这个是com组件里的。

 1 using Microsoft.JScript;
 2 using Microsoft.JScript.Vsa;
 3 using System;
 4 using System.CodeDom.Compiler;
 5 using System.Collections.Generic;
 6 using System.Linq;
 7 using System.Reflection;
 8 using System.Text;
 9 using System.Threading.Tasks;
10 
11 namespace library
12 {
13     public static class JSHelper
14     {
15         static VsaEngine Engine = VsaEngine.CreateEngine();
16         public static object ExecJs(string str)
17         {
18             return EvalJScript(str);
19         }
20         public static object EvalJScript(string JScript)
21         {
22             object Result = null;
23             try
24             {
25                 Result = Microsoft.JScript.Eval.JScriptEvaluate(JScript, Engine);
26             }
27             catch (Exception ex)
28             {
29                 return ex.Message;
30             }
31             return Result;
32 
33         }
34     }
35 }

 

这种稍微复杂点的爬虫真的十分锻炼分析能力和耐心,这也是笔者认为开发者十分重要的一种能力,而对于.net这种门槛较低,技术能力金字塔分布的开发群体,真的需要我们好好钻研技术。

如有不明或更好的建议,欢迎留言交流。