支持向量回归

时间:2024-03-31 11:00:19

支持向量回归

现在我们来考虑支持向量机得回归问题

对于样本(x,y)(\bm{x},y),传统的回归模型通常直接基于输出f(x)f(\bm{x})与真实输出yy之间的差别来计算损失,当且仅当f(x)f(\bm{x})yy完全相同时,损失才为零。于此不同,支持向量回归(SVR)假设我们能容忍f(x)f(\bm{x})yy之间最多有ϵ\epsilon的偏差,即仅当f(x)f(\bm{x})yy之间的差别绝对值大于ϵ\epsilon时才计算损失。

支持向量回归

  • 于是SVRSVR问题可形式化为
    minw,b12w2+Ci=1mϵ(f(xi)yi)\min \limits_{\bm{w},b} \frac{1}{2}||\bm{w}||^2 + C\sum_{i=1}^{m}\ell_{\epsilon}(f(\bm{x_i})-y_i)

其中CC为正则化常数,ϵ\ell_{\epsilon}ϵ\epsilon-不敏感损失函数。
ϵ(z)={0,ifzϵzϵ,otherwise \ell_{\epsilon}(z)= \begin{cases} 0, & if |z| \le \epsilon\\ |z| - \epsilon, & otherwise \end{cases}

  • 引入松弛变量ξi\xi_iξ^i\widehat{\xi}_i,式子重写为:
    minw,b,ξi,ξ^i12w2+Ci=1m(ξi+ξ^i)s.t.f(xi)yiϵ+ξi,yif(xi)ϵ+ξ^i,ξi0,ξ^i0,i=1,2,...,m. \min \limits_{\bm{w},b,\xi_i,\widehat{\xi}_i} \frac{1}{2}||\bm{w}||^2 + C\sum_{i=1}^{m}(\xi_i+\widehat{\xi}_i)\\ s.t. \quad f(\bm{x_i} )- y_i \le \epsilon + \xi_i,\\ y_i - f(\bm{x_i} ) \le \epsilon + \widehat{\xi}_i,\\ \xi_i \ge 0, \widehat{\xi}_i \ge 0,i=1,2,...,m.

ϵ\epsilon-不敏感损失函数
支持向量回归

  • 拉格朗日函数为:
    L(w,b,α,α^,ξ,ξ^,μ,μ^)=12w2+Ci=1m(ξi+ξ^i)i=1mμiξii=1mμ^iξ^i+i=1mαi(f(xi)yiϵξi)+i=1mα^i(yif(xi)ϵξ^i) L(\bm{w},b,\bm{\alpha, \widehat{\alpha},\xi,\widehat{\xi},\mu,\widehat{\mu}}) \\ = \frac{1}{2}||\bm{w}||^2 + C\sum_{i=1}^{m}(\xi_i+\widehat{\xi}_i) - \sum_{i=1}^{m}\mu_i\xi_i - \sum_{i=1}^{m}\widehat{\mu}_i\widehat{\xi}_i \\ + \sum_{i=1}^{m}\alpha_i (f(\bm{x}_i)-y_i-\epsilon-\xi_i) + \sum_{i=1}^{m}\widehat{\alpha}_i (y_i - f(\bm{x}_i)-\epsilon-\widehat{\xi}_i)
  • w,b,ξi,ξ^i\bm{w},b,\xi_i,\widehat{\xi}_i的偏导为零,可得
    w=i=1m(α^iαi)xi,0=i=1m(α^iαi),C=αi+μi,C=α^i+μ^i \bm{w} = \sum_{i=1}^{m}(\widehat{\alpha}_i - \alpha_i)\bm{x_i},\\ 0 = \sum_{i=1}^{m}(\widehat{\alpha}_i - \alpha_i),\\ C = \alpha_i + \mu_i,\\ C = \widehat{\alpha}_i + \widehat{\mu}_i
  • 带入上式得:
    maxα,α^i=1myi(α^iαi)ϵ(α^i+αi)12i=1mj=1m(α^iαi)(α^jαj)xiTxjs.t.i=1m(α^iαi)=0,0αi,α^iC. \max \limits_{\bm{\alpha, \widehat{\alpha}}} \sum_{i=1}^{m}y_i(\widehat{\alpha}_i - \alpha_i)-\epsilon(\widehat{\alpha}_i + \alpha_i)\\- \frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}(\widehat{\alpha}_i - \alpha_i)(\widehat{\alpha}_j - \alpha_j)\bm{x}_i^T\bm{x}_j \\ s.t. \sum_{i=1}^{m}(\widehat{\alpha}_i - \alpha_i) = 0,\\ 0 \le \alpha_i,\widehat{\alpha}_i \le C.
  • 上式过程中需要满足KKT条件,即要求
    {αi(f(xi)yiϵξi)=0α^i(yif(xi)ϵξ^i)=0αiα^i=0ξiξ^i=0(Cαi)ξi=0(Cα^i)ξ^i=0 \begin{cases} \alpha_i (f(\bm{x}_i)-y_i-\epsilon-\xi_i) =0\\ \widehat{\alpha}_i (y_i - f(\bm{x}_i)-\epsilon-\widehat{\xi}_i)=0\\ \alpha_i\widehat{\alpha}_i=0\\ \xi_i\widehat{\xi}_i = 0\\ (C-\alpha_i)\xi_i = 0\\ (C-\widehat{\alpha}_i)\widehat{\xi}_i = 0 \end{cases}

f(xi)yiϵξi=0f(\bm{x}_i)-y_i-\epsilon-\xi_i=0αi\alpha_i能取非零值,当且仅当f(xi)yiϵξi=0f(\bm{x}_i)-y_i-\epsilon-\xi_i=0αi\alpha_i能取非零值,换言之,仅当样本 (xi,yi)(\bm{x}_i,y_i)不落入ϵ\epsilon-间隔带中,相应得αi,α^i\alpha_i,\widehat{\alpha}_i才能取非零值,此外约束f(xi)yiϵξi=0f(\bm{x}_i)-y_i-\epsilon-\xi_i =0yif(xi)ϵξ^i=0y_i - f(\bm{x}_i)-\epsilon-\widehat{\xi}_i=0不能同时成立,因此αi,α^i\alpha_i,\widehat{\alpha}_i中至少有一个为零。

  • SVRSVR的解形如:
    f(x)=i=1m(α^iαi)xiTxi+bf(\bm{x}) = \sum_{i=1}^{m}(\widehat{\alpha}_i - \alpha_i)\bm{x}_i^T\bm{x}_i + b

落在ϵ\epsilon-间隔带中的样本都满足α^i=0αi=0\widehat{\alpha}_i =0 且\alpha_i = 0,使(α^iαi)0(\widehat{\alpha}_i - \alpha_i) \ne 0的样本即为SVRSVR的支持向量,他们必定落在ϵ\epsilon-间隔带外。

  • 对于,每个样本(xi,yi)(\bm{x_i},y_i)都有(Cαi)ξi=0αi(f(xi)yiϵξi)=0(C-\alpha_i)\xi_i = 0且\alpha_i (f(\bm{x}_i)-y_i-\epsilon-\xi_i) =0,于是在得到αi\alpha_i0<αi<C0 < \alpha_i < C,则必有ξi=0\xi_i = 0,进而有:
    b=yi+ϵi=1m(α^iαi)xiTxi+bb = y_i + \epsilon - \sum_{i=1}^{m}(\widehat{\alpha}_i - \alpha_i)\bm{x}_i^T\bm{x}_i + b

可以选取多个满足条件0<αi<C0 < \alpha_i < C的样本求解bb后取平均值。

  • 考虑映射形式:
    ω=i=1m(α^iαi)ϕ(xi)\bm{\omega} = \sum_{i=1}^{m}(\widehat{\alpha}_i - \alpha_i)\phi(\bm{x_i})
  • 然后SVRSVR可表示为:
    f(x)=i=1m(α^iαi)k(x,xi)+bf(\bm{x}) = \sum_{i=1}^{m}(\widehat{\alpha}_i - \alpha_i)k(\bm{x},\bm{x_i}) + b

k(x,xi)=ϕ(xi)Tϕ(xi)k(\bm{x},\bm{x_i}) = \phi(\bm{x_i})^T\phi(\bm{x_i})