详解支持向量机-硬间隔SVM-模型求解-引出对偶问题&引出KKT条件【白板推导系列笔记】

时间:2022-10-17 10:58:38

$$

\left{\begin{aligned}&\mathop{\text{min }}\limits_{\omega,b} \frac{1}{2}\omega^{T}\omega\&s.t.y_{i}(\omega^{T}x_{i}+b)\geq 1\Leftrightarrow 1-y_{i}(\omega^{T}x_{i}+b)\leq 0,\underbrace{i=1,2,\cdots,N}_{N个约束}\end{aligned}\right.

$$

构建拉格朗日函数

$$

L(\omega,b,\lambda)=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}[1-y_{i}(\omega^{T}x_{i}+b)]

$$

注意这里$L$括号里面的$\lambda$为$N \times 1$,等号右边的$\lambda_{i}$为$1 \times 1$

 

拉格朗日乘子法具体后面文章会解释

 

例如本题,如果$1-y_{i}(\omega^{T}x_{i}+b)>0$,

$$

\mathop{\text{max }}\limits_{\lambda}L(\lambda,\omega,b)=\frac{1}{2}\omega^{T}\omega+ \infty=\infty

$$

如果$1-y_{i}(\omega^{T}x_{i}+b)\leq 0$

$$

\mathop{\text{max }}\limits_{\lambda}L(\lambda,\omega,b)=\frac{1}{2}\omega^{T}\omega+0=\frac{1}{2}\omega^{T}\omega

$$

因此有

$$

\mathop{\text{min }}\limits_{\omega,b}\mathop{\text{max }}\limits_{\lambda}L(\lambda,\omega,b)=\mathop{\text{min }}\limits_{\omega,b}(\infty, \frac{1}{2}\omega^{T}\omega)=\mathop{\text{min }}\limits_{\omega,b} \frac{1}{2}\omega^{T}\omega

$$

因此该问题的无约束形式为

$$

\mathop{\text{min }}\limits_{\omega,b}\mathop{\text{max }}\limits_{\lambda}L(\omega,b,\lambda),s.t.\lambda_{i}\geq 0

$$

 

这里的有无约束指的是对$\omega$的约束(这里的$\omega$相当于模板中的$x$)。本来$1\Leftrightarrow 1-y_{i}(\omega^{T}x_{i}+b)\leq 0$是对$\omega$的约束,通过拉格朗日函数将约束条件转化为$\lambda_{i}\geq 0$,是对$\lambda_{i}$的约束,不再是对$\omega$的约束,因此称为无约束形式

 

由于不等式约束是仿射函数,对偶问题和原问题等价,因此该问题的对偶形式为

$$

\mathop{\text{max }}\limits_{\lambda}\mathop{\text{min }}\limits_{\omega,b}L(\omega,b,\lambda)s.t.\lambda_{i}\geq 0

$$

先看$\mathop{\text{min }}\limits_{\omega,b}L(\omega,b,\lambda)$,对于$b$

$$

\begin{aligned}

\frac{\partial L}{\partial b}&=\frac{\partial }{\partial b}\left[\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}(\omega^{T}x_{i}+b)\right]\

&=\frac{\partial }{\partial b}\left(-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}b\right)\

&=-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}=0

\end{aligned}

$$

将其代入$L(\omega,b,\lambda)$

$$

\begin{aligned}

L(\omega,b,\lambda)&=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}(\omega^{T}x_{i}+b)\

&=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}\omega^{T}x_{i}+\sum\limits_{i=1}^{N}\lambda_{i}y_{i}b\

&=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}\omega^{T}x_{i}

\end{aligned}

$$

对于$\omega$

$$

\begin{aligned}

\frac{\partial L}{\partial \omega}&=\frac{1}{2} \cdot 2\omega- \sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}=0\

\omega&=\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}

\end{aligned}

$$

将其代入$L(\omega,b,\lambda)$

$$

\begin{aligned}

L(\omega,b,\lambda)&=\frac{1}{2}\underbrace{\left(\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}\right)^{T}\left(\sum\limits_{j=1}^{N}\lambda_{j}y_{j}x_{j}\right)}{\in \mathbb{R}}-\underbrace{\sum\limits{i=1}^{N}\lambda_{i}y_{i}\left(\sum\limits_{j=1}^{N}\lambda_{j}y_{j}x_{j}\right)^{T}x_{i}}{\in \mathbb{R}}+\sum\limits{i=1}^{N}\lambda_{i}\

&=- \frac{1}{2}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}\lambda_{i}\lambda_{j}y_{i}y_{j}x_{i}^{T}x_{j}+\sum\limits_{i=1}^{N}\lambda_{i}

\end{aligned}

$$

因此原问题转化为

$$

\mathop{\text{max }}\limits_{\lambda}- \frac{1}{2}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}\lambda_{i}\lambda_{j}y_{i}y_{j}x_{i}x_{j}+\sum\limits_{i=1}^{N}\lambda_{i},s.t.\lambda_{i}\geq 0,\sum\limits_{i=1}^{N}\lambda_{i}y_{i}=0

$$

定义该优化问题的KKT条件(由于原、对偶问题具有强对偶关系$\Leftrightarrow$满足KKT条件)

$$

\left{\begin{aligned}&\frac{\partial L}{\partial \omega}=0,\frac{\partial L}{\partial b}=0\&\lambda_{i}[1-y_{i}(\omega^{T}x_{i}+b)]=0\&\lambda_{i}\geq 0\&1-y_{i}(\omega^{T}x_{i}+b)=0\end{aligned}\right.

$$

其中$\lambda_{i}[1-y_{i}(\omega^{T}x_{i}+b)]=0$叫做互补松弛条件,因为对于支持向量$y_{i}(\omega^{T}x_{i}+b)= 1$,对于其他数据点$\lambda_{i}=0$(根据拉格朗日函数的定义),即二者中一定至少有一个为$0$

在之前的$\begin{aligned} \frac{\partial L}{\partial \omega}\end{aligned}$中我们可以得到

$$

\omega ^{*}=\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}

$$

 

其中$\lambda_{i}$通过求解$\begin{aligned} \mathop{\text{max }}\limits_{\lambda}- \frac{1}{2}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}\lambda_{i}\lambda_{j}y_{i}y_{j}x_{i}x_{j}+\sum\limits_{i=1}^{N}\lambda_{i},s.t.\lambda_{i}\geq 0,\sum\limits_{i=1}^{N}\lambda_{i}y_{i}=0\end{aligned}$可以得到

有看到怎么求,但是还是看不懂,反正就是能得到$\lambda_{i}$就对了

 

对于$b ^{*}$,我们假设

$$

\exists (x_{k},y_{k}),s.t.1-y_{k}(\omega^{T}x_{k}+b)=0

$$

显然$(x_{k},y_{k})$,就是所谓的支持向量(在最开始$\begin{aligned} \omega^{T}=\frac{\hat{\omega}^{T}}{a},b=\frac{\hat{b}}{a}\end{aligned}$那里设的)。因此

$$

\begin{aligned}

y_{k}(\omega^{T}x_{k}+b)&=1\

y_{k}^{2}(\omega^{T}x_{k}+b)&=y_{k}\

&y_{k}\in \left{+1,-1\right},y_{k}^{2}=1\

b ^{*}&=y_{k}-\omega^{T}x_{k}=y_{k}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}^{T}x_{k}

\end{aligned}

$$

 

这里的$(x_{k},y_{k})$就是$\lambda_{i}$对应的向量,至于$\lambda_{i}$怎么求,还是那句话我不会,非常抱歉