【深度学习】图形模型基础(7):机器学习优化中的方差减少方法(2)-附录A

时间:2024-07-12 07:02:25

引理

这里,我们提供并证明一些辅助性引理。

引理 1:
假设对于 i = 1 , … , n i = 1, \ldots, n i=1,,n f i ( x ) f_i(x) fi(x) 是凸函数且具有 L max L_{\text{max}} Lmax-光滑性。假设 i i i 在集合 { 1 , … , n } \{1, \ldots, n\} {1,,n} 上均匀采样。那么对于任意 x ∈ R d x \in \mathbb{R}^d xRd,我们有:
E k [ ∥ ∇ f i ( x ) − ∇ f i ( x ref ) ∥ 2 ] ≤ 2 L max ( f ( x ) − f ( x ref ) ) . \mathbb{E}_k[\|\nabla f_i(x) - \nabla f_i(x_{\text{ref}})\|^2] \leq 2L_{\text{max}}(f(x) - f(x_{\text{ref}})). Ek[∥∇fi(x)fi(xref)2]2Lmax(f(x)f(xref)).

证明:
由于 f i f_i fi 是凸函数,我们有:
f i ( z ) ≥ f i ( x ref ) + ⟨ ∇ f i ( x ref ) , z − x ref ⟩ , ∀ z ∈ R d . f_i(z) \geq f_i(x_{\text{ref}}) + \langle \nabla f_i(x_{\text{ref}}), z - x_{\text{ref}} \rangle, \quad \forall z \in \mathbb{R}^d. fi(z)fi(xref)+fi(xref),zxref,zRd.
由于 f i f_i fi L max L_{\text{max}} Lmax-光滑的,我们有:
f i ( z ) ≤ f i ( x ) + ⟨ ∇ f i ( x ) , z − x ⟩ + L max 2 ∥ z − x ∥ 2 , ∀ z , x ∈ R d . f_i(z) \leq f_i(x) + \langle \nabla f_i(x), z - x \rangle + \frac{L_{\text{max}}}{2}\|z - x\|^2, \quad \forall z, x \in \mathbb{R}^d. fi(z)fi(x)+fi(x),zx+2Lmaxzx2,z,xRd.
因此,对于任意 x , z ∈ R d x, z \in \mathbb{R}^d x,zRd,我们有:
f i ( x ref ) − f i ( x ) ≤ ⟨ ∇ f i ( x ref ) , x ref − z ⟩ + ⟨ ∇ f i ( x ) , z − x ⟩ + L max 2 ∥ z − x ∥ 2 . f_i(x_{\text{ref}}) - f_i(x) \leq \langle \nabla f_i(x_{\text{ref}}), x_{\text{ref}} - z \rangle + \langle \nabla f_i(x), z - x \rangle + \frac{L_{\text{max}}}{2}\|z - x\|^2. fi(xref)fi(x)fi(xref),xrefz+fi(x),zx+2Lmaxzx2.
为了获得右侧的最紧上界,我们可以在 z z z 中最小化右侧,得到:
z = x − 1 L max ( ∇ f i ( x ) − ∇ f i ( x ref ) ) . z = x - \frac{1}{L_{\text{max}}}(\nabla f_i(x) - \nabla f_i(x_{\text{ref}})). z=xLmax1(fi(x)fi(xref)).
代入上述方程,我们得到:
f i ( x ref ) − f i ( x ) = ⟨ ∇ f i ( x ref ) , x ref − x ⟩ − 1 2 L max ∥ ∇ f i ( x ) − ∇ f i ( x ref ) ∥ 2 . f_i(x_{\text{ref}}) - f_i(x) = \langle \nabla f_i(x_{\text{ref}}), x_{\text{ref}} - x \rangle - \frac{1}{2L_{\text{max}}}\|\nabla f_i(x) - \nabla f_i(x_{\text{ref}})\|^2. fi(xref)fi(x)=fi(xref),xrefx2Lmax1∥∇fi(x)fi(xref)2.
对上述表达式取期望,并使用 E i [ f i ( x ) ] = f ( x ) \mathbb{E}_i[f_i(x)] = f(x) Ei[fi(x)]=f(x) E [ ∇ f i ( x ref ) ] = 0 \mathbb{E}[\nabla f_i(x_{\text{ref}})] = 0 E[fi(xref)]=0,即可得到结果。

引理 2:
假设 X ∈ R d X \in \mathbb{R}^d XRd 是一个随机向量,具有有限的方差。那么:
E [ ∥ X − E [ X ] ∥ 2 ] ≤ E [ ∥ X ∥ 2 ] . \mathbb{E}[\|X - \mathbb{E}[X]\|^2] \leq \mathbb{E}[\|X\|^2]. E[XE[X]2]E[X2].

证明:
E [ ∥ X − E [ X ] ∥ 2 ] = E [ ∥ X ∥ 2 ] − 2 E [ ∥ X ∥ ] 2 + E [ ∥ X ∥ ] 2 = E [ ∥ X ∥ 2 ] − E [ ∥ X ∥ ] 2 ≤ E [ ∥ X ∥ 2 ] . \mathbb{E}[\|X - \mathbb{E}[X]\|^2] = \mathbb{E}[\|X\|^2] - 2\mathbb{E}[\|X\|]^2 + \mathbb{E}[\|X\|]^2 = \mathbb{E}[\|X\|^2] - \mathbb{E}[\|X\|]^2 \leq \mathbb{E}[\|X\|^2]. E[XE[X]2]=E[X2]2E[X]2+E[X]2=E[X2]E[X]2E[X2].