引理
这里,我们提供并证明一些辅助性引理。
引理 1:
假设对于
i
=
1
,
…
,
n
i = 1, \ldots, n
i=1,…,n,
f
i
(
x
)
f_i(x)
fi(x) 是凸函数且具有
L
max
L_{\text{max}}
Lmax-光滑性。假设
i
i
i 在集合
{
1
,
…
,
n
}
\{1, \ldots, n\}
{1,…,n} 上均匀采样。那么对于任意
x
∈
R
d
x \in \mathbb{R}^d
x∈Rd,我们有:
E
k
[
∥
∇
f
i
(
x
)
−
∇
f
i
(
x
ref
)
∥
2
]
≤
2
L
max
(
f
(
x
)
−
f
(
x
ref
)
)
.
\mathbb{E}_k[\|\nabla f_i(x) - \nabla f_i(x_{\text{ref}})\|^2] \leq 2L_{\text{max}}(f(x) - f(x_{\text{ref}})).
Ek[∥∇fi(x)−∇fi(xref)∥2]≤2Lmax(f(x)−f(xref)).
证明:
由于
f
i
f_i
fi 是凸函数,我们有:
f
i
(
z
)
≥
f
i
(
x
ref
)
+
⟨
∇
f
i
(
x
ref
)
,
z
−
x
ref
⟩
,
∀
z
∈
R
d
.
f_i(z) \geq f_i(x_{\text{ref}}) + \langle \nabla f_i(x_{\text{ref}}), z - x_{\text{ref}} \rangle, \quad \forall z \in \mathbb{R}^d.
fi(z)≥fi(xref)+⟨∇fi(xref),z−xref⟩,∀z∈Rd.
由于
f
i
f_i
fi 是
L
max
L_{\text{max}}
Lmax-光滑的,我们有:
f
i
(
z
)
≤
f
i
(
x
)
+
⟨
∇
f
i
(
x
)
,
z
−
x
⟩
+
L
max
2
∥
z
−
x
∥
2
,
∀
z
,
x
∈
R
d
.
f_i(z) \leq f_i(x) + \langle \nabla f_i(x), z - x \rangle + \frac{L_{\text{max}}}{2}\|z - x\|^2, \quad \forall z, x \in \mathbb{R}^d.
fi(z)≤fi(x)+⟨∇fi(x),z−x⟩+2Lmax∥z−x∥2,∀z,x∈Rd.
因此,对于任意
x
,
z
∈
R
d
x, z \in \mathbb{R}^d
x,z∈Rd,我们有:
f
i
(
x
ref
)
−
f
i
(
x
)
≤
⟨
∇
f
i
(
x
ref
)
,
x
ref
−
z
⟩
+
⟨
∇
f
i
(
x
)
,
z
−
x
⟩
+
L
max
2
∥
z
−
x
∥
2
.
f_i(x_{\text{ref}}) - f_i(x) \leq \langle \nabla f_i(x_{\text{ref}}), x_{\text{ref}} - z \rangle + \langle \nabla f_i(x), z - x \rangle + \frac{L_{\text{max}}}{2}\|z - x\|^2.
fi(xref)−fi(x)≤⟨∇fi(xref),xref−z⟩+⟨∇fi(x),z−x⟩+2Lmax∥z−x∥2.
为了获得右侧的最紧上界,我们可以在
z
z
z 中最小化右侧,得到:
z
=
x
−
1
L
max
(
∇
f
i
(
x
)
−
∇
f
i
(
x
ref
)
)
.
z = x - \frac{1}{L_{\text{max}}}(\nabla f_i(x) - \nabla f_i(x_{\text{ref}})).
z=x−Lmax1(∇fi(x)−∇fi(xref)).
代入上述方程,我们得到:
f
i
(
x
ref
)
−
f
i
(
x
)
=
⟨
∇
f
i
(
x
ref
)
,
x
ref
−
x
⟩
−
1
2
L
max
∥
∇
f
i
(
x
)
−
∇
f
i
(
x
ref
)
∥
2
.
f_i(x_{\text{ref}}) - f_i(x) = \langle \nabla f_i(x_{\text{ref}}), x_{\text{ref}} - x \rangle - \frac{1}{2L_{\text{max}}}\|\nabla f_i(x) - \nabla f_i(x_{\text{ref}})\|^2.
fi(xref)−fi(x)=⟨∇fi(xref),xref−x⟩−2Lmax1∥∇fi(x)−∇fi(xref)∥2.
对上述表达式取期望,并使用
E
i
[
f
i
(
x
)
]
=
f
(
x
)
\mathbb{E}_i[f_i(x)] = f(x)
Ei[fi(x)]=f(x) 和
E
[
∇
f
i
(
x
ref
)
]
=
0
\mathbb{E}[\nabla f_i(x_{\text{ref}})] = 0
E[∇fi(xref)]=0,即可得到结果。
引理 2:
假设
X
∈
R
d
X \in \mathbb{R}^d
X∈Rd 是一个随机向量,具有有限的方差。那么:
E
[
∥
X
−
E
[
X
]
∥
2
]
≤
E
[
∥
X
∥
2
]
.
\mathbb{E}[\|X - \mathbb{E}[X]\|^2] \leq \mathbb{E}[\|X\|^2].
E[∥X−E[X]∥2]≤E[∥X∥2].
证明:
E
[
∥
X
−
E
[
X
]
∥
2
]
=
E
[
∥
X
∥
2
]
−
2
E
[
∥
X
∥
]
2
+
E
[
∥
X
∥
]
2
=
E
[
∥
X
∥
2
]
−
E
[
∥
X
∥
]
2
≤
E
[
∥
X
∥
2
]
.
\mathbb{E}[\|X - \mathbb{E}[X]\|^2] = \mathbb{E}[\|X\|^2] - 2\mathbb{E}[\|X\|]^2 + \mathbb{E}[\|X\|]^2 = \mathbb{E}[\|X\|^2] - \mathbb{E}[\|X\|]^2 \leq \mathbb{E}[\|X\|^2].
E[∥X−E[X]∥2]=E[∥X∥2]−2E[∥X∥]2+E[∥X∥]2=E[∥X∥2]−E[∥X∥]2≤E[∥X∥2].