目标:使
1
m
∑
i
=
1
m
(
y
(
i
)
−
y
^
(
i
)
)
2
\frac{1}{m}\sum_{i = 1}^{m}(y^{(i)} - \hat{y}^{(i)})^2
m1∑i=1m(y(i)−y^(i))2尽可能小。这个公式其实就是MSE均方误差:
J
(
θ
)
=
M
S
E
(
y
,
y
^
)
J(\theta) = MSE(y,\hat{y})
J(θ)=MSE(y,y^)
有时取:
J
(
θ
)
=
1
2
m
∑
i
=
1
m
(
y
(
i
)
−
y
^
(
i
)
)
2
J(\theta)=\frac{1}{2m}\sum_{i = 1}^{m}(y^{(i)} - \hat{y}^{(i)})^2
J(θ)=2m1∑i=1m(y(i)−y^(i))2
对
J
(
θ
)
J(\theta)
J(θ)求偏导数得到损失函数的梯度:
∇
J
(
θ
)
=
(
∂
J
/
∂
θ
0
∂
J
/
∂
θ
1
∂
J
/
∂
θ
2
⋯
∂
J
/
∂
θ
n
)
=
2
m
⋅
(
∑
i
=
1
m
(
X
b
(
i
)
θ
−
y
(
i
)
)
∑
i
=
1
m
(
X
b
(
i
)
θ
−
y
(
i
)
)
⋅
X
1
(
i
)
∑
i
=
1
m
(
X
b
(
i
)
θ
−
y
(
i
)
)
⋅
X
2
(
i
)
⋯
∑
i
=
1
m
(
X
b
(
i
)
θ
−
y
(
i
)
)
⋅
X
n
(
i
)
)
\nabla J(\theta) =\begin{pmatrix} \partial J/\partial\theta_{0} \\ \partial J/\partial\theta_{1} \\ \partial J/\partial\theta_{2} \\ \cdots \\ \partial J/\partial\theta_{n} \end{pmatrix} =\frac{2}{m} \cdot \begin{pmatrix} \sum_{i = 1}^{m}(X_{b}^{(i)}\theta - y^{(i)}) \\ \sum_{i = 1}^{m}(X_{b}^{(i)}\theta - y^{(i)}) \cdot X_{1}^{(i)}\\ \sum_{i = 1}^{m}(X_{b}^{(i)}\theta - y^{(i)}) \cdot X_{2}^{(i)}\\ \cdots \\ \sum_{i = 1}^{m}(X_{b}^{(i)}\theta - y^{(i)}) \cdot X_{n}^{(i)} \end{pmatrix}
∇J(θ)=
∂J/∂θ0∂J/∂θ1∂J/∂θ2⋯∂J/∂θn
=m2⋅
∑i=1m(Xb(i)θ−y(i))∑i=1m(Xb(i)θ−y(i))⋅X1(i)∑i=1m(Xb(i)θ−y(i))⋅X2(i)⋯∑i=1m(Xb(i)θ−y(i))⋅Xn(i)