【机器学习】应用梯度下降法训练线性回归算法模型

时间:2025-03-05 13:54:58
目标:使 1 m ∑ i = 1 m ( y ( i ) − y ^ ( i ) ) 2 \frac{1}{m}\sum_{i = 1}^{m}(y^{(i)} - \hat{y}^{(i)})^2 m1i=1m(y(i)y^(i))2尽可能小。这个公式其实就是MSE均方误差:
J ( θ ) = M S E ( y , y ^ ) J(\theta) = MSE(y,\hat{y}) J(θ)=MSE(y,y^)
有时取: J ( θ ) = 1 2 m ∑ i = 1 m ( y ( i ) − y ^ ( i ) ) 2 J(\theta)=\frac{1}{2m}\sum_{i = 1}^{m}(y^{(i)} - \hat{y}^{(i)})^2 J(θ)=2m1i=1m(y(i)y^(i))2
J ( θ ) J(\theta) J(θ)求偏导数得到损失函数的梯度
∇ J ( θ ) = ( ∂ J / ∂ θ 0 ∂ J / ∂ θ 1 ∂ J / ∂ θ 2 ⋯ ∂ J / ∂ θ n ) = 2 m ⋅ ( ∑ i = 1 m ( X b ( i ) θ − y ( i ) ) ∑ i = 1 m ( X b ( i ) θ − y ( i ) ) ⋅ X 1 ( i ) ∑ i = 1 m ( X b ( i ) θ − y ( i ) ) ⋅ X 2 ( i ) ⋯ ∑ i = 1 m ( X b ( i ) θ − y ( i ) ) ⋅ X n ( i ) ) \nabla J(\theta) =\begin{pmatrix} \partial J/\partial\theta_{0} \\ \partial J/\partial\theta_{1} \\ \partial J/\partial\theta_{2} \\ \cdots \\ \partial J/\partial\theta_{n} \end{pmatrix} =\frac{2}{m} \cdot \begin{pmatrix} \sum_{i = 1}^{m}(X_{b}^{(i)}\theta - y^{(i)}) \\ \sum_{i = 1}^{m}(X_{b}^{(i)}\theta - y^{(i)}) \cdot X_{1}^{(i)}\\ \sum_{i = 1}^{m}(X_{b}^{(i)}\theta - y^{(i)}) \cdot X_{2}^{(i)}\\ \cdots \\ \sum_{i = 1}^{m}(X_{b}^{(i)}\theta - y^{(i)}) \cdot X_{n}^{(i)} \end{pmatrix} J(θ)= J/θ0J/θ1J/θ2J/θn =m2 i=1m(Xb(i)θy(i))i=1m(Xb(i)θy(i))X1(i)i=1m(Xb(i)θy(i))X2(i)i=1m(Xb(i)θy(i))Xn(i)