神经网络反向传播算法公式推导

时间:2024-11-24 14:47:29

要推导反向传播算法,并了解每一层的参数梯度如何计算,以及每一层的梯度受到哪些值的影响,我们使用一个简单的神经网络结构:

  • 输入层有2个节点
  • 一个有2个节点的隐藏层,激活函数是ReLU
  • 一个输出节点,激活函数是线性激活(即没有激活函数)

假设权重矩阵和偏置如下:

  • 输入层到隐藏层的权重矩阵 W 1 W_1 W1 2 × 2 2 \times 2 2×2
  • 隐藏层的偏置向量 b 1 b_1 b1 2 × 1 2 \times 1 2×1
  • 隐藏层到输出层的权重矩阵 W 2 W_2 W2 2 × 1 2 \times 1 2×1
  • 输出层的偏置向量 b 2 b_2 b2是一个标量

输入为 x = [ x 1 , x 2 ] x = [x_1, x_2] x=[x1,x2],期望输出为 y y y,损失函数为均方误差(MSE)。

前向传播:

  1. 计算隐藏层的输入:
    z 1 = W 1 ⋅ x + b 1 z_1 = W_1 \cdot x + b_1 z1=W1x+b1
  2. 计算隐藏层的激活:
    a 1 = ReLU ( z 1 ) a_1 = \text{ReLU}(z_1) a1=ReLU(z1)
  3. 计算输出层的输入:
    z 2 = W 2 T ⋅ a 1 + b 2 z_2 = W_2^T \cdot a_1 + b_2 z2=W2Ta1+b2
  4. 输出值:
    y ^ = z 2 \hat{y} = z_2 y^=z2
  5. 计算损失:
    L = 1 2 ( y ^ − y ) 2 L = \frac{1}{2} (\hat{y} - y)^2 L=21(y^y)2

反向传播:

  1. 计算输出层的梯度:

    • 损失函数对输出层输入的梯度:
      ∂ L ∂ z 2 = y ^ − y \frac{\partial L}{\partial z_2} = \hat{y} - y z2L=y^y
  2. 计算从输出层到隐藏层的梯度:

    • 隐藏层激活对权重的梯度:
      ∂ L ∂ W 2 = ∂ L ∂ z 2 ⋅ a 1 \frac{\partial L}{\partial W_2} = \frac{\partial L}{\partial z_2} \cdot a_1 W2L=z2La1
    • 隐藏层激活对偏置的梯度:
      ∂ L ∂ b 2 = ∂ L ∂ z 2 \frac{\partial L}{\partial b_2} = \frac{\partial L}{\partial z_2} b2L=z2L
  3. 计算隐藏层的梯度:

    • 损失函数对隐藏层激活的梯度:
      ∂ L ∂ a 1 = W 2 ⋅ ∂ L ∂ z 2 \frac{\partial L}{\partial a_1} = W_2 \cdot \frac{\partial L}{\partial z_2} a1L=W2z2L
    • 隐藏层对隐藏层输入的梯度(ReLU的梯度):
      ∂ L ∂ z 1 = ∂ L ∂ a 1 ⋅ ReLU ′ ( z 1 ) \frac{\partial L}{\partial z_1} = \frac{\partial L}{\partial a_1} \cdot \text{ReLU}'(z_1) z1L=a1LReLU(z1)
      • ReLU梯度 ReLU ′ ( z 1 ) \text{ReLU}'(z_1) ReLU(z1) z 1 > 0 z_1 > 0 z1>0时为1,否则为0
  4. 计算从输入层到隐藏层的梯度:

    • 输入对权重的梯度:
      ∂ L ∂ W 1 = ∂ L ∂ z 1 ⋅ x T \frac{\partial L}{\partial W_1} = \frac{\partial L}{\partial z_1} \cdot x^T W1L=z1LxT
    • 输入对偏置的梯度:
      ∂ L ∂ b 1 = ∂ L ∂ z 1 \frac{\partial L}{\partial b_1} = \frac{\partial L}{\partial z_1} b1L=z1L