深度学习基础（三）—— 权值矩阵的初始化

权值矩阵初始化的方式与激励函数的类型有关：

对于隐层 i ：

tanh 型激励函数

对称区间上的均匀分布： [−6fanin+fanout−−−−−−−−−√,6fanin+fanout−−−−−−−−−√]
sigmoid 型激励函数

对称区间上的均匀分布： [−46fanin+fanout−−−−−−−−−√,46fanin+fanout−−−−−−−−−√]

其中 fanin 为第 i−1 层的神经元的数目，则 fanout 为第 i 层的神经元的数目。

这种权值初始化机制确保你了，在模型训练的初期，信息很容易前向传播（传播的是激励activation）和后向传播（传播的是梯度gradients）。

1. 普通全连接层

class HiddenLayer(object):
def __init__(self, rng, inpt, 
 n_in, n_out, W=None, b=None, activation=T.tanh):
         self.inpt = inpt
if not W:
             W_values = np.asarray(
                 rng.uniform(
                     low=-np.sqrt(6/(n_in + n_out)),
                     high=np.sqrt(6/(n_in + n_out)),
                     size=(n_in, n_out)
                 ),
                 dtype=theano.config.floatX
             )
if activation == T.nnet.sigmoid:
                 W_values *= 4
             W = theano.shared(value=W_values, name='W', borrow=True)
if not b:
            b_values = np.asarray(np.zeros(n_out,), dtype=theano.config.floatX)
            b = theano.shared(value=b, name='b', borrow=True)
        self.W = W
        self.b = b

2. 卷积层

Convolutional Neural Networks (LeNet)

filter_shape，是长度为 4 的 tuple：

number of filters,
num input feature maps,
filter height,
filter width

fan_in = np.prod(filter_shape[1:])
fan_out = filter_shape[0]*np.prod(filter_shape[2:])
# 把池化层也包括在内；
# fan_out = flter_shape[0]*np.prod(filter_shape[2:])//poolsize

References

Understanding the difficulty of training deep feedforward neuralnetworks

Going from logistic regression to MLP

秒客网

深度学习基础（三）—— 权值矩阵的初始化

1. 普通全连接层

2. 卷积层

References

相关文章