权值矩阵初始化的方式与激励函数的类型有关:
对于 隐层
-
tanh 型激励函数对称区间上的均匀分布:
[−6fanin+fanout−−−−−−−−−√,6fanin+fanout−−−−−−−−−√] -
sigmoid 型激励函数对称区间上的均匀分布:
[−46fanin+fanout−−−−−−−−−√,46fanin+fanout−−−−−−−−−√]
其中
这种权值初始化机制确保你了,在模型训练的初期,信息很容易前向传播(传播的是激励activation)和后向传播(传播的是梯度gradients)。
1. 普通全连接层
class HiddenLayer(object):
def __init__(self, rng, inpt,
n_in, n_out, W=None, b=None, activation=T.tanh):
self.inpt = inpt
if not W:
W_values = np.asarray(
rng.uniform(
low=-np.sqrt(6/(n_in + n_out)),
high=np.sqrt(6/(n_in + n_out)),
size=(n_in, n_out)
),
dtype=theano.config.floatX
)
if activation == T.nnet.sigmoid:
W_values *= 4
W = theano.shared(value=W_values, name='W', borrow=True)
if not b:
b_values = np.asarray(np.zeros(n_out,), dtype=theano.config.floatX)
b = theano.shared(value=b, name='b', borrow=True)
self.W = W
self.b = b
2. 卷积层
Convolutional Neural Networks (LeNet)
filter_shape,是长度为 4 的 tuple:
- number of filters,
- num input feature maps,
- filter height,
- filter width
fan_in = np.prod(filter_shape[1:])
fan_out = filter_shape[0]*np.prod(filter_shape[2:])
# 把池化层也包括在内;
# fan_out = flter_shape[0]*np.prod(filter_shape[2:])//poolsize
References
Understanding the difficulty of training deep feedforward neuralnetworks