本文来自CSDN博客：http://blog.csdn.net/niuwei22007/article/details/49370063

上一篇介绍了卷基层，可以用来构建很常见的卷积神经网络等模型。那么今天将要介绍的是递归层，是一个可以用来构建递归网络(RNN)的基础部件。具体的RNN知识，可以参考文章：《深入探究递归神经网络》。如果感觉上面这篇文章比较抽象，那么强烈建议读者阅读一下《递归神经网络不可思议的有效性》，因为它结合实际讲述了RNN的强大。下面来看下递归层都有哪些结构。

一、SimpleRNN

keras.layers.recurrent.SimpleRNN(output_dim,
        init='glorot_uniform', inner_init='orthogonal', activation='sigmoid', weights=None,
        truncate_gradient=-1, return_sequences=False, input_dim=None, input_length=None)

一种输出反馈给输入的全连接RNN。

inputshape: 3维 tensor(nb_samples, timesteps,input_dim)

outputshape: 如果return_sequences=True，那么输出3维 tensor(nb_samples, timesteps, output_dim) .否则输出2维tensor(nb_samples,output_dim)。

Masking：This layer supports masking forinput data with a variable number of timesteps To introduce masks to your data,use an Embedding layer with themask_zero parameter set toTrue.

参数：

output_dim : 内部计算和最终输出的维度。
init : 初始化权值的函数名称或Theano function。可以使用Keras内置的（内置初始化权值函数见这里），也可以传递自己编写的Theano function。如果不给weights传递参数时，则该参数必须指明。
activation : 激活函数名称或者Theano function。可以使用Keras内置的（内置激活函数见这里），也可以是传递自己编写的Theano function。如果不明确指定，那么将没有激活函数会被应用。
weights :用于初始化权值的numpy arrays组成的list。这个List应该有3个元素，它们的shape是[(input_dim, output_dim), (output_dim, output_dim),(output_dim,)]
truncate_gradient: 在BPTT(back propgation throughtime, BP算法加入了时间维度)算法中的truncate步数。
return_sequence: Boolean.False返回在输出序列中的最后一个输出；True返回整个序列。
input_dim:输入数据的维度。当把该层作为模型的第一层时，这个参数和input_shape至少要提供一个传值。
input_length:输入序列的长度。This argument is required ifyou are going to connectFlatten thenDense layers upstream (without it,the shape of the dense outputs cannot be computed)

二、SimpleDeepRNN

keras.layers.recurrent.SimpleDeepRNN(output_dim,depth=3,
        init='glorot_uniform', inner_init='orthogonal',
        activation='sigmoid', inner_activation='hard_sigmoid',
        weights=None, truncate_gradient=-1, return_sequences=False,
        input_dim=None, input_length=None)

一种经过多步（由参数depth决定）计算输出反馈给输入的全连接RNN。示例代码如下：

output= activation(W.x_t + b +inner_activation(U_1.h_tm1) +inner_activation(U_2.h_tm2) + ...)

inputshape: 3维 tensor(nb_samples, timesteps,input_dim)

outputshape: 如果return_sequences=True，那么输出3维 tensor(nb_samples, timesteps, output_dim) .否则输出2维tensor(nb_samples,output_dim)。

Masking：This layer supports masking forinput data with a variable number of timesteps To introduce masks to your data,use an Embedding layer with themask_zero parameter set toTrue.

参数：

output_dim : 内部计算和最终输出的维度。
depth : int>=1.循环迭代的次数。如果depth=1，那么就等价于SimpleRNN。
init : 初始化权值的函数名称或Theano function。可以使用Keras内置的（内置初始化权值函数见这里），也可以传递自己编写的Theano function。如果不给weights传递参数时，则该参数必须指明。
inner_init : 内部神经元的权值初始化。
activation : 激活函数名称或者Theano function。可以使用Keras内置的（内置激活函数见这里），也可以是传递自己编写的Theano function。如果不明确指定，那么将没有激活函数会被应用。
inner_activation: 内部隐层的激活函数。
weights :用于初始化权值的numpy arrays组成的list。这个List应该有depth+2个元素。
truncate_gradient: 在BPTT(back propgation throughtime, BP算法加入了时间维度)算法中的truncate步数。
return_sequence: Boolean.False返回在输出序列中的最后一个输出；True返回整个序列。
input_dim:输入数据的维度。当把该层作为模型的第一层时，这个参数和input_shape至少要提供一个传值。
input_length:输入序列的长度。This argument is required ifyou are going to connectFlatten thenDense layers upstream (without it,the shape of the dense outputs cannot be computed)

三、GRU

keras.layers.recurrent.GRU(output_dim,
        init='glorot_uniform', inner_init='orthogonal',
        activation='sigmoid', inner_activation='hard_sigmoid',
        weights=None, truncate_gradient=-1, return_sequences=False,
        input_dim=None, input_length=None)

GRU(Gated Recurrent Unit)单元(2014年提出)。是实现RNN模型的主要单元之一。

inputshape: 3维 tensor(nb_samples, timesteps,input_dim)

outputshape: 如果return_sequences=True，那么输出3维 tensor(nb_samples, timesteps, output_dim) .否则输出2维tensor(nb_samples,output_dim)。

Masking：This layer supports masking forinput data with a variable number of timesteps To introduce masks to your data,use an Embedding layer with themask_zero parameter set toTrue.

参数：

output_dim : 内部计算和最终输出的维度。
init : 初始化权值的函数名称或Theano function。可以使用Keras内置的（内置初始化权值函数见这里），也可以传递自己编写的Theano function。如果不给weights传递参数时，则该参数必须指明。
inner_init : 内部神经元的权值初始化。
activation : 激活函数名称或者Theano function。可以使用Keras内置的（内置激活函数见这里），也可以是传递自己编写的Theano function。如果不明确指定，那么将没有激活函数会被应用。
inner_activation: 内部隐层的激活函数。
weights :用于初始化权值的numpy arrays组成的list。这个List应该有9个元素。
truncate_gradient: 在BPTT(back propgation throughtime, BP算法加入了时间维度)算法中的truncate步数。
return_sequence: Boolean.False返回在输出序列中的最后一个输出；True返回整个序列。
input_dim:输入数据的维度。当把该层作为模型的第一层时，这个参数和input_shape至少要提供一个传值。
input_length:输入序列的长度。This argument is required ifyou are going to connectFlatten thenDense layers upstream (without it,the shape of the dense outputs cannot be computed)

本小节参考文献：

四、LSTM

keras.layers.recurrent.LSTM(output_dim,
        init='glorot_uniform', inner_init='orthogonal', forget_bias_init='one',
        activation='tanh', inner_activation='hard_sigmoid',
        weights=None, truncate_gradient=-1, return_sequences=False,
        input_dim=None, input_length=None)

LSTM(Long-Short Term Memoryunit)单元(1997年Hochreiter提出)。是用来构建RNN网络的主要单元之一。

inputshape: 3维 tensor(nb_samples, timesteps,input_dim)

outputshape: 如果return_sequences=True，那么输出3维 tensor(nb_samples, timesteps, output_dim) .否则输出2维tensor(nb_samples,output_dim)。

Masking：This layer supports masking forinput data with a variable number of timesteps To introduce masks to your data,use an Embedding layer with themask_zero parameter set toTrue.

参数：

output_dim : 内部计算和最终输出的维度。
init : 初始化权值的函数名称或Theano function。可以使用Keras内置的（内置初始化权值函数见这里），也可以传递自己编写的Theano function。如果不给weights传递参数时，则该参数必须指明。
inner_init : 内部神经元的权值初始化。
forget_bias_init: 初始化forget gate的偏置函数。Jozefowiczet al.推荐初始化为1。
activation : 激活函数名称或者Theano function。可以使用Keras内置的（内置激活函数见这里），也可以是传递自己编写的Theano function。如果不明确指定，那么将没有激活函数会被应用。
inner_activation: 内部隐层的激活函数。
weights :用于初始化权值的numpy arrays组成的list。这个List应该有12个元素。
truncate_gradient: 在BPTT(back propgation throughtime, BP算法加入了时间维度)算法中的truncate步数。
return_sequence: Boolean.False返回在输出序列中的最后一个输出；True返回整个序列。
input_dim:输入数据的维度。当把该层作为模型的第一层时，这个参数和input_shape至少要提供一个传值。
input_length:输入序列的长度。This argument is required ifyou are going to connectFlatten thenDense layers upstream (without it,the shape of the dense outputs cannot be computed)

本小节参考文献：

五、JZS1, JZS2, JZS3

keras.layers.recurrent.JZS1(output_dim,
        init='glorot_uniform', inner_init='orthogonal',
        activation='tanh', inner_activation='sigmoid',
        weights=None, truncate_gradient=-1, return_sequences=False,
        input_dim=None, input_length=None)

是在近千种模型评估中进化而来的Top 3的RNN结构单元。它的作用与GRU和LSTM是一样的。其对应的MUT1, MUT2, 和MUT3 结构是在《An Empirical Exploration of Recurrent NetworkArchitectures, Jozefowicz et al. 2015》中的提出来的。

inputshape: 3维 tensor(nb_samples, timesteps,input_dim)

outputshape: 如果return_sequences=True，那么输出3维 tensor(nb_samples, timesteps, output_dim) .否则输出2维tensor(nb_samples,output_dim)。

Masking：This layer supports masking forinput data with a variable number of timesteps To introduce masks to your data,use an Embedding layer with themask_zero parameter set toTrue.

参数：

output_dim : 内部计算和最终输出的维度。
init : 初始化权值的函数名称或Theano function。可以使用Keras内置的（内置初始化权值函数见这里），也可以传递自己编写的Theano function。如果不给weights传递参数时，则该参数必须指明。
inner_init : 内部神经元的权值初始化。
forget_bias_init: 初始化forget gate的偏置函数。Jozefowiczet al.推荐初始化为1。
activation : 激活函数名称或者Theano function。可以使用Keras内置的（内置激活函数见这里），也可以是传递自己编写的Theano function。如果不明确指定，那么将没有激活函数会被应用。
inner_activation: 内部隐层的激活函数。
weights :用于初始化权值的numpy arrays组成的list。这个List应该有9个元素。
truncate_gradient: 在BPTT(back propgation throughtime, BP算法加入了时间维度)算法中的truncate步数。
return_sequence: Boolean.False返回在输出序列中的最后一个输出；True返回整个序列。
input_dim:输入数据的维度。当把该层作为模型的第一层时，这个参数和input_shape至少要提供一个传值。
input_length:输入序列的长度。This argument is required ifyou are going to connectFlatten thenDense layers upstream (without it,the shape of the dense outputs cannot be computed)

本小节参考文献：

An EmpiricalExploration of Recurrent Network Architectures

参考资料：

官方教程

秒客网

基于Theano的深度学习(Deep Learning)框架Keras学习随笔-14-递归层

一、SimpleRNN

二、SimpleDeepRNN

三、GRU

四、LSTM

五、JZS1, JZS2, JZS3

参考资料：

相关文章