I'm new to Keras, and I find it hard to understand the shape of input data of the LSTM layer.The Keras Documentation says that the input data should be 3D tensor with shape (nb_samples, timesteps, input_dim). I'm having trouble of understanding this format. Does the timesteps variable represent the number of timesteps the network remembers?
我是Keras的新手,我发现很难理解LSTM层的输入数据的形状.Keras文档说输入数据应该是具有形状的3D张量(nb_samples,timesteps,input_dim)。我无法理解这种格式。时间步长变量是否表示网络记住的时间步数?
In my data a few time steps affect the output of the network but I do not know how many in advance i.e. I can't say that the previous 10 samples affect the output. For example the input can be words that form sentences. There is an important correlation between the words in each sentence. I don't know the length of the sentence in advance, this length also vary from one sentence to another. I do know when the sentence ends (i.e. i have a period that indicates the ending). Two different sentences has no affect one on the other - there is no need to remember the previous sentence.
在我的数据中,一些时间步骤影响网络的输出,但我不知道提前多少,即我不能说前10个样本影响输出。例如,输入可以是形成句子的单词。每个句子中的单词之间存在重要的相关性。我事先不知道句子的长度,这个长度也因一句话而异。我确实知道句子何时结束(即我有一个表示结尾的句号)。两个不同的句子对另一个没有影响 - 没有必要记住前面的句子。
I'm using the LSTM network for learning a policy in reinforcement learning, so I don't have a fixed data set. The agent's policy will change the length of the sentence.
我正在使用LSTM网络来学习强化学习中的策略,所以我没有固定的数据集。代理人的政策将改变句子的长度。
How should I shape my data? How should it be fed into the Keras LSTM layer?
我应该如何塑造我的数据?如何将其输入Keras LSTM层?
1 个解决方案
#1
1
Time steps is the total length of your sequence.
时间步长是序列的总长度。
If you're working with words, it's the amount of words of each sentence.
If you're working with chars, it's the amount of chars of each sequence.
如果你正在处理单词,那就是每个句子的单词数量。如果你正在使用字符,那就是每个序列的字符数量。
In a variable sentence length case, you should set that dimension to None
:
在可变句长度的情况下,您应该将该维度设置为None:
#for functional API models:
inputTensor = Input((None,input_dim)) #the nb_samples doesn't participate in this definition
#for sequential models:
LSTM(units, input_shape=(None,input_dim)) #the nb_samples doesn't participate in this definition
There are two possible ways of working with variable lenghts in keras.
在keras中使用变量长度有两种可能的方法。
- Fixed length with padding
- 固定长度与填充
- Variable length separated in batches with same length
- 可变长度分批分批,长度相同
In the fixed length case, you create a dummy word/character that is meaningless, and fill your sentences to a maximum length, so all sentences have the same length. Then you add a Masking()
layer that will ignore that dummy word/char.
在固定长度的情况下,您创建一个无意义的虚拟单词/字符,并将句子填充到最大长度,因此所有句子都具有相同的长度。然后添加一个将忽略该虚拟词/ char的Masking()层。
The Embedding
layers already have a mask_zeros
parameter, then, if working with embeddings, you can make the id 0 be a dummy char/word.
嵌入层已经有一个mask_zeros参数,然后,如果使用嵌入,你可以使id为一个虚拟字符/单词。
In the variable length, you just separate your input data in smaller batches, like here: Keras misinterprets training data shape
在可变长度中,您只需将输入数据分成较小的批次,如下所示:Keras误解了训练数据的形状
#1
1
Time steps is the total length of your sequence.
时间步长是序列的总长度。
If you're working with words, it's the amount of words of each sentence.
If you're working with chars, it's the amount of chars of each sequence.
如果你正在处理单词,那就是每个句子的单词数量。如果你正在使用字符,那就是每个序列的字符数量。
In a variable sentence length case, you should set that dimension to None
:
在可变句长度的情况下,您应该将该维度设置为None:
#for functional API models:
inputTensor = Input((None,input_dim)) #the nb_samples doesn't participate in this definition
#for sequential models:
LSTM(units, input_shape=(None,input_dim)) #the nb_samples doesn't participate in this definition
There are two possible ways of working with variable lenghts in keras.
在keras中使用变量长度有两种可能的方法。
- Fixed length with padding
- 固定长度与填充
- Variable length separated in batches with same length
- 可变长度分批分批,长度相同
In the fixed length case, you create a dummy word/character that is meaningless, and fill your sentences to a maximum length, so all sentences have the same length. Then you add a Masking()
layer that will ignore that dummy word/char.
在固定长度的情况下,您创建一个无意义的虚拟单词/字符,并将句子填充到最大长度,因此所有句子都具有相同的长度。然后添加一个将忽略该虚拟词/ char的Masking()层。
The Embedding
layers already have a mask_zeros
parameter, then, if working with embeddings, you can make the id 0 be a dummy char/word.
嵌入层已经有一个mask_zeros参数,然后,如果使用嵌入,你可以使id为一个虚拟字符/单词。
In the variable length, you just separate your input data in smaller batches, like here: Keras misinterprets training data shape
在可变长度中,您只需将输入数据分成较小的批次,如下所示:Keras误解了训练数据的形状