I am using keras (ver. 2.0.6 with TensorFlow backend) for a simple neural network:


model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(100, 5)))
model.add(LSTM(32, return_sequences=True)) 

It is only a test for me, I am "training" the model with the following dummy data.


x_train = np.array([
    [[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
    [[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
    [[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
    [[0,0,1,0,0], [1,0,0,0,0], [1,0,0,0,0]],
    [[0,0,0,1,0], [0,0,0,0,1], [0,1,0,0,0]],
    [[0,0,0,0,1], [0,0,0,0,1], [0,0,0,0,1]]

y_train = np.array([
    [[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
    [[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
    [[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
    [[1,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0]],
    [[1,0,0,0,0], [0,0,0,0,1], [0,1,0,0,0]],
    [[1,0,0,0,0], [0,0,0,0,1], [0,0,0,0,1]]

then i do:


model.fit(x_train, y_train, batch_size=2, epochs=50, shuffle=False)


The result is:


[[[ 0.11855114  0.13603994  0.21069065  0.28492314  0.24979511]
  [ 0.03013871  0.04114409  0.16499813  0.41659597  0.34712321]
  [ 0.00194826  0.00351031  0.06993906  0.52274817  0.40185428]]

 [[ 0.17915446  0.19629011  0.21316603  0.22450975  0.18687972]
  [ 0.17935558  0.1994358   0.22070852  0.2309722   0.16952793]
  [ 0.18571526  0.20774922  0.22724937  0.23079531  0.14849086]]

 [[ 0.11163659  0.13263632  0.20109797  0.28029731  0.27433187]
  [ 0.02216373  0.03424517  0.13683401  0.38068131  0.42607573]
  [ 0.00105937  0.0023865   0.0521594   0.43946937  0.50492537]]

 [[ 0.13276921  0.15531689  0.21852671  0.25823513  0.23515201]
  [ 0.05750636  0.08210614  0.22636817  0.3303588   0.30366054]
  [ 0.01128351  0.02332032  0.210263    0.3951444   0.35998878]]

 [[ 0.15303896  0.18197381  0.21823004  0.23647803  0.21027911]
  [ 0.10842207  0.15755147  0.23791778  0.26479205  0.23131666]
  [ 0.06472684  0.12843341  0.26680911  0.28923658  0.25079405]]

 [[ 0.19560908  0.20663913  0.21954383  0.21920268  0.15900527]
  [ 0.22829761  0.22907974  0.22933882  0.20822221  0.10506159]
  [ 0.27179539  0.25587022  0.22594844  0.18308094  0.063305  ]]]

Ok, It works, but it is just a test, i really do not care about accuracy etc. I would like to understand how i can work with output of different size.


For example: passing a sequence (numpy.array) like:


[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]]

I would like to get 4 dimensions output as prediction:


[[..first..], [..second..], [..third..], [..four..]]

Is that possibile somehow? The size could vary I would train the model with different labels that can have different N-dimensions.




This answer is for non varying dimensions, but for varying dimensions, the padding idea in Giuseppe's answer seems the way to go, maybe with help of the "Masking" proposed in Keras documentation.


The output shape in Keras is totally dependent on the number of "units/neurons/cells" you put in the last layer, and of course, on the type of layer.


I can see that your data does not match your code in your question, it's impossible, but, suppose your code is right and forget the data for a while.


An input shape of (100,5) in an LSTM layer means a tensor of shape (None, 100, 5), which is


  • None is the batch size. The first dimension of your data is reserved to the number of examples you have. (X and Y must have the same number of examples).
  • 没有一个是批处理大小。您的数据的第一个维度是保留到您拥有的示例的数量。(X和Y必须有相同数量的例子)。
  • Each example is a sequence with 100 time steps
  • 每个示例都是一个包含100个时间步骤的序列。
  • each time step is a 5-dimension vector.
  • 每一步都是一个5维向量。

And the 32 cells in this same LSTM layer means that the resulting vectors will change from 5 to 32-dimension vectors. With return_sequences=True, all the 100 timesteps will appear in the result. So the result shape of the first layer is (None, 100, 32):

同样的LSTM层中的32个单元格意味着所得到的向量将由5变到32维向量。使用return_sequence =True,结果将显示所有100个时间步骤。第一层的结果形状是(None, 100,32):

  • Same number of examples (this will never change along the model)
  • 相同数量的示例(这在模型中永远不会改变)
  • Still 100 timesteps per example (because return_sequences=True)
  • 每个示例仍然有100个时间步(因为return_sequence =True)
  • each time step is a 32-dimension vector (because of 32 cells)
  • 每一步都是一个32维的矢量(因为32个单元格)

Now the second LSTM layer does exactly the same thing. Keeps the 100 timesteps, and since it has also 32 cells, keeps the 32-dimension vectors, so the output is also (None, 100, 32)

现在第二个LSTM层也做了同样的事情。保持100个时间步,因为它也有32个单元格,所以保持32维向量,所以输出也是(None, 100,32)

Finally, the time distributed Dense layer will also keep the 100 timesteps (because of TimeDistributed), and change your vectors to 5-dimensoin vectors again (because of 5 units), resulting in (None, 100, 5).


As you can see, you cannot change the number of timesteps directly with recurrent layers, you need to use other layers to change these dimensions. And the way to do this is completely up to you, there are infinite ways of doing this.


But in all of them, you need to get free of the timesteps and rebuild the data with another shape.



A suggestion from me (which is just one possibility) is to reshape your result, and apply another dense layer just to achieve the final shape expeted.


Suppose you want a result like (None, 4, 5) (never forget, the first dimension of your data is the number of examples, it can be any number, but you must take it into account when you organize your data). We can achieve this by reshaping the data to a shape containing 4 in the second dimension:


#after the Dense layer:

model.add(Reshape((4,125)) #the batch size doesn't appear here, 
   #just make sure you have 500 elements, which is 100*5 = 4*125

#this layer could also be model.add(LSTM(5,return_sequences=True)), for instance

#continue to the "Activation" layer

This will give you 4 timesteps (because the dimension after Reshape was: (None, 4, 125), each step being a 5-dimension vector (because of Dense(5)).

这将给你4个时间步(因为重构后的维度是:(None, 4,125),每一个步骤都是一个5维向量(因为稠密(5))。

Use the model.summary() command to see the shapes outputted by each layer.




I don't know Keras but from a practical and theoretical point of view this is absolutely possible.


The idea is that you have an input sequence and an output sequence. Commonly, the beginning and the end of each sequence are delimited by some special symbol (e.g. the character sequence "cat" is translated into "^cat#" with an start symbol "^" and an end symbol "#"). Then the sequence is padded with another special symbol, up to a maximum sequence length (e.g. "^cat#$$$$$$" with a padding symbol "$").

其思想是,您有一个输入序列和一个输出序列。通常,每个序列的开头和结尾是由一些特殊的符号分隔(例如字符序列的“猫”翻译成“猫^ #”开始“^”象征和结束符号“#”)。然后将序列添加到另一个特殊符号中,最多达到一个最大序列长度。”猫^ # $ $ $ $ $ $”填充符号“$”)。

If the padding symbol correspond to a zero-vector, it will have no impact on your training.


Your output sequence could now assume any length up to the maximum one, because the real length is the one from the start to the end symbol positions.


In other words, you will have always the same input and output sequence length (i.e. the maximum one), but the real length is that between the start and the end symbols.


(Obviously, in the output sequence, anything after the end symbol should not be considered in the loss function)




There seems to be two methods to do a sequence to sequence method, you're describing. The first directly using keras using this example (code below)


from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model

inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)

sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

Where the repeat vector repeats the initial time series n times to match the output vectors number of timestamps. This will still mean you need a fixed number of time steps in you output vector, however, there may be a method to padding vectors that have less timestamps than you max amount of timesteps.


Or you can you the seq2seq module, which is built ontop of keras.




