如何用不同的n维标签训练LSTM模型?

时间:2022-07-09 21:41:28

I am using keras (ver. 2.0.6 with TensorFlow backend) for a simple neural network:

我用的是keras。2.0.6对于一个简单的神经网络:

model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(100, 5)))
model.add(LSTM(32, return_sequences=True)) 
model.add(TimeDistributed(Dense(5)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

It is only a test for me, I am "training" the model with the following dummy data.

这只是对我的测试,我用下面的虚拟数据“训练”模型。

x_train = np.array([
    [[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
    [[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
    [[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
    [[0,0,1,0,0], [1,0,0,0,0], [1,0,0,0,0]],
    [[0,0,0,1,0], [0,0,0,0,1], [0,1,0,0,0]],
    [[0,0,0,0,1], [0,0,0,0,1], [0,0,0,0,1]]
])

y_train = np.array([
    [[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
    [[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
    [[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
    [[1,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0]],
    [[1,0,0,0,0], [0,0,0,0,1], [0,1,0,0,0]],
    [[1,0,0,0,0], [0,0,0,0,1], [0,0,0,0,1]]
])

then i do:

然后我做:

model.fit(x_train, y_train, batch_size=2, epochs=50, shuffle=False)

print(model.predict(x_train))

The result is:

其结果是:

[[[ 0.11855114  0.13603994  0.21069065  0.28492314  0.24979511]
  [ 0.03013871  0.04114409  0.16499813  0.41659597  0.34712321]
  [ 0.00194826  0.00351031  0.06993906  0.52274817  0.40185428]]

 [[ 0.17915446  0.19629011  0.21316603  0.22450975  0.18687972]
  [ 0.17935558  0.1994358   0.22070852  0.2309722   0.16952793]
  [ 0.18571526  0.20774922  0.22724937  0.23079531  0.14849086]]

 [[ 0.11163659  0.13263632  0.20109797  0.28029731  0.27433187]
  [ 0.02216373  0.03424517  0.13683401  0.38068131  0.42607573]
  [ 0.00105937  0.0023865   0.0521594   0.43946937  0.50492537]]

 [[ 0.13276921  0.15531689  0.21852671  0.25823513  0.23515201]
  [ 0.05750636  0.08210614  0.22636817  0.3303588   0.30366054]
  [ 0.01128351  0.02332032  0.210263    0.3951444   0.35998878]]

 [[ 0.15303896  0.18197381  0.21823004  0.23647803  0.21027911]
  [ 0.10842207  0.15755147  0.23791778  0.26479205  0.23131666]
  [ 0.06472684  0.12843341  0.26680911  0.28923658  0.25079405]]

 [[ 0.19560908  0.20663913  0.21954383  0.21920268  0.15900527]
  [ 0.22829761  0.22907974  0.22933882  0.20822221  0.10506159]
  [ 0.27179539  0.25587022  0.22594844  0.18308094  0.063305  ]]]

Ok, It works, but it is just a test, i really do not care about accuracy etc. I would like to understand how i can work with output of different size.

好的,它是可行的,但它只是一个测试,我真的不关心精度等等。我想知道我如何能处理不同大小的输出。

For example: passing a sequence (numpy.array) like:

例如:传递一个序列(numpy.array):

[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]]

I would like to get 4 dimensions output as prediction:

我希望得到4维输出作为预测:

[[..first..], [..second..], [..third..], [..four..]]

Is that possibile somehow? The size could vary I would train the model with different labels that can have different N-dimensions.

是,相信吗?大小可以改变,我会用不同的标签来训练模型,不同的标签可以有不同的n维。

Thanks

谢谢

3 个解决方案

#1


2  

This answer is for non varying dimensions, but for varying dimensions, the padding idea in Giuseppe's answer seems the way to go, maybe with help of the "Masking" proposed in Keras documentation.

这个答案是针对不同的维度,但对于不同的维度,Giuseppe的答案中的填充思想似乎是可行的,也许是借助Keras文档中提出的“屏蔽”的帮助。


The output shape in Keras is totally dependent on the number of "units/neurons/cells" you put in the last layer, and of course, on the type of layer.

Keras的输出形状完全依赖于你放在最后一层的“单元/神经元/细胞”的数量,当然,也依赖于层的类型。

I can see that your data does not match your code in your question, it's impossible, but, suppose your code is right and forget the data for a while.

我可以看到,您的数据与您的问题中的代码不匹配,这是不可能的,但是,假设您的代码是正确的,并且暂时忘记了数据。

An input shape of (100,5) in an LSTM layer means a tensor of shape (None, 100, 5), which is

在LSTM层中(100,5)的输入形状是一个形状张量(不,100,5),也就是。

  • None is the batch size. The first dimension of your data is reserved to the number of examples you have. (X and Y must have the same number of examples).
  • 没有一个是批处理大小。您的数据的第一个维度是保留到您拥有的示例的数量。(X和Y必须有相同数量的例子)。
  • Each example is a sequence with 100 time steps
  • 每个示例都是一个包含100个时间步骤的序列。
  • each time step is a 5-dimension vector.
  • 每一步都是一个5维向量。

And the 32 cells in this same LSTM layer means that the resulting vectors will change from 5 to 32-dimension vectors. With return_sequences=True, all the 100 timesteps will appear in the result. So the result shape of the first layer is (None, 100, 32):

同样的LSTM层中的32个单元格意味着所得到的向量将由5变到32维向量。使用return_sequence =True,结果将显示所有100个时间步骤。第一层的结果形状是(None, 100,32):

  • Same number of examples (this will never change along the model)
  • 相同数量的示例(这在模型中永远不会改变)
  • Still 100 timesteps per example (because return_sequences=True)
  • 每个示例仍然有100个时间步(因为return_sequence =True)
  • each time step is a 32-dimension vector (because of 32 cells)
  • 每一步都是一个32维的矢量(因为32个单元格)

Now the second LSTM layer does exactly the same thing. Keeps the 100 timesteps, and since it has also 32 cells, keeps the 32-dimension vectors, so the output is also (None, 100, 32)

现在第二个LSTM层也做了同样的事情。保持100个时间步,因为它也有32个单元格,所以保持32维向量,所以输出也是(None, 100,32)

Finally, the time distributed Dense layer will also keep the 100 timesteps (because of TimeDistributed), and change your vectors to 5-dimensoin vectors again (because of 5 units), resulting in (None, 100, 5).

最后,时间分布的密集层也会保留100个时间步(因为有了TimeDistributed),并且再次将你的向量变成5维的向量(因为5个单位),结果是(没有,100,5)。


As you can see, you cannot change the number of timesteps directly with recurrent layers, you need to use other layers to change these dimensions. And the way to do this is completely up to you, there are infinite ways of doing this.

正如您所看到的,您不能直接使用循环层来更改时间步骤的数量,您需要使用其他层来更改这些维度。这样做的方法完全取决于你,有很多方法可以做到这一点。

But in all of them, you need to get free of the timesteps and rebuild the data with another shape.

但是,在所有这些过程中,您需要获得空闲的时间步骤,并以另一个形状重新构建数据。


Suggestion

A suggestion from me (which is just one possibility) is to reshape your result, and apply another dense layer just to achieve the final shape expeted.

我的一个建议(这只是一种可能性)是重塑你的结果,并应用另一个致密层来达到最终的形状。

Suppose you want a result like (None, 4, 5) (never forget, the first dimension of your data is the number of examples, it can be any number, but you must take it into account when you organize your data). We can achieve this by reshaping the data to a shape containing 4 in the second dimension:

假设您想要一个结果,例如(None、4、5)(不要忘记,您的数据的第一个维度是示例的数量,它可以是任何数字,但是在组织数据时必须考虑到它)。我们可以通过将数据重塑为包含第二个维度中的4的形状来实现这一点:

#after the Dense layer:

model.add(Reshape((4,125)) #the batch size doesn't appear here, 
   #just make sure you have 500 elements, which is 100*5 = 4*125

model.add(TimeDistributed(Dense(5))
#this layer could also be model.add(LSTM(5,return_sequences=True)), for instance

#continue to the "Activation" layer

This will give you 4 timesteps (because the dimension after Reshape was: (None, 4, 125), each step being a 5-dimension vector (because of Dense(5)).

这将给你4个时间步(因为重构后的维度是:(None, 4,125),每一个步骤都是一个5维向量(因为稠密(5))。

Use the model.summary() command to see the shapes outputted by each layer.

使用model.summary()命令查看每个层所输出的形状。

#2


2  

I don't know Keras but from a practical and theoretical point of view this is absolutely possible.

我不知道Keras,但从实践和理论的角度来看,这是完全可能的。

The idea is that you have an input sequence and an output sequence. Commonly, the beginning and the end of each sequence are delimited by some special symbol (e.g. the character sequence "cat" is translated into "^cat#" with an start symbol "^" and an end symbol "#"). Then the sequence is padded with another special symbol, up to a maximum sequence length (e.g. "^cat#$$$$$$" with a padding symbol "$").

其思想是,您有一个输入序列和一个输出序列。通常,每个序列的开头和结尾是由一些特殊的符号分隔(例如字符序列的“猫”翻译成“猫^ #”开始“^”象征和结束符号“#”)。然后将序列添加到另一个特殊符号中,最多达到一个最大序列长度。”猫^ # $ $ $ $ $ $”填充符号“$”)。

If the padding symbol correspond to a zero-vector, it will have no impact on your training.

如果填充符号对应于零向量,它将不会影响您的训练。

Your output sequence could now assume any length up to the maximum one, because the real length is the one from the start to the end symbol positions.

您的输出序列现在可以假设任意长度到最大值,因为实际长度是从开始到结束符号位置的长度。

In other words, you will have always the same input and output sequence length (i.e. the maximum one), but the real length is that between the start and the end symbols.

换句话说,您将始终具有相同的输入和输出序列长度(即最大的长度),但真正的长度是在开始和结束符号之间。

(Obviously, in the output sequence, anything after the end symbol should not be considered in the loss function)

(显然,在输出序列中,结束符号后的任何东西都不应该考虑在损失函数中)

#3


0  

There seems to be two methods to do a sequence to sequence method, you're describing. The first directly using keras using this example (code below)

似乎有两种方法可以对序列方法进行排序,您正在描述。第一个直接使用keras的例子(下面的代码)

from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model

inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)

sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

Where the repeat vector repeats the initial time series n times to match the output vectors number of timestamps. This will still mean you need a fixed number of time steps in you output vector, however, there may be a method to padding vectors that have less timestamps than you max amount of timesteps.

重复向量重复初始时间序列n次,以匹配时间戳的输出向量个数。这仍然意味着您需要在输出向量中有固定的时间步骤,但是,可能会有一种方法来填充比您的最大时间步长更少的时间戳的向量。

Or you can you the seq2seq module, which is built ontop of keras.

或者您可以使用seq2seq模块,它是在keras之上构建的。

#1


2  

This answer is for non varying dimensions, but for varying dimensions, the padding idea in Giuseppe's answer seems the way to go, maybe with help of the "Masking" proposed in Keras documentation.

这个答案是针对不同的维度,但对于不同的维度,Giuseppe的答案中的填充思想似乎是可行的,也许是借助Keras文档中提出的“屏蔽”的帮助。


The output shape in Keras is totally dependent on the number of "units/neurons/cells" you put in the last layer, and of course, on the type of layer.

Keras的输出形状完全依赖于你放在最后一层的“单元/神经元/细胞”的数量,当然,也依赖于层的类型。

I can see that your data does not match your code in your question, it's impossible, but, suppose your code is right and forget the data for a while.

我可以看到,您的数据与您的问题中的代码不匹配,这是不可能的,但是,假设您的代码是正确的,并且暂时忘记了数据。

An input shape of (100,5) in an LSTM layer means a tensor of shape (None, 100, 5), which is

在LSTM层中(100,5)的输入形状是一个形状张量(不,100,5),也就是。

  • None is the batch size. The first dimension of your data is reserved to the number of examples you have. (X and Y must have the same number of examples).
  • 没有一个是批处理大小。您的数据的第一个维度是保留到您拥有的示例的数量。(X和Y必须有相同数量的例子)。
  • Each example is a sequence with 100 time steps
  • 每个示例都是一个包含100个时间步骤的序列。
  • each time step is a 5-dimension vector.
  • 每一步都是一个5维向量。

And the 32 cells in this same LSTM layer means that the resulting vectors will change from 5 to 32-dimension vectors. With return_sequences=True, all the 100 timesteps will appear in the result. So the result shape of the first layer is (None, 100, 32):

同样的LSTM层中的32个单元格意味着所得到的向量将由5变到32维向量。使用return_sequence =True,结果将显示所有100个时间步骤。第一层的结果形状是(None, 100,32):

  • Same number of examples (this will never change along the model)
  • 相同数量的示例(这在模型中永远不会改变)
  • Still 100 timesteps per example (because return_sequences=True)
  • 每个示例仍然有100个时间步(因为return_sequence =True)
  • each time step is a 32-dimension vector (because of 32 cells)
  • 每一步都是一个32维的矢量(因为32个单元格)

Now the second LSTM layer does exactly the same thing. Keeps the 100 timesteps, and since it has also 32 cells, keeps the 32-dimension vectors, so the output is also (None, 100, 32)

现在第二个LSTM层也做了同样的事情。保持100个时间步,因为它也有32个单元格,所以保持32维向量,所以输出也是(None, 100,32)

Finally, the time distributed Dense layer will also keep the 100 timesteps (because of TimeDistributed), and change your vectors to 5-dimensoin vectors again (because of 5 units), resulting in (None, 100, 5).

最后,时间分布的密集层也会保留100个时间步(因为有了TimeDistributed),并且再次将你的向量变成5维的向量(因为5个单位),结果是(没有,100,5)。


As you can see, you cannot change the number of timesteps directly with recurrent layers, you need to use other layers to change these dimensions. And the way to do this is completely up to you, there are infinite ways of doing this.

正如您所看到的,您不能直接使用循环层来更改时间步骤的数量,您需要使用其他层来更改这些维度。这样做的方法完全取决于你,有很多方法可以做到这一点。

But in all of them, you need to get free of the timesteps and rebuild the data with another shape.

但是,在所有这些过程中,您需要获得空闲的时间步骤,并以另一个形状重新构建数据。


Suggestion

A suggestion from me (which is just one possibility) is to reshape your result, and apply another dense layer just to achieve the final shape expeted.

我的一个建议(这只是一种可能性)是重塑你的结果,并应用另一个致密层来达到最终的形状。

Suppose you want a result like (None, 4, 5) (never forget, the first dimension of your data is the number of examples, it can be any number, but you must take it into account when you organize your data). We can achieve this by reshaping the data to a shape containing 4 in the second dimension:

假设您想要一个结果,例如(None、4、5)(不要忘记,您的数据的第一个维度是示例的数量,它可以是任何数字,但是在组织数据时必须考虑到它)。我们可以通过将数据重塑为包含第二个维度中的4的形状来实现这一点:

#after the Dense layer:

model.add(Reshape((4,125)) #the batch size doesn't appear here, 
   #just make sure you have 500 elements, which is 100*5 = 4*125

model.add(TimeDistributed(Dense(5))
#this layer could also be model.add(LSTM(5,return_sequences=True)), for instance

#continue to the "Activation" layer

This will give you 4 timesteps (because the dimension after Reshape was: (None, 4, 125), each step being a 5-dimension vector (because of Dense(5)).

这将给你4个时间步(因为重构后的维度是:(None, 4,125),每一个步骤都是一个5维向量(因为稠密(5))。

Use the model.summary() command to see the shapes outputted by each layer.

使用model.summary()命令查看每个层所输出的形状。

#2


2  

I don't know Keras but from a practical and theoretical point of view this is absolutely possible.

我不知道Keras,但从实践和理论的角度来看,这是完全可能的。

The idea is that you have an input sequence and an output sequence. Commonly, the beginning and the end of each sequence are delimited by some special symbol (e.g. the character sequence "cat" is translated into "^cat#" with an start symbol "^" and an end symbol "#"). Then the sequence is padded with another special symbol, up to a maximum sequence length (e.g. "^cat#$$$$$$" with a padding symbol "$").

其思想是,您有一个输入序列和一个输出序列。通常,每个序列的开头和结尾是由一些特殊的符号分隔(例如字符序列的“猫”翻译成“猫^ #”开始“^”象征和结束符号“#”)。然后将序列添加到另一个特殊符号中,最多达到一个最大序列长度。”猫^ # $ $ $ $ $ $”填充符号“$”)。

If the padding symbol correspond to a zero-vector, it will have no impact on your training.

如果填充符号对应于零向量,它将不会影响您的训练。

Your output sequence could now assume any length up to the maximum one, because the real length is the one from the start to the end symbol positions.

您的输出序列现在可以假设任意长度到最大值,因为实际长度是从开始到结束符号位置的长度。

In other words, you will have always the same input and output sequence length (i.e. the maximum one), but the real length is that between the start and the end symbols.

换句话说,您将始终具有相同的输入和输出序列长度(即最大的长度),但真正的长度是在开始和结束符号之间。

(Obviously, in the output sequence, anything after the end symbol should not be considered in the loss function)

(显然,在输出序列中,结束符号后的任何东西都不应该考虑在损失函数中)

#3


0  

There seems to be two methods to do a sequence to sequence method, you're describing. The first directly using keras using this example (code below)

似乎有两种方法可以对序列方法进行排序,您正在描述。第一个直接使用keras的例子(下面的代码)

from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model

inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)

sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

Where the repeat vector repeats the initial time series n times to match the output vectors number of timestamps. This will still mean you need a fixed number of time steps in you output vector, however, there may be a method to padding vectors that have less timestamps than you max amount of timesteps.

重复向量重复初始时间序列n次,以匹配时间戳的输出向量个数。这仍然意味着您需要在输出向量中有固定的时间步骤,但是,可能会有一种方法来填充比您的最大时间步长更少的时间戳的向量。

Or you can you the seq2seq module, which is built ontop of keras.

或者您可以使用seq2seq模块,它是在keras之上构建的。