如何使用隐藏层激活来构造损失函数，并在Keras拟合过程中提供y_true ?

Assume I have a model like this. M1 and M2 are two layers linking left and right sides of the model. The example model: Red lines indicate backprop directions

假设我有一个这样的模型。M1和M2是连接模型左右两侧的两层。示例模型:红线表示反向支撑方向。

During training, I hope M1 can learn a mapping from L2_left activation to L2_right activation. Similarly, M2 can learn a mapping from L3_right activation to L3_left activation. The model also needs to learn the relationship between two inputs and the output. Therefore, I should have three loss functions for M1, M2, and L3_left respectively.

在训练期间，我希望M1可以学习从L2_left激活到L2_right激活的映射。类似地，M2可以学习从L3_right激活到L3_left激活的映射。模型还需要学习两个输入和输出之间的关系。因此，对于M1, M2, L3_left，我应该有三个损失函数。

I probably can use:

我可能可以使用:

model.compile(optimizer='rmsprop',
          loss={'M1': 'mean_squared_error',
                'M2': 'mean_squared_error', 
                'L3_left': mean_squared_error'})

But during training, we need to provide y_true, for example:

但是在培训期间，我们需要提供y_true，例如:

model.fit([input_1,input_2], y_true)

In this case, the y_true is the hidden layer activations and not from a dataset. Is it possible to build this model and train it using it's hidden layer activations?

在这种情况下，y_true是隐藏层激活，而不是来自数据集。是否有可能构建这个模型并使用它的隐藏层激活来训练它?

2 个解决方案

#1

If you have only one output, you must have only one loss function.

如果只有一个输出，则必须只有一个损失函数。

If you want three loss functions, you must have three outputs, and, of course, three Y vectors for training.

如果你想要三个损失函数，你必须有三个输出，当然，还有三个Y向量用于训练。

If you want loss functions in the middle of the model, you must take outputs from those layers.

如果您希望在模型的中间有损失函数，那么您必须从这些层中获取输出。

Creating the graph of your model: (if the model is already defined, see the end of this answer)

创建模型的图:(如果模型已经定义，请参见此答案的末尾)

#Here, all "SomeLayer(blabla)" could be replaced by a "SomeModel" if necessary
    #Example of using a layer or a model:
        #M1 = SomeLayer(blablabla)(L12) 
        #M1 = SomeModel(L12)

from keras.models import Model
from keras.layers import *

inLef = Input((shape1))   
inRig = Input((shape2))

L1Lef = SomeLayer(blabla)(inLef)
L2Lef = SomeLayer(blabla)(L1Lef)
M1 = SomeLayer(blablaa)(L2Lef) #this is an output

L1Rig = SomeLayer(balbla)(inRig)

conc2Rig = Concatenate(axis=?)([L1Rig,M1]) #Or Add, or Multiply, however you're joining the models    
L2Rig = SomeLayer(nlanlab)(conc2Rig)
L3Rig = SomeLayer(najaljd)(L2Rig)

M2 = SomeLayer(babkaa)(L3Rig) #this is an output

conc3Lef = Concatenate(axis=?)([L2Lef,M2])
L3Lef = SomeLayer(blabla)(conc3Lef) #this is an output

Creating your model with three outputs:

用三个输出创建模型:

Now you've got your graph ready and you know what the outputs are, you create the model:

现在你已经准备好了图形你知道输出是什么，你创建了模型:

model = Model([inLef,inRig], [M1,M2,L3Lef])
model.compile(loss='mse', optimizer='rmsprop')

If you want different losses for each output, then you create a list:

如果您希望每个输出都有不同的损失，那么您可以创建一个列表:

#example of custom loss function, if necessary
def lossM1(yTrue,yPred):
    return keras.backend.sum(keras.backend.abs(yTrue-yPred))

#compiling with three different loss functions
model.compile(loss = [lossM1, 'mse','binary_crossentropy'], optimizer =??)

But you've got to have three different yTraining too, for training with:

但是你也要有三种不同的训练，训练时:

model.fit([input_1,input_2], [yTrainM1,yTrainM2,y_true], ....)

If your model is already defined and you don't create it's graph like I did:

如果你的模型已经定义了并且你不像我那样创建它的图形:

Then, you have to find in yourModel.layers[i] which ones are M1 and M2, so you create a new model like this:

然后，你必须在你的模型中找到。层[i]分别是M1和M2，所以你创建了一个这样的新模型:

M1 = yourModel.layers[indexForM1].output
M2 = yourModel.layers[indexForM2].output
newModel = Model([inLef,inRig], [M1,M2,yourModel.output])

If you want that two outputs be equal:

如果你想让两个输出相等:

In this case, just subtract the two outputs in a lambda layer, and make that lambda layer be an output of your model, with expected values = 0.

在本例中，只需在lambda层中减去两个输出，并使这个lambda层成为模型的输出，期望值= 0。

Using the exact same vars as before, we'll just create two addictional layers to subtract outputs:

使用与之前完全相同的vars，我们将创建两个成瘾层来减去输出:

diffM1L1Rig = Lambda(lambda x: x[0] - x[1])([L1Rig,M1])
diffM2L2Lef = Lambda(lambda x: x[0] - x[1])([L2Lef,M2])

Now your model should be:

现在你的模型应该是:

newModel = Model([inLef,inRig],[diffM1L1Rig,diffM2L2lef,L3Lef])

And training will expect those two differences to be zero:

培训将期望这两种差异为零:

yM1 = np.zeros((shapeOfM1Output))
yM2 = np.zeros((shapeOfM2Output))
newModel.fit([input_1,input_2], [yM1,yM2,t_true], ...)

#2

Trying to answer to the last part: how to make gradients only affect one side of the model.

试图回答最后一部分:如何使渐变只影响模型的一边。

...well.... at first that sounds unfeasible to me. But, if that is similar to "train only a part of the model", then it's totally ok by defining models that only go to a certain point and making part of the layers untrainable.

……嗯....起初，我觉得这听起来不可行。但是，如果这类似于“仅仅训练模型的一部分”，那么它就完全可以通过定义模型，而这些模型只会到达某个特定的点，并使部分层无法训练。

By doing that, nothing will affect those layers. If that's what you want, then you can do it:

通过这样做，没有什么会影响这些层。如果这是你想要的，那么你可以这样做:

#using the previous vars to define other models

modelM1 = Model([inLef,inRig],diffM1L1Rig)

This model above ends in diffM1L1Rig. Before compiling, you must set L2Right untrainable:

此模型在diffM1L1Rig中结束。在编译之前，您必须设置L2Right untrainable:

modelM1.layers[??].trainable = False
#to find which layer is the right one, you may define then using the "name" parameter, or see in the modelM1.summary() the shapes, types etc. 

modelM1.compile(.....)
modelM1.fit([input_1, input_2], yM1)

This suggestion makes you train only a single part of the model. You can repeat the procedure for M2, locking the layers you need before compiling.

这个建议使您只培训模型的一个部分。您可以在编译之前重复使用M2的过程，锁定所需的层。

You can also define a full model taking all layers, and lock only the ones you want. But you won't be able (I think) to make half gradients pass by one side and half the gradients pass by the other side.

您还可以定义一个完整的模型，它包含所有层，并且只锁定您想要的。但是你不能(我认为)让一半的梯度通过一边，一半的梯度通过另一边。

So I suggest you keep three models, the fullModel, the modelM1, and the modelM2, and you cycle them in training. One epoch each, maybe....

所以我建议你保留三个模型，fullModel, modelM1和modelM2，然后在培训中循环它们。一个时代,也许....

That should be tested....

应测试....

#1