缩放LSTM权重是否有意义？

I want to transfer hidden states from multiple LSTM networks into new network. Does it make sense to have the hidden states scaled or standardized before I input them into the new network?

我想将隐藏状态从多个LSTM网络转移到新网络中。在将隐藏状态输入新网络之前，将隐藏状态缩放或标准化是否有意义？

1 个解决方案

#1

It is fairly possible to ruin the network this way. Take a look at LSTM equations below:

以这种方式破坏网络是相当可能的。看看下面的LSTM方程：

Depending on the input sequence x, scaling Wi or Wc can make the corresponding biases dominant, which will basically form a completely new network. The same input sequence x will result in different long- and short-term states, and there's no reason to think they are better. Scaling both weights and biases is also odd, because it changes the scale of the whole linear layer.

根据输入序列x，缩放Wi或Wc可以使相应的偏差成为主导，这将基本上形成一个全新的网络。相同的输入序列x将导致不同的长期和短期状态，并且没有理由认为它们更好。缩放权重和偏差也很奇怪，因为它会改变整个线性层的比例。

If you are interested in weight regularization, it's better to incorporate it into the original network, rather than patch the trained model.

如果您对体重正规化感兴趣，最好将其合并到原始网络中，而不是修补训练模型。

#1