I want to transfer hidden states from multiple LSTM networks into new network. Does it make sense to have the hidden states scaled or standardized before I input them into the new network?
我想将隐藏状态从多个LSTM网络转移到新网络中。在将隐藏状态输入新网络之前,将隐藏状态缩放或标准化是否有意义?
1 个解决方案
#1
1
It is fairly possible to ruin the network this way. Take a look at LSTM equations below:
以这种方式破坏网络是相当可能的。看看下面的LSTM方程:
Depending on the input sequence x
, scaling Wi
or Wc
can make the corresponding biases dominant, which will basically form a completely new network. The same input sequence x
will result in different long- and short-term states, and there's no reason to think they are better. Scaling both weights and biases is also odd, because it changes the scale of the whole linear layer.
根据输入序列x,缩放Wi或Wc可以使相应的偏差成为主导,这将基本上形成一个全新的网络。相同的输入序列x将导致不同的长期和短期状态,并且没有理由认为它们更好。缩放权重和偏差也很奇怪,因为它会改变整个线性层的比例。
If you are interested in weight regularization, it's better to incorporate it into the original network, rather than patch the trained model.
如果您对体重正规化感兴趣,最好将其合并到原始网络中,而不是修补训练模型。
#1
1
It is fairly possible to ruin the network this way. Take a look at LSTM equations below:
以这种方式破坏网络是相当可能的。看看下面的LSTM方程:
Depending on the input sequence x
, scaling Wi
or Wc
can make the corresponding biases dominant, which will basically form a completely new network. The same input sequence x
will result in different long- and short-term states, and there's no reason to think they are better. Scaling both weights and biases is also odd, because it changes the scale of the whole linear layer.
根据输入序列x,缩放Wi或Wc可以使相应的偏差成为主导,这将基本上形成一个全新的网络。相同的输入序列x将导致不同的长期和短期状态,并且没有理由认为它们更好。缩放权重和偏差也很奇怪,因为它会改变整个线性层的比例。
If you are interested in weight regularization, it's better to incorporate it into the original network, rather than patch the trained model.
如果您对体重正规化感兴趣,最好将其合并到原始网络中,而不是修补训练模型。