I have an issue in my code where I would like to share weights in my lstm_decoder
(so essentially just use one LSTM). I know there are a few resources online on that but I am still unable to understand why the following does not share weights:
我的代码中有一个问题,我想在我的lstm_decoder*享权重(所以基本上只使用一个LSTM)。我知道网上有一些资源,但我仍然无法理解为什么以下不共享权重:
initial_input = tf.unstack(tf.zeros(shape=(1,1,hidden_size2)))
for index in range(window_size):
with tf.variable_scope('lstm_cell_decoder', reuse = index > 0):
rnn_decoder_cell = tf.nn.rnn_cell.LSTMCell(hidden_size, state_is_tuple = True)
output_decoder, state_decoder = tf.nn.static_rnn(rnn_decoder_cell, initial_input, initial_state=last_encoder_state, dtype=tf.float32)
# Compute the score for source output vector
scores = tf.matmul(concat_lstm_outputs, tf.reshape(output_decoder[-1],(hidden_size,1)))
attention_coef = tf.nn.softmax(scores)
context_vector = tf.reduce_sum(tf.multiply(concat_lstm_outputs, tf.reshape(attention_coef, (window_size, 1))),0)
context_vector = tf.reshape(context_vector, (1,hidden_size))
# compute the tilda hidden state \tilde{h}_t=tanh(W[c_t, h_t]+b_t)
concat_context = tf.concat([context_vector, output_decoder[-1]], axis = 1)
W_tilde = tf.Variable(tf.random_normal(shape = [hidden_size*2, hidden_size2], stddev = 0.1), name = "weights_tilde", trainable = True)
b_tilde = tf.Variable(tf.zeros([1, hidden_size2]), name="bias_tilde", trainable = True)
hidden_tilde = tf.nn.tanh(tf.matmul(concat_context, W_tilde)+b_tilde) # hidden_tilde is [1*64]
# update for next time step
initial_input = tf.unstack(tf.reshape(hidden_tilde, (1,1,hidden_size2)))
last_encoder_state = state_decoder
print(initial_input, last_encoder_state)
# predict the target
W_target = tf.Variable(tf.random_normal(shape = [hidden_size2, 1], stddev = 0.1), name = "weights_target", trainable = True)
print(W_target)
logit = tf.matmul(hidden_tilde, W_target)
logits = tf.concat([logits, logit], axis = 0)
logits = logits[1:]
I would like to use the same LSTM cell and the same W_target for each loop iteration. However, I get the following output for print(initial_input, last_encoder_state)
and print(W_target)
for window_size = 2 in the loop.
我想为每个循环迭代使用相同的LSTM单元格和相同的W_target。但是,我在循环中得到了以下输出print(initial_input,last_encoder_state)和print(W_target)for window_size = 2。
[<tf.Tensor 'lstm_cell_decoder/unstack:0' shape=(1, 64) dtype=float32>]
LSTMStateTuple(c=<tf.Tensor
'lstm_cell_decoder/rnn/rnn/lstm_cell/lstm_cell/add_1:0' shape=(1, 64)
dtype=float32>, h=<tf.Tensor
'lstm_cell_decoder/rnn/rnn/lstm_cell/lstm_cell/mul_2:0' shape=(1, 64)
dtype=float32>)
<tf.Variable 'lstm_cell_decoder/weights_target:0' shape=(64, 1)
dtype=float32_ref>
[<tf.Tensor 'lstm_cell_decoder_1/unstack:0' shape=(1, 64) dtype=float32>]
LSTMStateTuple(c=<tf.Tensor
'lstm_cell_decoder_1/rnn/rnn/lstm_cell/lstm_cell/add_1:0' shape=(1, 64)
dtype=float32>, h=<tf.Tensor
'lstm_cell_decoder_1/rnn/rnn/lstm_cell/lstm_cell/mul_2:0' shape=(1, 64)
dtype=float32>)
<tf.Variable 'lstm_cell_decoder_1/weights_target:0' shape=(64, 1)
dtype=float32_ref>
Update: After Maxim's comments, I tried the following syntax
更新:在Maxim的评论之后,我尝试了以下语法
for index in range(window_size):
with tf.variable_scope('lstm_cell_decoder', reuse = index > 0):
rnn_decoder_cell = tf.nn.rnn_cell.LSTMCell(hidden_size,reuse=index > 0)
output_decoder, state_decoder = tf.nn.static_rnn(rnn_decoder_cell, ...)
W_target = tf.get_variable(...)
It now shares the Variable W_target properly but there is still an issue for sharing the lstm cell/weights:
它现在正确共享变量W_target,但共享lstm单元格/权重仍然存在问题:
<tf.Tensor 'lstm_cell_decoder/rnn/rnn/lstm_cell/lstm_cell/mul_2:0' shape=(1,
64) dtype=float32>]
LSTMStateTuple(c=<tf.Tensor
'lstm_cell_decoder/rnn/rnn/lstm_cell/lstm_cell/add_1:0' shape=(1, 64)
dtype=float32>, h=<tf.Tensor
'lstm_cell_decoder/rnn/rnn/lstm_cell/lstm_cell/mul_2:0' shape=(1, 64)
dtype=float32>)
<tf.Variable 'lstm_cell_decoder/weights_target:0' shape=(64, 1)
dtype=float32_ref>
[<tf.Tensor 'lstm_cell_decoder_1/rnn/rnn/lstm_cell/lstm_cell/mul_2:0'
shape=(1, 64) dtype=float32>]
LSTMStateTuple(c=<tf.Tensor
'lstm_cell_decoder_1/rnn/rnn/lstm_cell/lstm_cell/add_1:0' shape=(1, 64)
dtype=float32>, h=<tf.Tensor
'lstm_cell_decoder_1/rnn/rnn/lstm_cell/lstm_cell/mul_2:0' shape=(1, 64)
dtype=float32>)
<tf.Variable 'lstm_cell_decoder/weights_target:0' shape=(64, 1)
dtype=float32_ref>
1 个解决方案
#1
0
First off, creating variables with tf.Variable
won't make it reusable. That's one of the key differences between tf.Variable
and tf.get_variable
. See this example:
首先,使用tf.Variable创建变量不会使其可重用。这是tf.Variable和tf.get_variable之间的关键区别之一。看这个例子:
with tf.variable_scope('foo', reuse=tf.AUTO_REUSE):
for i in range(3):
x = tf.Variable(0.0, name='x')
y = tf.get_variable(name='y', shape=())
If you inspect the created variables, you'll see:
如果检查创建的变量,您将看到:
<tf.Variable 'foo/x:0' shape=() dtype=float32_ref>
<tf.Variable 'foo/y:0' shape=() dtype=float32_ref>
<tf.Variable 'foo/x_1:0' shape=() dtype=float32_ref>
<tf.Variable 'foo/x_2:0' shape=() dtype=float32_ref>
Next, RNN cells provide own mechanism for the reuse. E.g., for tf.nn.rnn_cell.LSTMCell
it's reuse
constructor argument:
接下来,RNN小区提供自己的重用机制。例如,对于tf.nn.rnn_cell.LSTMCell,它是重用构造函数参数:
reuse = tf.AUTO_REUSE # Try also True and False
cell1 = tf.nn.rnn_cell.LSTMCell(3, reuse=reuse)
cell2 = tf.nn.rnn_cell.LSTMCell(3, reuse=reuse)
outputs1, states1 = tf.nn.dynamic_rnn(cell1, X, dtype=tf.float32)
outputs2, states2 = tf.nn.dynamic_rnn(cell2, X, dtype=tf.float32)
#1
0
First off, creating variables with tf.Variable
won't make it reusable. That's one of the key differences between tf.Variable
and tf.get_variable
. See this example:
首先,使用tf.Variable创建变量不会使其可重用。这是tf.Variable和tf.get_variable之间的关键区别之一。看这个例子:
with tf.variable_scope('foo', reuse=tf.AUTO_REUSE):
for i in range(3):
x = tf.Variable(0.0, name='x')
y = tf.get_variable(name='y', shape=())
If you inspect the created variables, you'll see:
如果检查创建的变量,您将看到:
<tf.Variable 'foo/x:0' shape=() dtype=float32_ref>
<tf.Variable 'foo/y:0' shape=() dtype=float32_ref>
<tf.Variable 'foo/x_1:0' shape=() dtype=float32_ref>
<tf.Variable 'foo/x_2:0' shape=() dtype=float32_ref>
Next, RNN cells provide own mechanism for the reuse. E.g., for tf.nn.rnn_cell.LSTMCell
it's reuse
constructor argument:
接下来,RNN小区提供自己的重用机制。例如,对于tf.nn.rnn_cell.LSTMCell,它是重用构造函数参数:
reuse = tf.AUTO_REUSE # Try also True and False
cell1 = tf.nn.rnn_cell.LSTMCell(3, reuse=reuse)
cell2 = tf.nn.rnn_cell.LSTMCell(3, reuse=reuse)
outputs1, states1 = tf.nn.dynamic_rnn(cell1, X, dtype=tf.float32)
outputs2, states2 = tf.nn.dynamic_rnn(cell2, X, dtype=tf.float32)