Tensorflow dynamic_rnn传播批量大于1的nans

Hoping someone can help me understand an issue I have been having using LSTMs with dynamic_rnn in Tensorflow. As per this MWE, when I have a batch size of 1 with sequences that are incomplete (I pad the short tensors with nan's as opposed to zeros to highlight) everything operates as normal, the nan's in the short sequences are ignored as expected...

希望有人可以帮助我理解我在Tensorflow中使用带有dynamic_rnn的LSTM的问题。根据这个MWE，当我的批量大小为1且序列不完整时（我填充带有nan的短张量而不是零以突出显示）一切都正常运行，短序列中的nan会被按预期忽略。。

import tensorflow as tf
import numpy as np

batch_1 = np.random.randn(1, 10, 8)
batch_2 = np.random.randn(1, 10, 8)

batch_1[6:] = np.nan # lets make a short batch in batch 1 second sample of length 6 by padding with nans

seq_lengths_batch_1 = [6]
seq_lengths_batch_2 = [10]

tf.reset_default_graph()

input_vals = tf.placeholder(shape=[1, 10, 8], dtype=tf.float32)
lengths = tf.placeholder(shape=[1], dtype=tf.int32)

cell = tf.nn.rnn_cell.LSTMCell(num_units=5)
outputs, states  = tf.nn.dynamic_rnn(cell=cell, dtype=tf.float32, sequence_length=lengths, inputs=input_vals)
last_relevant_value = states.h
fake_loss = tf.reduce_mean(last_relevant_value)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(fake_loss)

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
_, fl, lrv = sess.run([optimizer, fake_loss, last_relevant_value], feed_dict={input_vals: batch_1, lengths: seq_lengths_batch_1})
print(fl, lrv)
_, fl, lrv = sess.run([optimizer, fake_loss, last_relevant_value], feed_dict={input_vals: batch_2, lengths: seq_lengths_batch_2})
print(fl, lrv)

sess.close()

which outputs properly populated values of the ilk....

哪个输出适当填充的值...

0.00659429 [[ 0.11608966  0.08498846 -0.02892204 -0.01945034 -0.1197343 ]]
-0.080244 [[-0.03018401 -0.18946587 -0.19128899 -0.10388547  0.11360413]]

However then when I increase my batch size up to size 3 for example, the first batch executes correctly but then somehow the second batch causes nans to start to propogating

然而，当我将批量大小增加到3级时，第一批正确执行，但不知何故第二批导致nans开始传播

import tensorflow as tf
import numpy as np

batch_1 = np.random.randn(3, 10, 8)
batch_2 = np.random.randn(3, 10, 8)

batch_1[1, 6:] = np.nan 
batch_2[0, 8:] = np.nan 

seq_lengths_batch_1 = [10, 6, 10]
seq_lengths_batch_2 = [8, 10, 10]

tf.reset_default_graph()

input_vals = tf.placeholder(shape=[3, 10, 8], dtype=tf.float32)
lengths = tf.placeholder(shape=[3], dtype=tf.int32)

cell = tf.nn.rnn_cell.LSTMCell(num_units=5)
outputs, states  = tf.nn.dynamic_rnn(cell=cell, dtype=tf.float32, sequence_length=lengths, inputs=input_vals)
last_relevant_value = states.h
fake_loss = tf.reduce_mean(last_relevant_value)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(fake_loss)

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
_, fl, lrv = sess.run([optimizer, fake_loss, last_relevant_value], feed_dict={input_vals: batch_1, lengths: seq_lengths_batch_1})
print(fl, lrv)
_, fl, lrv = sess.run([optimizer, fake_loss, last_relevant_value], feed_dict={input_vals: batch_2, lengths: seq_lengths_batch_2})
print(fl, lrv)

sess.close()

giving

给

0.0533635 [[ 0.33622459 -0.0284576   0.11914439  0.14402215 -0.20783389]
 [ 0.20805927  0.17591488 -0.24977767 -0.03432769  0.2944448 ]
 [-0.04508523  0.11878576  0.07287208  0.14114542 -0.24467923]]
nan [[ nan  nan  nan  nan  nan]
 [ nan  nan  nan  nan  nan]
 [ nan  nan  nan  nan  nan]]

I have found this behavior quite strange, as I expected all values after the sequence lengths to be ignored as happens with a batch size of 1 but doesn't work with a batch size of 2 or more.

我发现这种行为很奇怪，因为我预计序列长度之后的所有值都会被忽略，因为批量大小为1，但批量大小为2或更多时无效。

Obviously, nans do not get propagated if I use 0 as my padding value, but this doesn't inspire me with any confidence that dynamic_rnn is functioning as I am expecting it to.

显然，如果我使用0作为填充值，nans就不会被传播，但这并不能让我充满信心，因为dynamic_rnn正在运行，因为我期待它。

Also I should mention that if I remove the optimisation step the issue doesnt occur so now I'm properly confused and after a day of trying many different permutations, I cant see why batch size would make any difference here

另外我应该提一下，如果我删除了优化步骤，问题就不会发生，所以现在我很困惑，经过一天尝试了很多不同的排列，我不知道为什么批量大小会在这里产生任何影响

1 个解决方案

#1

I did not trace it down to the exact operation but here is what I believe to be the case.

我没有追溯到确切的操作，但这是我认为的情况。

Why aren't values beyond sequence_length ignored? They are ignored in the sense that they are multiplied by 0 (they are masked out) when doing some operations. Mathematically, the result is always a zero, so they should have no effect. Unfortunately, nan * 0 = nan. So, if you give nan values in your examples, they propagate. You might wonder why TensorFlow does not ignore them completely, but only masks them. The reason is performance on modern hardware. It is much easier to do operations on a large regular shape with a bunch of zeros than on several small shapes (that you get from decomposing an irregular shape).

为什么忽略sequence_length之外的值？在执行某些操作时，它们被乘以0（它们被屏蔽掉）的意义上被忽略。在数学上，结果总是为零，所以它们应该没有效果。不幸的是，nan * 0 = nan。因此，如果在示例中给出nan值，它们就会传播。您可能想知道为什么TensorFlow不会完全忽略它们，而只是掩盖它们。原因是现代硬件的性能。使用一堆零而不是几个小形状（通过分解不规则形状得到），可以更容易地对大型规则形状进行操作。

Why does it only happen on the second batch? In the first batch, the loss and last hidden state are computed using the original variable values. They are fine. Because you also do the optimizer update in the sess.run(), variables get updated and become nan in the first call. In the second call, the nans from variables spread to loss and hidden state.

为什么它只发生在第二批？在第一批中，使用原始变量值计算损失和最后隐藏状态。他们很好。因为您还在sess.run（）中执行优化器更新，所以变量会在第一次调用时更新并变为nan。在第二次调用中，nans从变量扩散到损失和隐藏状态。

How can I be confident that the values beyond sequence_length are really masked out? I modified your example to reproduce the issue but also made it deterministic.

我怎么能确信sequence_length之外的值真的被掩盖了？我修改了你的例子来重现这个问题但也使它具有确定性。

import tensorflow as tf
import numpy as np

batch_1 = np.ones((3, 10, 2))

batch_1[1, 7:] = np.nan

seq_lengths_batch_1 = [10, 7, 10]

tf.reset_default_graph()

input_vals = tf.placeholder(shape=[3, 10, 2], dtype=tf.float32)
lengths = tf.placeholder(shape=[3], dtype=tf.int32)

cell = tf.nn.rnn_cell.LSTMCell(num_units=3, initializer=tf.constant_initializer(1.0))
init_state = tf.nn.rnn_cell.LSTMStateTuple(*[tf.ones([3, c]) for c in cell.state_size])
outputs, states  = tf.nn.dynamic_rnn(cell=cell, dtype=tf.float32, sequence_length=lengths, inputs=input_vals,
        initial_state=init_state)
last_relevant_value = states.h
fake_loss = tf.reduce_mean(last_relevant_value)
optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(fake_loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for _ in range(1):
        _, fl, lrv = sess.run([optimizer, fake_loss, last_relevant_value],
                feed_dict={input_vals: batch_1, lengths: seq_lengths_batch_1})
        print "VARIABLES:", sess.run(tf.trainable_variables())
        print "LOSS and LAST HIDDEN:", fl, lrv

If you replace the np.nan in batch_1[1, 7:] = np.nan with any number (e.g. try -1M, 1M, 0) , you will see that the values you get are the same. You can also run the loop for more iterations. As a further sanity check, if you set seq_lengths_batch_1 to something "wrong", e.g. [10, 8, 10], you can see that now the value you use in batch_1[1, 7:] = np.nan effects the output.

如果将batch_1 [1,7：] = np.nan中的np.nan替换为任意数字（例如，尝试-1M，1M，0），您将看到所获得的值相同。您还可以运行循环以进行更多迭代。作为进一步的健全性检查，如果您将seq_lengths_batch_1设置为“错误”，例如[10,8,10]，您可以看到现在您在batch_1 [1,7：] = np.nan中使用的值会影响输出。

#1