使用Keras在RNN中进行多特征序列填充和掩蔽

I have constructed LSTM architecture using Keras, but I am not certain if duplicating time steps is a good approach to deal with variable sequence length.

我使用Keras构建了LSTM架构，但我不确定复制时间步长是否是处理可变序列长度的好方法。

I have a multidimensional data set with multi-feature sequence and varying time steps. It is a multivariate time series data with multiple examples to train LSTM on, and Y is either 0 or 1. Currently, I am duplicating last time steps for each sequence to ensure timesteps = 3.

我有一个多维数据集，具有多特征序列和不同的时间步长。它是一个多变量时间序列数据，有多个例子来训练LSTM，Y是0或1.目前，我复制每个序列的最后一步，以确保时间步长= 3。

I appreciate if someone could answer the following questions or concerns: 1. Is creating additional time steps with feature values represented by zeroes more suitable?
2. What is the right way to frame this problem, pad sequences, and mask for evaluation.
3. I am duplicating last time step in Y variable as well for prediction, and the value 1 in Y only appears at the last time step if at all.

如果有人可以回答以下问题或疑虑，我感激不尽：1。创建额外的时间步骤，用零表示的特征值更合适吗？ 2.构建此问题的正确方法是什么，填充序列和掩码以进行评估。 3.我正在复制Y变量的最后一个步骤以及预测，并且Y中的值1仅出现在最后一个步骤（如果有的话）。

# The input sequences are
trainX = np.array([
        [
            # Input features at timestep 1
            [1, 2, 3],
            # Input features at timestep 2
            [5, 2, 3] #<------ duplicate this to ensure compliance
        ],
        # Datapoint 2
        [
            # Features at timestep 1
            [1, 8, 9],
            # Features at timestep 2
            [9, 8, 9],
            # Features at timestep 3
            [7, 6, 1]
        ]
    ])

# The desired model outputs is as follows:
trainY = np.array([
        # Datapoint 1
        [
            # Target class at timestep 1
            [0],
            # Target class at timestep 2
            [1] #<---------- duplicate this to ensure compliance
        ],
        # Datapoint 2
        [
            # Target class at timestep 1
            [0],
            # Target class at timestep 2
            [0]
            # Target class at time step 3
            [0]
        ]
    ])

timesteps = 3
model = Sequential()
model.add(LSTM(3, kernel_initializer ='uniform', return_sequences=True, batch_input_shape=(None, timesteps, trainX.shape[2]), 
               kernel_constraint=maxnorm(3), name='LSTM'))
model.add(Dropout(0.2))
model.add(LSTM(3, return_sequences=True, kernel_constraint=maxnorm(3), name='LSTM-2'))
model.add(Flatten(name='Flatten'))
model.add(Dense(timesteps, activation='sigmoid', name='Dense'))
model.compile(loss="mse", optimizer="sgd", metrics=["mse"])
model.fit(trainX, trainY, epochs=2000, batch_size=2)
predY = model.predict(testX)

1 个解决方案

#1

In my opinion there are two solutions to your problem. (Duplicating timesteps is None of them):

在我看来，你的问题有两个解决方案。（复制时间步长是其中之一）：

Use pad_sequence layer in combination with a masking layer. This is the common approach. Now thanks to padding every sample has the same number of timesteps. The good thing on this method, it's very easy to implement. Also, the Masking layer will give you a little performance boost. The downside of this approach: If you train on a GPU, CuDNNLSTM is the layer to go, which is highly optimized for gpu and therefore a lot faster. But it's not working with a masking layer and if your dataset has a high range of timesteps, you're losing perfomance.

将pad_sequence图层与遮罩图层结合使用。这是常用的方法。现在，由于填充每个样本具有相同的时间步数。这个方法的好处是，它很容易实现。此外，Masking层将为您带来一点性能提升。这种方法的缺点是：如果你在GPU上进行训练，那么CuDNNLSTM就是要去的层，它针对gpu进行了高度优化，因此速度要快得多。但是它没有使用遮罩层，如果你的数据集具有很高的时间步长，那么你就会失去性能。
Set your timesteps-shape to None and write a keras generator which will group your batches by timesteps.(I think you'll also have to use the functional api) Now you can implement CuDNNLSTM and every sample will be computed with only the relavant timesteps (instead of padded ones), which is much more efficient.

将您的时间步长形状设置为无并编写一个keras生成器，它将按时间步长对您的批次进行分组。（我认为您还必须使用功能API）现在您可以实现CuDNNLSTM并且每个样本将仅使用相关时间步长计算（而不是填充的），这是更有效的。

If you're new to keras and perfomance is not so important, go with option 1. If you have a production environment where you often have to train the Network and it's cost relevant, try option 2.

如果您不熟悉keras并且性能不是那么重要，请使用选项1.如果您有一个生产环境，您经常需要训练网络并且它的成本相关，请尝试选项2。

#1