I'm trying to train an LSTM for some a binary classification problem. When I plot loss
curve after the training, there are strange picks in it. Here are some examples:
我正在尝试为一些二进制分类问题训练LSTM。当我在训练后绘制损失曲线时,会有奇怪的选择。这里有些例子:
Here is the basic code
这是基本代码
model = Sequential()
model.add(recurrent.LSTM(128, input_shape = (columnCount,1), return_sequences=True))
model.add(Dropout(0.5))
model.add(recurrent.LSTM(128, return_sequences=False))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
new_train = X_train[..., newaxis]
history = model.fit(new_train, y_train, nb_epoch=500, batch_size=100,
callbacks = [EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, verbose=0, mode='auto'),
ModelCheckpoint(filepath="model.h5", verbose=0, save_best_only=True)],
validation_split=0.1)
# list all data in history
print(history.history.keys())
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
I don't understand why do that picks occur? Any ideas?
我不明白为什么会发生这种选择?有任何想法吗?
2 个解决方案
#1
4
There are many possibilities why something like this occurs:
有很多可能出现这样的事情:
-
Your parameters trajectory changed its basin of attraction - this means that your system left a stable trajectory and switched to another one. This was probably due to randomization like e.g. batch sampling or dropout.
您的参数轨迹改变了它的吸引力 - 这意味着您的系统保持稳定的轨迹并切换到另一个轨道。这可能是由于随机化,例如批量采样或丢失。
-
LSTM instability- LSTMs are believed to be extremely unstable in terms of training. It was also reported that very often it's really time consuming for them to stabilize.
LSTM不稳定性 - 据信LSTM在训练方面极不稳定。据报道,他们往往非常耗费时间来稳定。
Due to the latest research (e.g. from here) I would recommend you decreasing the batch size and leaving it for more epochs. I would also try to check if e.g. topology of a network is not to complexed (or plain) in terms of amount of patterns it need to learn. I would also try switch to either GRU
or SimpleRNN
.
由于最新的研究(例如从这里开始),我建议你减少批量大小并留下更多的时代。我也会尝试检查是否在需要学习的模式数量方面,网络的拓扑结构不是复杂的(或简单的)。我也会尝试切换到GRU或SimpleRNN。
#2
0
This question is old, but I've seen this happen before when re-starting training from a checkpoint. If the spike corresponded to a break in training, you may be inadvertently resetting some of the weights.
这个问题已经过时了,但是我在从检查站重新开始训练之前就已经看到了这个问题。如果尖峰对应于训练中断,则可能无意中重置了一些重量。
#1
4
There are many possibilities why something like this occurs:
有很多可能出现这样的事情:
-
Your parameters trajectory changed its basin of attraction - this means that your system left a stable trajectory and switched to another one. This was probably due to randomization like e.g. batch sampling or dropout.
您的参数轨迹改变了它的吸引力 - 这意味着您的系统保持稳定的轨迹并切换到另一个轨道。这可能是由于随机化,例如批量采样或丢失。
-
LSTM instability- LSTMs are believed to be extremely unstable in terms of training. It was also reported that very often it's really time consuming for them to stabilize.
LSTM不稳定性 - 据信LSTM在训练方面极不稳定。据报道,他们往往非常耗费时间来稳定。
Due to the latest research (e.g. from here) I would recommend you decreasing the batch size and leaving it for more epochs. I would also try to check if e.g. topology of a network is not to complexed (or plain) in terms of amount of patterns it need to learn. I would also try switch to either GRU
or SimpleRNN
.
由于最新的研究(例如从这里开始),我建议你减少批量大小并留下更多的时代。我也会尝试检查是否在需要学习的模式数量方面,网络的拓扑结构不是复杂的(或简单的)。我也会尝试切换到GRU或SimpleRNN。
#2
0
This question is old, but I've seen this happen before when re-starting training from a checkpoint. If the spike corresponded to a break in training, you may be inadvertently resetting some of the weights.
这个问题已经过时了,但是我在从检查站重新开始训练之前就已经看到了这个问题。如果尖峰对应于训练中断,则可能无意中重置了一些重量。