如何配置一个非常简单的LSTM与Keras / Theano for Regression

时间:2022-08-08 13:58:14

I am struggling to configure a Keras LSTM for a simple regression task. There is some very basic explanation at the official page: Keras RNN documentation

我正在努力为简单的回归任务配置Keras LSTM。官方页面上有一些非常基本的解释:Keras RNN文档

But to fully understand, example configurations with example data would be extremely helpful.

但要完全理解,带有示例数据的示例配置将非常有用。

I have barely found examples for regression with Keras-LSTM. Most examples are about classification (text or images). I've studied the LSTM examples which come with the Keras distribution and one example I found through Google search: http://danielhnyk.cz/ It offers some insight, though the author admitts the approach is quite memory-inefficient, since data samples have to be stored very redundantly.

我几乎没有找到使用Keras-LSTM进行回归的示例。大多数示例都是关于分类(文本或图像)。我研究了Keras发行版附带的LSTM示例以及我通过Google搜索找到的一个示例:http://danielhnyk.cz/它提供了一些见解,尽管作者承认这种方法在内存效率方面非常低效,因为数据样本必须非常冗余地存储。

Although, an improvement was introduced by a commentor (Taha), data-storage is still redundant, I doubt this is the way it was meant to be by the Keras developers.

虽然评论员(Taha)引入了一项改进,但数据存储仍然是多余的,我怀疑这是Keras开发人员的意图。

I've downloaded some simple example sequential data, which happens to be stock data from Yahoo finance. It is freely available from Yahoo Finance Data

我已经下载了一些简单的示例顺序数据,这些数据恰好是来自雅虎财经的股票数据。它可以从雅虎财经数据免费获得

Date,       Open,      High,      Low,       Close,     Volume,   Adj Close
2016-05-18, 94.160004, 95.209999, 93.889999, 94.559998, 41923100, 94.559998
2016-05-17, 94.550003, 94.699997, 93.010002, 93.489998, 46507400, 93.489998
2016-05-16, 92.389999, 94.389999, 91.650002, 93.879997, 61140600, 93.879997
2016-05-13, 90.00,     91.669998, 90.00,     90.519997, 44188200, 90.519997

The table consists of more than 8900 such lines of Apple stock data. There are 7 columns = data points for each day. The value to predict would be "AdjClose", which is the value at the end of the day

该表包含8900多条此类Apple股票数据。每天有7列=数据点。要预测的值是“AdjClose”,这是一天结束时的值

So the goal would be to predict the AdjClose for the next day, based on the sequence of a the previous few days. (This is probably next to impossible, but it is always good to see how a tool behaves under challenging conditions.)

因此,目标是根据前几天的顺序预测第二天的AdjClose。 (这可能几乎是不可能的,但总是很高兴看到工具在具有挑战性的条件下如何表现。)

I think this should be a very standard prediction/regression case for LSTM and easily transferrable to other problem domains.

我认为这应该是LSTM非常标准的预测/回归情况,并且可以轻松转移到其他问题域。

So, how should the data be formatted (X_train, y_train) for minimum redundancy and how do I initialize the Sequential model with only one LSTM layer and a couple of hidden neurons?

那么,如何格式化数据(X_train,y_train)以实现最小冗余,以及如何仅使用一个LSTM层和几个隐藏神经元来初始化Sequential模型?

Kind Regards, Theo

亲切的问候,西奥

PS: I started coding this:

PS:我开始编码:

...
X_train
Out[6]: 
array([[  2.87500000e+01,   2.88750000e+01,   2.87500000e+01,
      2.87500000e+01,   1.17258400e+08,   4.31358010e-01],
   [  2.73750019e+01,   2.73750019e+01,   2.72500000e+01,
      2.72500000e+01,   4.39712000e+07,   4.08852011e-01],
   [  2.53750000e+01,   2.53750000e+01,   2.52500000e+01,
      2.52500000e+01,   2.64320000e+07,   3.78845006e-01],
   ..., 
   [  9.23899994e+01,   9.43899994e+01,   9.16500015e+01,
      9.38799973e+01,   6.11406000e+07,   9.38799973e+01],
   [  9.45500031e+01,   9.46999969e+01,   9.30100021e+01,
      9.34899979e+01,   4.65074000e+07,   9.34899979e+01],
   [  9.41600037e+01,   9.52099991e+01,   9.38899994e+01,
      9.45599976e+01,   4.19231000e+07,   9.45599976e+01]], dtype=float32)

y_train
Out[7]: 
array([  0.40885201,   0.37884501,   0.38822201, ...,  93.87999725,
   93.48999786,  94.55999756], dtype=float32)

So far, the data is ready. There is no redundancy introduced. Now the question is, how to describe a Keras LSTM model / training process on this data.

到目前为止,数据准备就绪。没有引入冗余。现在的问题是,如何描述这个数据的Keras LSTM模型/培训过程。

EDIT 3:

编辑3:

Here is the updated code with the 3D data structure required for recurrent networks. (See answer by Lorrit). It does not work, though.

以下是具有循环网络所需的3D数据结构的更新代码。 (见Lorrit的回答)。但它不起作用。

EDIT 4: removed the extra comma after Activation('sigmoid'), shaped Y_train in the correct way. Still the same error.

编辑4:在激活('sigmoid')后删除额外的逗号,以正确的方式塑造Y_train。还是一样的错误。

import numpy as np

from keras.models import Sequential
from keras.layers import Dense,  Activation, LSTM

nb_timesteps    =  4
nb_features     =  5
batch_size      = 32

# load file
X_train = np.genfromtxt('table.csv', 
                        delimiter=',',  
                        names=None, 
                        unpack=False,
                        dtype=None)

# delete the first row with the names
X_train = np.delete(X_train, (0), axis=0)

# invert the order of the rows, so that the oldest
# entry is in the first row and the newest entry
# comes last
X_train = np.flipud(X_train)

# the last column is our Y
Y_train = X_train[:,6].astype(np.float32)

Y_train = np.delete(Y_train, range(0,6))
Y_train = np.array(Y_train)
Y_train.shape = (len(Y_train), 1)

# we don't use the timestamps. convert the rest to Float32
X_train = X_train[:, 1:6].astype(np.float32)

# shape X_train
X_train.shape = (1,len(X_train), nb_features)


# Now comes Lorrit's code for shaping the 3D-input-data
# http://*.com/questions/36992855/keras-how-should-i-prepare-input-data-for-rnn
flag = 0

for sample in range(X_train.shape[0]):
    tmp = np.array([X_train[sample,i:i+nb_timesteps,:] for i in range(X_train.shape[1] - nb_timesteps + 1)])

    if flag==0:
        new_input = tmp
        flag = 1

    else:
        new_input = np.concatenate((new_input,tmp))

X_train = np.delete(new_input, len(new_input) - 1, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
# X successfully shaped

# free some memory
tmp = None
new_input = None


# split data for training, validation and test
# 50:25:25
X_train, X_test = np.split(X_train, 2, axis=0)
X_valid, X_test = np.split(X_test, 2, axis=0)

Y_train, Y_test = np.split(Y_train, 2, axis=0)
Y_valid, Y_test = np.split(Y_test, 2, axis=0)


print('Build model...')

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

model.compile(loss='mse',
              optimizer='RMSprop',
              metrics=['accuracy'])

print('Train...')
print(X_train.shape)
print(Y_train.shape)
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=15,
          validation_data=(X_test, Y_test))
score, acc = model.evaluate(X_test, Y_test,
                            batch_size=batch_size)

print('Test score:', score)
print('Test accuracy:', acc)

There still seems to be an issue with the data, Keras says:

Keras说,数据似乎仍然存在问题:

Using Theano backend.
Using gpu device 0: GeForce GTX 960 (CNMeM is disabled, cuDNN not available)Build model...

Traceback (most recent call last):

  File "<ipython-input-1-3a6e9e045167>", line 1, in <module>
    runfile('C:/Users/admin/Documents/pycode/lstm/lstm5.py', wdir='C:/Users/admin/Documents/pycode/lstm')

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Users/admin/Documents/pycode/lstm/lstm5.py", line 79, in <module>
    Activation('sigmoid')

  File "d:\git\keras\keras\models.py", line 93, in __init__
    self.add(layer)

  File "d:\git\keras\keras\models.py", line 146, in add
    output_tensor = layer(self.outputs[0])

  File "d:\git\keras\keras\engine\topology.py", line 441, in __call__
    self.assert_input_compatibility(x)

  File "d:\git\keras\keras\engine\topology.py", line 382, in assert_input_compatibility
    str(K.ndim(x)))

Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2

3 个解决方案

#1


2  

In your model definition you placed a Dense layer before LSTM layer. You need to use TimeDistributed layer on Dense layer.

在模型定义中,您在LSTM图层之前放置了一个Dense图层。您需要在Dense图层上使用TimeDistributed图层。

Try to change

试着改变

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

to

model = Sequential([
    TimeDistributed(Dense(8, input_dim=nb_features, Activation='softmax')),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

#2


1  

You are still missing one preprocessing step before feeding the data to the LSTM. You will have to decide how many previous data samples (previous days) you want to include in the calculation of the current day's AdjClose. See my answer here on how to do that. Your data should then be 3-dimensional of shape (nb_samples, nb_included_previous_days, features).

在将数据提供给LSTM之前,您仍然缺少一个预处理步骤。您必须决定在计算当天的AdjClose时要包含的先前数据样本(前几天)。请参阅我的答案,了解如何做到这一点。然后,您的数据应为三维形状(nb_samples,nb_included_previous_days,features)。

Then you can feed the 3D to a standard LSTM layer with one output. This value you can compare to y_train and try to minimize the error. Remember to pick a loss function that is appropriate for regression, e.g. mean squared error.

然后,您可以使用一个输出将3D提供给标准LSTM图层。您可以将此值与y_train进行比较,并尝试将错误降至最低。请记住选择适合回归的损失函数,例如:均方误差。

#3


0  

Not sure if this is still relevant, but there is a great example of how to use LSTM networks for predicting time series on Dr. Jason Brownlees blog here

不确定这是否仍然相关,但有一个很好的例子,说明如何使用LSTM网络来预测Jason Brownlees博士博客的时间序列

I prepared an example on three noisy phase shifted sinusoids with different amplitudes. Not market data, but I assume, you assume one stock would say something about another.

我准备了三个具有不同幅度的噪声相移正弦波的例子。不是市场数据,但我认为,你假设一只股票会说另一种股票。

import numpy
import matplotlib.pyplot as plt
import pandas
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Reshape
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# generate sine wavepip
def make_sine_with_noise(_start, _stop, _step, _phase_shift, gain):
    x = numpy.arange(_start, _stop, step = _step)
    noise = numpy.random.uniform(-0.1, 0.1, size = len(x))
    y = gain*0.5*numpy.sin(x+_phase_shift)
    y = numpy.add(noise, y)
    return x, y
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1, look_ahead=1):
    dataX, dataY = [], []
    for i in range(len(dataset) - look_back - look_ahead - 1):
        a = dataset[i:(i + look_back), :]
        dataX.append(a)
        b = dataset[(i + look_back):(i + look_back + look_ahead), :]
        dataY.append(b)
    return numpy.array(dataX), numpy.array(dataY)
# fix random seed for reproducibility
numpy.random.seed(7)
# generate sine wave
x1, y1 = make_sine_with_noise(0, 200, 1/24, 0, 1)
x2, y2 = make_sine_with_noise(0, 200, 1/24, math.pi/4, 3)
x3, y3 = make_sine_with_noise(0, 200, 1/24, math.pi/2, 20)
# plt.plot(x1, y1)
# plt.plot(x2, y2)
# plt.plot(x3, y3)
# plt.show()
#transform to pandas dataframe
dataframe = pandas.DataFrame({'y1': y1, 'y2': y2, 'x3': y3})
dataset = dataframe.values
dataset = dataset.astype('float32')
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
#split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# reshape into X=t and Y=t+1
look_back = 10
look_ahead = 5
trainX, trainY = create_dataset(train, look_back, look_ahead)
testX, testY = create_dataset(test, look_back, look_ahead)
print(trainX.shape)
print(trainY.shape)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], trainX.shape[2]))
testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], testX.shape[2]))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(look_ahead, input_shape=(trainX.shape[1], trainX.shape[2]), return_sequences=True))
model.add(LSTM(look_ahead, input_shape=(look_ahead, trainX.shape[2])))
model.add(Dense(trainY.shape[1]*trainY.shape[2]))
model.add(Reshape((trainY.shape[1], trainY.shape[2])))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=1, batch_size=1, verbose=1)
# make prediction
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

#save model
model.save('my_sin_prediction_model.h5')

trainPredictPlottable = trainPredict[::look_ahead]
trainPredictPlottable = [item for sublist in trainPredictPlottable for item in sublist]
trainPredictPlottable = scaler.inverse_transform(numpy.array(trainPredictPlottable))
# create single testPredict array concatenating every 'look_ahed' prediction array
testPredictPlottable = testPredict[::look_ahead]
testPredictPlottable = [item for sublist in testPredictPlottable for item in sublist]
testPredictPlottable = scaler.inverse_transform(numpy.array(testPredictPlottable))
# testPredictPlottable = testPredictPlottable[:-look_ahead]
# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredictPlottable)+look_back, :] = trainPredictPlottable
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(dataset)-len(testPredictPlottable):len(dataset), :] = testPredictPlottable
# plot baseline and predictions
dataset = scaler.inverse_transform(dataset)
plt.plot(dataset, color='k')
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()

#1


2  

In your model definition you placed a Dense layer before LSTM layer. You need to use TimeDistributed layer on Dense layer.

在模型定义中,您在LSTM图层之前放置了一个Dense图层。您需要在Dense图层上使用TimeDistributed图层。

Try to change

试着改变

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

to

model = Sequential([
    TimeDistributed(Dense(8, input_dim=nb_features, Activation='softmax')),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

#2


1  

You are still missing one preprocessing step before feeding the data to the LSTM. You will have to decide how many previous data samples (previous days) you want to include in the calculation of the current day's AdjClose. See my answer here on how to do that. Your data should then be 3-dimensional of shape (nb_samples, nb_included_previous_days, features).

在将数据提供给LSTM之前,您仍然缺少一个预处理步骤。您必须决定在计算当天的AdjClose时要包含的先前数据样本(前几天)。请参阅我的答案,了解如何做到这一点。然后,您的数据应为三维形状(nb_samples,nb_included_previous_days,features)。

Then you can feed the 3D to a standard LSTM layer with one output. This value you can compare to y_train and try to minimize the error. Remember to pick a loss function that is appropriate for regression, e.g. mean squared error.

然后,您可以使用一个输出将3D提供给标准LSTM图层。您可以将此值与y_train进行比较,并尝试将错误降至最低。请记住选择适合回归的损失函数,例如:均方误差。

#3


0  

Not sure if this is still relevant, but there is a great example of how to use LSTM networks for predicting time series on Dr. Jason Brownlees blog here

不确定这是否仍然相关,但有一个很好的例子,说明如何使用LSTM网络来预测Jason Brownlees博士博客的时间序列

I prepared an example on three noisy phase shifted sinusoids with different amplitudes. Not market data, but I assume, you assume one stock would say something about another.

我准备了三个具有不同幅度的噪声相移正弦波的例子。不是市场数据,但我认为,你假设一只股票会说另一种股票。

import numpy
import matplotlib.pyplot as plt
import pandas
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Reshape
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# generate sine wavepip
def make_sine_with_noise(_start, _stop, _step, _phase_shift, gain):
    x = numpy.arange(_start, _stop, step = _step)
    noise = numpy.random.uniform(-0.1, 0.1, size = len(x))
    y = gain*0.5*numpy.sin(x+_phase_shift)
    y = numpy.add(noise, y)
    return x, y
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1, look_ahead=1):
    dataX, dataY = [], []
    for i in range(len(dataset) - look_back - look_ahead - 1):
        a = dataset[i:(i + look_back), :]
        dataX.append(a)
        b = dataset[(i + look_back):(i + look_back + look_ahead), :]
        dataY.append(b)
    return numpy.array(dataX), numpy.array(dataY)
# fix random seed for reproducibility
numpy.random.seed(7)
# generate sine wave
x1, y1 = make_sine_with_noise(0, 200, 1/24, 0, 1)
x2, y2 = make_sine_with_noise(0, 200, 1/24, math.pi/4, 3)
x3, y3 = make_sine_with_noise(0, 200, 1/24, math.pi/2, 20)
# plt.plot(x1, y1)
# plt.plot(x2, y2)
# plt.plot(x3, y3)
# plt.show()
#transform to pandas dataframe
dataframe = pandas.DataFrame({'y1': y1, 'y2': y2, 'x3': y3})
dataset = dataframe.values
dataset = dataset.astype('float32')
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
#split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# reshape into X=t and Y=t+1
look_back = 10
look_ahead = 5
trainX, trainY = create_dataset(train, look_back, look_ahead)
testX, testY = create_dataset(test, look_back, look_ahead)
print(trainX.shape)
print(trainY.shape)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], trainX.shape[2]))
testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], testX.shape[2]))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(look_ahead, input_shape=(trainX.shape[1], trainX.shape[2]), return_sequences=True))
model.add(LSTM(look_ahead, input_shape=(look_ahead, trainX.shape[2])))
model.add(Dense(trainY.shape[1]*trainY.shape[2]))
model.add(Reshape((trainY.shape[1], trainY.shape[2])))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=1, batch_size=1, verbose=1)
# make prediction
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

#save model
model.save('my_sin_prediction_model.h5')

trainPredictPlottable = trainPredict[::look_ahead]
trainPredictPlottable = [item for sublist in trainPredictPlottable for item in sublist]
trainPredictPlottable = scaler.inverse_transform(numpy.array(trainPredictPlottable))
# create single testPredict array concatenating every 'look_ahed' prediction array
testPredictPlottable = testPredict[::look_ahead]
testPredictPlottable = [item for sublist in testPredictPlottable for item in sublist]
testPredictPlottable = scaler.inverse_transform(numpy.array(testPredictPlottable))
# testPredictPlottable = testPredictPlottable[:-look_ahead]
# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredictPlottable)+look_back, :] = trainPredictPlottable
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(dataset)-len(testPredictPlottable):len(dataset), :] = testPredictPlottable
# plot baseline and predictions
dataset = scaler.inverse_transform(dataset)
plt.plot(dataset, color='k')
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()