我正在尝试预测一个时间序列,并寻找一些关于如何使用这个LSTM预测2021 -2025年的指导
我想知道是否有一句简单的话我遗漏了将预测延伸到数据集之外。我真的只需要预测y,但我意识到我可能需要预测额外的功能来预测未来的y,这就是我被难倒的地方。
df_train = df.iloc[0:32]
df_test = df.iloc[32:]
df = df.values
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(df)
#divide the data into train and test data
train_size = int(len(dataset) * 0.80)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
#index the data into dependent and independent variables
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
#convert data into suitable dimension for using it as input in LSTM network
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
model = Sequential()
model.add(LSTM(120, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
history = model.fit(train_X, train_y, epochs=120, batch_size=64, validation_data=(test_X, test_y), verbose=2, shuffle=False)
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()
#prediction on training and testing data
train_predict = model.predict(train_X)
test_predict = model.predict(test_X)
#converting from three dimension to two dimension
train_X = train_X.reshape((train_X.shape[0], train_X.shape[2]))
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
inv_train_predict = concatenate((train_predict, train_X), axis=1)
inv_test_predict = concatenate((test_predict, test_X), axis=1)
#transforming to original scale
inv_train_predict = scaler.inverse_transform(inv_train_predict)
inv_test_predict = scaler.inverse_transform(inv_test_predict)
#predicted values on training data
inv_train_predict = inv_train_predict[:,0]
inv_train_predict
#predicted values on testing data
inv_test_predict = inv_test_predict[:,0]
inv_test_predict
#scaling back the original train labels
train_y = train_y.reshape((len(train_y), 1))
inv_train_y = concatenate((train_y, train_X), axis=1)
inv_train_y = scaler.inverse_transform(inv_train_y)
inv_train_y = inv_train_y[:,0]
#scaling back the original test labels
test_y = test_y.reshape((len(test_y), 1))
inv_test_y = concatenate((test_y, test_X), axis=1)
inv_test_y = scaler.inverse_transform(inv_test_y)
inv_test_y = inv_test_y[:,0]
#calculating rmse on train data
rmse_train = sqrt(mean_squared_error(inv_train_y, inv_train_predict))
print('Test RMSE: %.3f' % rmse_train)
#calculating rmse on test data
rmse_test = sqrt(mean_squared_error(inv_test_y, inv_test_predict))
print('Test RMSE: %.3f' % rmse_test)
#plotting the graph of test actual vs predicted
inv_test_y = inv_test_y.reshape(-1,1)
inv_test_y.shape
t = np.arange(0,8,1)
plt.plot(t,inv_test_y,label="actual")
plt.plot(t,inv_test_predict,'r',label="predicted")
plt.show()
#plotting the graph to show multi step prediction
plt.figure(figsize=(25, 10))
plt.plot(df_train.index, inv_train_predict,label="actual")
plt.plot(df_test.index, inv_test_predict, color='r',label="predicted")
plt.legend(loc='best', fontsize='xx-large')
plt.xticks(fontsize=18)
plt.yticks(fontsize=16)
plt.show()
现在,它绘制了y的预测值,这似乎与我正在寻找的值一致,但我不知道如何使其超出我的数据集。
有什么建议吗?
发布于 2020-09-30 04:43:14
实现这一点的一种方法是生成对应于某个时间间隔的Y的时间滞后版本,其中n
是滞后Y的时间间隔的数量。在这种情况下,预测/回归将“提前n个时间步”完成。
一个简单的例子。让我们考虑以下数据集。
X Y
_____ _____
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
在使用n = 3
生成时滞版本后,我们获得了以下内容:
X Y
_____ _____
1 16
2 25
3 36
4 49
5 64
6 81
7 100
这种方法不限于一个时间延迟,可以生成所需的任意多个时间延迟。确定要生成哪些时滞的一种方法是在X和Y的不同时滞之间进行相关性分析。然后,从相关值的池中选择所生成的在所需阈值内的时滞序列。最后,这种方法是有代价的,max(n1, n2, ..., nN-1, nN)
值会从数据集中丢失。
https://stackoverflow.com/questions/64125477
复制相似问题