LSTM时间序列预测的预测间隔

14

是否有一种方法可以根据LSTM（或其他递归）神经网络在时间序列预测周围计算预测间隔（概率分布）？

假设举例来说，根据最近观察到的10个样本（t-9至t），我预测了10个样本（t + 1至t + 10），我希望在t + 1的预测会更多比t + 10时的预测准确。通常，可能会在预测周围绘制误差线以显示间隔。使用ARIMA模型（在正态分布误差的假设下），我可以围绕每个预测值计算预测间隔（例如95％）。我可以从LSTM模型中计算出相同的值（或与预测间隔有关的值）吗？

我一直在Keras / Python的LSTMs，下面很多来自例子machinelearningmastery.com，从我的示例代码（见下文）的基础上的。我正在考虑将问题重新分类为离散的分类，因为这会使每个类产生置信度，但这似乎是一个糟糕的解决方案。

有几个类似的主题（例如以下主题），但是似乎没有什么可以直接解决LSTM（或其他）神经网络的预测间隔问题：

/stats/25055/how-to-calculate-the-confidence-interval-for-time-series-prediction

使用ARIMA和LSTM进行时间序列预测

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sin
from matplotlib import pyplot
import numpy as np

# Build an LSTM network and train
def fit_lstm(X, y, batch_size, nb_epoch, neurons):
    X = X.reshape(X.shape[0], 1, X.shape[1]) # add in another dimension to the X data
    y = y.reshape(y.shape[0], y.shape[1])      # but don't add it to the y, as Dense has to be 1d?
    model = Sequential()
    model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
    model.add(Dense(y.shape[1]))
    model.compile(loss='mean_squared_error', optimizer='adam')
    for i in range(nb_epoch):
        model.fit(X, y, epochs=1, batch_size=batch_size, verbose=1, shuffle=False)
        model.reset_states()
    return model

# Configuration
n = 5000    # total size of dataset
SLIDING_WINDOW_LENGTH = 30
SLIDING_WINDOW_STEP_SIZE = 1
batch_size = 10
test_size = 0.1 # fraction of dataset to hold back for testing
nb_epochs = 100 # for training
neurons = 8 # LSTM layer complexity

# create dataset
#raw_values = [sin(i/2) for i in range(n)]  # simple sine wave
raw_values = [sin(i/2)+sin(i/6)+sin(i/36)+np.random.uniform(-1,1) for i in range(n)]  # double sine with noise
#raw_values = [(i%4) for i in range(n)] # saw tooth

all_data = np.array(raw_values).reshape(-1,1) # make into array, add anothe dimension for sci-kit compatibility

# data is segmented using a sliding window mechanism
all_data_windowed = [np.transpose(all_data[idx:idx+SLIDING_WINDOW_LENGTH]) for idx in np.arange(0,len(all_data)-SLIDING_WINDOW_LENGTH, SLIDING_WINDOW_STEP_SIZE)]
all_data_windowed = np.concatenate(all_data_windowed, axis=0).astype(np.float32)

# split data into train and test-sets
# round datasets down to a multiple of the batch size
test_length = int(round((len(all_data_windowed) * test_size) / batch_size) * batch_size)
train, test = all_data_windowed[:-test_length,:], all_data_windowed[-test_length:,:]
train_length = int(np.floor(train.shape[0] / batch_size)*batch_size) 
train = train[:train_length,...]

half_size = int(SLIDING_WINDOW_LENGTH/2) # split the examples half-half, to forecast the second half
X_train, y_train = train[:,:half_size], train[:,half_size:]
X_test, y_test = test[:,:half_size], test[:,half_size:]

# fit the model
lstm_model = fit_lstm(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epochs, neurons=neurons)

# forecast the entire training dataset to build up state for forecasting
X_train_reshaped = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
lstm_model.predict(X_train_reshaped, batch_size=batch_size)

# predict from test dataset
X_test_reshaped = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
yhat = lstm_model.predict(X_test_reshaped, batch_size=batch_size)

#%% Plot prediction vs actual

x_axis_input = range(half_size)
x_axis_output = [x_axis_input[-1]] + list(half_size+np.array(range(half_size)))

fig = pyplot.figure()
ax = fig.add_subplot(111)
line1, = ax.plot(x_axis_input,np.zeros_like(x_axis_input), 'r-')
line2, = ax.plot(x_axis_output,np.zeros_like(x_axis_output), 'o-')
line3, = ax.plot(x_axis_output,np.zeros_like(x_axis_output), 'g-')
ax.set_xlim(np.min(x_axis_input),np.max(x_axis_output))
ax.set_ylim(-4,4)
pyplot.legend(('Input','Actual','Predicted'),loc='upper left')
pyplot.show()

# update plot in a loop
for idx in range(y_test.shape[0]):

    sample_input = X_test[idx]
    sample_truth = [sample_input[-1]] + list(y_test[idx]) # join lists
    sample_predicted = [sample_input[-1]] + list(yhat[idx])

    line1.set_ydata(sample_input)
    line2.set_ydata(sample_truth)
    line3.set_ydata(sample_predicted)
    fig.canvas.draw()
    fig.canvas.flush_events()

    pyplot.pause(.25)

— 4小时4
source

10

直接地，这是不可能的。但是，如果以其他方式对其进行建模，则可以得出置信区间。您可以代替正常回归方法来估计连续概率分布。通过对每个步骤执行此操作，可以绘制分布图。做到这一点的方法是内核混合网络（https://janvdvegt.github.io/2017/06/07/Kernel-Mixture-Networks.html，披露，我的博客）或密度混合网络（http：//www.cedar .buffalo.edu /〜srihari / CSE574 / Chap5 / Chap5.7-MixDensityNetworks.pdf），第一个使用内核作为基础并估计这些内核的混合，第二个估计混合的分布，包括每个参数的参数分布。您使用对数似然来训练模型。

对不确定性进行建模的另一种方法是在训练过程中然后在推理过程中使用辍学。您多次执行此操作，并且每次从后验中获取样本时都要这样做。您没有发行版，只有样本，但这是最容易实现的，并且效果很好。

在您的情况下，您必须考虑生成t + 2到t + 10的方式。根据您当前的设置，您可能必须从上一个时间步中采样，并为下一个时间步进行输入。第一种方法或第二种方法都无法很好地工作。如果每个时间步长有10个输出（从t + 1到t + 10），那么所有这些方法都比较干净，但直观性却有所降低。

— 扬·范德维格
source

2

使用混合网络很有趣，我将尝试实现它。这里有一些关于使用辍学的可靠研究：arxiv.org/abs/1709.01907和arxiv.org/abs/1506.02142

— 4Oh4

有关辍学的注意事项，您实际上可以计算蒙特卡洛辍学的预测方差，并将其用作不确定性的量化

— Charles Chow

@CharlesChow是正确的，但是在这种情况下构建可信区间的方法很差。由于可能存在非常偏斜的分布，因此最好对值进行排序并使用分位数。

— Jan van der Vegt

同意@JanvanderVegt，但是您仍然可以在不假设输出分布的情况下估计MC辍学的统计信息，我的意思是您还可以使用百分位或自举来构建MC辍学的CI

— Charles Chow

2

作为流行语，保形预测可能会让您感兴趣，因为它在许多情况下都可以工作-特别是它不需要正态分布误差，并且几乎可以用于任何机器学习模型。

Scott Locklin和Henrik Linusson给出了两个很好的介绍。

— 鲍里斯·W
source

1

我将稍有分歧，认为在实践中计算置信区间通常不是一件有价值的事情。原因是总是需要做出很多假设。即使是最简单的线性回归，也需要

线性关系。
多元正态性。
没有或很少有多重共线性。
无自相关。
同方性。

一种更为实用的方法是进行蒙特卡洛模拟。如果您已经知道或愿意对输入变量的分布进行假设，请拿一大堆样本并将其提供给您LSTM，现在您可以凭经验计算“置信区间”了。

— 路易·T
source

1

是的你可以。您唯一需要更改的是损失函数。实现分位数回归中使用的损失函数并将其积分。另外，您还想看看如何评估这些间隔。为此，我将使用ICP，MIL和RMIL指标。

— 英吾
source