如何在Keras中实现“一对多”和“多对多”序列预测？

13

我很难解释一对一（例如，单个图像的分类）和多对多（例如，图像序列的分类）序列标签的Keras编码差异。我经常看到两种不同的代码：

类型1是没有应用TimeDistributed的地方，如下所示：

model=Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode="valid", input_shape=[1, 56,14]))
model.add(Activation("relu"))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=pool_size))

model.add(Reshape((56*14,)))
model.add(Dropout(0.25))
model.add(LSTM(5))
model.add(Dense(50))
model.add(Dense(nb_classes))
model.add(Activation("softmax"))

类型2是应用TimeDistributed的地方，如下所示：

model = Sequential()

model.add(InputLayer(input_shape=(5, 224, 224, 3)))
model.add(TimeDistributed(Convolution2D(64, (3, 3))))
model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2))))
model.add(LSTM(10))
model.add(Dense(3))

我的问题是：

我的假设是正确的，类型1是一对多类型，类型2是多对多类型吗？还是TimeDistributed在这方面没有关联？
在一对多或多对多的情况下，最后一个密集层应为1个节点“长”（依次仅发出一个值），
而先前的循环层负责确定有多少个
1长发射的价值？或者最后一个密集层应该由N个节点组成，其中N=max sequence length？如果是这样，
当我们可以
使用N个并行“原始”估计量产生具有多个输出的相似输入时，在这里使用RNN 有什么意义？
如何定义RNN中的时间步数？它是某种程度上
与输出序列长度相关，还是只是
需要调整的超参数？
上面我的Type 1示例的Inn案例，
当模型仅发出一个（可能的
nb_classes）类别预测时，应用LSTM有什么意义？如果省略了LSTM层怎么办？

— 亨德里克
source

您能否提供两个模型的摘要？

— 法迪·巴库拉

2

使用任何循环层的目的是使输出不仅是与其他项目无关的单个项目的结果，而且还应是一系列项目的结果，这样，该层对序列中一项上的操作的输出就是结果该项目以及序列中之前的任何项目的结果。时间步的数量定义了这样一个序列有多长时间。也就是说，应按顺序处理多少个项目，并且影响彼此的结果输出。

LSTM层以这样的方式运行：它接受形式为number_of_timesteps，dimension_of_each_item的输入。如果参数return_sequences设置为False（默认情况下为False），则该层会将所有时间步长的输入“组合”为单个输出。如果考虑一个序列（例如10个项目），则return_sequences设置为False的LSTM层将从该序列中产生单个输出项目，并且该单个项目的属性将是序列。对于多对一设计，这就是您想要的。

将return_sequences设置为True的LSTM层将为输入序列中的每个项目（时间步）产生一个输出。这样做的方式是，在任何时间步长，输出将不仅取决于当前正在操作的项目，而且还取决于序列中的先前项目。对于多对多设计，这就是您想要的。

由于LSTM层将一系列项目作为输入，因此模型中LSTM层之前的任何层都需要产生一个序列作为输出。对于您的Type 1模型，前几层不对序列进行操作，而一次只能对一个项目进行操作。因此，这不会产生针对LSTM进行操作的一系列项目。

使用TimeDistributed可以使层对序列中的每个项目进行操作，而项目之间不会相互影响。因此，TimeDistributed层对项目序列进行操作，但是没有递归。

对于您的2型模型，第一层将产生5个时间步长的序列，并且序列中每个项目的操作都将彼此独立，因为封装在TimeDistributed中的层是非周期性的。由于LSTM层使用默认设置return_sequences = False，因此LSTM层将为5个项目的每个此类序列生成单个输出。

模型中输出节点的最终数量完全取决于用例。单个节点适合于二进制分类之类的东西，也适合于产生某种分数。

— Mevoki
source

1

我认为您也许可以使用我以前的作品。在此代码中，我创建了（随机波长和相位的）正弦波，并将LSTM训练为来自这些正弦波的一系列点，并输出150个点的序列来完成每个正弦波。

这是模型：

    features_num=5 
    latent_dim=40

    ##
    encoder_inputs = Input(shape=(None, features_num))
    encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoder_inputs)
    encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
    encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
    encoded = LSTM(latent_dim, return_state=True)(encoded)

    encoder = Model (input=encoder_inputs, output=encoded)
    ##

    encoder_outputs, state_h, state_c = encoder(encoder_inputs)
    encoder_states = [state_h, state_c]

    decoder_inputs=Input(shape=(1, features_num))
    decoder_lstm_1 = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_lstm_2 = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_lstm_3 = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_lstm_4 = LSTM(latent_dim, return_sequences=True, return_state=True)

    decoder_dense = Dense(features_num)

    all_outputs = []
    inputs = decoder_inputs


    states_1=encoder_states
   # Place holder values:
    states_2=states_1; states_3=states_1; states_4=states_1

    for _ in range(1):
        # Run the decoder on the first timestep
        outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
        outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1)
        outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2)
        outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3)

        # Store the current prediction (we will concatenate all predictions later)
        outputs = decoder_dense(outputs_4)
        all_outputs.append(outputs)
        # Reinject the outputs as inputs for the next loop iteration
        # as well as update the states
        inputs = outputs
        states_1 = [state_h_1, state_c_1]
        states_2 = [state_h_2, state_c_2]
        states_3 = [state_h_3, state_c_3]
        states_4 = [state_h_4, state_c_4]


    for _ in range(149):
        # Run the decoder on each timestep
        outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
        outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1, initial_state=states_2)
        outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2, initial_state=states_3)
        outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3, initial_state=states_4)

        # Store the current prediction (we will concatenate all predictions later)
        outputs = decoder_dense(outputs_4)
        all_outputs.append(outputs)
        # Reinject the outputs as inputs for the next loop iteration
        # as well as update the states
        inputs = outputs
        states_1 = [state_h_1, state_c_1]
        states_2 = [state_h_2, state_c_2]
        states_3 = [state_h_3, state_c_3]
        states_4 = [state_h_4, state_c_4]


    # Concatenate all predictions
    decoder_outputs = Lambda(lambda x: K.concatenate(x, axis=1))(all_outputs)   

    model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

    #model = load_model('pre_model.h5')


    print(model.summary())

这是整个脚本：

from keras.models import Model
from keras.layers import Input, LSTM, Dense, TimeDistributed,Lambda, Dropout, Activation ,RepeatVector
from keras.callbacks import ModelCheckpoint 
import numpy as np

from keras.layers import Lambda
from keras import backend as K

from keras.models import load_model

import os


features_num=5 
latent_dim=40

##
encoder_inputs = Input(shape=(None, features_num))
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoder_inputs)
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
encoded = LSTM(latent_dim, return_state=True)(encoded)

encoder = Model (input=encoder_inputs, output=encoded)
##

encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]

decoder_inputs=Input(shape=(1, features_num))
decoder_lstm_1 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_2 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_3 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_4 = LSTM(latent_dim, return_sequences=True, return_state=True)

decoder_dense = Dense(features_num)

all_outputs = []
inputs = decoder_inputs

# Place holder values:
states_1=encoder_states
states_2=states_1; states_3=states_1; states_4=states_1

for _ in range(1):
    # Run the decoder on one timestep
    outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
    outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1)
    outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2)
    outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3)

    # Store the current prediction (we will concatenate all predictions later)
    outputs = decoder_dense(outputs_4)
    all_outputs.append(outputs)
    # Reinject the outputs as inputs for the next loop iteration
    # as well as update the states
    inputs = outputs
    states_1 = [state_h_1, state_c_1]
    states_2 = [state_h_2, state_c_2]
    states_3 = [state_h_3, state_c_3]
    states_4 = [state_h_4, state_c_4]


for _ in range(149):
    # Run the decoder on one timestep
    outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
    outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1, initial_state=states_2)
    outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2, initial_state=states_3)
    outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3, initial_state=states_4)

    # Store the current prediction (we will concatenate all predictions later)
    outputs = decoder_dense(outputs_4)
    all_outputs.append(outputs)
    # Reinject the outputs as inputs for the next loop iteration
    # as well as update the states
    inputs = outputs
    states_1 = [state_h_1, state_c_1]
    states_2 = [state_h_2, state_c_2]
    states_3 = [state_h_3, state_c_3]
    states_4 = [state_h_4, state_c_4]


# Concatenate all predictions
decoder_outputs = Lambda(lambda x: K.concatenate(x, axis=1))(all_outputs)   

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

#model = load_model('pre_model.h5')


print(model.summary())


model.compile(loss='mean_squared_error', optimizer='adam')


def create_wavelength(min_wavelength, max_wavelength, fluxes_in_wavelength, category )  :         
#category :: 0 - train ; 2 - validate ; 4- test. 1;3;5 - dead space
    c=(category+np.random.random())/6         
    k = fluxes_in_wavelength
#
    base= (np.trunc(k*np.random.random()*(max_wavelength-min_wavelength))       +k*min_wavelength)  /k
    answer=base+c/k
    return (answer)       

def make_line(length,category):
    shift= np.random.random()
    wavelength = create_wavelength(30,10,1,category)
    a=np.arange(length)
    answer=np.sin(a/wavelength+shift)
    return answer

def make_data(seq_num,seq_len,dim,category):
    data=np.array([]).reshape(0,seq_len,dim)
    for i in range (seq_num):
        mini_data=np.array([]).reshape(0,seq_len)
        for j in range (dim):
            line = make_line(seq_len,category)
            line=line.reshape(1,seq_len)            
            mini_data=np.append(mini_data,line,axis=0)
        mini_data=np.swapaxes(mini_data,1,0)
        mini_data=mini_data.reshape(1,seq_len,dim)      
        data=np.append(data,mini_data,axis=0)
    return (data)


def train_generator():
    while True:
        sequence_length = np.random.randint(150, 300)+150       
        data=make_data(1000,sequence_length,features_num,0) # category=0 in train


    #   decoder_target_data is the same as decoder_input_data but offset by one timestep

        encoder_input_data = data[:,:-150,:] # all but last 150 

        decoder_input_data = data[:,-151,:] # the one before the last 150.
        decoder_input_data=decoder_input_data.reshape((decoder_input_data.shape[0],1,decoder_input_data.shape[1]))


        decoder_target_data = (data[:, -150:, :]) # last 150        
        yield [encoder_input_data, decoder_input_data], decoder_target_data
def val_generator():
    while True:

        sequence_length = np.random.randint(150, 300)+150       
        data=make_data(1000,sequence_length,features_num,2) # category=2 in val

        encoder_input_data = data[:,:-150,:] # all but last 150 

        decoder_input_data = data[:,-151,:] # the one before the last 150.
        decoder_input_data=decoder_input_data.reshape((decoder_input_data.shape[0],1,decoder_input_data.shape[1]))

        decoder_target_data = (data[:, -150:, :]) # last 150        
        yield [encoder_input_data, decoder_input_data], decoder_target_data

filepath_for_w= 'flux_p2p_s2s_model.h5' 
checkpointer=ModelCheckpoint(filepath_for_w, monitor='val_loss', verbose=0, save_best_only=True, mode='auto', period=1)     
model.fit_generator(train_generator(),callbacks=[checkpointer], steps_per_epoch=30, epochs=2000, verbose=1,validation_data=val_generator(),validation_steps=30)
model.save(filepath_for_w)




def predict_wave(input_wave,input_for_decoder):  # input wave= x[n,:,:], ie points except the last 150; each wave has feature_num features. run this function for all such instances (=n)   
    #print (input_wave.shape)
    #print (input_for_decoder.shape)
    pred= model.predict([input_wave,input_for_decoder])

    return pred

def predict_many_waves_from_input(x):   
    x, x2=x # x == encoder_input_data ; x==2 decoder_input_data

    instance_num= x.shape[0]


    multi_predict_collection=np.zeros((x.shape[0],150,x.shape[2]))

    for n in range(instance_num):
        input_wave=x[n,:,:].reshape(1,x.shape[1],x.shape[2])
        input_for_decoder=x2[n,:,:].reshape(1,x2.shape[1],x2.shape[2])
        wave_prediction=predict_wave(input_wave,input_for_decoder)
        multi_predict_collection[n,:,:]=wave_prediction
    return (multi_predict_collection)

def test_maker():
    if True:        
        sequence_length = np.random.randint(150, 300)+150       
        data=make_data(470,sequence_length,features_num,4) # category=4 in test

        encoder_input_data = data[:,:-150,:] # all but last 150 

        decoder_input_data = data[:,-151,:] # the one before the last 150.
        decoder_input_data=decoder_input_data.reshape((decoder_input_data.shape[0],1,decoder_input_data.shape[1]))

        decoder_target_data = (data[:, -150:, :]) # last 150        
        return [encoder_input_data, decoder_input_data],    decoder_target_data

x,y= test_maker()   



a=predict_many_waves_from_input (x) # is that right..?
x=x[0] # keep the wave (generated data except last 150 time points) 
print (x.shape)
print (y.shape)
print (a.shape)

np.save ('a.npy',a)
np.save ('y.npy',y)
np.save ('x.npy',x)



print (np.mean(np.absolute(y[:,:,0]-a[:,:,0])))
print (np.mean(np.absolute(y[:,:,1]-a[:,:,1])))
print (np.mean(np.absolute(y[:,:,2]-a[:,:,2])))
print (np.mean(np.absolute(y[:,:,3]-a[:,:,3])))
print (np.mean(np.absolute(y[:,:,4]-a[:,:,4])))

— 拉斐特
source