无法使此自动编码器网络正常运行(具有卷积层和maxpool层)


9

自动编码器网络似乎比普通分类器MLP网络更复杂。在使用Lasagne进行了几次尝试之后,我在重构输出中得到的所有内容在最好的情况下类似于MNIST数据库的所有图像的模糊平均,而没有区分输入位数是多少。

我选择的网络结构为以下层叠层:

  1. 输入层(28x28)
  2. 2D卷积层,滤镜尺寸7x7
  3. 最大汇聚层,大小3x3,步幅2x2
  4. 密集(完全连接)的展平层,10个单位(这是瓶颈)
  5. 密集(完全连接)层,共121个单元
  6. 将图层重塑为11x11
  7. 2D卷积层,滤镜大小3x3
  8. 2D放大层系数2
  9. 2D卷积层,滤镜大小3x3
  10. 2D放大层系数2
  11. 2D卷积层,滤镜尺寸5x5
  12. 功能最大池化(从31x28x28到28x28)

所有的2D卷积层都具有无偏差的偏置,S型激活和31个滤波器。

所有完全连接的层均具有S型激活。

使用的损失函数为平方误差,更新函数为adagrad。用于学习的块的长度是100个样本,乘以1000个纪元。

下面是该问题的说明:上面的行是设置为网络输入的一些样本,下面的行是重构:

自动编码器输入和输出

为了完整起见,以下是我使用的代码:

import theano.tensor as T
import theano
import sys
sys.path.insert(0,'./Lasagne') # local checkout of Lasagne
import lasagne
from theano import pp
from theano import function
import gzip
import numpy as np
from sklearn.preprocessing import OneHotEncoder
import matplotlib.pyplot as plt
def load_mnist():

    def load_mnist_images(filename):
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=16)
        # The inputs are vectors now, we reshape them to monochrome 2D images,
        # following the shape convention: (examples, channels, rows, columns)
        data = data.reshape(-1, 1, 28, 28)
        # The inputs come as bytes, we convert them to float32 in range [0,1].
        # (Actually to range [0, 255/256], for compatibility to the version
        # provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)
        return data / np.float32(256)

    def load_mnist_labels(filename):
        # Read the labels in Yann LeCun's binary format.
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=8)
        # The labels are vectors of integers now, that's exactly what we want.
        return data

    X_train = load_mnist_images('train-images-idx3-ubyte.gz')
    y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')
    X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')
    y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')
    return X_train, y_train, X_test, y_test

def plot_filters(conv_layer):
    W = conv_layer.get_params()[0]
    W_fn = theano.function([],W)
    params = W_fn()
    ks = np.squeeze(params)
    kstack = np.vstack(ks)
    plt.imshow(kstack,interpolation='none')
    plt.show()

def main():

    #theano.config.exception_verbosity="high"
    #theano.config.optimizer='None'

    X_train, y_train, X_test, y_test = load_mnist()
    ohe = OneHotEncoder()

    y_train = ohe.fit_transform(np.expand_dims(y_train,1)).toarray()
    chunk_len = 100
    visamount = 10
    num_epochs = 1000
    num_filters=31
    dropout_p=.0
    print "X_train.shape",X_train.shape,"y_train.shape",y_train.shape
    input_var = T.tensor4('X')
    output_var = T.tensor4('X')
    conv_nonlinearity = lasagne.nonlinearities.sigmoid
    net = lasagne.layers.InputLayer((chunk_len,1,28,28), input_var)
    conv1 = net = lasagne.layers.Conv2DLayer(net,num_filters,(7,7),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.MaxPool2DLayer(net,(3,3),stride=(2,2))
    net = lasagne.layers.DropoutLayer(net,p=dropout_p)
    #conv2_layer = lasagne.layers.Conv2DLayer(dropout_layer,num_filters,(3,3),nonlinearity=conv_nonlinearity)
    #pool2_layer = lasagne.layers.MaxPool2DLayer(conv2_layer,(3,3),stride=(2,2))
    net = lasagne.layers.DenseLayer(net,10,nonlinearity=lasagne.nonlinearities.sigmoid)

    #augment_layer1 = lasagne.layers.DenseLayer(reduction_layer,33,nonlinearity=lasagne.nonlinearities.sigmoid)
    net = lasagne.layers.DenseLayer(net,121,nonlinearity=lasagne.nonlinearities.sigmoid)

    net = lasagne.layers.ReshapeLayer(net,(chunk_len,1,11,11))

    net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.Upscale2DLayer(net,2)

    net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
    #pool_after0 = lasagne.layers.MaxPool2DLayer(conv_after1,(3,3),stride=(2,2))
    net = lasagne.layers.Upscale2DLayer(net,2)

    net = lasagne.layers.DropoutLayer(net,p=dropout_p)

    #conv_after2 = lasagne.layers.Conv2DLayer(upscale_layer1,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
    #pool_after1 = lasagne.layers.MaxPool2DLayer(conv_after2,(3,3),stride=(1,1))
    #upscale_layer2 = lasagne.layers.Upscale2DLayer(pool_after1,4)

    net = lasagne.layers.Conv2DLayer(net,num_filters,(5,5),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.FeaturePoolLayer(net,num_filters,pool_function=theano.tensor.max)
    print "output_shape:",lasagne.layers.get_output_shape(net)
    params = lasagne.layers.get_all_params(net, trainable=True)
    prediction = lasagne.layers.get_output(net)
    loss = lasagne.objectives.squared_error(prediction, output_var)
    #loss = lasagne.objectives.binary_crossentropy(prediction, output_var)
    aggregated_loss = lasagne.objectives.aggregate(loss)
    updates = lasagne.updates.adagrad(aggregated_loss,params)
    train_fn = theano.function([input_var, output_var], loss, updates=updates)

    test_prediction = lasagne.layers.get_output(net, deterministic=True)
    predict_fn = theano.function([input_var], test_prediction)

    print "starting training..."
    for epoch in range(num_epochs):
        selected = list(set(np.random.random_integers(0,59999,chunk_len*4)))[:chunk_len]
        X_train_sub = X_train[selected,:]
        _loss = train_fn(X_train_sub, X_train_sub)
        print("Epoch %d: Loss %g" % (epoch + 1, np.sum(_loss) / len(X_train)))
        """
        chunk = X_train[0:chunk_len,:,:,:]
        result = predict_fn(chunk)
        vis1 = np.hstack([chunk[j,0,:,:] for j in range(visamount)])
        vis2 = np.hstack([result[j,0,:,:] for j in range(visamount)])
        plt.imshow(np.vstack([vis1,vis2]))
        plt.show()
        """
    print "done."

    chunk = X_train[0:chunk_len,:,:,:]
    result = predict_fn(chunk)
    print "chunk.shape",chunk.shape
    print "result.shape",result.shape
    plot_filters(conv1)
    for i in range(chunk_len/visamount):
        vis1 = np.hstack([chunk[i*visamount+j,0,:,:] for j in range(visamount)])
        vis2 = np.hstack([result[i*visamount+j,0,:,:] for j in range(visamount)])
        plt.imshow(np.vstack([vis1,vis2]))
        plt.show()
    import ipdb; ipdb.set_trace()

if __name__ == "__main__":
    main()

关于如何改善此网络以获得功能合理的自动编码器的任何想法?

问题解决了!

在完全不同的实现方式下,在卷积层中使用泄漏整流器而不是Sigmoid函数,瓶颈层中只有2个(!!)节点,并且在最后使用1x1内核进行卷积。

这是一些重建的结果:

在此处输入图片说明

码:

import theano.tensor as T
import theano
import sys
sys.path.insert(0,'./Lasagne') # local checkout of Lasagne
import lasagne
from theano import pp
from theano import function
import theano.tensor.nnet
import gzip
import numpy as np
from sklearn.preprocessing import OneHotEncoder
import matplotlib.pyplot as plt
def load_mnist():

    def load_mnist_images(filename):
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=16)
        # The inputs are vectors now, we reshape them to monochrome 2D images,
        # following the shape convention: (examples, channels, rows, columns)
        data = data.reshape(-1, 1, 28, 28)
        # The inputs come as bytes, we convert them to float32 in range [0,1].
        # (Actually to range [0, 255/256], for compatibility to the version
        # provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)
        return data / np.float32(256)

    def load_mnist_labels(filename):
        # Read the labels in Yann LeCun's binary format.
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=8)
        # The labels are vectors of integers now, that's exactly what we want.
        return data

    X_train = load_mnist_images('train-images-idx3-ubyte.gz')
    y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')
    X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')
    y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')
    return X_train, y_train, X_test, y_test

def main():

    X_train, y_train, X_test, y_test = load_mnist()
    ohe = OneHotEncoder()

    y_train = ohe.fit_transform(np.expand_dims(y_train,1)).toarray()
    chunk_len = 100
    num_epochs = 10000
    num_filters=7
    input_var = T.tensor4('X')
    output_var = T.tensor4('X')
    #conv_nonlinearity = lasagne.nonlinearities.sigmoid
    #conv_nonlinearity = lasagne.nonlinearities.rectify
    conv_nonlinearity = lasagne.nonlinearities.LeakyRectify(.1)
    softplus = theano.tensor.nnet.softplus
    #conv_nonlinearity = theano.tensor.nnet.softplus
    net = lasagne.layers.InputLayer((chunk_len,1,28,28), input_var)
    conv1 = net = lasagne.layers.Conv2DLayer(net,num_filters,(7,7),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.MaxPool2DLayer(net,(3,3),stride=(2,2))
    net = lasagne.layers.DenseLayer(net,2,nonlinearity=lasagne.nonlinearities.sigmoid)
    net = lasagne.layers.DenseLayer(net,49,nonlinearity=lasagne.nonlinearities.sigmoid)
    net = lasagne.layers.ReshapeLayer(net,(chunk_len,1,7,7))
    net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.MaxPool2DLayer(net,(3,3),stride=(1,1))
    net = lasagne.layers.Upscale2DLayer(net,4)
    net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.MaxPool2DLayer(net,(3,3),stride=(1,1))
    net = lasagne.layers.Upscale2DLayer(net,4)
    net = lasagne.layers.Conv2DLayer(net,num_filters,(5,5),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.Conv2DLayer(net,num_filters,(1,1),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.FeaturePoolLayer(net,num_filters,pool_function=theano.tensor.max)
    net = lasagne.layers.Conv2DLayer(net,1,(1,1),nonlinearity=conv_nonlinearity,untie_biases=True)
    print "output shape:",net.output_shape
    params = lasagne.layers.get_all_params(net, trainable=True)
    prediction = lasagne.layers.get_output(net)
    loss = lasagne.objectives.squared_error(prediction, output_var)
    #loss = lasagne.objectives.binary_hinge_loss(prediction, output_var)
    aggregated_loss = lasagne.objectives.aggregate(loss)
    #updates = lasagne.updates.adagrad(aggregated_loss,params)
    updates = lasagne.updates.nesterov_momentum(aggregated_loss,params,0.5)#.005
    train_fn = theano.function([input_var, output_var], loss, updates=updates)

    test_prediction = lasagne.layers.get_output(net, deterministic=True)
    predict_fn = theano.function([input_var], test_prediction)

    print "starting training..."
    for epoch in range(num_epochs):
        selected = list(set(np.random.random_integers(0,59999,chunk_len*4)))[:chunk_len]
        X_train_sub = X_train[selected,:]
        _loss = train_fn(X_train_sub, X_train_sub)
        print("Epoch %d: Loss %g" % (epoch + 1, np.sum(_loss) / len(X_train)))
    print "done."

    chunk = X_train[0:chunk_len,:,:,:]
    result = predict_fn(chunk)
    print "chunk.shape",chunk.shape
    print "result.shape",result.shape
    visamount = 10
    for i in range(10):
        vis1 = np.hstack([chunk[i*visamount+j,0,:,:] for j in range(visamount)])
        vis2 = np.hstack([result[i*visamount+j,0,:,:] for j in range(visamount)])
        plt.imshow(np.vstack([vis1,vis2]))
        plt.show()

    import ipdb; ipdb.set_trace()
if __name__ == "__main__":
    main()

Answers:


4

通过可视化权重而不仅仅是重构,您可能会获得更多的见解。当我的偏见配置错误时,我遇到了类似的问题。下面的所有内容都是根据我编写自己的学习库的经验编写的。您可以在Github上的http://github.com/josephcatrambone/aij上查看代码。

这是没有偏差时我的程序的屏幕截图。由于我急于完成此撰写,因此仅十个星期之后:

只有重量-没有偏见。

权重更新通过以下操作完成:

weights.add_i(positiveProduct.subtract(negativeProduct).elementMultiply(learningRate / (float) batchSize));
//visibleBias.add_i(batch.subtract(negativeVisibleProbabilities).meanRow().elementMultiply(learningRate));
//hiddenBias.add_i(positiveHiddenProbabilities.subtract(negativeHiddenProbabilities).meanRow().elementMultiply(learningRate));

如果取消注释可见的偏见代码,则会得到以下结果:

纠正可见的偏差。

如果我拧紧可见偏置代码的符号(减去而不是加):

visibleBias.subtract_i(batch.subtract(negativeVisibleProbabilities).meanRow().elementMultiply(learningRate));

我得到这张图片:

反向偏置符号。

哪些雪球,最终达到您上面所拥有的水平。检查错误功能的标志。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.