在大多数Tensorflow代码中,我已经看到Adam Optimizer的学习率恒定1e-4(即0.0001)。该代码通常如下所示:
...build the model...
# Add the optimizer
train_op = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# Add the ops to initialize variables.  These will include 
# the optimizer slots added by AdamOptimizer().
init_op = tf.initialize_all_variables()
# launch the graph in a session
sess = tf.Session()
# Actually intialize the variables
sess.run(init_op)
# now train your model
for ...:
  sess.run(train_op)我想知道,使用亚当优化器时使用指数衰减是否有用,即使用以下代码:
...build the model...
# Add the optimizer
step = tf.Variable(0, trainable=False)
rate = tf.train.exponential_decay(0.15, step, 1, 0.9999)
optimizer = tf.train.AdamOptimizer(rate).minimize(cross_entropy, global_step=step)
# Add the ops to initialize variables.  These will include 
# the optimizer slots added by AdamOptimizer().
init_op = tf.initialize_all_variables()
# launch the graph in a session
sess = tf.Session()
# Actually intialize the variables
sess.run(init_op)
# now train your model
for ...:
  sess.run(train_op)通常,人们使用某种学习率衰减,对于亚当看来并不常见。有任何理论上的原因吗?将Adam优化器与衰减结合使用是否有用?
global_step参数minimize。参见编辑。
                1e-4= 0.0001,不是0.0004。