前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >3.1 学习率(learning rate)的选择

3.1 学习率(learning rate)的选择

作者头像
锦小年
发布2019-05-28 18:19:47
1.2K0
发布2019-05-28 18:19:47
举报
文章被收录于专栏:锦小年的博客锦小年的博客

版权声明:本文为博主原创文章,未经博主允许不得转载。python版本为python3,实例都是经过实际验证。 https://cloud.tencent.com/developer/article/1437411

文章目录

- [1. 什么是学习率](https://cloud.tencent.com/developer/audit/support-plan/4869456#1__1)
- [2. 学习率指数衰减机制](https://cloud.tencent.com/developer/audit/support-plan/4869456#2__11)
- [3. 实例解析](https://cloud.tencent.com/developer/audit/support-plan/4869456#3__41)
- [4. 总结](https://cloud.tencent.com/developer/audit/support-plan/4869456#4__283)

1. 什么是学习率

调参的第一步是知道这个参数是什么,它的变化对模型有什么影响。

(1)要理解学习率是什么,首先得弄明白神经网络参数更新的机制-梯度下降+反向传播。参考资料:https://www.cnblogs.com/softzrp/p/6718909.html。

总结一句话:将输出误差反向传播给网络参数,以此来拟合样本的输出。本质上是最优化的一个过程,逐步趋向于最优解。但是每一次更新参数利用多少误差,就需要通过一个参数来控制,这个参数就是学习率(Learning rate),也称为步长。从bp算法的公式可以更好理解:

(2)学习率对模型的影响

从公式就可以看出,学习率越大,输出误差对参数的影响就越大,参数更新的就越快,但同时受到异常数据的影响也就越大,很容易发散。

2. 学习率指数衰减机制

在1. 中理解了学习率变化对模型的影响,我们可以看出,最理想的学习率不是固定值,而是一个随着训练次数衰减的变化的值,也就是在训练初期,学习率比较大,随着训练的进行,学习率不断减小,直到模型收敛。常用的衰减机制有:

在这三种方法中,最常用的是指数衰减,实践证明,它也是最有效的。

tensorflow中它的数学表达式为:

decayed_lr = lr0*(decay_rate^(global_steps/decay_steps)

参数解释:

decayed_lr:衰减后的学习率,也就是当前训练不使用的真实学习率

lr0: 初始学习率

decay_rate: 衰减率,每次衰减的比例

global_steps:当前训练步数

decay_steps:衰减步数,每隔多少步衰减一次。

tensorflow对应API:

global_step = tf.Variable(0)
lr = tf.train.exponential_decay(
     lr0,
     global_step,
     decay_steps=lr_step,
     decay_rate=lr_decay,
     staircase=True)

staircase=True 参数是说 global_steps/decay_steps 取整更新,也就是能做到每隔decay_steps学习率更新一次。

3. 实例解析

# -*- coding: utf-8 -*-
# @Time    : 18-10-5 下午3:38
# @Author  : gxrao
# @Site    : 
# @File    : cnn_mnist_2.py
# @Software: PyCharm



import os
# os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import tensorflow as tf
import matplotlib.pylab as plt
from functools import reduce
import time

# prepare the data
import tensorflow.examples.tutorials.mnist.input_data as input_data
mnist = input_data.read_data_sets('./data/MNIST/', one_hot=True)

# create the data graph and set it as default
sess = tf.InteractiveSession()

# parameters setting
batch_size = 50
max_steps = 16000
lr0 = 0.0001
regularizer_rate = 0.0001
lr_decay = 0.99
lr_step = 500
sample_size = 40000

# init weights and bias
def init_variable(w_shape, b_shape, regularizer=None):
    weights = tf.get_variable('weights',w_shape,initializer=tf.truncated_normal_initializer(stddev=0.1))
    if regularizer != None:
        tf.add_to_collection('lossess', regularizer(weights))
    biases = tf.get_variable('biases', b_shape, initializer=tf.constant_initializer(0.0))
    return weights, biases

# create conv2d
def conv2d(x,w,b,keep_prob):
    # conv
    conv_res = tf.nn.conv2d(x,w,strides=[1,1,1,1],padding='SAME') + b
    # activation function
    activation_res = tf.nn.relu(conv_res)
    # pooling
    pool_res = tf.nn.max_pool(activation_res, ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

    # drop out
    drop_res = tf.nn.dropout(pool_res,keep_prob)
    return drop_res

def inference(x,reuse=False,regularizer=None,dropout=1.0):
    # layer_1:
    x_img = tf.reshape(x,shape=[-1,28,28,1])
    with tf.variable_scope('cnn_layer1',reuse=reuse):
        weights, biases = init_variable([5,5,1,32],[32])
        cnn1_res = conv2d(x_img,weights,biases,1.0)

    # layer_2
    with tf.variable_scope('cnn_layer2',reuse=reuse):
        weights, biases = init_variable([3,3,32,64],[64])
        cnn2_res = conv2d(cnn1_res,weights, biases,1.0)

    # layer_3
    with tf.variable_scope('cnn_layer3',reuse=reuse):
        weights, biases = init_variable([3, 3, 64, 128], [128])
        cnn3_res = conv2d(cnn2_res, weights, biases, 1.0)
        cnn3_shape = cnn3_res.shape.as_list()[1:]
        h3_s = reduce(lambda x, y: x * y, cnn3_shape)
        cnn3_reshape = tf.reshape(cnn3_res,[-1,h3_s])

    # layer_4
    with tf.variable_scope('fcn1',reuse=reuse):
        weights, biases = init_variable([h3_s,5000],[5000],regularizer)
        fcn1_res = tf.nn.relu(tf.matmul(cnn3_reshape,weights)+biases)
        fcn1_dropout = tf.nn.dropout(fcn1_res,dropout)

    # layer_5
    with tf.variable_scope('fcn2',reuse=reuse):
        weights, biases = init_variable([5000,500],[500], regularizer)
        fcn2_res = tf.nn.relu(tf.matmul(fcn1_dropout,weights)+biases)
        fcn2_dropout = tf.nn.dropout(fcn2_res,1.0)

    # output layer
    with tf.variable_scope('out_put_layer',reuse=reuse):
        weights, biases = init_variable([500,10],10)
        y = tf.nn.softmax(tf.matmul(fcn2_dropout,weights)+biases)
    return y

train_acc = []
validation_acc = []
train_loss = []

# create train model
def train_model():
    start = time.time()
    # add th input placeholder
    x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
    y_ = tf.placeholder(tf.float32, shape=[None, 10])
    keep_prob = tf.placeholder(tf.float32)
    global_step = tf.Variable(0)
    lr = tf.train.exponential_decay(
        lr0,
        global_step,
        decay_steps=lr_step,
        decay_rate=lr_decay,
        staircase=True)

    # select regularizer
    regularizer = tf.contrib.layers.l2_regularizer(regularizer_rate)
    # define loss and select optimizer
    y = inference(x, reuse=False,regularizer=None, dropout= keep_prob)
    cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
    loss = cross_entropy

    train_step = tf.train.AdamOptimizer(learning_rate=lr).minimize(
        loss,global_step=global_step)

    predict = tf.equal(tf.argmax(y_, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(predict, 'float'))
    # train the model
    # init all variables
    init = tf.global_variables_initializer()
    init.run()
    for i in range(max_steps):
        x_batch, y_batch = mnist.train.next_batch(batch_size)
        _, _loss = sess.run(
            [train_step, loss], feed_dict={
                x: x_batch,
                y_: y_batch,
                keep_prob: 0.8
            })
        train_loss.append(_loss)
        if (i + 1) % 200 == 0:
            print('training steps: %d' % (i + 1))
            validation_x, validation_y = mnist.validation.next_batch(500)
            _validation_acc = acc.eval(feed_dict={
                x: validation_x,
                y_: validation_y,
                keep_prob: 1.0
            })
            validation_acc.append(_validation_acc)
            print('validation accurary is %f' % _validation_acc)

            x_batch, y_batch = mnist.train.next_batch(500)
            _train_acc = acc.eval(feed_dict={x: x_batch, y_: y_batch,keep_prob: 1.0})
            train_acc.append(_train_acc)
            print('train accurary is %f' % _train_acc)
    stop = time.time()
    total= (stop - start)
    print('train time: %d'%int(total))


def test_model():
    x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
    y_ = tf.placeholder(tf.float32, shape=[None, 10])
    keep_prob = tf.placeholder(tf.float32)
    y = inference(x, reuse=tf.AUTO_REUSE, dropout=keep_prob)
    predict = tf.equal(tf.argmax(y_, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(predict, 'float'))
    # because it raise error when i select all test data.I select 2000 images as testing data
    acc_test = acc.eval(feed_dict={
        x: mnist.test.images[:2000, :],
        y_: mnist.test.labels[:2000, :],
        keep_prob:1.0
    })
    print('test accurary is:%f' % acc_test)


def plot_acc():
    # plot the train acc and validation acc
    x_axis = [i for i in range(len(train_acc))]
    plt.plot(x_axis, train_acc, 'r-', label='train accuracy')
    plt.plot(x_axis, validation_acc, 'b:', label='validation accuracy')
    plt.legend()
    plt.xlabel('train steps')
    plt.ylabel('accuracy')
    plt.savefig('accuracy.jpg')


def plot_loss():
    x_axis = [i for i in range(len(train_loss))]
    plt.plot(x_axis, train_loss, 'r-')
    plt.legend()
    plt.xlabel('train steps')
    plt.ylabel('train loss')
    plt.savefig('train_loss.jpg')


def save_model():
    # save the model
    module_save_dir = './model/cnn_mnist_2/'
    if not os.path.exists(module_save_dir):
        os.makedirs(module_save_dir)
    saver = tf.train.Saver()
    saver.save(sess, module_save_dir + 'model.ckpt')
    sess.close()


if __name__ == "__main__":
    train_model()
    test_model()
    plot_acc()
    plot_loss()
    save_model()

这是一个简单的基于mnist的cnn分类程序。

# parameters setting
batch_size = 50
max_steps = 16000
lr0 = 0.0001
regularizer_rate = 0.0001
lr_decay = 0.99
lr_step = 500

定义了相关参数

global_step = tf.Variable(0)
lr = tf.train.exponential_decay(
      lr0,
      global_step,
      decay_steps=lr_step,
      decay_rate=lr_decay,
      staircase=True)

在使用指数衰减学习率时,一定要记得global_step = tf.Variable(0)的定义,不然后面参数不会更新

train_step = tf.train.AdamOptimizer(learning_rate=lr).minimize(
        loss,global_step=global_step)

训练步中指明学习率以及global_step,这两个千万不能忘记。

4. 总结

指数衰减学习率是深度学习调参过程中比较使用的一个方法,刚开始训练时,学习率以 0.01 ~ 0.001 为宜, 接近训练结束的时候,学习速率的衰减应该在100倍以上。按照这个经验去设置相关参数,对于模型的精度会有很大帮助。

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2018年10月17日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 文章目录
  • 1. 什么是学习率
  • 2. 学习率指数衰减机制
  • 3. 实例解析
  • 4. 总结
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档