前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >入门|调参技能之学习率衰减(Learning Rate Decay)

入门|调参技能之学习率衰减(Learning Rate Decay)

作者头像
YoungTimes
发布2022-04-28 19:15:13
2.4K0
发布2022-04-28 19:15:13
举报

调参技巧是一名合格的算法工程师的必备技能,本文主要分享在训练神经网络的过程中如何通过使用Keras实现不同的Learning Rate Decay策略,从而达到动态的调整Learning Rate的目的。

不同Learning Rate对收敛的影响(图片来源:cs231n)

1.为何要动态调整Learning Rate

采用Small Learning Rate(上)和Large Learning Rate(下)的梯度下降。来源:Coursera 上吴恩达(Andrew Ng)的机器学习课程

从上图可以看到,小的Learning Rate导致Gradient Descent的速度非常缓慢;大的Learning Rate导致Gradient Descent会Overshoot Minimum,甚至导致训练结果无法收敛。

因此我们在Trainning过程中动态的调整Learning Rate是一种常用的策略。

还有一些场景,如下图所示,迭代过程进入误差曲面中的鞍点,就可能被困在鞍点,而难以达到局部最小值或者全局最小值。

误差曲面鞍点。图片来源【1】

为了解决这个问题,可使用一些周期函数f来改变每次迭代的学习速率,这种方法让Learning Rate在合理的边界值之间周期变化,从而跳出鞍点。

三角形的周期函数作为Learning Rate。图片来源【1】

使用余弦函数作为周期函数的Learning Rate。图片来源【1】

通过周期性的动态改变Learning Rate,可以跳跃"山脉"收敛更快收敛到全局或者局部最优解。

固定Learning Rate VS 周期性的Learning Rete。图片来源【1】

2.Keras中的Learning Rate实现

2.1 Keras Standard Decay Schedule

Keras通过在Optimizer(SGD、Adam等)的decay参数提供了一个Learning Rate Scheduler。如下所示。

代码语言:javascript
复制
# initialize our optimizer and model, then compile it
opt = SGD(lr=1e-2, momentum=0.9, decay=1e-2/epochs)
model = ResNet.build(32, 32, 3, 10, (9, 9, 9),
	(64, 64, 128, 256), reg=0.0005)
model.compile(loss="categorical_crossentropy", optimizer=opt,metrics=["accuracy"])

各个Epoch的起始Learning Rate如下:

Epoch

Learning Rate

1

0.01000

2

0.00836

3

0.00719

...

...

37

0.00121

38

0.00119

39

0.00116

2.2 Keras中自带的Learning Rate Scheduler

除了Keras Standard Decay之外,Keras还提供了如下的Learning Rate Scheduler实现:

  • ExponentialDecay
  • PiecewiseConstantDecay
  • PolynomialDecay
  • InverseTimeDecay

我们看看如何使用keras.optimizers. schedules调整学习率(Learning Rate)。

1)实现每100000个Step学习率衰减0.96。

代码语言:javascript
复制
initial_learning_rate = 0.1
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=100000,
    decay_rate=0.96,
    staircase=True)

model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=lr_schedule),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(data, labels, epochs=5)

2)在0~100000个Step使用Learning Rate=1.0;在100001~110000个Step使用Learning Rate=0.5;其余的Step使用Learning Rate=0.1。

代码语言:javascript
复制
step = tf.Variable(0, trainable=False)
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]
learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
    boundaries, values)

# Later, whenever we perform an optimization step, we pass in the step.
learning_rate = learning_rate_fn(step)

3)在10000个Step内将Learning Rate从0.1衰减到0.01。

代码语言:javascript
复制
...
starter_learning_rate = 0.1
end_learning_rate = 0.01
decay_steps = 10000
learning_rate_fn = tf.keras.optimizers.schedules.PolynomialDecay(
    starter_learning_rate,
    decay_steps,
    end_learning_rate,
    power=0.5)

model.compile(optimizer=tf.keras.optimizers.SGD(
                  learning_rate=learning_rate_fn),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(data, labels, epochs=5)

4)以1/t衰减Learning Rate。

代码语言:javascript
复制
initial_learning_rate = 0.1
decay_steps = 1.0
decay_rate = 0.5
learning_rate_fn = keras.optimizers.schedules.InverseTimeDecay(
  initial_learning_rate, decay_steps, decay_rate)

model.compile(optimizer=tf.keras.optimizers.SGD(
                  learning_rate=learning_rate_fn),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(data, labels, epochs=5)

2.3 Custom Keras Learning Rate

下面我们看看如何在Keras中自定义神经网络的Learning Rate。

为了保证代码的干净整洁,同时遵循”面向对象编程“的最佳实践,我们首先定义一个Learning Rate基类:

代码语言:javascript
复制
# import the necessary packages
import matplotlib.pyplot as plt
import numpy as np
class LearningRateDecay:
    def plot(self, epochs, title="Learning Rate Schedule"):
	# compute the set of learning rates for each corresponding epoch
        lrs = [self(i) for i in epochs]
        # the learning rate schedule
        plt.style.use("ggplot")
        plt.figure()
        plt.plot(epochs, lrs)
        plt.title(title)
        plt.xlabel("Epoch #")
        plt.ylabel("Learning Rate")

在该基类中实现了Plot函数,通过该函数可以绘制出Learning Rate随时间变换的图像。

2.3.1 Step-based Learning Rate Schedules

Keras learning rate step-based decay. The schedule in red is a decay factor of 0.5 and blue is a factor of 0.25.

Step-based Decay可以实现在神经网络训练过程中每间隔指定的Epoch减少特定的Learning Rate。 Step-based Decay可以看做一个分段函数。如上图所致,Learning Rate在几个连续的Epoch中维持固定值,然后衰减到一个较小的值,再在几个连续的Epoch中维持这个固定值,如此往复,直至Trainning结束。

Python代码实现如下:

代码语言:javascript
复制
class StepDecay(LearningRateDecay):
  def __init__(self, initAlpha=0.01, factor=0.25, dropEvery=10):
    # store the base initial learning rate, drop factor, and epochs to drop every
	self.initAlpha = initAlpha
	self.factor = factor
	self.dropEvery = dropEvery
  
  def __call__(self, epoch):
    # compute the learning rate for the current epoch
	exp = np.floor((1 + epoch) / self.dropEvery)
	alpha = self.initAlpha * (self.factor ** exp)
	# return the learning rate
	return float(alpha)

2.3.2 Linear And Polynomial Learning Rate Schedule

Linear And Polynomial Decay可以实现在神经网络训练过程中每个Epoch持续衰减Learning Rate,直至为0。

Python代码实现如下:

代码语言:javascript
复制
class PolynomialDecay(LearningRateDecay):
  def __init__(self, maxEpochs=100, initAlpha=0.01, power=1.0):
    # store the maximum number of epochs, base learning rate,and power of the polynomial
    self.maxEpochs = maxEpochs
    self.initAlpha = initAlpha
    self.power = power
    
  def __call__(self, epoch):
    # compute the new learning rate based on polynomial decay
    decay = (1 - (epoch / float(self.maxEpochs))) ** self.power
    alpha = self.initAlpha * decay
    # return the new learning rate
    return float(alpha)

2.3.3 通过Model.fit的callback设置学习率

在Training中,我们可通过参数配置的方式选择不同的Learning Rate策略。

代码语言:javascript
复制
# store the number of epochs to train for in a convenience variable,
# then initialize the list of callbacks and learning rate scheduler
# to be used
epochs = args["epochs"]
callbacks = []
schedule = None
# check to see if step-based learning rate decay should be used
if args["schedule"] == "step":
  print("[INFO] using 'step-based' learning rate decay...")
  schedule = StepDecay(initAlpha=1e-1, factor=0.25, dropEvery=15)
# check to see if linear learning rate decay should should be used
elif args["schedule"] == "linear":
  print("[INFO] using 'linear' learning rate decay...")
  schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, power=1)
# check to see if a polynomial learning rate decay should be used
elif args["schedule"] == "poly":
  print("[INFO] using 'polynomial' learning rate decay...")
  schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, power=5)
# if the learning rate schedule is not empty, add it to the list of
# callbacks
if schedule is not None:
  callbacks = [LearningRateScheduler(schedule)]

有了Callbacks之后,在model的Fit函数中设置Callback函数,就可以实现Learning Rate在训练过程中的动态调整。

代码语言:javascript
复制
# initialize our optimizer and model, then compile it
opt = SGD(lr=1e-1, momentum=0.9, decay=decay)
model = ResNet.build(32, 32, 3, 10, (9, 9, 9),(64, 64, 128, 256), reg=0.0005)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# train the network
H = model.fit(trainX, trainY, validation_data=(testX, testY), batch_size=128, epochs=epochs, callbacks=callbacks, verbose=1)

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-05-30,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 半杯茶的小酒杯 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1.为何要动态调整Learning Rate
  • 2.Keras中的Learning Rate实现
    • 2.1 Keras Standard Decay Schedule
      • 2.2 Keras中自带的Learning Rate Scheduler
        • 2.3 Custom Keras Learning Rate
          • 2.3.1 Step-based Learning Rate Schedules
          • 2.3.2 Linear And Polynomial Learning Rate Schedule
          • 2.3.3 通过Model.fit的callback设置学习率
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档