首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何在pytorch中查看Adam的自适应学习率?

如何在pytorch中查看Adam的自适应学习率?
EN

Stack Overflow用户
提问于 2020-05-13 19:14:08
回答 2查看 1.5K关注 1票数 4

有许多不同的具有自适应学习率方法的optimizers。可以看到Adam的初始学习率的适配值吗?

Here是一个关于Adadelta的类似问题,答案是搜索["acc_delta"]密钥,但Adam没有该密钥。

EN

回答 2

Stack Overflow用户

发布于 2020-06-20 22:05:14

AFAIK没有非常简单的方法可以做到这一点。但是,您可以使用PyTorch中的Adam实现重新计算某个参数的当前学习率:https://pytorch.org/docs/stable/_modules/torch/optim/adam.html

我想出了这个最小的工作示例:

代码语言:javascript
运行
复制
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

def get_current_lr(optimizer, group_idx, parameter_idx):
    # Adam has different learning rates for each paramter. So we need to pick the
    # group and paramter first.
    group = optimizer.param_groups[group_idx]
    p = group['params'][parameter_idx]

    beta1, _ = group['betas']
    state = optimizer.state[p]

    bias_correction1 = 1 - beta1 ** state['step']
    current_lr = group['lr'] / bias_correction1
    return current_lr

x = Variable(torch.randn(100, 1)) #Just create a random tensor as input
model = nn.Linear(1, 1)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
niter = 20
for _ in range(0, niter):
    out = model(x)

    optimizer.zero_grad()
    loss = criterion(out, x) #Here we learn the identity mapping
    loss.backward()
    optimizer.step()
    group_idx, param_idx = 0, 0
    current_lr = get_current_lr(optimizer, group_idx, param_idx)
    print('Current learning rate (g:%d, p:%d): %.4f | Loss: %.4f'%(group_idx, param_idx, current_lr, loss.item()))

它应该输出如下所示的内容:

代码语言:javascript
运行
复制
Current learning rate (g:0, p:0): 0.0100 | Loss: 0.5181
Current learning rate (g:0, p:0): 0.0053 | Loss: 0.5161
Current learning rate (g:0, p:0): 0.0037 | Loss: 0.5141
Current learning rate (g:0, p:0): 0.0029 | Loss: 0.5121
Current learning rate (g:0, p:0): 0.0024 | Loss: 0.5102
Current learning rate (g:0, p:0): 0.0021 | Loss: 0.5082
Current learning rate (g:0, p:0): 0.0019 | Loss: 0.5062
Current learning rate (g:0, p:0): 0.0018 | Loss: 0.5042
Current learning rate (g:0, p:0): 0.0016 | Loss: 0.5023
Current learning rate (g:0, p:0): 0.0015 | Loss: 0.5003
Current learning rate (g:0, p:0): 0.0015 | Loss: 0.4984
Current learning rate (g:0, p:0): 0.0014 | Loss: 0.4964
Current learning rate (g:0, p:0): 0.0013 | Loss: 0.4945
Current learning rate (g:0, p:0): 0.0013 | Loss: 0.4925
Current learning rate (g:0, p:0): 0.0013 | Loss: 0.4906
Current learning rate (g:0, p:0): 0.0012 | Loss: 0.4887
Current learning rate (g:0, p:0): 0.0012 | Loss: 0.4868
Current learning rate (g:0, p:0): 0.0012 | Loss: 0.4848
Current learning rate (g:0, p:0): 0.0012 | Loss: 0.4829
Current learning rate (g:0, p:0): 0.0011 | Loss: 0.4810

请注意,监控每个单独参数的学习率对于较大的模型可能是不可行的,也不会有帮助。

票数 2
EN

Stack Overflow用户

发布于 2021-11-30 14:27:22

为了构建answer by Elias E.,我不仅会考虑有效学习率的偏差校正部分,还会考虑取决于第二动量的每个参数的归一化,因此对原始函数的更新将是:

代码语言:javascript
运行
复制
def get_current_lr(optimizer, group_idx, parameter_idx):
    # Adam has different learning rates for each paramter. So we need to pick the
    # group and paramter first.
    group = optimizer.param_groups[group_idx]
    p = group['params'][parameter_idx]

    beta1, _ = group['betas']
    state = optimizer.state[p]

    bias_correction1 = 1 - beta1 ** state['step']
    current_lr = group['lr'] / bias_correction1 / torch.sqrt(state['exp_avg_sq'] + 1e-8)
    return current_lr

除以平方梯度的指数移动平均值。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61773139

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档