文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在动态模式下更改RLlib培训代理的学习率

问如何在动态模式下更改RLlib培训代理的学习率
EN

Stack Overflow用户

提问于 2019-09-01 20:20:06

回答 2查看 1.1K关注 0票数 2

我正在使用ray RLlib库来训练多智能体训练器来玩5人一排的游戏。这是零和环境，所以我有一个智能体行为退化的问题(第一个智能体总是赢，5步才能赢)。我有一个想法来改变智能体的学习率:首先训练第一个智能体，第二个是随机的，学习率等于零。在第一个智能体学会如何赢得超过90%的游戏后切换。然后重复，但在构造函数中初始化后，我不能更改学习率。这个是可能的吗？

def gen_policy(GENV, lr=0.001):
    config = {
        "model": {
            "custom_model": 'GomokuModel',
            "custom_options": {"use_symmetry": True, "reg_loss": 0},
        },
        "custom_action_dist": Categorical,
        "lr": lr
    }
    return (None, GENV.observation_space, GENV.action_space, config)

def map_fn(agent_id):
    if agent_id=='agent_0':
        return "policy_0"
    else:
        return "policy_1"

trainer = ray.rllib.agents.a3c.A3CTrainer(env="GomokuEnv", config={
        "multiagent": {
            "policies": {"policy_0": gen_policy(GENV, lr = 0.001), "policy_1": gen_policy(GENV,lr=0)},
            "policy_mapping_fn": map_fn,
            },
        "callbacks":
            {"on_episode_end": clb_episode_end},


while True:
    rest = trainer.train()
    #here I want to change learning rate of my policies based on environment statistics

我尝试在while True循环中添加以下代码行

new_config = trainer.get_config()
new_config["multiagent"]["policies"]["policy_0"]=gm.gen_policy(GENV, lr = 0.00321)
new_config["multiagent"]["policies"]["policy_1"]=gm.gen_policy(GENV, lr = 0.00175)

trainer["raw_user_config"]=new_config
trainer.config = new_config

它没有帮助

ray

回答 2

Stack Overflow用户

发布于 2020-08-29 18:27:21

我偶然发现了同样的问题，并对RLlib实现做了一些研究。

从测试脚本中可以看出，lr_schedule是由如下间隔给出的

lr_schedule: [
            [0, 0.0005],
            [20000000, 0.000000000001],
        ]

在那之后，我检查了实现细节。

在ray/rllib/policy/torch_policy.py中，函数LearningRateSchedule实现入口点。

定义lr_schedule时，将使用PiecewiseSchedule。

ray/rllib/utils/schedules/piecewise_schedule.py 中的PiecewiseSchedule实现如下：

endpoints (List[Tuple[int,float]]): A list of tuples
                `(t, value)` such that the output
                is an interpolation (given by the `interpolation` callable)
                between two values.
                E.g.
                t=400 and endpoints=[(0, 20.0),(500, 30.0)]
                output=20.0 + 0.8 * (30.0 - 20.0) = 28.0
                NOTE: All the values for time must be sorted in an increasing
                order.

这意味着学习率时间表由两个参数组成：

时间步长t (int)和支持学习率(浮点)

对于这些值之间的每个时间步长，使用插值。

可在函数PiecewiseSchedule内通过参数 interpolation (默认为_linear_interpolation )指定插值

interpolation (callable): A function that takes the left-value,
                the right-value and an alpha interpolation parameter
                (0.0=only left value, 1.0=only right value), which is the
                fraction of distance from left endpoint to right endpoint.

TL;DR;

因此，lr_schedule描述了线性插值的支撑点(使用默认插值)。

此外，要在此Github Issue的训练过程中更改参数，最佳选择似乎是重新初始化训练器：

state = trainer.save()
trainer.stop()
#re_initialise trainer
trainer.restore(state)

票数 0

Stack Overflow用户

发布于 2021-05-05 18:11:32

我发现这里的简单示例有点令人困惑。所以我想添加一个明确的答案。为了确保其他用户不必查看代码，我添加了一个问题，并希望在此处添加我的答案：https://github.com/ray-project/ray/issues/15647

这是一个线性递减学习率的测试示例，直到某一点。

lr_start = 2.5e-4
lr_end = 2.5e-5
lr_time = 50 * 1000000
config = {
    "lr": lr_start,
    "lr_schedule": [
        [0, lr_start],
        [lr_time, lr_end],
    ],
}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57745954

复制

相似问题

问如何在动态模式下更改RLlib培训代理的学习率
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在动态模式下更改RLlib培训代理的学习率EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在动态模式下更改RLlib培训代理的学习率
EN