Teach agent how to walk with sac algorithm

https://github.com/rail-berkeley/softlearning training about ten hours with 24 cores

视频内容
视频内容

Pretty cool, right? Today we will discuss some fundamental idea in reinforcement learning, one day you can also make this agent walk.

First of all, let's back to that cooking example, when you cook, there are many actions you make, adding water, adding eggs...every action you make is base on two things, states, and policy. States is what your kitchen looks like and what is your dish looks like, the policy tells you what to do under that circumstance. Of course, you will gain some reward by your action, maybe a sweet cookie or a burned rubbish, that is the so-called reward. Let's make it more official. The picture below illustrates the dynamic of state-policy-action-reward.

Just like you want to make as many cookies as you can. The goal of the agent is to maximize its cumulative reward, called return. Reinforcement learning methods are ways that the agent can learn behaviors to achieve its goal[1].

Next time we will jump into this key concept with an accurate definition. Of course, math is inevitable. But we will make it easier to understand, just like cooking, you will get used to it when you practice enough.

  • Policies
  • Trajectories
  • Reward and Return
  • Value functions

One more thing, the sqn algorithm is an old version in yesterday article. The following one is the latest version, please read this one.

[1] https://spinningup.openai.com/en/latest/spinningup/rl_intro.html

原文发布于微信公众号 - CreateAMind(createamind)

原文发表时间:2019-01-14

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券