SQN with Lunar landing challenge

CreateAMind

发布于 2019-04-28 14:55:45

3150

发布于 2019-04-28 14:55:45

文章被收录于专栏：CreateAMind

Our team combine SAC algorithm and DQN method gave a useful way to solve the control problem in discrete action space. Here is a little demo. Creat by a much more simple version of SQN.

视频内容

In order to understand SQN algorithm, we first need to introduce some key concept in reinforcement learning, let's talk about the policy gradient first. We don't want to go deep in the math and make you dizzy, you can imagine the policy gradient is some method to make our landing ship give better policy over time, just like us, we learn from experience, and we will change our policy toward life with time going on. Record the first time you cook, you make a mess of your kitchen, but when you have done, you notice that you shouldn't put the eggs too early this is a so-called negative reward, so does our small landing boat. When he crashes because of some actions he took, he gets a bad reward, so he will change his policy to avoid taking those action in similar situations (so-called state). But how? How can we change the policy? The answer is easy, we represent the policy will some function, this function take the sate as input and give the action you show take, we can twinkling the parameters in this function, and make the probability of getting this bad action to be lower. How we twinkling it and how much should we change? In a word, we change the parameters in the direction of gradient and with the strength reward. That is easy to understand, the worse one action is the more we should prevent it to appear in future, as for gradient we will discuss in future, for now, you can imagine it's like Oracle, tells you with direction should you change your parameters.

This is only the beginning of an interesting journey, in future we will update a series of article introduce reinforcement learning from scratch.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2019-01-13，如有侵权请联系 cloudcommunity@tencent.com 删除

编程算法

本文分享自 CreateAMind 微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

编程算法

登录后参与评论

0 条评论

热度

SQN with Lunar landing challenge

SQN with Lunar landing challenge

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐