前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >SQN with Lunar landing challenge

SQN with Lunar landing challenge

作者头像
CreateAMind
发布2019-04-28 14:55:45
3150
发布2019-04-28 14:55:45
举报
文章被收录于专栏:CreateAMind

Our team combine SAC algorithm and DQN method gave a useful way to solve the control problem in discrete action space. Here is a little demo. Creat by a much more simple version of SQN.

视频内容

In order to understand SQN algorithm, we first need to introduce some key concept in reinforcement learning, let's talk about the policy gradient first. We don't want to go deep in the math and make you dizzy, you can imagine the policy gradient is some method to make our landing ship give better policy over time, just like us, we learn from experience, and we will change our policy toward life with time going on. Record the first time you cook, you make a mess of your kitchen, but when you have done, you notice that you shouldn't put the eggs too early this is a so-called negative reward, so does our small landing boat. When he crashes because of some actions he took, he gets a bad reward, so he will change his policy to avoid taking those action in similar situations (so-called state). But how? How can we change the policy? The answer is easy, we represent the policy will some function, this function take the sate as input and give the action you show take, we can twinkling the parameters in this function, and make the probability of getting this bad action to be lower. How we twinkling it and how much should we change? In a word, we change the parameters in the direction of gradient and with the strength reward. That is easy to understand, the worse one action is the more we should prevent it to appear in future, as for gradient we will discuss in future, for now, you can imagine it's like Oracle, tells you with direction should you change your parameters.

This is only the beginning of an interesting journey, in future we will update a series of article introduce reinforcement learning from scratch.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2019-01-13,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档