互信息强化学习探索两篇paper

CreateAMind

发布于 2018-12-17 17:20:42

8370

发布于 2018-12-17 17:20:42

文章被收录于专栏：CreateAMind

EMI: Exploration with Mutual Information Maximizing State and Action Embeddings

Hyoungseok Kim, Jaekyeom Kim, Yeonwoo Jeong, Sergey Levine, Hyun Oh Song

(Submitted on 2 Oct 2018 (v1), last revised 4 Oct 2018 (this version, v2))

Policy optimization struggles when the reward feedback signal is very sparse and essentially becomes a random search algorithm until the agent accidentally stumbles upon a rewarding or the goal state. Recent works utilize intrinsic motivation to guide the exploration via generative models, predictive forward models, or more ad-hoc measures of surprise. We propose EMI, which is an exploration method that constructs embedding representation of states and actions that does not rely on generative decoding of the full observation but extracts predictive signals that can be used to guide exploration based on forward prediction in the representation space. Our experiments show the state of the art performance on challenging locomotion task with continuous control and on image-based exploration tasks with discrete actions on Atari.

https://arxiv.org/abs/1810.05533

Empowerment-driven Exploration using Mutual Information Estimation

Navneet Madhu Kumar

(Submitted on 11 Oct 2018)

Exploration is a difficult challenge in reinforcement learning and is of prime importance in sparse reward environments. However, many of the state of the art deep reinforcement learning algorithms, that rely on epsilon-greedy, fail on these environments. In such cases, empowerment can serve as an intrinsic reward signal to enable the agent to maximize the influence it has over the near future. We formulate empowerment as the channel capacity between states and actions and is calculated by estimating the mutual information between the actions and the following states. The mutual information is estimated using Mutual Information Neural Estimator and a forward dynamics model. We demonstrate that an empowerment driven agent is able to improve significantly the score of a baseline DQN agent on the game of Montezuma's Revenge.

code：

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2018-11-20，如有侵权请联系 cloudcommunity@tencent.com 删除

其他

本文分享自 CreateAMind 微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

其他

登录后参与评论

0 条评论

热度

互信息强化学习探索两篇paper

互信息强化学习探索两篇paper

EMI: Exploration with Mutual Information Maximizing State and Action Embeddings

Empowerment-driven Exploration using Mutual Information Estimation

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

互信息 强化学习探索 两篇paper

互信息 强化学习探索 两篇paper

EMI: Exploration with Mutual Information Maximizing State and Action Embeddings

Empowerment-driven Exploration using Mutual Information Estimation

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

互信息强化学习探索两篇paper

互信息强化学习探索两篇paper