专栏首页CreateAMind通俗易懂 empowered RL

通俗易懂 empowered RL

"All Else Being Equal Be Empowered":

Inspired by examples from the animal kingdom, social sciences and games the authors proposed empowerment, a rather universal function, defined as the information-theoretic capacity of an agent’s actuation channel.

Organisms may be seen to maintain “essential variables”, like body temperature, sugar levels, pH levels. Homeostasis provides organisms with a local gradient telling which actions to make or which states to seek. The mechanism itself is universal and quite simple, however the choice of variables and the methods of regulation is not. They are evolved and are specific to different phyla.

The unifying theme of these and many other examples is the striving towards situations where in the long term one could do many different things if one wanted to, where one has more control or influence over the world. Predators with better sensors and actuators can hunt better. Having high status in a group of chimpanzees allows one more mating choices. Having a lot of money enables one to engage in more activities. One can choose from an array of options. However, if one doesn’t know what to do, a good rule of thumb is to choose actions leading to higher status, more power, money and control. We will now apply this idea to “embodied” agents.

Empowerment can be seen as the agent’s potential to change the world, that is, how much the agent could do in principle. This is in general different from the actual change the agent inflicts.

Briefly, empowerment is defined as the capacity of the actuation channel of the agent.

The Communication Problem:

There is a sender and a receiver. The sender transmits a signal, denoted by a random variable X, to the receiver, who receives a potentially different signal, denoted by a random variable Y. The communication channel between the sender and the receiver defines how transmitted signals correspond to received signals. In the case of discrete signals the channel can be described by a conditional probability distribution p(y|x).

Given p(y|x):

Channel capacity is the maximum amount of information the received signal can contain about the transmitted signal. Thus, mutual information is a function of p(x) and p(y|x), whereas channel capacity is a function of the channel p(y|x) only. Another important difference is that mutual information is symmetric in X and Y and is thus acausal, whereas channel capacity requires complete control over X and is thus asymmetric and causal.

Assuming that the agent is allowed to perform any actions for n time steps, what is the maximum amount of information it can “inject” into the momentary reading of its sensor after these n time steps? The more of the information can be made to appear in the sensor, the more control or influence the agent has over its sensor.

We need to measure the maximum amount of information the agent could “inject” or transmit into its sensor by performing a sequence of actions of length n.

In the paper titled "Empowerment-driven Exploration using Mutual Information Estimation", empowerment is used as an intrinsic reward to empower reinforcement learning.

本文分享自微信公众号 - CreateAMind(createamind)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2018-11-27

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • Time-Contrastive Learning for Latent Variable Models

    "Aapo did it again!" - I exclaimed while reading this paper yesterday on the tra...

    用户1908973
  • 强化学习中的情景好奇心

    https://github.com/google-research/episodic-curiosity

    用户1908973
  • (Keras)基于DDPG用300行Python代码玩转TORCS(开放赛车模拟器)-教程及代码

    视频地址 http://weibo.com/3164120327/EcF8g6jdw

    用户1908973
  • CentOS下redis集群安装

    环境: 一台CentOS虚拟机上部署六个节点,创建3个master,3个slave节点

    肖哥哥
  • HashMap探索01-源码注解翻译

    当时好奇HashMap与ConcurrentHashMap,在网上找资料时发现基本都是相关的源码分析,想自己看看JDK里面具体有些什么,于是有了这个系列,信马由...

    汐楓
  • Docker常用软件安装之Redis

      我们首先需要在root/myredis/conf/redis.conf目录下创建redis.conf配置文件。

    用户4919348
  • V-Rep随手笔记

    论坛是宝,要善于使用:http://www.forum.coppeliarobotics.com/

    zhangrelay
  • CentOS下redis集群安装

    环境: 一台CentOS虚拟机上部署六个节点,创建3个master,3个slave节点

    肖哥哥
  • kafka-0.10.0官网翻译(一)入门指南

    1.1 Introduction Kafka is a distributed streaming platform. What exactly does th...

    intsmaze-刘洋
  • SAP UI5和React的页面渲染性能比较

    I have been working as a Fiori application developer and nowadays I have read qu...

    Jerry Wang

扫码关注云+社区

领取腾讯云代金券