专栏首页CreateAMindState Abstraction as 压缩 in Apprenticeship Learning

State Abstraction as 压缩 in Apprenticeship Learning

code:

State Abstraction as Compression in Apprenticeship Learning https://github.com/david-abel/rl_info_theory ref infobot;

State Abstractions for Lifelong Reinforcement Learnin https://david-abel.github.io/papers/lifelong_sa_icml_18.pdf

State Abstraction as Compression in Apprenticeship Learning

David Abel1 and Dilip Arumugam2 and Kavosh Asadi1 andYuu Jinnai1 and Michael L. Littman1 and Lawson L.S. Wong3

1: Department of Computer Science, Brown University 2: Department of Computer Science, Stanford University 3: College of Computer and Information Science, Northeastern University

Abstract

State abstraction can give rise to models of environments that are both compressed and useful, thereby enabling efficient sequential decision making. In this work, we offer the first formalism and analysis of the trade-off between compression and performance made in the context of state abstraction for Apprenticeship Learning. We build on Rate-Distortion the- ory, the classic Blahut-Arimoto algorithm, and the Informa- tion Bottleneck method to develop an algorithm for com- puting state abstractions that approximate the optimal trade- off between compression and performance. We illustrate the power of this algorithmic structure to offer insights into ef- fective abstraction, compression, and reinforcement learning through a mixture of analysis, visuals, and experimentation.

1 Introduction

Reinforcement Learning (RL) poses a challenging problem. Agents must learn about their environment through high- dimensional and often noisy observations while receiving sparse and delayed evaluative feedback. The ability to un- derstand one’s surroundings well enough to support effec- tive decision making under these conditions is a remarkable feat, and arguably a hallmark of intelligent behavior. To this end, a long-standing goal of RL is to endow decision-making agents with the ability to acquire and exploit abstract models for use in decision making, drawing inspiration from human cognition (Tenenbaum et al. 2011).

One path toward realizing this goal is to make use of state abstraction, which describes methods for compressing the environment’s state space to distill complex problems into simpler forms (Dietterich 2000b; Andre and Russell 2002; Li, Walsh, and Littman 2006). Critically, the degree of com- pression induced by an abstraction trades off directly with its capacity to represent good behavior. If the abstraction throws away too much information, the resulting abstract model will fail to preserve essential characteristics of the original task (Abel, Hershkowitz, and Littman 2016). Thus, care must be taken to identify state abstractions that balance between an appropriate degree of compression and adequate representational power.

Information Theory offers foundational results about the limits of compression (Shannon 1948). The core of the theory clarifies how to communicate in the presence of noise, culminating in seminal results about the nature of commu- nication and compression that helped establish the science and engineering practices of computation. Of particular rel- evance to our agenda is Rate-Distortion theory, which stud- ies the trade-off between a code’s ability to compress (rate) and represent the original signal (distortion) (Shannon 1948; Berger 1971). Cognitive neuroscience has suggested that perception and generalization are tied to efficient compres- sion (Attneave 1954; Sims 2016; 2018), termed the “efficient coding hypothesis” by Barlow (1961).

The goal of this work is to understand the role of information-theoretic compression in state abstraction for sequential decision making. We draw a parallel betweenstate abstraction as used in reinforcement learning and com- pression as understood in information theory. We build on the seminal work of Shannon (1948), Blahut (1972), Ari- moto (1972) and Tishby, Pereira, and Bialek (1999), and draw inspiration from related work on understanding the re- lationship between abstraction and compression (Botvinick et al. 2015; Solway et al. 2014). While the perspective we introduce is intended to be general, we focus our study in two ways. First, we investigate only state abstraction, de- ferring discussion of temporal (Sutton, Precup, and Singh 1999), action (Hauskrecht et al. 1998), and hierarchical ab- straction (Dayan and Hinton 1993; Dietterich 2000a) to fu- ture work. Second, we address the learning problem when a demonstrator is available, as in Apprenticeship Learn- ing (Atkeson and Schaal 1997; Abbeel and Ng 2004; Argall et al. 2009), which simplifies aspects of our model.

Concretely, we introduce a new objective function that explicitly balances state-compression and performance. Our main result proves this objective is upper bounded by a vari- ant of the Information Bottleneck objective adapted to se- quential decision making. We introduce Deterministic In- formation Bottleneck for State abstraction (DIBS), an algo- rithm that outputs a lossy state abstraction optimizing the trade-off between compressing the state space and preserv- ing the capacity for performance in that compressed state space. We conduct experiments to showcase the relationship between compression and performance captured by the al- gorithm in a traditional grid world and present an extension to high-dimensional observations via experiments with the Atari game Breakout.

本文分享自微信公众号 - CreateAMind(createamind)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2019-01-19

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 金句频频:用信息瓶颈的迁移学习和探索;关键状态

    We present a hierarchical reinforcement learning (HRL) or options framework for ...

    用户1908973
  • Predicting the Future V2更新

    Predicting the Future with Multi-scale Successor Representations

    用户1908973
  • 论文解读:主视觉大脑皮层的深度层级模型:机器视觉可以从中学到些什么?

    、论文:Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Compute...

    用户1908973
  • ISUX 2019实习生招聘专属通道

    ? 腾讯ISUX isux.tencent.com 社交用户体验设计 ? ISUX实习生招聘2019专属通道 ? ISUX 2019年实习生招聘正式启动,爱...

    腾讯ISUX
  • 老兄,这看起来很不对劲 - 对比美国2009年H1N1病毒与中国2020年新冠病毒世界舆论的不同反映

    文章链接- https://www.linkedin.com/pulse/somethings-right-here-folks-look-usa-2009-h...

    用户6026865
  • How to learn Best Data Analytics?

    Data Analysis where coping with structured and unstructured information, Data Sc...

    用户6901788
  • How to Keep Up-to-Date as a Web Developer?

    Stay Up To Date As A Software Developer. How to update yourself as a web/program...

    用户4822892
  • 2019-2020 设计趋势·图形篇

    ? 腾讯ISUX isux.tencent.com 社交用户体验设计 ? ? ? 为和大家分享关于设计趋势的见解,ISUX研究了2019-2020的设计趋势...

    腾讯ISUX
  • 2020-2021 设计趋势ISUX报告 · 年轻文化篇

    ? 前言 ? 很难界定千禧一代(生于20世纪80年代至2000年初)和 z 世代(生于1995年以后) ,但那些已成为当前消费市场中心的人与上一代有着不同的价...

    腾讯ISUX
  • What does DevOps actually do?

    The IT industry is working constantly for developing and maintaining services wh...

    用户4822892

扫码关注云+社区

领取腾讯云代金券