前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >State Abstraction as 压缩 in Apprenticeship Learning

State Abstraction as 压缩 in Apprenticeship Learning

作者头像
用户1908973
发布2019-04-28 14:39:43
4090
发布2019-04-28 14:39:43
举报
文章被收录于专栏:CreateAMindCreateAMind

code:

State Abstraction as Compression in Apprenticeship Learning https://github.com/david-abel/rl_info_theory ref infobot;

State Abstractions for Lifelong Reinforcement Learnin https://david-abel.github.io/papers/lifelong_sa_icml_18.pdf

State Abstraction as Compression in Apprenticeship Learning

David Abel1 and Dilip Arumugam2 and Kavosh Asadi1 andYuu Jinnai1 and Michael L. Littman1 and Lawson L.S. Wong3

1: Department of Computer Science, Brown University 2: Department of Computer Science, Stanford University 3: College of Computer and Information Science, Northeastern University

Abstract

State abstraction can give rise to models of environments that are both compressed and useful, thereby enabling efficient sequential decision making. In this work, we offer the first formalism and analysis of the trade-off between compression and performance made in the context of state abstraction for Apprenticeship Learning. We build on Rate-Distortion the- ory, the classic Blahut-Arimoto algorithm, and the Informa- tion Bottleneck method to develop an algorithm for com- puting state abstractions that approximate the optimal trade- off between compression and performance. We illustrate the power of this algorithmic structure to offer insights into ef- fective abstraction, compression, and reinforcement learning through a mixture of analysis, visuals, and experimentation.

1 Introduction

Reinforcement Learning (RL) poses a challenging problem. Agents must learn about their environment through high- dimensional and often noisy observations while receiving sparse and delayed evaluative feedback. The ability to un- derstand one’s surroundings well enough to support effec- tive decision making under these conditions is a remarkable feat, and arguably a hallmark of intelligent behavior. To this end, a long-standing goal of RL is to endow decision-making agents with the ability to acquire and exploit abstract models for use in decision making, drawing inspiration from human cognition (Tenenbaum et al. 2011).

One path toward realizing this goal is to make use of state abstraction, which describes methods for compressing the environment’s state space to distill complex problems into simpler forms (Dietterich 2000b; Andre and Russell 2002; Li, Walsh, and Littman 2006). Critically, the degree of com- pression induced by an abstraction trades off directly with its capacity to represent good behavior. If the abstraction throws away too much information, the resulting abstract model will fail to preserve essential characteristics of the original task (Abel, Hershkowitz, and Littman 2016). Thus, care must be taken to identify state abstractions that balance between an appropriate degree of compression and adequate representational power.

Information Theory offers foundational results about the limits of compression (Shannon 1948). The core of the theory clarifies how to communicate in the presence of noise, culminating in seminal results about the nature of commu- nication and compression that helped establish the science and engineering practices of computation. Of particular rel- evance to our agenda is Rate-Distortion theory, which stud- ies the trade-off between a code’s ability to compress (rate) and represent the original signal (distortion) (Shannon 1948; Berger 1971). Cognitive neuroscience has suggested that perception and generalization are tied to efficient compres- sion (Attneave 1954; Sims 2016; 2018), termed the “efficient coding hypothesis” by Barlow (1961).

The goal of this work is to understand the role of information-theoretic compression in state abstraction for sequential decision making. We draw a parallel betweenstate abstraction as used in reinforcement learning and com- pression as understood in information theory. We build on the seminal work of Shannon (1948), Blahut (1972), Ari- moto (1972) and Tishby, Pereira, and Bialek (1999), and draw inspiration from related work on understanding the re- lationship between abstraction and compression (Botvinick et al. 2015; Solway et al. 2014). While the perspective we introduce is intended to be general, we focus our study in two ways. First, we investigate only state abstraction, de- ferring discussion of temporal (Sutton, Precup, and Singh 1999), action (Hauskrecht et al. 1998), and hierarchical ab- straction (Dayan and Hinton 1993; Dietterich 2000a) to fu- ture work. Second, we address the learning problem when a demonstrator is available, as in Apprenticeship Learn- ing (Atkeson and Schaal 1997; Abbeel and Ng 2004; Argall et al. 2009), which simplifies aspects of our model.

Concretely, we introduce a new objective function that explicitly balances state-compression and performance. Our main result proves this objective is upper bounded by a vari- ant of the Information Bottleneck objective adapted to se- quential decision making. We introduce Deterministic In- formation Bottleneck for State abstraction (DIBS), an algo- rithm that outputs a lossy state abstraction optimizing the trade-off between compressing the state space and preserv- ing the capacity for performance in that compressed state space. We conduct experiments to showcase the relationship between compression and performance captured by the al- gorithm in a traditional grid world and present an extension to high-dimensional observations via experiments with the Atari game Breakout.

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-01-19,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
文件存储
文件存储(Cloud File Storage,CFS)为您提供安全可靠、可扩展的共享文件存储服务。文件存储可与腾讯云服务器、容器服务、批量计算等服务搭配使用,为多个计算节点提供容量和性能可弹性扩展的高性能共享存储。腾讯云文件存储的管理界面简单、易使用,可实现对现有应用的无缝集成;按实际用量付费,为您节约成本,简化 IT 运维工作。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档