Yoshua Bengio 3篇强化学习论文学习disentangling 特征

Disentangling the independently controllable factors of variation by interacting with the world

https://arxiv.org/abs/1802.09484

It has been postulated that a good representation is one that disentangles the under- lying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors, and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal. 确实能够在没有任何外部奖励信号的情况下,解开环境中可独立控制的方面。

mechanism for representation learning that has close ties to intrinsic motivation mechanisms and causality.(内部驱动,类似好奇心?,因果关系) This mechanism explicitly links the agent’s control over its environment to the representation of the environment that is learned by the agent

can push a model to learn to disentangle its input in a meaningful way, and learn to represent factors which take multiple actions to change and show that these representations make it possible to perform model-based predictions in the learned latent space, rather than in a low-level input space (e.g. pixels).

We hypothesize that interactions can be the key to learning how to disentangle the various causal factors of the stream of observations that an agent is faced with, and that such learning can be done in an unsupervised way.

https://arxiv.org/abs/1708.01289

Independently Controllable Factors

Abstract

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.

Specifically, we hypothesize that some of the factors explaining variations in the data correspond to aspects of the world that can be controlled by the agent. For example, an object that could be pushed around or picked up independently of others is an independently controllable aspect of the environment. Our approach therefore aims to jointly discover a set of features (functions of the environment state) and policies (which change the state) such that each policy controls the associated feature while leaving the other features unchanged as much as possible. In §2 and §3 we explain this mechanism and show experimental results for the simplest instantiation of this new principle. In §5 we discuss how this principle could be applied more generally, and what are the research challenges that emerge.

https://arxiv.org/abs/1804.06955

Disentangling Controllable and Uncontrollable Factors of Variation by Interacting with the World

Abstract

We introduce a method for disentangling independently controllable and uncon- trollable factors of variation by interacting with the world. Disentanglement leads to good representations and it is important when applying deep neural networks (DNNs) in fields where explanations are necessary. This article focuses on rein- forcement learning (RL) approach for disentangling factors of variation, however, previous methods lacks a mechanism for representing uncontrollable obstacles. To tackle this problem, we train two DNNs simultaneously: one that represents the controllable object and another that represents the uncontrollable obstacles. During training, we used the parameters from a previous RL-based model as our initial parameters to improve stability. We also conduct simple toy simulations to show that our model can indeed disentangle controllable and uncontrollable factors of variation and that it is effective for a task involving the acquisition of extrinsic rewards.

ref : 智能的几点随想 观点和 Yoshua Bengio 的观点基本一致。

原文发布于微信公众号 - CreateAMind(createamind)

原文发表时间:2018-04-26

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏UAI人工智能

Google AI dopamine 多巴胺强化学习框架

值得试试。下图是在 Seaquest 游戏上的算法比对,可以看到 Rainbow 最厉害了。

28420
来自专栏生信宝典

人类微生物组计划 - 宏基因组/16S分析流程 bioBakery

bioBakery是NIH人类微生物组计划实施过程中开发的部分软件和使用教程的集合,主要由哈佛大学的Huttenhower实验室开发。提供了16S, 宏基因组,...

65730
来自专栏工科狗和生物喵

C++初入门,写个弱智银行卡系统

写在前面: 自从课程设计之后,我们就开始了生产实习,我们老师找的是河南卫华集团的技术部实习,经过一阵子的不适应(比如说河南这边的基本没味道的伙食,我们现在两个人...

488110
来自专栏Web 开发

暑假完了,又该找时间升级老Y了

嗯,上个暑假在电脑城,两个星期赚了千把块,就给老Y小小升级了一下,买了一条三星2G DDR3 1333内存和一块日立7K500硬盘

10800
来自专栏程序生活

Python爬虫系列(七)豆瓣图书排行榜(数据存入到数据库)

豆瓣用户每天都在对“读过”的书进行“很差”到“力荐”的评价,豆瓣根据每本书读过的人数 以及该书所得的评价等综合数据,通过算法分析产生了豆瓣图书250。 网址:豆...

39940
来自专栏ThoughtWorks

TW洞见 | 敏捷回顾7步法

Paulo和TC一直在收集整理关于敏捷回顾的任何想法和活动。在这篇内容里面,他们分享了7步法来帮助你组织你的下一次回顾。 Agenda structure: 1...

40380
来自专栏算法+

MP3 编码解码 附完整c代码

图像方面,已经有stb_image,spot,freeimage等编解码库系列,做得特别赞。

15340
来自专栏SAP梦心的SAP分享

公司消费一卡通“变法”记

一卡通在每家公司都存在,不仅含考勤机,还会有门禁,订餐,食堂消费等。我们公司采用的是厦门舒特科技的一卡通系统,前后用了好几年了。 在我之前,一卡通的功能主要启用...

21590
来自专栏算法修养

HDU-4539郑厂长系列故事——排兵布阵(状态压缩,动态规划)

郑厂长系列故事——排兵布阵 Time Limit : 10000/5000ms (Java/Other) Memory Limit : 65535/3276...

37050
来自专栏架构师之路

一幅图秒懂LoadAverage(负载)

一幅图秒懂LoadAverage(负载) 一、什么是Load Average? 系统负载(System Load)是系统CPU繁忙程度的度量,即有多少进程在等待...

36060

扫码关注云+社区

领取腾讯云代金券