专栏首页CreateAMinddeepmind 做通用人工智能的思路

deepmind 做通用人工智能的思路

Early Visual Concept Learning with Unsupervised Deep Learning


Automated discovery of early visual concepts from raw image data is a major open challenge in AI research.


Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of variation. 属性分解

We draw inspiration from neuroscience, and show how this can be achieved in an unsupervised generative model by applying the same learning pressures as have been suggested to act in the ventral visual stream in the brain. 视觉脑神经启发

By enforcing redundancy reduction, encouraging statistical independence, and exposure to data with transform continuities analogous to those to which human infants are exposed, we obtain a variational autoencoder (VAE) framework capable of learning disentan gled factors. 婴儿类似的视觉环境

Our approach makes few assumptions and works well across a wide variety of datasets. Furthermore, our solution has useful emergent properties, such as zero-shot inference and an intuitive understanding of “objectness"。高级功能

1 Introduction 简介对实现类人智能的论述非常精彩

State-of-the-art AI approaches still struggle with some scenarios where humans excel [21],

such as knowledge transfer, where faster learning is achieved by reusing learnt representations for numerous tasks (Fig. 1A); or zero-shot inference, where reasoning about new data is enabled by recombining previously learnt factors (Fig. 1B).


[21] suggest incorporating certain “start-up” abilities into deep models, such as intuitive understanding of physics, to help bootstrap learning in these scenarios.

Elaborating on this idea, we believe that learning basic visual concepts, such as the “objectness” of things in the world, and the ability to reason about objects in terms of the generative factors that specify their properties, is an important step towards building machines that learn and think like people.


We believe that this can be achieved by learning a disentangled posterior distribution of the generative factors of the observed sensory input by leveraging the wealth of unsupervised data [4, 21]. 属性分解

We wish to learn a representation where single latent units are sensitive to changes in single generative factors, while being relatively invariant to changes in other factors [4]. 单一变量属性

With a disentangled representation, knowledge about one factor could generalise to many configurations of other factors, thus capturing the “multiple explanatory factors” and “shared factors across tasks” priors suggested by [4].

Unsupervised disentangled factor learning from raw image data is a major open challenge in AI. Most previous attempts require a priori knowledge of the number and/or nature of the data generative factors [16, 25, 35, 34, 13, 20, 8, 33, 17]. This is infeasible in the real world, where the newborn learner may have no a priori knowledge and little to no supervision for discovering the generative factors. So far any purely unsupervised approaches to disentangled factor learning have not scaled well [11, 30, 9, 10].


We propose a deep unsupervised generative approach for disentangled factor learning inspired by neuroscience [2, 3, 24, 15]. We apply similar learning constraints to the model as have been suggested to act in the ventral visual stream in the brain [28]: redundancy reduction, an emphasis on learning statistically independent factors, and exposure to data with transform continuities analogous to those human infants are exposed to [2, 3]. We show that the application of such pressures to a deep unsupervised generative model can be realised in the variational autoencoder (VAE)framework [19, 26].


Our main contributions are the following: 1) we show the importance of neuroscience inspired constraints (data continuity, redundancy reduction and statistical independence) for learning disentangled representations of continuous visual generative factors; 2) we devise a protocol to quantitatively compare the degree of disentanglement learnt by different models; and 3) we demonstrate how learning disentangled representations enables zero-shot inference and the emergence of basic visual concepts, such as “objectness”.


本文分享自微信公众号 - CreateAMind(createamind),作者:zdx3578

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。




0 条评论
登录 后参与评论


  • STCN


  • 强化学习框架 IMPALA 介绍

    In this work we aim to solve a large collection of tasks using a single reinforc...

  • 开源ALNS 自适应大邻域搜索(Adaptive Large Neighborhood Search)

    This package offers a general, well-documented and tested implementation of the ...

  • AlexNet论文总结

    Introduction Preconditions ImageNet Objects in realistic settings exhibit cons...

  • 部署机器学习以帮助数字人性化:提高OpenStreetMap中的图像注释效率(CS HC)


  • 随机供应和灵活消费者的最优动态机制设计(CS GT)


  • How to Select an Object-Relational Mapping Tool for .NET

    The single, most important trap to watch out for when choosing an object-relatio...

  • 2017美国数学建模MCM C题(大数据)翻译 “合作和导航”

    Traffic capacity is limited in many regions of the United States due to the numb...

  • 基于PfSPZ的疟疾疫苗生产的蚊子取放系统(CS RO)


  • 【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

    【导读】专知内容组整理了最近五篇视觉图像分割(Image Segmentation)相关文章,为大家进行介绍,欢迎查看! 1. Conditional Rand...