Automated discovery of early visual concepts from raw image data is a major open challenge in AI research.
Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of variation. 属性分解
We draw inspiration from neuroscience, and show how this can be achieved in an unsupervised generative model by applying the same learning pressures as have been suggested to act in the ventral visual stream in the brain. 视觉脑神经启发
By enforcing redundancy reduction, encouraging statistical independence, and exposure to data with transform continuities analogous to those to which human infants are exposed, we obtain a variational autoencoder (VAE) framework capable of learning disentan gled factors. 婴儿类似的视觉环境
Our approach makes few assumptions and works well across a wide variety of datasets. Furthermore, our solution has useful emergent properties, such as zero-shot inference and an intuitive understanding of “objectness"。高级功能
1 Introduction 简介对实现类人智能的论述非常精彩
State-of-the-art AI approaches still struggle with some scenarios where humans excel ,
such as knowledge transfer, where faster learning is achieved by reusing learnt representations for numerous tasks (Fig. 1A); or zero-shot inference, where reasoning about new data is enabled by recombining previously learnt factors (Fig. 1B).
 suggest incorporating certain “start-up” abilities into deep models, such as intuitive understanding of physics, to help bootstrap learning in these scenarios.
Elaborating on this idea, we believe that learning basic visual concepts, such as the “objectness” of things in the world, and the ability to reason about objects in terms of the generative factors that specify their properties, is an important step towards building machines that learn and think like people.
We believe that this can be achieved by learning a disentangled posterior distribution of the generative factors of the observed sensory input by leveraging the wealth of unsupervised data [4, 21]. 属性分解
We wish to learn a representation where single latent units are sensitive to changes in single generative factors, while being relatively invariant to changes in other factors . 单一变量属性
With a disentangled representation, knowledge about one factor could generalise to many configurations of other factors, thus capturing the “multiple explanatory factors” and “shared factors across tasks” priors suggested by .
Unsupervised disentangled factor learning from raw image data is a major open challenge in AI. Most previous attempts require a priori knowledge of the number and/or nature of the data generative factors [16, 25, 35, 34, 13, 20, 8, 33, 17]. This is infeasible in the real world, where the newborn learner may have no a priori knowledge and little to no supervision for discovering the generative factors. So far any purely unsupervised approaches to disentangled factor learning have not scaled well [11, 30, 9, 10].
We propose a deep unsupervised generative approach for disentangled factor learning inspired by neuroscience [2, 3, 24, 15]. We apply similar learning constraints to the model as have been suggested to act in the ventral visual stream in the brain : redundancy reduction, an emphasis on learning statistically independent factors, and exposure to data with transform continuities analogous to those human infants are exposed to [2, 3]. We show that the application of such pressures to a deep unsupervised generative model can be realised in the variational autoencoder (VAE)framework [19, 26].
Our main contributions are the following: 1) we show the importance of neuroscience inspired constraints (data continuity, redundancy reduction and statistical independence) for learning disentangled representations of continuous visual generative factors; 2) we devise a protocol to quantitatively compare the degree of disentanglement learnt by different models; and 3) we demonstrate how learning disentangled representations enables zero-shot inference and the emergence of basic visual concepts, such as “objectness”.
原文发布于微信公众号 - CreateAMind（createamind）