Deep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, 强化学习本质是试错学习
and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings 缺点很多 from the current generation of deep learning techniques. For example, they require very large datasets 大数据量 to work effectively, entailing that they are slow 慢 to learn even when such datasets are available.
Moreover, they lack the ability to reason on an abstract level 缺少抽象思维, which makes it difficult to implement high-level cognitive functions such as transfer learning, analogical reasoning, and hypothesis-based reasoning. Finally, their operation is largely opaque to humans 机理不透明, rendering them unsuitable for domains in which verifiability is important.
In this paper, we propose an end-to-end reinforcement learning architecture comprising a neural back end and a symbolic front end with the potential to overcome each of these shortcomings. As proof-of-concept, we present a preliminary implementation of the architecture and apply it to several variants of a simple video game. We show that the resulting system – though just a prototype – learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game.
Here we take a different approach. We propose a novel reinforcement learning architecture that addresses all of these issues at once in a principled way by combining neural network learning 结合深度网络和传统符号计算 期望能增强学习能力 with aspects of classical symbolic AI, gaining the advantages of both methodologies without their respective disadvantages. Central to classical AI is the use of language-like propositional representations to encode knowledge.
Thanks to their compositional structure, such representations are amenable to endless extension and recombination, an essential feature for the acquisition and deployment of high-level abstract concepts, which are key to general intelligence (McCarthy, 1987). Moreover, knowledge expressed in propositional form can be exploited by multiple high-level reasoning processes and has general-purpose application across multiple tasks and domains. Features such as these, derived from the benefits of human language, motivated several decades of research in symbolic AI.
But as an approach to general intelligence, classical symbolic AI has been disappointing. A major obstacle here is the symbol grounding problem (Harnad, 1990; Shanahan, 2005). The symbolic elements of a representation in classical AI – the constants, functions, and predicates – are typically hand-crafted, rather than grounded in data from the real world. Philosophically speaking, this means their semantics are parasitic on meanings in the heads of their designers rather than deriving from a direct connection with the world.
传统手工特征存在的问题是：哲学家的分析是，特征不是从原始数据生成，数据不是建立在真实世界信息之上，信息是通过人的处理之后的抽象信息，因此信息失去了原始的丰富特性。对后续的扩展，新环境适应，都失去了根基。 深度学习是解决这个问题的方法；deepmind 做通用人工智能的思路； 因此整合深度学习和符号计算是一个很好的尝试
Pragmatically, hand-crafted representations cannot capture the rich statistics of real-world perceptual data, cannot support ongoing adaptation to an unknown environment, and are an obvious barrier to full autonomy. By contrast, none of these problems afflict machine learning. Deep neural networks in particular have proven to be remarkably effective for supervised learning from large datasets using backpropagation (LeCun et al., 2015; Schmidhuber,2015). Deep learning is therefore already a viable solution to the symbol grounding problem in the supervised case, and for the unsupervised case, which is essential for a full solution, rapid progress is being made (Chen et al., 2016; Goodfellow et al., 2014; Greff et al., 2016; Higgins et al., 2016; Kingma and Welling, 2013). The hybrid neural-symbolic reinforcement learning architecture we propose relies on a deep learning solution to the symbol grounding problem.
1) Conceptual abstraction 概念抽象学习
这里可以使用beta-VAE 谷歌：beta-vae 可以媲美infogan的无监督学习框架-多图-及代码； 进行语义基本的特征学习，大幅提高对外部世界的理解和认知
2) Compositional structure. 学习到的概念能够进行组合比较等，生成新的概念属性等。
3) Common sense priors ； 有先验知识，有常识知识，在任务训练前有对世界的理解的基础知识。
4) Causal reasoning. 因果推理的能力。
原文发布于微信公众号 - CreateAMind（createamind）