The basic idea is to use standard deep RL techniques to train a recurrent neural network, in such a way that the recurrent network comes to implement its own, free-standing RL procedure. As we shall illustrate, under the right circumstances, the secondary learned RL procedure can display an adaptiveness and sample efficiency that the original RL procedure lacks.
deepmind 一篇很难懂的文章,但是又很重要:元学习。
also learning a distribution学习共性,学习结构,学习分布
本文由zdx3578推荐。