关键词:model-base、 model-free、 entropy 、mutual-info、 abstract 、skill-action、 goal-condition、 info-bottleneck、complex dynamics , behaviors dynamics , planet升级版 、diverse task、泛化、discover skill、predict , unsupervised, DADS, behaviors, zero-shaot planning , latent space, exploration curiosity, hierarchical , probabilistic graphical model
contral
state-action 不抽象太难, learning a specific behavior, predictability representaition,
强化学习 state 熵最大,skill 对 state的条件熵最小,技能对世界的控制最可控;可预测,学习小的动力模型,分而治之 ,避免学习全局模型
先无监督学习skill,再基于模型优化任务,skill-condition policy ;skill-condition transition function;
联系互信息探索和基于模型的强化学习
直觉:state 序列和skill的互信息最大,状态skill和动作的互信息最小。为什么这样 看附录下面截图
z x 互信息最小 z r 互信息最大
图9 p基于skill的策略action 10 q世界模型的抽象,skill而非action;
目标就是通过优化9 的 π 拉近 9 10 p q的距离
paper63 2001year refed:
继续接论文:
公式7 是两个变量的函数,每个变量分别优化,modelfree 训练π需要及时reard r(9)
公式67 推导参考:
chap4 等其他部分请参考原论文;
相关神文推荐
THERML:THE THERMODYNAMICS OF MACHINE LEARNING
ABSTRACT
In this work we offer an information-theoretic framework for representation learn- ing that connects with a wide class of existing objectives in machine learning. We develop a formal correspondence between this work and thermodynamics and dis- cuss its implications.
Information Bottleneck and its Applications in Deep Learning
kl regular paper: deepmind 2 paper;bert MI 1 paper;uber pomdp imitation 1 paper;
相关定义:
-----------------------------------------------------------------------------------------------------
2
6.1 压缩大量技能 skill
可预测非常非常重要,对环境的控制
--------------------------------------------
3
欢迎加入我们!更多内容请参考CreateAMind公众号菜单