来源:AI研习社 来源: https://mp.weixin.qq.com/s/Fi904VUeUAQURSZrUQCarw
by wwxFromTju
项目地址:
https://github.com/wwxFromTju/awesome-reinforcement-learning-zh
2018-11-10:
1.加入OpenAI的spinningup
2. 加入中国台湾大学李宏毅的课
3. 加入 UCL 汪军老师 与 SJTU 张伟楠 老师 在 SJTU 做的 Multi-Agent Reinforcement Learning Tutorial
4. update UCB 与 CMU的DRL课到2018 fall 5. update Sutton 的书到 final version
目录
Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction update 第二版的最终版(点击obline draft): link,因为官方的是放在google doc上,所以我就下载了一个放在github上,需要自取 link
注:已经可以准备买实体书了,和同学各自海淘了一本,还没有到手 -- 国外亚马逊, 国内的话,可以考虑JD和国内的亚马逊--不过会贵一些
Csaba Szepesvari, Algorithms for Reinforcement Learning link
这个算是比较杂的书吧,有在线doc+对应的code+对应的练习(非常建议结合UCL的一起看,我大致过了一遍,蛮不错的。 * 但是没有提到下面的UCL,UCB的课,也没有提到上面sutton的书,结合得看或许会更好 * 在线的文档 link 关于强化学习的基础介绍 link 关于深度强化学习的建议 link 代码部分 link
课程主页 link
这个比较老了,有一个比较新的在google云盘上,我找个时间整理一下。
注:这是David Silver大神2015在UCL开的课,现在感觉已经在DeepMind走向巅峰了,估计得等他那天想回学校培养学生才可能开出新的课吧。非常推荐入门学习,建立基础的RL概念。 课程主页:link
对应slide(课件): Lecture 1: Introduction to Reinforcement Learning link
Lecture 2: Markov Decision Processes link
Lecture 3: Planning by Dynamic Programming link
Lecture 4: Model-Free Prediction link
Lecture 5: Model-Free Control link
Lecture 6: Value Function Approximation link
Lecture 7: Policy Gradient Methods link
Lecture 8: Integrating Learning and Planning link
Lecture 9: Exploration and Exploitation link
Lecture 10: Case Study: RL in Classic Games link
注:为2018 spring的课 课程主页: link
对应slide(课件): Introduction to Reinforcement Learning link
How to act given know how the world works. Tabular setting. Markov processes. Policy search. Policy iteration. Value iteration link
Learning to evaluate a policy when don't know how the world works. link
Model-free learning to make good decisions. Q-learning. SARSA. link
Scaling up: value function approximation. Deep Q Learning. link
Deep reinforcement learning continued. link
Imitation Learning. link
Policy search. link
Policy search. link
Midterm review. link
Fast reinforcement learning (Exploration/Exploitation) Part I. link
Fast reinforcement learning (Exploration/Exploitation) Part II. link
Batch Reinforcement Learning. link
Monte Carlo Tree Search. link
Human in the loop RL with a focus on transfer learing. link
注:因为在阿里广告这边实习,有幸和汪老师还有张老师做了篇论文。在过程中体会到汪老师的思维真的很活跃,很强。另外,张老师感觉是国内cs冉冉升起的新星,值得follow和关注!
课程主页 link
Fundamentals of Reinforcement Learning link Fundamentals of Game Theory link Learning in Repeated Games link Multi-Agent Reinforcement Learning link link
课程主页 [link](http://speech.ee.ntu.edu.tw/~tlkagk/courses/)
视频可以在B站上看到:link
课程主页: link
update:2018 fall(2018年秋季)
对应slide(课件):
Lecture Slides See Syllabus for more information.
Introduction and Course Overview link Supervised Learning and Imitation link TensorFlow and Neural Nets Review Session (notebook) link Reinforcement Learning Introduction link Policy Gradients Introduction link Actor-Critic Introduction link Value Functions and Q-Learning link Advanced Q-Learning Algorithms link Advanced Policy Gradients link Optimal Control and Planning link Model-Based Reinforcement Learning link Advanced Model Learning and Images link Learning Policies by Imitating Other Policies link Probability and Variational Inference Primer link Connection between Inference and Control linkInverse Reinforcement Learning link Explorationlink,link Transfer Learning and Multi-Task Learning link Meta-Learning link Parallelismand RL System Design link Advanced Imitation Learning and Open Problems link
update fall 2018
2018 fall 的课程主页 link 2017的课程主页: link
对应slide(课件): Introduction link
Markov decision processes (MDPs), POMDPs link
Solving known MDPs: Dynamic Programming link
Policy iteration, Value iteration, Asynchronous DP link
Monte Carlo Learning, Temporal difference learning, Q learning link
Temporal difference learning (Tom), Planning and learning: Dyna, Monte carlo tree search link
Deep NN Architectures for RL link
Recitation on Monte Carlo Tree Search link
VF approximation, MC, TD with VF approximation, Control with VF approximation link
Deep Q Learning : Double Q learning, replay memorylink Policy Gradients link link
Advanced Policy Gradients link
Evolution Methods, Natural Gradients link
Natural Policy Gradients, TRPO, PPO, ACKTR link
Pathwise Derivatives, DDPG, multigoal RL, HER link
Exploration vs. Exploitation link link
Exploration and RL in Animals link link
Model-based Reinforcement Learning link
Imitation Learning link
Maximum Entropy Inverse RL, Adversarial imitation learning link
Recitation: Trajectory optimization - iterative LQR link
Learning to learn, one shot learning[link](Learning to learn, one shot learning)