https://github.com/brylevkirill/notes/blob/master/Reinforcement%20Learning.md
Reinforcement Learning is learning to maximize expected sum of future rewards for sequence of actions made by agent in environment with stochastic state unknown to agent and dependent on its actions.
Reinforcement Learning in general case is learning to act through trial and error with no provided models, labels, demonstrations or supervision signals other than delayed rewards for agent's actions.
Reinforcement Learning poses significant challenges beyond pattern recognition, including exploration, credit assignment, stability, safety.
definition by Sergey Levine video
"Reinforcement Learning is as hard as any problem in computer science, since any task with a computable description can be formulated in it."
"Reinforcement Learning is a general-purpose framework for decision-making:
"Deep Learning is a general-purpose framework for representation learning:
"We seek a single agent which can solve any human-level task:
(David Silver)
"Reinforcement Learning in Industry" by Nicolas Le Roux video
personalized web services at Microsoft (Custom Decision Service paper summary
)
"Personalized Web Services" chapter of book by Richard Sutton and Andrew Barto
datacenter cooling at Google (paper)
"Deep Reinforcement Learning: An Overview" by Yuxi Li paper
(slides)
other applications
"Why Tool AIs Want to Be Agent AIs" by Gwern Branwen:
"The logical extension of these neural networks all the way down papers is that an actor like Google / Baidu / Facebook / MS could effectively turn neural networks into a black box: a user/developer uploads through an API a dataset of input/output pairs of a specified type and a monetary loss function, and a top-level neural network running on a large GPU cluster starts autonomously optimizing over architectures & hyperparameters for the neural network design which balances GPU cost and the monetary loss, interleaved with further optimization over the thousands of previous submitted tasks, sharing its learning across all of the datasets / loss functions / architectures / hyperparameters, and the original user simply submits future data through the API for processing by the best neural network so far."
"A 'Brief' History of Game AI Up To AlphaGo" by Andrey Kurenkov
"AI for Classic Games" by David Silver video
"From TD(λ) to AlphaGo: Games, Neural Nets, Reinforcement Learning and Rollouts" by Gerry Tesauro video
paper
summary
"Mastering the Game of Go with Deep Neural Networks and Tree Search" by Silver et al. paper
summary
"Combining Online and Offline Knowledge in UCT" by Gelly and Silver paper
(talk video
)
AlphaGo Zero overview by David Silver video
AlphaGo Zero overview by Demis Hassabis video
AlphaGo overview by Demis Hassabis video
AlphaGo overview by David Silver video
AlphaGo overview by Aja Huang video
"Google AlphaGo is a historical tour of AI ideas: 70s (Alpha-Beta), 80s/90s (RL & self-play), 00's (Monte-Carlo), 10's (deep neural networks)."
history of ideas by Richard Sutton, Czaba Szepesvari, Michael Bowling, Ryan Hayward, Martin Muller video
"AlphaGo, In Context" by Andrej Karpathy
AlphaGo documentary video
AlphaGo vs Lee Sedol match:
game 1: overview video
+overview video
+overview text
+overview text
game 2: overview video
+overview video
+overview text
+overview text
game 3: overview video
+overview video
+overview text
+overview text
game 4: overview video
+overview video
+overview text
+overview text
game 5: overview video
+overview video
+overview text
+overview text
AlphaGo Master vs Ke Jie match:
game 1: overview video
+overview text
game 2: overview video
+overview text
game 3: overview video
+overview text
paper
"Safe and Nested Subgame Solving for Imperfect-Information Games" by Noam Brown and Tuomas Sandholm paper
(talk video
)
"Depth-Limited Solving for Imperfect-Information Games" by Brown, Sandholm, Amos paper
Libratus overview by Tuomas Sandholm video
Libratus overview by Tuomas Sandholm video
Libratus overview by Noam Brown video
Libratus overview by Noam Brown video
"Safe and Nested Subgame Solving for Imperfect-Information Games" by Noam Brown video
"Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning" by Noam Brown video
"The State of Techniques for Solving Large Imperfect-Information Games" by Tuomas Sandholm video
"The State of Techniques for Solving Large Imperfect-Information Games, Including Poker" by Tuomas Sandholm video
discussion with Noam Brown and Tuomas Sandholm
discussion with Noam Brown audio
discussion with Tuomas Sandholm audio
Libratus vs top professional players match discussion video
paper
"DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker" by Moravcik et al. paper
summary
http://deepstack.ai
http://twitter.com/DeepStackAI
DeepStack overview by Michael Bowling video
DeepStack overview by Michael Bowling video
DeepStack overview by Michael Bowling video
DeepStack overview by Michael Bowling video
discussion with Michael Bowling
discussion with Michael Johanson and Dustin Morrill
discussion with Michael Bowling and Dustin Morrill
DeepStack vs professional players games video
paper
summary
"Giraffe: Using Deep Reinforcement Learning to Play Chess" by Lai paper
summary
"Bootstrapping from Game Tree Search" by Veness et al. paper
summary
"KnightCap: A Chess Program that Learns by Combining TD(lambda) with Game-tree Search" by Baxter et al. paper
AlphaZero overview by David Silver video
AlphaZero overview by Demis Hassabis video
AlphaZero vs Stockfish match:
match highlights video
game 3: overview video
game 5: overview video
game 8: overview video
game 9: overview video
game 10: overview video
video
OpenAI 1v1 bot vs SumaiL game video
OpenAI 1v1 bot vs Arteezy game video
OpenAI 1v1 bot vs Pajkatt game video
paper
FTW agent overview
FTW agents team vs human team video
paper
summary
demo of IntelAct agent video
demo of agents from ViZDoom competition video
paper
summary
demo video
paper
"Analysis of Watson's Strategies for Playing Jeopardy!" by Tesauro et al. paper
"How Watson Learns Superhuman Jeopardy! Strategies" by Gerry Tesauro video
IBM Watson project summary
IBM Watson vs Ken Jennings vs Brad Rutter match video
video
overview by David Silver video
overview by Sergey Levine video
overview by Sergey Levine video
overview by Sergey Levine video
overview by Sergey Levine video
overview by Pieter Abbeel video
overview by Pieter Abbeel video
overview by Pieter Abbeel video
overview by Pieter Abbeel video
"Is (Deep) Reinforcement Learning Barking Up The Wrong Tree?" by Chris Atkeson video
interesting recent papers - imitation learning
introduction by Kevin Frans:
introduction by Massimiliano Patacchiola:
introduction by Benjamin Recht:
introduction by Shakir Mohamed:
overview by David Silver video
overview by Fedor Ratnikov video
in russian
Reinforcement Learning Summer School 2017 video
course by David Silver video
course by Michael Littman video
course from Yandex video
in russian
tutorial by Richard Sutton video
(write-up)
tutorial by Emma Brunskill video
"Theory of Reinforcement Learning" by Csaba Szepesvari video
"Reinforcement Learning: An Introduction" book by Richard Sutton and Andrew Barto (second edition) (code) "Reinforcement Learning: An Introduction" book by Richard Sutton and Andrew Barto (first edition) "Algorithms for Reinforcement Learning" book by Csaba Szepesvari
course notes by Ben Van Roy course slides by Richard Sutton
exercises and solutions by Shangtong Zhang exercises and solutions by Denny Britz exercises and solutions from Yandex
implementations of algorithms from Shangtong Zhang implementations of algorithms from Dulat Yerzat implementations of algorithms from Dulat Yerzat implementations of algorithms from Intel Nervana implementations of algorithms from RLCode team implementations of algorithms from OpenAI
"A Brief Survey of Deep Reinforcement Learning" by Arulkumaran et al. paper
"Deep Reinforcement Learning: An Overview" by Yuxi Li paper
(slides)
course by Sergey Levine, John Schulman and Chelsea Finn (videos) course by Ruslan Salakhutdinov and Katerina Fragkiadaki (videos)
Deep RL Bootcamp at Berkeley video
"The Nuts and Bolts of Deep RL Research" by John Schulman video
(slides,write-up)
"Deep Reinforcement Learning" workshop at NIPS 2016 "Abstraction in RL" workshop at ICML 2016 "Deep Reinforcement Learning: Frontiers and Challenges" workshop at IJCAI 2016 "Deep Reinforcement Learning" workshop at NIPS 2015 "Novel Trends and Applications in RL" workshop at NIPS 2014
deep learning
characteristics:
challenges:
problems:
components of algorithms (overview by Sergey Levine video
):
classifications of methods (overview by Sutton and Barto):
differences video
(by David Silver):
differences video
(by John Schulman):
differences video
(by Csaba Szepesvari)
"Expressivity, Trainability, and Generalization in Machine Learning" by Eric Jang "Deep Reinforcement Learning Doesn't Work Yet" by Alex Irpan "Reinforcement Learning Never Worked, and 'Deep' Only Helped a Bit" by Himanshu Sahni
model-based methods:
value-based methods:
policy-based methods:
overview by Michael Littman video
overview by Benjamin Recht
overview by Sergey Levine video
"Utilities" by Pieter Abbeel video
"Rethinking State Action and Reward in Reinforcement Learning" by Satinder Singh video
"Supervised Learning of Behaviors: Deep Learning, Dynamical Systems, and Behavior Cloning" by Sergey Levine video
"Learning Policies by Imitating Optimal Control" by Sergey Levine video
"Advanced Topics in Imitation Learning and Safety" by Chelsea Finn video
"An Invitation to Imitation" by Andrew Bagnell "Imitation Learning" chapter by Hal Daume
"Global Overview of Imitation Learning" by Attia and Dayan paper
"Imitation Learning: A Survey of Learning Methods" by Hussein et al. paper
interesting papers
introduction (part 2, 20:40) by Pieter Abbeel video
overview by Chelsea Finn video
overview by Chelsea Finn video
tutorial by Johannes Heidecke
"Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control" by Pieter Abbeel paper
"Maximum Entropy Inverse Reinforcement Learning" by Ziebart et al. paper
interesting papers
exploration and intrinsic motivation
"Learning in Brains and Machines: Synergistic and Modular Action" by Shakir Mohamed
Options framework:
overview by Doina Precup video
"Temporal Abstraction in Reinforcement Learning" by Doina Precup video
"Advances in Option Construction: The Option-Critic Architecture" by Pierre-Luc Bacon video
"Progress on Deep Reinforcement Learning with Temporal Abstraction" by Doina Precup video
Feudal framework:
overview by David Silver video
Hierarchical RL workshop video
Abstraction in RL workshop video
interesting papers