前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >大量完整的强化学习内容

大量完整的强化学习内容

作者头像
CreateAMind
发布2018-07-20 14:18:11
1.1K0
发布2018-07-20 14:18:11
举报
文章被收录于专栏:CreateAMind

https://github.com/brylevkirill/notes/blob/master/Reinforcement%20Learning.md

Reinforcement Learning is learning to maximize expected sum of future rewards for sequence of actions made by agent in environment with stochastic state unknown to agent and dependent on its actions.

  • introduction
  • applications
  • overview
  • deep reinforcement learning
  • problems
  • exploration and intrinsic motivation
  • bandits
  • contextual bandits
  • model-based methods
  • value-based methods
  • policy-based methods
  • interesting papers
    • applications
    • exploration and intrinsic motivation
    • hierarchical reinforcement learning
    • model-based methods
    • value-based methods
    • policy-based methods
    • behavioral cloning
    • inverse reinforcement learning

introduction

Reinforcement Learning in general case is learning to act through trial and error with no provided models, labels, demonstrations or supervision signals other than delayed rewards for agent's actions.

Reinforcement Learning poses significant challenges beyond pattern recognition, including exploration, credit assignment, stability, safety.

definition by Sergey Levine video

"Reinforcement Learning is as hard as any problem in computer science, since any task with a computable description can be formulated in it."

"Reinforcement Learning is a general-purpose framework for decision-making:

  • Is for an agent with the capacity to act
  • Each action influences the agent's future state
  • Success is measured by a scalar reward signal
  • Goal: select actions to maximize future reward"

"Deep Learning is a general-purpose framework for representation learning:

  • Given an objective
  • Learn representation that is required to achieve objective
  • Directly from raw inputs
  • Using minimal domain knowledge"

"We seek a single agent which can solve any human-level task:

  • Reinforcement Learning defines the objective
  • Deep Learning gives the mechanism
  • Reinforcement Learning + Deep Learning = general intelligence"

(David Silver)


applications

industry

"Reinforcement Learning in Industry" by Nicolas Le Roux video


personalized web services at Microsoft (Custom Decision Service paper summary) "Personalized Web Services" chapter of book by Richard Sutton and Andrew Barto

datacenter cooling at Google (paper)

"Deep Reinforcement Learning: An Overview" by Yuxi Li paper (slides)

other applications


"Why Tool AIs Want to Be Agent AIs" by Gwern Branwen:

"The logical extension of these neural networks all the way down papers is that an actor like Google / Baidu / Facebook / MS could effectively turn neural networks into a black box: a user/developer uploads through an API a dataset of input/output pairs of a specified type and a monetary loss function, and a top-level neural network running on a large GPU cluster starts autonomously optimizing over architectures & hyperparameters for the neural network design which balances GPU cost and the monetary loss, interleaved with further optimization over the thousands of previous submitted tasks, sharing its learning across all of the datasets / loss functions / architectures / hyperparameters, and the original user simply submits future data through the API for processing by the best neural network so far."


games

"A 'Brief' History of Game AI Up To AlphaGo" by Andrey Kurenkov

"AI for Classic Games" by David Silver video "From TD(λ) to AlphaGo: Games, Neural Nets, Reinforcement Learning and Rollouts" by Gerry Tesauro video


  • Go "Mastering the Game of Go" chapter of book by Richard Sutton and Andrew Barto "Mastering the Game of Go without Human Knowledge" by Silver et al. paper summary "Mastering the Game of Go with Deep Neural Networks and Tree Search" by Silver et al. paper summary "Combining Online and Offline Knowledge in UCT" by Gelly and Silver paper (talk video) AlphaGo Zero overview by David Silver video AlphaGo Zero overview by Demis Hassabis video AlphaGo overview by Demis Hassabis video AlphaGo overview by David Silver video AlphaGo overview by Aja Huang video "Google AlphaGo is a historical tour of AI ideas: 70s (Alpha-Beta), 80s/90s (RL & self-play), 00's (Monte-Carlo), 10's (deep neural networks)." history of ideas by Richard Sutton, Czaba Szepesvari, Michael Bowling, Ryan Hayward, Martin Muller video "AlphaGo, In Context" by Andrej Karpathy AlphaGo documentary video AlphaGo vs Lee Sedol match: game 1: overview video +overview video +overview text +overview text game 2: overview video +overview video +overview text +overview text game 3: overview video +overview video +overview text +overview text game 4: overview video +overview video +overview text +overview text game 5: overview video +overview video +overview text +overview text AlphaGo Master vs Ke Jie match: game 1: overview video +overview text game 2: overview video +overview text game 3: overview video +overview text

  • Poker Libratus "Science" magazine paper "Safe and Nested Subgame Solving for Imperfect-Information Games" by Noam Brown and Tuomas Sandholm paper (talk video) "Depth-Limited Solving for Imperfect-Information Games" by Brown, Sandholm, Amos paper Libratus overview by Tuomas Sandholm video Libratus overview by Tuomas Sandholm video Libratus overview by Noam Brown video Libratus overview by Noam Brown video "Safe and Nested Subgame Solving for Imperfect-Information Games" by Noam Brown video "Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning" by Noam Brown video "The State of Techniques for Solving Large Imperfect-Information Games" by Tuomas Sandholm video "The State of Techniques for Solving Large Imperfect-Information Games, Including Poker" by Tuomas Sandholm video discussion with Noam Brown and Tuomas Sandholm discussion with Noam Brown audio discussion with Tuomas Sandholm audio Libratus vs top professional players match discussion video

  • Poker DeepStack "Science" magazine paper "DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker" by Moravcik et al. paper summary http://deepstack.ai http://twitter.com/DeepStackAI DeepStack overview by Michael Bowling video DeepStack overview by Michael Bowling video DeepStack overview by Michael Bowling video DeepStack overview by Michael Bowling video discussion with Michael Bowling discussion with Michael Johanson and Dustin Morrill discussion with Michael Bowling and Dustin Morrill DeepStack vs professional players games video

  • Chess "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by Silver et al. paper summary "Giraffe: Using Deep Reinforcement Learning to Play Chess" by Lai paper summary "Bootstrapping from Game Tree Search" by Veness et al. paper summary "KnightCap: A Chess Program that Learns by Combining TD(lambda) with Game-tree Search" by Baxter et al. paper AlphaZero overview by David Silver video AlphaZero overview by Demis Hassabis video AlphaZero vs Stockfish match: match highlights video game 3: overview video game 5: overview video game 8: overview video game 9: overview video game 10: overview video

  • Dota 2 OpenAI Five bots overview specification of reward function OpenAI 1v1 bot overview OpenAI 1v1 bot overview OpenAI 1v1 bot vs Dendi game video OpenAI 1v1 bot vs SumaiL game video OpenAI 1v1 bot vs Arteezy game video OpenAI 1v1 bot vs Pajkatt game video

  • Quake III Arena CTF "Human-level Performance in First-person Multiplayer Games with Population-based Deep Reinforcement Learning" by Jaderberg et al. paper FTW agent overview FTW agents team vs human team video

  • Doom "Learning to Act by Predicting the Future" by Dosovitskiy and Koltun paper summary demo of IntelAct agent video demo of agents from ViZDoom competition video

  • Atari "Human-level Video Game Play" chapter of book by Richard Sutton and Andrew Barto "Playing Atari with Deep Reinforcement Learning" by Mnih et al. paper summary demo video

  • Jeopardy! "Watson’s Daily-Double Wagering" chapter of book by Richard Sutton and Andrew Barto "Simulation, Learning and Optimization Techniques in Watson's Game Strategies" by Tesauro et al. paper "Analysis of Watson's Strategies for Playing Jeopardy!" by Tesauro et al. paper "How Watson Learns Superhuman Jeopardy! Strategies" by Gerry Tesauro video IBM Watson project summary IBM Watson vs Ken Jennings vs Brad Rutter match video

  • TD-Gammon "TD-Gammon" chapter of book by Richard Sutton and Andrew Barto overview by Gerry Tesauro video overview by David Silver video

robotics

overview by Sergey Levine video overview by Sergey Levine video overview by Sergey Levine video overview by Sergey Levine video

overview by Pieter Abbeel video overview by Pieter Abbeel video overview by Pieter Abbeel video overview by Pieter Abbeel video

"Is (Deep) Reinforcement Learning Barking Up The Wrong Tree?" by Chris Atkeson video

interesting recent papers - imitation learning


overview

introduction by Kevin Frans:

  • basics
  • Markov processes
  • planning
  • model-free methods
  • policy gradient methods
  • model-based methods

introduction by Massimiliano Patacchiola:

  • Dynamic Programming
  • Monte Carlo
  • Temporal Difference
  • Actor-Critic
  • Genetic Algorithms

introduction by Benjamin Recht:

  • "Make It Happen"
  • "Total Control"

introduction by Shakir Mohamed:

  • "Learning in Brains and Machines: Temporal Differences"
  • "Synergistic and Modular Action"

overview by David Silver video overview by Fedor Ratnikov video in russian


Reinforcement Learning Summer School 2017 video


course by David Silver video course by Michael Littman video course from Yandex video in russian


tutorial by Richard Sutton video (write-up) tutorial by Emma Brunskill video

"Theory of Reinforcement Learning" by Csaba Szepesvari video


"Reinforcement Learning: An Introduction" book by Richard Sutton and Andrew Barto (second edition) (code) "Reinforcement Learning: An Introduction" book by Richard Sutton and Andrew Barto (first edition) "Algorithms for Reinforcement Learning" book by Csaba Szepesvari


course notes by Ben Van Roy course slides by Richard Sutton

exercises and solutions by Shangtong Zhang exercises and solutions by Denny Britz exercises and solutions from Yandex

implementations of algorithms from Shangtong Zhang implementations of algorithms from Dulat Yerzat implementations of algorithms from Dulat Yerzat implementations of algorithms from Intel Nervana implementations of algorithms from RLCode team implementations of algorithms from OpenAI


deep reinforcement learning

"A Brief Survey of Deep Reinforcement Learning" by Arulkumaran et al. paper "Deep Reinforcement Learning: An Overview" by Yuxi Li paper (slides)


course by Sergey Levine, John Schulman and Chelsea Finn (videos) course by Ruslan Salakhutdinov and Katerina Fragkiadaki (videos)

Deep RL Bootcamp at Berkeley video

"The Nuts and Bolts of Deep RL Research" by John Schulman video(slides,write-up)


"Deep Reinforcement Learning" workshop at NIPS 2016 "Abstraction in RL" workshop at ICML 2016 "Deep Reinforcement Learning: Frontiers and Challenges" workshop at IJCAI 2016 "Deep Reinforcement Learning" workshop at NIPS 2015 "Novel Trends and Applications in RL" workshop at NIPS 2014


deep learning


problems

characteristics:

  • can learn any function
  • inherently handles uncertainty
    • uncertainty in actions
    • uncertainty in observations
  • directly maximizes criteria we care about
  • copes with delayed feedback
    • temporal credit assignment problem

challenges:

  • stability (non-stationary and online data)
  • credit assigment (delayed rewards and consequences)
  • exploration vs exploitation (need for trial and error)
  • using learned model of environment

problems:

  • adaptive methods for large number of conditions
  • exploration problem in large MDPs
  • learning and acting under partial information
  • hierarchical learning over multiple time scales
  • sample efficiency
  • algorithms for large or continuous action spaces
  • transfer learning
  • lifelong learning
  • efficient sample-based planning
  • multiagent or distributed learning
  • learning from demonstrations

components of algorithms (overview by Sergey Levine video):

  • generate samples / run the policy
  • fit a model / estimate the return
  • improve the policy

classifications of methods (overview by Sutton and Barto):

  • prediction vs control
  • MDPs vs bandits
  • model-based vs value-based vs policy-based
  • on-policy vs off-policy
  • bootstrapping vs Monte Carlo

reinforcement learning vs supervised learning

differences video (by David Silver):

  • there is no supervisor, only a reward signal
  • feedback is delayed, not instantaneous
  • time really matters (sequential, not i.i.d. data)
  • agent's actions affect subsequent data it receives

differences video (by John Schulman):

  • no full access to analytic representation of loss function being optimized - value has to be queried by interaction with environment
  • interacting with stateful environment (unknown, nonlinear, stochastic, arbitrarily complex) - next input depends on previous actions

differences video (by Csaba Szepesvari)


"Expressivity, Trainability, and Generalization in Machine Learning" by Eric Jang "Deep Reinforcement Learning Doesn't Work Yet" by Alex Irpan "Reinforcement Learning Never Worked, and 'Deep' Only Helped a Bit" by Himanshu Sahni


model-based vs value-based vs policy-based methods

model-based methods:

  • build prediction model for next state and reward after action
  • space complexity asymptotically less than space required to store MDP
  • define objective function measuring goodness of model (e.g. number of bits to reconstruct next state)
  • plan using model (e.g. by lookahead)
  • allows reasoning about task-independent aspects of environment
  • allows for transfer learning across domains and faster learning

value-based methods:

  • estimate the optimal value function Q*(s,a) (expected total reward from state s and action a under policy π)
  • this is the maximum value achievable under any policy

policy-based methods:

  • search directly for the optimal policy (behaviour function selecting actions given states) achieving maximum expected reward
  • often simpler to represent and learn good policies than good state value or action value functions (such as for robot grasping an object)
  • state value function doesn't prescribe actions (dynamics model becomes necessary)
  • action value function requires to solve maximization problem over actions (challenge for continuous / high-dimensional action spaces)
  • focus on discriminating between several actions instead of estimating values for every state-action
  • true objectives of expected cost is optimized (vs a surrogate like Bellman error)
  • suboptimal values does not necessarily give suboptimal actions in every state (but optimal values do)
  • easier generalization to continuous action spaces

overview by Michael Littman video overview by Benjamin Recht


forms of supervision
  • scalar rewards
  • demonstrated behavior (imitation, inferring reward)
  • self-supervision, prediction (model-based control)
  • auxiliary objectives
    • additional sensing modalities
    • learning related tasks
    • task-relevant properties of environment
    • exploration and intrinsic motivation

overview by Sergey Levine video

"Utilities" by Pieter Abbeel video "Rethinking State Action and Reward in Reinforcement Learning" by Satinder Singh video


imitation learning / behavioral cloning
  • learn agent's behavior in environment with unknown cost function via imitation of another agent's behavior

"Supervised Learning of Behaviors: Deep Learning, Dynamical Systems, and Behavior Cloning" by Sergey Levine video "Learning Policies by Imitating Optimal Control" by Sergey Levine video "Advanced Topics in Imitation Learning and Safety" by Chelsea Finn video

"An Invitation to Imitation" by Andrew Bagnell "Imitation Learning" chapter by Hal Daume

"Global Overview of Imitation Learning" by Attia and Dayan paper "Imitation Learning: A Survey of Learning Methods" by Hussein et al. paper

interesting papers


inverse reinforcement learning
  • infer underlying reward structure guiding agent’s behavior based on observations and model of environment
  • learn reward structure for modelling purposes or for imitation of another agent's behavior (apprenticeship)

introduction (part 2, 20:40) by Pieter Abbeel video overview by Chelsea Finn video overview by Chelsea Finn video

tutorial by Johannes Heidecke

"Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control" by Pieter Abbeel paper "Maximum Entropy Inverse Reinforcement Learning" by Ziebart et al. paper

interesting papers


exploration and intrinsic motivation


hierarchical reinforcement learning
  • simplify dimensionality of the action spaces over which we need to reason
  • enable quick planning and execution of low-level actions (such as robot movements)
  • provide a simple mechanism that connects plans and intentions to commands at the level of execution
  • support rapid learning and generalisation (that humans are capable of)

"Learning in Brains and Machines: Synergistic and Modular Action" by Shakir Mohamed

Options framework: overview by Doina Precup video "Temporal Abstraction in Reinforcement Learning" by Doina Precup video "Advances in Option Construction: The Option-Critic Architecture" by Pierre-Luc Bacon video "Progress on Deep Reinforcement Learning with Temporal Abstraction" by Doina Precup video

Feudal framework: overview by David Silver video

Hierarchical RL workshop video Abstraction in RL workshop video

interesting papers


off-policy learning
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2018-07-16,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • introduction
  • applications
    • industry
      • games
        • robotics
        • overview
        • deep reinforcement learning
        • problems
          • reinforcement learning vs supervised learning
            • model-based vs value-based vs policy-based methods
              • forms of supervision
                • imitation learning / behavioral cloning
                  • inverse reinforcement learning
                    • hierarchical reinforcement learning
                      • off-policy learning
                      领券
                      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档