ExpectiMinimax Optimal strategy in games with chance nodes, Melkó E., Nagy B. (2007).
Sparse sampling A sparse sampling algorithm for near-optimal planning in large Markov decision processes, Kearns M. et al. (2002).
MCTS Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Rémi Coulom, SequeL (2006).
UCT Bandit based Monte-Carlo Planning, Kocsis L., Szepesvári C. (2006).
Bandit Algorithms for Tree Search, Coquelin P-A., Munos R. (2007).
OPD Optimistic Planning for Deterministic Systems, Hren J., Munos R. (2008).
OLOP Open Loop Optimistic Planning, Bubeck S., Munos R. (2010).
LGP Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning, Toussaint M. (2015). video️
AlphaGo Mastering the game of Go with deep neural networks and tree search, Silver D. et al. (2016).
AlphaGo Zero Mastering the game of Go without human knowledge, Silver D. et al. (2017).
AlphaZero Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Silver D. et al. (2017).
TrailBlazer Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning, Grill J. B., Valko M., Munos R. (2017).
MCTSnets Learning to search with MCTSnets, Guez A. et al. (2018).
ADI Solving the Rubik's Cube Without Human Knowledge, McAleer S. et al. (2018).
OPC/SOPC Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values, Busoniu L., Pall E., Munos R. (2018).
控制理论
(book) Constrained Control and Estimation, Goodwin G. (2005).
PI² A Generalized Path Integral Control Approach to Reinforcement Learning, Theodorou E. et al. (2010).
PI²-CMA Path Integral Policy Improvement with Covariance Matrix Adaptation, Stulp F., Sigaud O. (2010).
iLQG A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems, Todorov E. (2005).
iLQG+ Synthesis and stabilization of complex behaviors through online trajectory optimization, Tassa Y. (2012).
模型预测控制
(book) Model Predictive Control, Camacho E. (1995).
(book) Predictive Control With Constraints, Maciejowski J. M. (2002).
Linear Model Predictive Control for Lane Keeping and Obstacle Avoidance on Low Curvature Roads, Turri V. et al. (2013).
MPCC Optimization-based autonomous racing of 1:43 scale RC cars, Liniger A. et al. (2014). Video 1 | Video 2
MIQP Optimal trajectory planning for autonomous driving integrating logical constraints: An MIQP perspective, Qian X., Altché F., Bender P., Stiller C. de La Fortelle A. (2016).
安全控制
鲁棒控制
Minimax analysis of stochastic problems, Shapiro A., Kleywegt A. (2002).
Robust DP Robust Dynamic Programming, Iyengar G. (2005).
Robust Planning and Optimization, Laumanns M. (2011). (lecture notes)
Robust Markov Decision Processes, Wiesemann W., Kuhn D., Rustem B. (2012).
Coarse-Id On the Sample Complexity of the Linear Quadratic Regulator, Dean S., Mania H., Matni N., Recht B., Tu S. (2017).
Tube-MPPI Robust Sampling Based Model Predictive Control with Sparse Objective Information, Williams G. et al. (2018). Video
风险规避控制
A Comprehensive Survey on Safe Reinforcement Learning, García J., Fernández F. (2015).
RA-QMDP Risk-averse Behavior Planning for Autonomous Driving under Uncertainty, Naghshvar M. et al. (2018).
约束控制
ICS Will the Driver Seat Ever Be Empty, Fraichard T. (2014).
RSS On a Formal Model of Safe and Scalable Self-driving Cars, Shalev-Shwartz S. et al. (2017).
HJI-reachability Safe learning for control: Combining disturbance estimation, reachability analysis and reinforcement learning with systematic exploration, Heidenreich C. (2017).
BFTQ A Fitted-Q Algorithm for Budgeted MDPs, Carrara N. et al. (2018).
MPC-HJI On Infusing Reachability-Based Safety Assurance within Probabilistic Planning Frameworks for Human-Robot Vehicle Interactions, Leung K. et al. (2018).
不确定动力系统
Simulation of Controlled Uncertain Nonlinear Systems, Tibken B., Hofer E. (1995).