博客 | 过去10年NIPS顶会强化学习论文(100多篇)汇总(2008-2018年)

本文原载于微信公众号:深度强化学习算法 AI研习社经授权转载。欢迎关注 深度学习强化算法 微信公众号、及 AI研习社博客专栏

NIPS(NeurIPS),全称神经信息处理系统大会(Conference and Workshop on Neural Information Processing Systems),是一个关于机器学习和计算神经科学的国际会议。该会议固定在每年的12月举行,由NIPS基金会主办。NIPS是机器学习领域的顶级会议。在中国计算机学会的国际学术会议排名中,NIPS为人工智能领域的A类会议,自1987年到2000年,NIPS都在美国的丹佛举办。而在2001年到2010年,NIPS的举办地则是在加拿大的温哥华。此后,NIPS分别在在西班牙的格兰纳达(2011年),太浩湖(Lake Tahoe)(2012年到2013年),以及加拿大的蒙特利尔(2014到2015年)

自从数年前深度学习流行以来,NIPS 成为学术界、产业界重点关注的学术会议之一,参会人数从 5 年前的 2000 人一度飙升到 2017 年的 8000 多人。除参会人员,2017 年 NIPS 的论文投稿也创造了历史新高,达到了 3240 篇。最近的统计显示,NIPS 2018 论文投稿数量高达 4900 篇,比去年又多了 1600 多篇,在过去几年中,各个领域文章特别多,本文汇总了过去10年NIPS会议接收的108篇强化学习领域的文章内容,具体总结如下:

从表中我们可以看出强化学习在2012年是一个分水岭,经历了火热之后开始衰退,然后从2015年开始一路攀升,达到了录取38篇的数量,下面是历届accept paper的题目list

2008年(3)

  • Near-optimal Regret Bounds for Reinforcement Learning
  • Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning
  • Optimization on a Budget: A Reinforcement Learning Approach

2009(3)

  • Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
  • Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
  • Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

2010年(5)

  • Nonparametric Bayesian Policy Priors for Reinforcement Learning
  • Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories
  • Feature Construction for Inverse Reinforcement Learning
  • PAC-Bayesian Model Selection for Reinforcement Learning
  • Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains

2011年(7)

  • Nonlinear Inverse Reinforcement Learning with Gaussian Processes
  • A Reinforcement Learning Theory for Homeostatic Regulation
  • Action-Gap Phenomenon in Reinforcement Learning
  • Optimal Reinforcement Learning for Gaussian Systems
  • Reinforcement Learning using Kernel-Based Stochastic Factorization
  • MAP Inference for Bayesian Inverse Reinforcement Learning
  • Selecting the State-Representation in Reinforcement Learning

2012年(11)

  • Bayesian Hierarchical Reinforcement Learning
  • Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress
  • Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions
  • Inverse Reinforcement Learning through Structured Classification
  • Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
  • On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization
  • Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
  • Neurally Plausible Reinforcement Learning of Working Memory Tasks
  • Transferring Expectations in Model-based Reinforcement Learning
  • Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems
  • Cost-Sensitive Exploration in Bayesian Reinforcement Learning

2013年(3)

  • Reinforcement Learning in Robust Markov Decision Processes
  • Policy Shaping: Integrating Human Feedback with Reinforcement Learning
  • (More) Efficient Reinforcement Learning via Posterior Sampling

2014年(5)

  • Model-based Reinforcement Learning and the Eluder Dimension
  • Sparse Multi-Task Reinforcement Learning
  • Difference of Convex Functions Programming for Reinforcement Learning
  • Near-optimal Reinforcement Learning in Factored MDPs
  • RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning

2015年(2)

  • Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
  • Inverse Reinforcement Learning with Locally Consistent Reward Functions

2016年(7)

  • Tree-Structured Reinforcement Learning for Sequential Object Localization
  • Safe and Efficient Off-Policy Reinforcement Learning
  • Contextual-MDPs for PAC Reinforcement Learning with Rich Observations
  • Learning to Communicate with Deep Multi-Agent Reinforcement Learning
  • Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
  • Cooperative Inverse Reinforcement Learning
  • Linear Feature Encoding for Reinforcement Learning

2017年(24)

  • Hybrid Reward Architecture for Reinforcement Learning
  • Shallow Updates for Deep Reinforcement Learning
  • Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
  • Optimistic posterior sampling for reinforcement learning: worst-case regret bounds
  • Cold-Start Reinforcement Learning with Softmax Policy Gradient
  • Log-normality and Skewness of Estimated State/Action Values in Reinforcement Learning
  • Safe Model-based Reinforcement Learning with Stability Guarantees
  • Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs
  • Deep Reinforcement Learning from Human Preferences
  • EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
  • Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
  • Compatible Reward Inverse Reinforcement Learning
  • Bridging the Gap Between Value and Policy Based Reinforcement Learning
  • Compatible Reward Inverse Reinforcement Learning
  • Online Reinforcement Learning in Stochastic Games
  • Reinforcement Learning under Model Mismatch
  • A multi-agent reinforcement learning model of common-pool resource appropriation
  • Imagination-Augmented Agents for Deep Reinforcement Learning
  • Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
  • Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
  • Repeated Inverse Reinforcement Learning
  • A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
  • Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning
  • Agent Reinforcement Learning

2018年(38)

  • The Importance of Sampling inMeta-Reinforcement Learning
  • Learning Temporal Point Processes via Reinforcement Learning
  • Data-Efficient Hierarchical Reinforcement Learning
  • Fast deep reinforcement learning using online adjustments from the past
  • Inference Aided Reinforcement Learning for Incentive Mechanism Design in Crowdsourcing
  • A Lyapunov-based Approach to Safe Reinforcement Learning
  • Reinforcement Learning of Theorem Proving
  • Simple random search of static linear policies is competitive for reinforcement learning
  • Meta-Gradient Reinforcement Learning
  • Reinforcement Learning for Solving the Vehicle Routing Problem
  • Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning
  • REFUEL: Exploring Sparse Features in Deep Reinforcement Learning for Fast Disease Diagnosis
  • Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents
  • Distributed Multitask Reinforcement Learning with Quadratic Convergence
  • Constrained Cross-Entropy Method for Safe Reinforcement Learning
  • Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach
  • Verifiable Reinforcement Learning via Policy Extraction
  • Deep Reinforcement Learning of Marked Temporal Point Processes
  • Evolution-Guided Policy Gradient in Reinforcement Learning
  • Meta-Reinforcement Learning of Structured Exploration Strategies
  • Diversity-Driven Exploration Strategy for Deep Reinforcement Learning
  • Genetic-Gated Networks for Deep Reinforcement Learning
  • Visual Reinforcement Learning with Imagined Goals
  • Unsupervised Video Object Segmentation for Deep Reinforcement Learning
  • Total stochastic gradient algorithms and applications in reinforcement learning
  • Fighting Boredom in Recommender Systems with Linear Reinforcement Learning
  • Randomized Prior Functions for Deep Reinforcement Learning
  • Scalable Coordinated Exploration in Concurrent Reinforcement Learning
  • Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
  • Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making
  • Teaching Inverse Reinforcement Learners via Features and Demonstrations
  • Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies
  • Lifelong Inverse Reinforcement Learning
  • Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning
  • Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
  • Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

原文发布于微信公众号 - AI研习社(okweiwu)

原文发表时间:2019-05-18

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券