专栏首页专知【论文推荐】最新八篇强化学习相关论文—残差网络、QMIX、元学习、动态速率分配、分层强化学习、抽象概况、快速物体检测、SOM

【论文推荐】最新八篇强化学习相关论文—残差网络、QMIX、元学习、动态速率分配、分层强化学习、抽象概况、快速物体检测、SOM

【导读】专知内容组整理了最近八篇强化学习(Reinforcement learning)相关文章,为大家进行介绍,欢迎查看!

1.BlockDrop: Dynamic Inference Paths in Residual Networks(BlockDrop:残差网络中的动态推断路径)



作者:Zuxuan Wu,Tushar Nagarajan,Abhishek Kumar,Steven Rennie,Larry S. Davis,Kristen Grauman,Rogerio Feris

摘要:Very deep convolutional neural networks offer excellent recognition results, yet their computational expense limits their impact for many real-world applications. We introduce BlockDrop, an approach that learns to dynamically choose which layers of a deep network to execute during inference so as to best reduce total computation without degrading prediction accuracy. Exploiting the robustness of Residual Networks (ResNets) to layer dropping, our framework selects on-the-fly which residual blocks to evaluate for a given novel image. In particular, given a pretrained ResNet, we train a policy network in an associative reinforcement learning setting for the dual reward of utilizing a minimal number of blocks while preserving recognition accuracy. We conduct extensive experiments on CIFAR and ImageNet. The results provide strong quantitative and qualitative evidence that these learned policies not only accelerate inference but also encode meaningful visual information. Built upon a ResNet-101 model, our method achieves a speedup of 20\% on average, going as high as 36\% for some images, while maintaining the same 76.4\% top-1 accuracy on ImageNet.

期刊:arXiv, 2018年3月30日

网址

http://www.zhuanzhi.ai/document/4df79a0e7ac6a695592bb121575f330a

2.QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning(QMIX:基于单调值函数因子的深度多智能体强化学习)



作者:Tabish Rashid,Mikayel Samvelyan,Christian Schroeder de Witt,Gregory Farquhar,Jakob Foerster,Shimon Whiteson

摘要:In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.

期刊:arXiv, 2018年3月20日

网址

http://www.zhuanzhi.ai/document/c17c70fe3802166c570a0c6153c49697

3.Learning to Adapt: Meta-Learning for Model-Based Control(学习适应:基于模型控制的元学习)



作者:Ignasi Clavera,Anusha Nagabandi,Ronald S. Fearing,Pieter Abbeel,Sergey Levine,Chelsea Finn

摘要:Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations can cause proficient but narrowly-learned policies to fail at test time. In this work, we propose to learn how to quickly and effectively adapt online to new situations as well as to perturbations. To enable sample-efficient meta-learning, we consider learning online adaptation in the context of model-based reinforcement learning. Our approach trains a global model such that, when combined with recent data, the model can be be rapidly adapted to the local context. Our experiments demonstrate that our approach can enable simulated agents to adapt their behavior online to novel terrains, to a crippled leg, and in highly-dynamic environments.

期刊:arXiv, 2018年3月30日

网址

http://www.zhuanzhi.ai/document/a2a848a9b0ea080048e08686ef4c946c

4.Cache-Enabled Dynamic Rate Allocation via Deep Self-Transfer Reinforcement Learning(通过深度自转移强化学习来实现缓存的动态速率分配)



作者:Zhengming Zhang,Yaru Zheng,Meng Hua,Yongming Huang,Luxi Yang

机构:Southeast University

摘要:Caching and rate allocation are two promising approaches to support video streaming over wireless network. However, existing rate allocation designs do not fully exploit the advantages of the two approaches. This paper investigates the problem of cache-enabled QoE-driven video rate allocation problem. We establish a mathematical model for this problem, and point out that it is difficult to solve the problem with traditional dynamic programming. Then we propose a deep reinforcement learning approaches to solve it. First, we model the problem as a Markov decision problem. Then we present a deep Q-learning algorithm with a special knowledge transfer process to find out effective allocation policy. Finally, numerical results are given to demonstrate that the proposed solution can effectively maintain high-quality user experience of mobile user moving among small cells. We also investigate the impact of configuration of critical parameters on the performance of our algorithm.

期刊:arXiv, 2018年3月30日

网址

http://www.zhuanzhi.ai/document/ee889b39f1abd03fa8fd105bb824cb8d

5.Video Captioning via Hierarchical Reinforcement Learning(基于分层强化学习的视频描述生成)



作者:Xin Wang,Wenhu Chen,Jiawei Wu,Yuan-Fang Wang,William Yang Wang

机构:University of California

摘要:Video captioning is the task of automatically generating a textual description of the actions in a video. Although previous work (e.g. sequence-to-sequence model) has shown promising results in abstracting a coarse description of a short video, it is still very challenging to caption a video containing multiple fine-grained actions with a detailed description. This paper aims to address the challenge by proposing a novel hierarchical reinforcement learning framework for video captioning, where a high-level Manager module learns to design sub-goals and a low-level Worker module recognizes the primitive actions to fulfill the sub-goal. With this compositional framework to reinforce video captioning at different levels, our approach significantly outperforms all the baseline methods on a newly introduced large-scale dataset for fine-grained video captioning. Furthermore, our non-ensemble model has already achieved the state-of-the-art results on the widely-used MSR-VTT dataset.

期刊:arXiv, 2018年3月29日

网址

http://www.zhuanzhi.ai/document/0e06d68487c1f38c870eed320088047e

6.Deep Communicating Agents for Abstractive Summarization(深度沟通智能体的抽象概况)



作者:Asli Celikyilmaz,Antoine Bosselut,Xiaodong He,Yejin Choi

机构:University of Washington

摘要:We present deep communicating agents in an encoder-decoder architecture to address the challenges of representing a long document for abstractive summarization. With deep communicating agents, the task of encoding a long text is divided across multiple collaborating agents, each in charge of a subsection of the input text. These encoders are connected to a single decoder, trained end-to-end using reinforcement learning to generate a focused and coherent summary. Empirical results demonstrate that multiple communicating encoders lead to a higher quality summary compared to several strong baselines, including those based on a single encoder or multiple non-communicating encoders.

期刊:arXiv, 2018年3月28日

网址

http://www.zhuanzhi.ai/document/2cb61ebaa88eb3db3f515f9f78fa641e

7.Dynamic Zoom-in Network for Fast Object Detection in Large Images(基于动态Zoom-in网络在大图像上的快速物体检测)



作者:Mingfei Gao,Ruichi Yu,Ang Li,Vlad I. Morariu,Larry S. Davis

摘要:We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.

期刊:arXiv, 2018年3月27日

网址

http://www.zhuanzhi.ai/document/3f713d0b55396afaeab8803effe5cc38

8.Modeling Others using Oneself in Multi-Agent Reinforcement Learning(在多智能体强化学习中对他人进行建模)



作者:Roberta Raileanu,Emily Denton,Arthur Szlam,Rob Fergus

摘要:We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent's actions and update its belief of their hidden state in an online manner. We evaluate this approach on three different tasks and show that the agents are able to learn better policies using their estimate of the other players' hidden states, in both cooperative and adversarial settings.

期刊:arXiv, 2018年3月23日

网址

http://www.zhuanzhi.ai/document/630e7bc3e8897b1c5c564c86a98cafaf

本文分享自微信公众号 - 专知(Quan_Zhuanzhi),作者:专知内容组

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2018-04-03

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习

    【导读】专知内容组整理了最近八篇主题模型(Topic Model)相关文章,为大家进行介绍,欢迎查看! 1. Application of Rényi and ...

    WZEARW
  • 【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

    【导读】专知内容组整理了最近六篇图像描述生成(Image Caption)相关文章,为大家进行介绍,欢迎查看! 1. Textually Customized ...

    WZEARW
  • 【论文推荐】最新七篇视觉问答(VQA)相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

    WZEARW
  • Germany flips to Apple-Google approach on phone contact tracing

    Germany flips to Apple-Google approach on smartphone contact tracing

    仇诺伊
  • 区块链是未来吗? 或者它只是一种时尚?

    区块链行业和美国大淘金者的共同点 今天用加密货币发生的事情让人联想到90年代的网络泡沫时代。投资科技公司的人们以及更加连接未来的雄心勃勃的想法几乎在一夜之间赚取...

    首席架构师智库
  • 区块链可以减少社会不平等吗?

    本文在腾讯云+社区人工智能专栏首发, 为原创翻译文章. 英语原文以引用方式给出 注释部分为译者为文中的相关概念注释 若手机显示不全, 请滑动屏幕 导读 201...

    benny
  • 追求卓越,勇攀高峰 - RWP中国之旅盛大来袭

    编辑手记:3月28日,Oracle RWP 性能之旅,北京站再度来袭!Andrew Holdsworth 和 Graham Wood 将带领大家在一天之内,探...

    数据和云
  • 【论文推荐】最新七篇视觉问答(VQA)相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

    WZEARW
  • 【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

    【导读】专知内容组整理了最近六篇图像描述生成(Image Caption)相关文章,为大家进行介绍,欢迎查看! 1. Textually Customized ...

    WZEARW
  • 【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习

    【导读】专知内容组整理了最近八篇主题模型(Topic Model)相关文章,为大家进行介绍,欢迎查看! 1. Application of Rényi and ...

    WZEARW

扫码关注云+社区

领取腾讯云代金券