专栏首页arxiv.org翻译专栏连续深层次强化学习的地空群牧(CS RO)
原创

连续深层次强化学习的地空群牧(CS RO)

由于群体之间耦合交互所固有的复杂性,多机器人(群体)的控制和引导是一个不小的问题。无论群是合作的还是非合作的,都可以从放牧绵羊的牧羊犬身上吸取教训。牧羊的仿生学为群体控制提供了计算方法,具有在不同环境中推广和扩展的潜力。但是,由于机器学习者面临着巨大的搜索空间,因此学习牧羊犬很复杂。我们提出了一种用于牧羊的深度分层强化学习方法,借此,无人机(UAV)学会充当空中牧羊犬,以控制和引导成群的无人机(UGV)。该方法扩展了我们先前有关机器教育的工作,以将搜索空间分解为按层次组织的课程。课程中的每节课都是通过深度强化学习模型来学习的。通过融合模型的输出来形成层次结构。该方法首先在基于高保真机器人操作系统(ROS)的仿真环境中得到证明,然后在室内测试设施中使用物理UGV和UAV进行演示。我们研究了该方法在模型从仿真转移到现实世界以及模型从一种规模转移到另一种规模时进行概括的能力。

原文题目:Continuous Deep Hierarchical Reinforcement Learning for Ground-Air Swarm Shepherding

原文:The control and guidance of multi-robots (swarm) is a non-trivial problem due to the complexity inherent in the coupled interaction among the group. Whether the swarm is cooperative or non cooperative, lessons could be learnt from sheepdogs herding sheep. Biomimicry of shepherding offers computational methods for swarm control with the potential to generalize and scale in different environments. However, learning to shepherd is complex due to the large search space that a machine learner is faced with. We present a deep hierarchical reinforcement learning approach for shepherding, whereby an unmanned aerial vehicle (UAV) learns to act as an Aerial sheepdog to control and guide a swarm of unmanned ground vehicles (UGVs). The approach extends our previous work on machine education to decompose the search space into hierarchically organized curriculum. Each lesson in the curriculum is learnt by a deep reinforcement learning model. The hierarchy is formed by fusing the outputs of the model. The approach is demonstrated first in a high-fidelity robotic-operating-system (ROS)-based simulation environment, then with physical UGVs and a UAV in an in-door testing facility. We investigate the ability of the method to generalize as the models move from simulation to the real-world and as the models move from one scale to another.

原文作者:Hung The Nguyen,Tung Nguyen,Vu Phi Tran,Matthew Garratt,Kathryn Kasmarik,Sreenatha Anavatti,Michael Barlow,Hussein A. Abbass

原文地址:https://arxiv.org/abs/2004.11543

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 移动操作的空间动作图(CS RO)

    本文提出了一种新的动作表示形式,用于学习执行复杂的移动操作任务。在典型的深度Q学习设置中,训练卷积神经网络(ConvNet)从表示当前状态的图像(例如,场景的S...

    时代在召唤
  • 耳面部外科应用的RCM新机制(CS RO)

    由于中耳或鼻窦腔的插入区域非常狭窄,内窥镜的活动性降低为围绕虚拟点旋转和平移以插入相机。本文首先介绍了从三维扫描中获得的这些区域的解剖结构,然后介绍了一种基于敏...

    时代在召唤
  • 非结构化环境中的定位:基于Delaunay三角剖分的森林自主机器人(CS RO)

    自主采伐和运输是林业的长期目标。一个主要的挑战是精确定位森林中的车辆和树木。森林是一种非结构化的环境,在这种环境下,很难找到一组重要的地标,为目前快速的基于特征...

    时代在召唤
  • Linux 下安装及配置heartbeat

    a、配置主机host解析 b、配置等效验证 c、高可用的相关服务配置(如httpd,myqld等),关闭自启动 d、如需要用到共享存储,还应配置相关...

    Leshami
  • redis配置详解

    ##redis配置详解 # Redis configuration file example. # # Note that in order to read ...

    joshua317
  • CentOS下redis集群安装

    环境: 一台CentOS虚拟机上部署六个节点,创建3个master,3个slave节点

    肖哥哥
  • 通俗易懂 empowered RL

    Inspired by examples from the animal kingdom, social sciences and games the ...

    用户1908973
  • 【Scikit-Learn 中文文档】流形学习 - 监督学习 - 用户指南 | ApacheCN

    中文文档: http://sklearn.apachecn.org/cn/stable/modules/manifold.html 英文文档: http:/...

    片刻
  • Codeforces Round #336 (Div. 2)【A.思维,暴力,B.字符串,暴搜,前缀和,C.暴力,D,区间dp,E,字符串,数学】

    A. Saitama Destroys Hotel time limit per test:1 second memory limit per test:256...

    Angel_Kitty
  • 什么是SAP UI5的Component-preload.js

    First of all, the Component-preload.js works as expected. In your design time, (...

    Jerry Wang

扫码关注云+社区

领取腾讯云代金券