专栏首页arxiv.org翻译专栏移动操作的空间动作图(CS RO)
原创

移动操作的空间动作图(CS RO)

本文提出了一种新的动作表示形式,用于学习执行复杂的移动操作任务。在典型的深度Q学习设置中,训练卷积神经网络(ConvNet)从表示当前状态的图像(例如,场景的SLAM重建的鸟瞰图)映射到预测的Q值。少量的转向命令动作(向前,向右转,向左转等)。相反,我们建议在与状态相同的域中进行动作表示:“空间动作图”。在我们的建议中,一组可能的动作由一幅图像的像素表示,其中每个像素代表通过最短路径穿过部分重建场景的障碍物到相应场景位置的轨迹。这种方法的一个显着优势是,每个状态操作值预测的空间位置代表了代理策略的本地里程碑(本地端点),在状态图像的本地视觉模式中可以很容易地识别出该里程碑。第二个优点是原子动作可以执行远程计划(遵循最短路径到达场景另一侧的点),因此使用深度Q网络来学习复杂行为更加简单。第三个优点是,我们可以使用具有跳过连接的全卷积网络(FCN),以高效地学习从状态图像到像素对齐的动作图像的映射。在使用学习将物体推到目标位置的机器人进行的实验中,我们发现,通过这种拟议的动作表示方法学习到的策略比传统方法具有更好的性能。

原文题目:Spatial Action Maps for Mobile Manipulation

原文:This paper proposes a new action representation for learning to perform complex mobile manipulation tasks. In a typical deep Q-learning setup, a convolutional neural network (ConvNet) is trained to map from an image representing the current state (e.g., a birds-eye view of a SLAM reconstruction of the scene) to predicted Q-values for a small set of steering command actions (step forward, turn right, turn left, etc.). Instead, we propose an action representation in the same domain as the state: "spatial action maps." In our proposal, the set of possible actions is represented by pixels of an image, where each pixel represents a trajectory to the corresponding scene location along a shortest path through obstacles of the partially reconstructed scene. A significant advantage of this approach is that the spatial position of each state-action value prediction represents a local milestone (local end-point) for the agent's policy, which may be easily recognizable in local visual patterns of the state image. A second advantage is that atomic actions can perform long-range plans (follow the shortest path to a point on the other side of the scene), and thus it is simpler to learn complex behaviors with a deep Q-network. A third advantage is that we can use a fully convolutional network (FCN) with skip connections to learn the mapping from state images to pixel-aligned action images efficiently. During experiments with a robot that learns to push objects to a goal location, we find that policies learned with this proposed action representation achieve significantly better performance than traditional alternatives.

原文作者:Jimmy Wu,Xingyuan Sun,Andy Zeng,Shuran Song,Johnny Lee,Szymon Rusinkiewicz,Thomas Funkhouser

原文地址:https://arxiv.org/abs/2004.09141

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 连续深层次强化学习的地空群牧(CS RO)

    由于群体之间耦合交互所固有的复杂性,多机器人(群体)的控制和引导是一个不小的问题。无论群是合作的还是非合作的,都可以从放牧绵羊的牧羊犬身上吸取教训。牧羊的仿生学...

    时代在召唤
  • 使用软件定义的网络管理工业通信延迟(CS RO)

    最近的技术进步促进了复杂的工业网络物理系统的发展,这些系统需要实时通信并保证延迟。在此类系统中违反延迟要求的后果可能会变得越来越严重。在本文中,我们提出了一种基...

    时代在召唤
  • 耳面部外科应用的RCM新机制(CS RO)

    由于中耳或鼻窦腔的插入区域非常狭窄,内窥镜的活动性降低为围绕虚拟点旋转和平移以插入相机。本文首先介绍了从三维扫描中获得的这些区域的解剖结构,然后介绍了一种基于敏...

    时代在召唤
  • Optimizing the number of centroids最优化形心数量

    Centroids are difficult to interpret, and it can also be very difficult to deter...

    到不了的都叫做远方
  • V-Rep随手笔记

    论坛是宝,要善于使用:http://www.forum.coppeliarobotics.com/

    zhangrelay
  • SAP CRM的订单模型移植到S/4HANA之后,到底做了哪些改进?

    Overall idea One order model consists of a series of objects with two different...

    Jerry Wang
  • CentOS下redis集群安装

    环境: 一台CentOS虚拟机上部署六个节点,创建3个master,3个slave节点

    肖哥哥
  • Gym 100952B&&2015 HIAST Collegiate Programming Contest B. New Job【模拟】

    B. New Job time limit per test:1 second memory limit per test:64 megabytes input...

    Angel_Kitty
  • SAP UI5和React的页面渲染性能比较

    I have been working as a Fiori application developer and nowadays I have read qu...

    Jerry Wang
  • HashMap探索01-源码注解翻译

    当时好奇HashMap与ConcurrentHashMap,在网上找资料时发现基本都是相关的源码分析,想自己看看JDK里面具体有些什么,于是有了这个系列,信马由...

    汐楓

扫码关注云+社区

领取腾讯云代金券