前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >机器人相关学术速递[6.29]

机器人相关学术速递[6.29]

作者头像
公众号-arXiv每日学术速递
发布2021-07-02 17:29:36
4570
发布2021-07-02 17:29:36
举报

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.RO机器人相关,共计28篇

【1】 If you trust me, I will trust you: the role of reciprocity in human-robot trust 标题:如果你相信我,我也会相信你:互惠在人与机器人信任中的作用

作者:Joshua Zonca,Anna Folso,Alessandra Sciutti 机构:. Cognitive Architecture for Collaborative Technologies (CONTACT) Unit, Italian Institute of, Technology, Genoa, Italy., . Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa 备注:Manuscript under review 链接:https://arxiv.org/abs/2106.14832 摘要:信任是人与机器人交互的基础。大量的证据表明,人与人之间的信任是通过相互作用来维持的,而对人与机器人交互中信任的相互动力学的了解却非常有限。为了研究人类-机器人信任中的互惠性,我们设计了一个联合任务,在这个任务中,人类参与者和仿人机器人在发出他们对伙伴信任的信号的同时做出感知判断。在实验过程中,机器人的信任被动态操纵。结果显示,参与者不太愿意从对他们表现出高度信任的机器人那里学习。然而,如果参与者期望未来与机器人互动,他们不愿意公开向机器人透露他们的不信任,这表明他们公开表达信任时出现了互惠。重要的是,当参与者与电脑互动时,这些效果消失了。我们的发现可能会对能够与人类有效合作的机器人的设计产生影响。 摘要:Trust is fundamental in human-human and human-robot interaction. Extensive evidence has shown that trust among humans is sustained by reciprocity, whereas knowledge on reciprocal dynamics of trust in human-robot interaction is very limited. To investigate reciprocity in human-robot trust, we designed a joint task in which a human participant and a humanoid robot made perceptual judgments while signaling their trust in the partner. The robot's trust was dynamically manipulated along the experiment. Results show that participants were less willing to learn from a robot that was showing high trust in them. However, participants were unwilling to overtly disclose their distrust to the robot if they expected future interactions with it, suggesting the emergence of reciprocity in their overt expression of trust. Importantly, these effects disappeared when participants interacted with a computer. Our findings may have an impact on the design of robots that could effectively collaborate with humans.

【2】 Instantaneous Capture Input for Balancing the Variable Height Inverted Pendulum 标题:用于平衡变高度倒立摆的瞬时捕获输入

作者:Junwei Liu,Hua Chen,Patrick M. Wensing,Wei Zhang 机构: Wensing is with the Department of Aerospace and MechanicalEngineering, University of Notre Dame 备注:IEEE Robotics and Automation Letters, accepted 链接:https://arxiv.org/abs/2106.14741 摘要:平衡是一个基本的需要腿机器人由于其不稳定的浮动基地性质。由于瞬时捕捉点(ICP)的概念,平衡控制在线性倒立摆等简单模型中得到了深入的研究,但质心高度不变的假设限制了它的应用。本文通过引入瞬时捕获输入(ICI)来研究变高度倒立摆(VHIP)模型的平衡问题,ICI是ICP的一个扩展,基于ICP的关键性质。也就是说,ICI可以作为状态的函数来计算,并且当该函数被用作控制策略时,ICI被呈现为静止的,并且系统最终将停止。这种特性为VHIP导出了一个可捕获状态的分析区域,该区域可用于概念性地指导步骤。为了进一步解决恢复过程中的状态和控制约束问题,我们提出并理论分析了一种基于显式ICI的在线最优反馈增益控制器。仿真结果表明,与基于运动发散分量的方法相比,该控制器在保持捕获性方面是有效的。 摘要:Balancing is a fundamental need for legged robots due to their unstable floating-base nature. Balance control has been thoroughly studied for simple models such as the linear inverted pendulum thanks to the concept of the instantaneous capture point (ICP), yet the constant center of mass height assumption limits the application. This paper explores balancing of the variable-height inverted pendulum (VHIP) model by introducing the \emph{instantaneous capture input} (ICI), an extension of the ICP based on its key properties. Namely, the ICI can be computed as a function of the state, and when this function is used as the control policy, the ICI is rendered stationary and the system will eventually come to a stop. This characterization induces an analytical region of capturable states for the VHIP, which can be used to conceptually guide where to step. To further address state and control constraints during recovery, we present and theoretically analyze an explicit ICI-based controller with online optimal feedback gains. Simulations demonstrate the validity of our controller for capturability maintenance compared to an approach based on the divergent component of motion.

【3】 Real-Time Multi-View 3D Human Pose Estimation using Semantic Feedback to Smart Edge Sensors 标题:基于语义反馈的智能边缘传感器实时多视角三维人体姿态估计

作者:Simon Bultmann,Sven Behnke 机构:Autonomous Intelligent Systems, University of Bonn, Germany, Accepted for Robotics: Science and Systems (RSS) 备注:Accepted for Robotics: Science and Systems (RSS), July 2021, 10 pages, 6 figures 链接:https://arxiv.org/abs/2106.14729 摘要:提出了一种基于多摄像机的三维人体姿态估计方法,该方法采用分布式智能边缘传感器,通过语义反馈环与后端相结合。每个摄像机视图的二维联合检测在专用的嵌入式推理处理器上本地执行。只有语义骨架表示通过网络传输,原始图像保留在传感器板上。基于三角剖分和包含人体骨架先验知识的人体模型,从中心后端的二维关节恢复三维姿势。从后端到各个传感器的反馈通道是在语义层上实现的。异中心三维姿态被反投影到传感器视图中,并与二维关节检测融合。通过引入全局上下文信息,可以改进每个传感器的局部语义模型。整个管道能够实时运行。我们在三个公共数据集上评估了我们的方法,在那里我们获得了最新的结果,并展示了我们反馈架构的优点,以及我们自己的多人实验设置。使用反馈信号可以提高二维关节检测,进而提高估计的三维姿态。 摘要:We present a novel method for estimation of 3D human poses from a multi-camera setup, employing distributed smart edge sensors coupled with a backend through a semantic feedback loop. 2D joint detection for each camera view is performed locally on a dedicated embedded inference processor. Only the semantic skeleton representation is transmitted over the network and raw images remain on the sensor board. 3D poses are recovered from 2D joints on a central backend, based on triangulation and a body model which incorporates prior knowledge of the human skeleton. A feedback channel from backend to individual sensors is implemented on a semantic level. The allocentric 3D pose is backprojected into the sensor views where it is fused with 2D joint detections. The local semantic model on each sensor can thus be improved by incorporating global context information. The whole pipeline is capable of real-time operation. We evaluate our method on three public datasets, where we achieve state-of-the-art results and show the benefits of our feedback architecture, as well as in our own setup for multi-person experiments. Using the feedback signal improves the 2D joint detections and in turn the estimated 3D poses.

【4】 SwarmPaint: Human-Swarm Interaction for Trajectory Generation and Formation Control by DNN-based Gesture Interface 标题:基于DNN手势界面的人-群交互轨迹生成与编队控制

作者:Valerii Serpiva,Ekaterina Karmanova,Aleksey Fedoseev,Stepan Perminov,Dzmitry Tsetserukou 机构:SkolkovoInstituteofScienceandTechnology 备注:ICUAS 21, 20XXIEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works 链接:https://arxiv.org/abs/2106.14698 摘要:基于多智能体系统的遥操作任务在支持人-群协同勘探救援行动中具有很大的潜力。然而,它需要一种直观的自适应控制方法来保证群体在杂乱和动态变化的环境中的稳定性。提出了一种新的基于DNN手势识别的人-群体交互系统,该系统允许用户通过直接的手部运动或通过手势界面绘制轨迹来控制群体的位置和队形。SwarmPoint的关键技术是用户能够通过在交互模式之间切换,在无需额外设备的情况下使用swarm执行各种任务。提出并发展了两类交互作用来调节群体行为:自由形式轨迹生成控制和成形编队控制。两个初步的用户研究是为了探索用户的表现和主观经验,从人群互动通过开发的控制模式。实验结果表明,在3种评估轨迹模式下,轨迹跟踪任务(手势绘制平均误差为5.6cm,鼠标绘制平均误差为3.1cm,尺寸为1m×1m)具有足够的精度,而在两种1m目标模式下,SwarmPaint界面的目标跟踪任务精度高达7.3cm。此外,参与者认为轨迹绘制界面比直接通过手势进行形状和位置控制更直观(12.9%),使用起来也更省力(22.7%),尽管其身体负荷和表现失败被认为更显著(分别为9.1%和16.3%)。 摘要:Teleoperation tasks with multi-agent systems have a high potential in supporting human-swarm collaborative teams in exploration and rescue operations. However, it requires an intuitive and adaptive control approach to ensure swarm stability in a cluttered and dynamically shifting environment. We propose a novel human-swarm interaction system, allowing the user to control swarm position and formation by either direct hand motion or by trajectory drawing with a hand gesture interface based on the DNN gesture recognition. The key technology of the SwarmPaint is the user's ability to perform various tasks with the swarm without additional devices by switching between interaction modes. Two types of interaction were proposed and developed to adjust a swarm behavior: free-form trajectory generation control and shaped formation control. Two preliminary user studies were conducted to explore user's performance and subjective experience from human-swarm interaction through the developed control modes. The experimental results revealed a sufficient accuracy in the trajectory tracing task (mean error of 5.6 cm by gesture draw and 3.1 cm by mouse draw with the pattern of dimension 1 m by 1 m) over three evaluated trajectory patterns and up to 7.3 cm accuracy in targeting task with two target patterns of 1 m achieved by SwarmPaint interface. Moreover, the participants evaluated the trajectory drawing interface as more intuitive (12.9 %) and requiring less effort to utilize (22.7%) than direct shape and position control by gestures, although its physical workload and failure in performance were presumed as more significant (by 9.1% and 16.3%, respectively).

【5】 Optimized Wireless Control and Telemetry Network for Mobile Soccer Robots 标题:移动足球机器人无线控制与遥测网络的优化

作者:Lucas Cavalcanti,Riei Joaquim,Edna Barros 机构:Centro de Inform´atica, Universidade Federal de Pernambuco., Av. Prof. Moraes Rego, - Cidade Universit´aria, Recife - Pernambuco, Brazil. 备注:12 pages, published in RoboCup Symposium 2021 链接:https://arxiv.org/abs/2106.14617 摘要:在一系列不同的机器人应用中,包括RoboCup类别,移动机器人需要控制命令才能与周围环境正确交互。这些控制命令应该是无线的,不会干扰机器人的运动;此外,通信有一系列要求,包括低延迟和一致的交付。本文提出了一种由计算机与基站通信组成的完整的通信体系结构,该结构将数据传输给机器人,并将机器人遥测数据返回给计算机。通过这种通信方式,六个机器人都可以在不到4.5毫秒的时间内发送信息。 摘要:In a diverse set of robotics applications, including RoboCup categories, mobile robots require control commands to interact with surrounding environment correctly. These control commands should come wirelessly to not interfere in robots' movement; also, the communication has a set of requirements, including low latency and consistent delivery. This paper presents a complete communication architecture consisting of computer communication with a base station, which transmits the data to robots and returns robots telemetry to the computer. With the proposed communication, it is possible to send messages in less than 4.5ms for six robots with telemetry enables in all of them.

【6】 Dataset and Benchmarking of Real-Time Embedded Object Detection for RoboCup SSL 标题:RoboCup SSL实时嵌入式目标检测的数据集和基准

作者:Roberto Fernandes,Walber M. Rodrigues,Edna Barros 机构:Centro de Inform´atica, Universidade Federal de Pernambuco. Recife, Brazil. 链接:https://arxiv.org/abs/2106.14597 摘要:在特定上下文中生成模型到对象检测时,第一个障碍是要有一个标记所需类的数据集。在RoboCup中,有些联盟已经有多个数据集来训练和评估模型。然而,在小型联盟(SSL)中,还没有这样的数据集可用。本文提出了一个开源数据集作为SSL中实时目标检测的基准。本文还提出了一种在低功耗嵌入式系统中训练、部署和评估卷积神经网络(CNNs)模型的方法。这个管道被用来评估提出的数据集与最先进的优化模型。在这个数据集中,MobileNet SSD v1在SSL robot上运行时,以每秒94帧(FPS)的速度达到44.88%的AP(68.81%的AP50)。 摘要:When producing a model to object detection in a specific context, the first obstacle is to have a dataset labeling the desired classes. In RoboCup, some leagues already have more than one dataset to train and evaluate a model. However, in the Small Size League (SSL), there is not such dataset available yet. This paper presents an open-source dataset to be used as a benchmark for real-time object detection in SSL. This work also presented a pipeline to train, deploy, and evaluate Convolutional Neural Networks (CNNs) models in a low-power embedded system. This pipeline was used to evaluate the proposed dataset with state-of-art optimized models. In this dataset, the MobileNet SSD v1 achieves 44.88% AP (68.81% AP50) at 94 Frames Per Second (FPS) while running on an SSL robot.

【7】 Active Safety System for Semi-Autonomous Teleoperated Vehicles 标题:半自动遥控车辆的主动安全系统

作者:Smit Saparia,Andreas Schimpe,Laura Ferranti 机构: Saparia was with the Institute for Automotive Technology at theTechnical University of Munich (TUM), Germany and Department ofCognitive Robotics, Delft University of Technology 备注:Accepted at workshop for Road Vehicle Teleoperation (WS09) at the 2021 IEEE Intelligent Vehicles Symposium (IV21) 链接:https://arxiv.org/abs/2106.14554 摘要:自动驾驶汽车可以减少道路交通事故,提供更安全的交通方式。然而,在将这些车辆投放市场之前,需要解决关键的技术挑战,例如复杂城市环境中的安全导航。遥操作有助于从人工操作车辆平稳过渡到全自动车辆,因为它仍然有人在回路中,提供了驾驶员的后备范围。提出了一种用于遥控驾驶的主动安全系统方法。所提出的方法有助于操作者在复杂环境中确保车辆的安全,即避免与静态或动态障碍物发生碰撞。我们的ASS依靠模型预测控制(MPC)来控制车辆的横向和纵向动力学。通过利用MPC框架处理约束的能力,我们的ASS限制了控制器的权限,以进行人工操作指令的横向修正,避免了人工操作的反直觉驾驶体验。此外,我们还设计了一种视觉反馈来增强操作者对ASS的信任。此外,我们还提出了一种基于MPC预测视界数据的新型预测显示来减轻遥操作系统中大延迟的影响。我们在一个高保真车辆模拟器上测试了该方法在动态障碍物和延迟情况下的性能。 摘要:Autonomous cars can reduce road traffic accidents and provide a safer mode of transport. However, key technical challenges, such as safe navigation in complex urban environments, need to be addressed before deploying these vehicles on the market. Teleoperation can help smooth the transition from human operated to fully autonomous vehicles since it still has human in the loop providing the scope of fallback on driver. This paper presents an Active Safety System (ASS) approach for teleoperated driving. The proposed approach helps the operator ensure the safety of the vehicle in complex environments, that is, avoid collisions with static or dynamic obstacles. Our ASS relies on a model predictive control (MPC) formulation to control both the lateral and longitudinal dynamics of the vehicle. By exploiting the ability of the MPC framework to deal with constraints, our ASS restricts the controller's authority to intervene for lateral correction of the human operator's commands, avoiding counter-intuitive driving experience for the human operator. Further, we design a visual feedback to enhance the operator's trust over the ASS. In addition, we propose an MPC's prediction horizon data based novel predictive display to mitigate the effects of large latency in the teleoperation system. We tested the performance of the proposed approach on a high-fidelity vehicle simulator in the presence of dynamic obstacles and latency.

【8】 UAV-Based Search and Rescue in Avalanches using ARVA: An Extremum Seeking Approach 标题:基于ARVA的雪崩无人机搜救:一种极值搜索方法

作者:Ilario Antonio Azzollini,Nicola Mimmo,Lorenzo Gentilini,Lorenzo Marconi 链接:https://arxiv.org/abs/2106.14514 摘要:这项工作处理的问题是定位一个受害者埋在雪崩通过无人机配备了一个ARVA(仪器Recherche de Victimes d'avalanche)传感器。所提出的控制方案是基于一个“无模型”极值搜索策略,它被证明能成功地在受害者位置附近操纵无人机。该算法的有效性和鲁棒性在Gazebo仿真环境下进行了测试,其中一个新的飞行模式和一个新的控制器模块被实现为著名PX4开源飞行堆栈的扩展。最后,为了测试可用性,我们在pixhawk2立方体板上进行了半实物仿真。 摘要:This work deals with the problem of localizing a victim buried by an avalanche by means of a drone equipped with an ARVA (Appareil de Recherche de Victimes d'Avalanche) sensor. The proposed control solution is based on a "model-free" extremum seeking strategy which is shown to succeed in steering the drone in a neighborhood of the victim position. The effectiveness and robustness of the proposed algorithm is tested in Gazebo simulation environment, where a new flight mode and a new controller module have been implemented as an extension of the well-known PX4 open source flight stack. Finally, to test usability, we present hardware-in-the-loop simulations on a Pixhawk 2 Cube board.

【9】 Design of a user-friendly control system for planetary rovers with CPS feature 标题:具有CPS功能的行星漫游车人性化控制系统设计

作者:Sebastiano Chiodini,Riccardo Giubilato,Marco Pertile,Annarita Tedesco,Domenico Accardo,Stefano Debei 机构:CISAS ”Giuseppe Colombo”, University of Padova, Padova, Italy, Institute of Robotics and Mechatronics, DLR, Wessling, Germany, Dept. of Industrial Engineering, IMS Laboratory, University of Bordeaux, Bordeaux, France, University of Napoli Federico II, Napoli, Italy 备注:To be presented at the 8th IEEE International Workshop on Metrology for Aerospace (MetroAeroSpace) 链接:https://arxiv.org/abs/2106.14507 摘要:本文介绍了一种用户友好的低延迟地面遥控机器人行星探测器控制系统。由于所提议的系统,操作员可以使用商用现货(COTS)操纵杆通过控制基站轻松地向漫游者发出命令,或者通过在感知到的环境地图上进行交互式监控的命令排序。在作战期间,由于三维地图可视化,高态势感知成为可能。通过使用视觉同步定位和映射(SLAM)算法处理漫游者的相机图像,在车载计算机上构建环境地图。它通过Wi-Fi进行传输,并以近乎实时的方式显示在控制基站屏幕上。导航堆栈以可视SLAM数据作为输入,构建代价图,以找到最小代价路径。通过与虚拟地图交互,漫游者展示了网络物理系统(CPS)的自我感知能力。软件体系结构基于机器人操作系统(ROS)中间件。文中给出了系统设计和初步的现场试验结果。 摘要:In this paper, we present a user-friendly planetary rover's control system for low latency surface telerobotic. Thanks to the proposed system, an operator can comfortably give commands through the control base station to a rover using commercially available off-the-shelf (COTS) joysticks or by command sequencing with interactive monitoring on the sensed map of the environment. During operations, high situational awareness is made possible thanks to 3D map visualization. The map of the environment is built on the on-board computer by processing the rover's camera images with a visual Simultaneous Localization and Mapping (SLAM) algorithm. It is transmitted via Wi-Fi and displayed on the control base station screen in near real-time. The navigation stack takes as input the visual SLAM data to build a cost map to find the minimum cost path. By interacting with the virtual map, the rover exhibits properties of a Cyber Physical System (CPS) for its self-awareness capabilities. The software architecture is based on the Robot Operative System (ROS) middleware. The system design and the preliminary field test results are shown in the paper.

【10】 Real-time Human-Robot Collaborative Manipulations of Cylindrical and Cubic Objects via Geometric Primitives and Depth Information 标题:基于几何基元和深度信息的柱面和立方体物体的人-机器人实时协同操作

作者:Huixu Dong,Jiadong Zhou,Haoyong Yu 链接:https://arxiv.org/abs/2106.14461 摘要:在家庭和工业环境中常见的许多物体都用圆柱形和立方形来表示。因此,通过实时检测由圆形和矩形物体顶部形成的椭圆和矩形形状基元,机器人可以对其进行操作。我们设计了一个鲁棒的抓取系统,通过所提出的感知策略,包括椭圆和矩形形状原语的检测和深度信息,使机器人能够在协作场景中操作圆柱形和立方体物体。提出的椭圆和矩形检测方法采用了一级检测主干,然后将所提出的自适应多分支多尺度网络与设计的迭代特征金字塔网络、局部初始网络和多接收域特征融合网络相结合,生成目标检测建议。针对不同形状的物体的操作,提出了一种基于检测器和深度信息的抓取综合方法,使抓取器的抓取姿态与物体的姿态保持一致。所提出的机器人感知算法已集成到机器人上,证明了该算法能够实时地对圆柱形和立方体物体进行人机协同操作。我们展示了机器人操作器,由提出的探测器授权,在实际操作场景中表现良好,https://www.youtube.com/watch?v=Amcs8lwvNK8.) 摘要:Many objects commonly found in household and industrial environments are represented by cylindrical and cubic shapes. Thus, it is available for robots to manipulate them through the real-time detection of elliptic and rectangle shape primitives formed by the circular and rectangle tops of these objects. We devise a robust grasping system that enables a robot to manipulate cylindrical and cubic objects in collaboration scenarios by the proposed perception strategy including the detection of elliptic and rectangle shape primitives and depth information. The proposed method of detecting ellipses and rectangles incorporates a one-stage detection backbone and then, accommodates the proposed adaptive multi-branch multi-scale net with a designed iterative feature pyramid network, local inception net, and multi-receptive-filed feature fusion net to generate object detection recommendations. In terms of manipulating objects with different shapes, we propose the grasp synthetic to align the grasp pose of the gripper with an object's pose based on the proposed detector and registered depth information. The proposed robotic perception algorithm has been integrated on a robot to demonstrate the ability to carry out human-robot collaborative manipulations of cylindrical and cubic objects in real-time. We show that the robotic manipulator, empowered by the proposed detector, performs well in practical manipulation scenarios.(An experiment video is available in YouTube, https://www.youtube.com/watch?v=Amcs8lwvNK8.)

【11】 VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects 标题:VAT-MART:学习操作3D关节对象的视觉动作轨迹建议

作者:Ruihai Wu,Yan Zhao,Kaichun Mo,Zizheng Guo,Yian Wang,Tianhao Wu,Qingnan Fan,Xuelin Chen,Leonidas Guibas,Hao Dong 机构:CFCS, CS Dept., PKU, AIIT, PKU, Stanford University, Tencent AI Lab 链接:https://arxiv.org/abs/2106.14440 摘要:在人类环境中感知和操纵三维关节对象(如橱柜、门)是未来家庭助理机器人的一项重要而富有挑战性的任务。三维关节对象的空间异常丰富,语义种类繁多,形状几何多样,零件功能复杂。以往的工作主要是通过估计关节参数和零件姿态来抽象运动结构,作为操纵三维关节对象的视觉表示。在本文中,我们提出以对象为中心的可操作视觉先验作为一种新的感知交互握手点,通过预测密集几何感知、交互感知和任务感知的视觉动作提供和轨迹建议,感知系统输出比运动结构估计更多的可操作指导。我们设计了一个交互感知框架,通过同时训练好奇心驱动的强化学习策略(探索不同的交互轨迹)和感知模块(总结和概括所探索的知识,以便在不同的交互轨迹中进行逐点预测),来学习这种可操作的视觉表征形状。通过SAPIEN环境下的大规模PartNet移动数据集的实验验证了该方法的有效性,并对新的测试形状、不可见的对象类别和真实数据显示了良好的泛化能力。项目页面:https://hyperplane-lab.github.io/vat-mart 摘要:Perceiving and manipulating 3D articulated objects (e.g., cabinets, doors) in human environments is an important yet challenging task for future home-assistant robots. The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality. Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects. In this paper, we propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware, interaction-aware, and task-aware visual action affordance and trajectory proposals. We design an interaction-for-perception framework VAT-Mart to learn such actionable visual representations by simultaneously training a curiosity-driven reinforcement learning policy exploring diverse interaction trajectories and a perception module summarizing and generalizing the explored knowledge for pointwise predictions among diverse shapes. Experiments prove the effectiveness of the proposed approach using the large-scale PartNet-Mobility dataset in SAPIEN environment and show promising generalization capabilities to novel test shapes, unseen object categories, and real-world data. Project page: https://hyperplane-lab.github.io/vat-mart

【12】 Habitat 2.0: Training Home Assistants to Rearrange their Habitat 标题:栖息地2.0:训练家务助理重新安排他们的栖息地

作者:Andrew Szot,Alex Clegg,Eric Undersander,Erik Wijmans,Yili Zhao,John Turner,Noah Maestre,Mustafa Mukadam,Devendra Chaplot,Oleksandr Maksymets,Aaron Gokaslan,Vladimir Vondrus,Sameer Dharur,Franziska Meier,Wojciech Galuba,Angel Chang,Zsolt Kira,Vladlen Koltun,Jitendra Malik,Manolis Savva,Dhruv Batra 机构:Facebook AI Research,Georgia Tech,Intel Research,Simon Fraser University ,UC Berkeley 链接:https://arxiv.org/abs/2106.14405 摘要:我们介绍了habitat2.0(H2.0),一个在交互式3D环境和复杂物理场景中训练虚拟机器人的仿真平台。我们对所有层次的具体化人工智能堆栈(数据、模拟和基准任务)做出了全面的贡献。具体来说,我们提出:(i)ReplicaCAD:艺术家创作、注释、可重构的公寓(匹配真实空间)3D数据集,带有铰接对象(例如可以打开/关闭的橱柜和抽屉)(ii)H2.0:高性能物理支持的3D模拟器,在8-GPU节点上的速度超过每秒25000个模拟步骤(实时8500倍),比以前的工作速度提高了100倍;(iii)家庭助理基准(HAB):一套辅助机器人的常见任务(整理房子、准备食品、摆桌子),测试一系列移动操作能力。这些大规模的工程贡献使我们能够系统地比较大规模的深度强化学习(RL)和经典意义上的计划法(SPA)在长时间结构化任务中的管道,重点是对新对象、容器和布局的概括。我们发现:(1)与等级策略相比,扁平RL策略在HAB上比较困难(2) 具有独立技能的等级制度存在“交接问题”(3)SPA管道比RL政策更脆弱。 摘要:We introduce Habitat 2.0 (H2.0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios. We make comprehensive contributions to all levels of the embodied AI stack - data, simulation, and benchmark tasks. Specifically, we present: (i) ReplicaCAD: an artist-authored, annotated, reconfigurable 3D dataset of apartments (matching real spaces) with articulated objects (e.g. cabinets and drawers that can open/close); (ii) H2.0: a high-performance physics-enabled 3D simulator with speeds exceeding 25,000 simulation steps per second (850x real-time) on an 8-GPU node, representing 100x speed-ups over prior work; and, (iii) Home Assistant Benchmark (HAB): a suite of common tasks for assistive robots (tidy the house, prepare groceries, set the table) that test a range of mobile manipulation capabilities. These large-scale engineering contributions allow us to systematically compare deep reinforcement learning (RL) at scale and classical sense-plan-act (SPA) pipelines in long-horizon structured tasks, with an emphasis on generalization to new objects, receptacles, and layouts. We find that (1) flat RL policies struggle on HAB compared to hierarchical ones; (2) a hierarchy with independent skills suffers from 'hand-off problems', and (3) SPA pipelines are more brittle than RL policies.

【13】 Single RGB-D Camera Teleoperation for General Robotic Manipulation 标题:用于通用机器人操作的单RGB-D摄像机遥操作

作者:Quan Vuong,Yuzhe Qin,Runlin Guo,Xiaolong Wang,Hao Su,Henrik Christensen 机构:University of California San Diego 链接:https://arxiv.org/abs/2106.14396 摘要:提出了一种利用单台RGB-D摄像机作为人体运动捕捉设备的遥操作系统。我们的系统可以执行一般的操作任务,如折叠布,锤击和3mm间隙钉孔。我们建议使用非笛卡尔斜坐标系、动态运动缩放和操作员框架的重新定位来增加遥操作系统的灵活性。我们假设,降低进入遥操作的门槛将允许更广泛地部署有监督的自治系统,这将反过来生成现实的数据集,从而释放机器学习用于机器人操作的潜力。我们的系统演示可以在线获得https://sites.google.com/view/manipulation-teleop-with-rgbd 摘要:We propose a teleoperation system that uses a single RGB-D camera as the human motion capture device. Our system can perform general manipulation tasks such as cloth folding, hammering and 3mm clearance peg in hole. We propose the use of non-Cartesian oblique coordinate frame, dynamic motion scaling and reposition of operator frames to increase the flexibility of our teleoperation system. We hypothesize that lowering the barrier of entry to teleoperation will allow for wider deployment of supervised autonomy system, which will in turn generates realistic datasets that unlock the potential of machine learning for robotic manipulation. Demo of our systems are available online https://sites.google.com/view/manipulation-teleop-with-rgbd

【14】 Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems 标题:Kimera-Multi:面向多机器人系统的鲁棒、分布式、密集度量语义SLAM

作者:Yulun Tian,Yun Chang,Fernando Herrera Arias,Carlos Nieto-Granda,Jonathan P. How,Luca Carlone 备注:18 pages, 15 figures 链接:https://arxiv.org/abs/2106.14386 摘要:本文介绍了Kimera Multi,这是第一个多机器人系统,它(i)具有鲁棒性,能够识别和拒绝由感知混叠引起的不正确的机器人间和机器人内循环闭合,(ii)完全分布式,仅依赖本地(对等)通信来实现分布式定位和映射,建立了一个全局一致的三维网格环境度量语义模型,并对网格的面进行语义标注。Kimera Multi由一组配备视觉惯性传感器的机器人实现。每个机器人使用Kimera建立一个局部轨迹估计和一个局部网格。当通信可用时,机器人基于一种新的分布式渐进非凸算法,提出了一种分布式位置识别和鲁棒位姿图优化协议。所提出的协议允许机器人通过利用机器人间的循环闭包来改进其局部轨迹估计,同时对异常值具有鲁棒性。最后,每个机器人利用其改进的轨迹估计,利用网格变形技术对局部网格进行修正。我们演示了Kimera Multi的照片真实感模拟、SLAM基准数据集以及使用地面机器人收集的具有挑战性的室外数据集。真实和模拟实验都涉及长轨迹(例如,每个机器人最多800米)。实验表明,Kimera Multi(i)在鲁棒性和准确性方面优于现有技术,(ii)在完全分布式的情况下实现了与集中式SLAM系统相当的估计误差,(iii)在通信带宽方面节省,(iv)生成精确的度量语义3D网格,和(v)是模块化的,并且还可以用于标准3D重建(即,没有语义标签)或用于轨迹估计(即,没有重建3D网格)。 摘要:This paper presents Kimera-Multi, the first multi-robot system that (i) is robust and capable of identifying and rejecting incorrect inter and intra-robot loop closures resulting from perceptual aliasing, (ii) is fully distributed and only relies on local (peer-to-peer) communication to achieve distributed localization and mapping, and (iii) builds a globally consistent metric-semantic 3D mesh model of the environment in real-time, where faces of the mesh are annotated with semantic labels. Kimera-Multi is implemented by a team of robots equipped with visual-inertial sensors. Each robot builds a local trajectory estimate and a local mesh using Kimera. When communication is available, robots initiate a distributed place recognition and robust pose graph optimization protocol based on a novel distributed graduated non-convexity algorithm. The proposed protocol allows the robots to improve their local trajectory estimates by leveraging inter-robot loop closures while being robust to outliers. Finally, each robot uses its improved trajectory estimate to correct the local mesh using mesh deformation techniques. We demonstrate Kimera-Multi in photo-realistic simulations, SLAM benchmarking datasets, and challenging outdoor datasets collected using ground robots. Both real and simulated experiments involve long trajectories (e.g., up to 800 meters per robot). The experiments show that Kimera-Multi (i) outperforms the state of the art in terms of robustness and accuracy, (ii) achieves estimation errors comparable to a centralized SLAM system while being fully distributed, (iii) is parsimonious in terms of communication bandwidth, (iv) produces accurate metric-semantic 3D meshes, and (v) is modular and can be also used for standard 3D reconstruction (i.e., without semantic labels) or for trajectory estimation (i.e., without reconstructing a 3D mesh).

【15】 Unsupervised Skill Discovery with Bottleneck Option Learning 标题:具有瓶颈选项学习的无监督技能发现

作者:Jaekyeom Kim,Seohong Park,Gunhee Kim 机构:Equal contribution 1Department of Computer Science andEngineering, Seoul National University 备注:Accepted to ICML 2021. Code at this https URL 链接:https://arxiv.org/abs/2106.14305 摘要:像人类一样,在没有任何外部奖励或监督的情况下,从环境中获得固有技能的能力是一个重要的问题。提出了一种新的无监督技能发现方法&信息瓶颈选择学习(IBOL)。除了环境的线性化可以促进更多不同的和遥远的状态转换之外,IBOL还可以发现不同的技能。它提供了抽象的技能学习与信息瓶颈框架的选择与改进的稳定性和鼓励解开。我们的经验证明,IBOL在MuJoCo环境中,包括Ant、HalfCheetah、Hopper和D'Kitty,在信息论评估和下游任务上优于多种最先进的无监督技能发现方法。 摘要:Having the ability to acquire inherent skills from environments without any external rewards or supervision like humans is an important problem. We propose a novel unsupervised skill discovery method named Information Bottleneck Option Learning (IBOL). On top of the linearization of environments that promotes more various and distant state transitions, IBOL enables the discovery of diverse skills. It provides the abstraction of the skills learned with the information bottleneck framework for the options with improved stability and encouraged disentanglement. We empirically demonstrate that IBOL outperforms multiple state-of-the-art unsupervised skill discovery methods on the information-theoretic evaluations and downstream tasks in MuJoCo environments, including Ant, HalfCheetah, Hopper and D'Kitty.

【16】 DONet: Learning Category-Level 6D Object Pose and Size Estimation from Depth Observation 标题:DONET:从深度观测中学习类别级6D对象姿态和大小估计

作者:Haitao Lin,Zichang Liu,Chilam Cheang,Lingwei Zhang,Yanwei Fu,Xiangyang Xue 机构:Fudan University 链接:https://arxiv.org/abs/2106.14193 摘要:提出了一种基于单个深度图像的类别级6D物体姿态和尺寸估计(COPSE)方法。以往的研究都是利用RGB(D)图像中的视觉线索,而我们的方法仅仅是基于物体在深度通道中丰富的几何信息进行推理。从本质上讲,我们的框架通过学习统一的三维方向一致表示(3D-OCR)模块,并进一步利用几何约束反射对称(GeoReS)模块的特性来探索这些几何信息。最后利用镜像对维估计(MPDE)模块对目标大小和中心点的大小信息进行估计。在类别级NOCS基准上的大量实验表明,我们的框架与需要标记真实世界图像的最新方法相竞争。我们还将我们的方法应用到一个物理Baxter机器人上,对未知但类别已知的实例执行操作任务,结果进一步验证了我们提出的模型的有效性。我们的视频可以在补充材料中找到。 摘要:We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image, without external pose-annotated real-world training data. While previous works exploit visual cues in RGB(D) images, our method makes inferences based on the rich geometric information of the object in the depth channel alone. Essentially, our framework explores such geometric information by learning the unified 3D Orientation-Consistent Representations (3D-OCR) module, and further enforced by the property of Geometry-constrained Reflection Symmetry (GeoReS) module. The magnitude information of object size and the center point is finally estimated by Mirror-Paired Dimensional Estimation (MPDE) module. Extensive experiments on the category-level NOCS benchmark demonstrate that our framework competes with state-of-the-art approaches that require labeled real-world images. We also deploy our approach to a physical Baxter robot to perform manipulation tasks on unseen but category-known instances, and the results further validate the efficacy of our proposed model. Our videos are available in the supplementary material.

【17】 DenseTNT: Waymo Open Dataset Motion Prediction Challenge 1st Place Solution 标题:DenseTNT:Waymo Open DataSet运动预测挑战赛第一名解决方案

作者:Junru Gu,Qiao Sun,Hang Zhao 机构:IIIS, Tsinghua University 链接:https://arxiv.org/abs/2106.14160 摘要:在自主驾驶中,基于目标的多轨迹预测方法最近被证明是有效的,它们首先对候选目标进行评分,然后选择最终的目标集,最后根据选定的目标完成轨迹。然而,这些方法通常涉及基于稀疏预定义锚的目标预测。在这项工作中,我们提出了一个无锚模型,名为DenseTNT,它执行密集目标概率估计的轨迹预测。我们的模型达到了最先进的性能,在Waymo开放数据集运动预测挑战赛中排名第一。 摘要:In autonomous driving, goal-based multi-trajectory prediction methods are proved to be effective recently, where they first score goal candidates, then select a final set of goals, and finally complete trajectories based on the selected goals. However, these methods usually involve goal predictions based on sparse predefined anchors. In this work, we propose an anchor-free model, named DenseTNT, which performs dense goal probability estimation for trajectory prediction. Our model achieves state-of-the-art performance, and ranks 1st on the Waymo Open Dataset Motion Prediction Challenge.

【18】 Improving Grasp Planning Efficiency with Human Grasp Tendencies* 标题:用人的抓取倾向提高抓取规划效率*

作者:Chang Cheng,Yadong Yan,Mingjun Guan,Jianan Zhang,Yu Wang 备注:8 pages, 9 figures, submitted to IEEE Robotics and Automation Letters (RA-L) 链接:https://arxiv.org/abs/2106.14158 摘要:规划抓取后,如果对象方向发生变化,则可以修改初始抓取,但并不总是必须修改以适应方向的变化。例如,圆柱体围绕其中心线旋转任意量都不会改变其相对于抓具的几何形状。可以近似为旋转实体或包含其他几何对称性的物体在日常生活中非常普遍,这些信息可以用来提高现有抓取规划模型的效率。本文通过实验研究了在不同物体方向下人类计划抓取的变化。实验结果表明,在抓取和放置普通物体时,只要抓取类型的一小部分,就可以获得稳定的抓取,手腕相关参数服从正态分布。此外,我们还证明了这种知识可以加快抓取规划算法的收敛速度。 摘要:After a grasp has been planned, if the object orientation changes, the initial grasp may but not always have to be modified to accommodate the orientation change. For example, rotation of a cylinder by any amount around its centerline does not change its geometric shape relative to the grasper. Objects that can be approximated to solids of revolution or contain other geometric symmetries are prevalent in everyday life, and this information can be employed to improve the efficiency of existing grasp planning models. This paper experimentally investigates change in human-planned grasps under varied object orientations. With 13,440 recorded human grasps, our results indicate that during pick-and-place task of ordinary objects, stable grasps can be achieved with a small subset of grasp types, and the wrist-related parameters follow normal distribution. Furthermore, we show this knowledge can allow faster convergence of grasp planning algorithm.

【19】 Continuous Control with Deep Reinforcement Learning for Autonomous Vessels 标题:基于深度强化学习的自治船舶连续控制

作者:Nader Zare,Bruno Brandoli,Mahtab Sarvmaili,Amilcar Soares,Stan Matwin 机构:Institute for Big Data Analytics, Dalhousie University, Halifax, Department of Computer Science, Memorial University of, Newfoundland, St. John’s, Institute for Computer Science, Polish Academy of Sciences, Warsaw 链接:https://arxiv.org/abs/2106.14130 摘要:海上自主运输在世界经济全球化进程中发挥了重要作用。深度强化学习(DRL)已被应用于自动路径规划,以模拟船舶在公海中的避碰情况。直接从输入中学习复杂映射的端到端方法在不同环境下达到目标的泛化能力较差。在这项工作中,我们提出了一种新的策略,称为状态-动作旋转,通过旋转获得的经验(状态-动作-状态)并将其保存在重放缓冲区中,来提高代理在不可见情况下的性能。我们设计了基于深度确定性策略梯度、局部视图生成器和规划器的模型。我们的代理使用两个深度卷积神经网络来估计策略和行动价值函数。所提出的模型经过了详尽的训练,并用蒙特利尔和哈利法克斯等城市的真实地图在海上场景中进行了测试。实验结果表明,CVN上的状态-动作旋转相对于具有规划器和局部视图(VNPLV)的船舶导航器(RATD)的到达目的地率(RATD)提高了11.96%,在不可见映射中的性能也提高了30.82%。我们提出的方法在一个新的环境中进行测试时表现出鲁棒性方面的优势,支持通过使用状态-动作旋转来实现泛化的思想。 摘要:Maritime autonomous transportation has played a crucial role in the globalization of the world economy. Deep Reinforcement Learning (DRL) has been applied to automatic path planning to simulate vessel collision avoidance situations in open seas. End-to-end approaches that learn complex mappings directly from the input have poor generalization to reach the targets in different environments. In this work, we present a new strategy called state-action rotation to improve agent's performance in unseen situations by rotating the obtained experience (state-action-state) and preserving them in the replay buffer. We designed our model based on Deep Deterministic Policy Gradient, local view maker, and planner. Our agent uses two deep Convolutional Neural Networks to estimate the policy and action-value functions. The proposed model was exhaustively trained and tested in maritime scenarios with real maps from cities such as Montreal and Halifax. Experimental results show that the state-action rotation on top of the CVN consistently improves the rate of arrival to a destination (RATD) by up 11.96% with respect to the Vessel Navigator with Planner and Local View (VNPLV), as well as it achieves superior performance in unseen mappings by up 30.82%. Our proposed approach exhibits advantages in terms of robustness when tested in a new environment, supporting the idea that generalization can be achieved by using state-action rotation.

【20】 Graph Convolutional Memory for Deep Reinforcement Learning 标题:深度强化学习的图卷积记忆

作者:Steven D. Morad,Stephan Liwicki,Amanda Prorok 机构:Department of Computer Science and Technology, University of Cambridge UK, Toshiba Europe Ltd. 链接:https://arxiv.org/abs/2106.14117 摘要:解决部分可观测马尔可夫决策过程(POMDPs)是将深度强化学习(DRL)应用于现实世界机器人问题的关键,在现实世界中,agent对世界的看法是不完全的。提出了一种利用深度强化学习求解POMDPs的图卷积存储器(GCM)。与递归神经网络(RNN)或变换器不同,GCM通过知识图将特定领域的先验知识嵌入到记忆召回过程中。通过在图中封装先验知识,GCM可以适应特定的任务,但仍然适用于任何DRL任务。利用图卷积,GCM提取层次图特征,类似于卷积神经网络(CNN)中的图像特征。我们发现GCM在控制、长期非顺序回忆和3D导航任务上优于长-短期记忆(LSTM)、强化学习的门控Transformer(GTrXL)和可微神经计算机(DNCs),同时使用的参数明显较少。 摘要:Solving partially-observable Markov decision processes (POMDPs) is critical when applying deep reinforcement learning (DRL) to real-world robotics problems, where agents have an incomplete view of the world. We present graph convolutional memory (GCM) for solving POMDPs using deep reinforcement learning. Unlike recurrent neural networks (RNNs) or transformers, GCM embeds domain-specific priors into the memory recall process via a knowledge graph. By encapsulating priors in the graph, GCM adapts to specific tasks but remains applicable to any DRL task. Using graph convolutions, GCM extracts hierarchical graph features, analogous to image features in a convolutional neural network (CNN). We show GCM outperforms long short-term memory (LSTM), gated transformers for reinforcement learning (GTrXL), and differentiable neural computers (DNCs) on control, long-term non-sequential recall, and 3D navigation tasks while using significantly fewer parameters.

【21】 Vision-driven Compliant Manipulation for Reliable, High-Precision Assembly Tasks 标题:视觉驱动的顺应性操作,实现可靠、高精度的装配任务

作者:Andrew S. Morgan,Bowen Wen,Junchi Liang,Abdeslam Boularias,Aaron M. Dollar,Kostas Bekris 机构:Deparmtent of Mechanical Engineering and Materials Science, Yale University, USA, Department of Computer Science, Rutgers University, USA 链接:https://arxiv.org/abs/2106.14070 摘要:高度受限的操作任务对于自主机器人来说仍然是一个挑战,因为它们需要高水平的精度,通常小于1毫米,这与传统的感知系统所能达到的目标是不相容的。本文证明了将最先进的目标跟踪技术与被动自适应机械硬件相结合,可以在工业相关公差(0.25mm)很小的情况下完成精密操作任务。所提出的控制方法通过视觉跟踪相关工作空间中物体的相对6D姿态来实现闭环控制。它通过手内操作调整柔顺机械手和手的控制基准,完成物体插入任务。与以前的插入工作相反,我们的方法不需要昂贵的力传感器、精确的机械手,也不需要耗时的在线学习,因为在线学习需要大量的数据。相反,这项工作利用了机械柔顺性,并利用了离线学习的手的对象不可知操作模型、现成的运动规划和仅用合成数据训练的基于RGBD的对象跟踪器。这些特性使得所提出的系统可以很容易地推广和转移到新的任务和环境中。本文详细描述了该系统的组成部分,并通过大量实验证明了其有效性,包括各种几何图形的紧公差孔内插钉任务以及开放世界约束放置任务。 摘要:Highly constrained manipulation tasks continue to be challenging for autonomous robots as they require high levels of precision, typically less than 1mm, which is often incompatible with what can be achieved by traditional perception systems. This paper demonstrates that the combination of state-of-the-art object tracking with passively adaptive mechanical hardware can be leveraged to complete precision manipulation tasks with tight, industrially-relevant tolerances (0.25mm). The proposed control method closes the loop through vision by tracking the relative 6D pose of objects in the relevant workspace. It adjusts the control reference of both the compliant manipulator and the hand to complete object insertion tasks via within-hand manipulation. Contrary to previous efforts for insertion, our method does not require expensive force sensors, precision manipulators, or time-consuming, online learning, which is data hungry. Instead, this effort leverages mechanical compliance and utilizes an object agnostic manipulation model of the hand learned offline, off-the-shelf motion planning, and an RGBD-based object tracker trained solely with synthetic data. These features allow the proposed system to easily generalize and transfer to new tasks and environments. This paper describes in detail the system components and showcases its efficacy with extensive experiments involving tight tolerance peg-in-hole insertion tasks of various geometries as well as open-world constrained placement tasks.

【22】 Correspondenceless scan-to-map-scan matching of homoriented 2D scans for mobile robot localisation 标题:用于移动机器人定位的同向2D扫描的无对应扫描到地图扫描匹配

作者:Alexandros Filotheou 机构:Agnostou Stratiotou , Thessaloniki, Greece 备注:19 pages, 19 figures 链接:https://arxiv.org/abs/2106.14003 摘要:本研究的目的是改善一个移动机器人的位置估计,该移动机器人能够在平面上运动,并安装有传统的二维激光雷达传感器,给定其位置在其周围的二维地图上的初始猜测。本文记录了解决两个同向2D扫描之间的匹配问题背后的理论推理,一个来自机器人的物理传感器,另一个来自模拟其在地图中的操作,其方式不需要在构成射线之间建立对应关系。两个结果得到了验证,并通过实验进行了验证。首先,当物理传感器报告无故障测量时,传感器的真实位置可以以任意精度恢复,并且机器人运行的环境与机器人对环境的感知之间没有差异。第二种方法是,当任意一个位置受到干扰影响时,其位置估计值被限制在真实位置的一个邻域内,该邻域的半径与影响干扰成正比。 摘要:The objective of this study is improving the location estimate of a mobile robot capable of motion on a plane and mounted with a conventional 2D LIDAR sensor, given an initial guess for its location on a 2D map of its surroundings. Documented herein is the theoretical reasoning behind solving a matching problem between two homoriented 2D scans, one derived from the robot's physical sensor and one derived by simulating its operation within the map, in a manner that does not require the establishing of correspondences between their constituting rays. Two results are proved and subsequently shown through experiments. The first is that the true position of the sensor can be recovered with arbitrary precision when the physical sensor reports faultless measurements and there is no discrepancy between the environment the robot operates in and its perception of it by the robot. The second is that when either is affected by disturbance, the location estimate is bound in a neighbourhood of the true location whose radius is proportional to the affecting disturbance.

【23】 Rise of the Autonomous Machines 标题:自主机器的崛起

作者:Shaoshan Liu,Jean-Luc Gaudiot 机构:PerceptIn Inc, University of California, Irvine, U.S.A. 备注:to appear in IEEE Computer Magazine 链接:https://arxiv.org/abs/2106.13987 摘要:经过几十年的不断进步和发展,信息技术已经发展到可以说我们正在进入自主机器时代,但是在实现这一点的道路上还存在许多障碍。在本文中,我们对自主机器的技术和非技术挑战进行了初步的认识和分类;对于我们已经确定的十个领域中的每一个,我们回顾了当前的状况、障碍和潜在的研究方向。希望这将有助于社区为未来定义清晰、有效和更正式的开发目标。 摘要:After decades of uninterrupted progress and growth, information technology has so evolved that it can be said we are entering the age of autonomous machines, but there exist many roadblocks in the way of making this a reality. In this article, we make a preliminary attempt at recognizing and categorizing the technical and non-technical challenges of autonomous machines; for each of the ten areas we have identified, we review current status, roadblocks, and potential research directions. It is hoped that this will help the community define clear, effective, and more formal development goalposts for the future.

【24】 Exploring Temporal Context and Human Movement Dynamics for Online Action Detection in Videos 标题:视频在线动作检测的时间上下文和人体运动动力学研究

作者:Vasiliki I. Vasileiou,Nikolaos Kardaris,Petros Maragos 机构:School of E.C.E., National Technical University of Athens, Athens , Greece 备注:EUSIPCO-2021 链接:https://arxiv.org/abs/2106.13967 摘要:如今,人与机器人之间的交互不断扩大,要求越来越多的人体运动识别应用实时运行。然而,大多数时间动作检测和识别的工作都是以离线的方式进行的,即将时间分割的视频作为一个整体进行分类。本文基于最近提出的时间循环网络框架,探讨了如何将时间上下文和人体运动动力学有效地应用于在线动作检测。我们的方法使用各种最先进的架构,并适当地结合提取的特征,以提高行动检测。我们在一个具有挑战性但广泛使用的时间动作定位数据集THUMOS'14上对我们的方法进行了评估。我们的实验表明,我们的方法比基线方法有了显著的改进,在THUMOS'14上取得了最新的结果。 摘要:Nowadays, the interaction between humans and robots is constantly expanding, requiring more and more human motion recognition applications to operate in real time. However, most works on temporal action detection and recognition perform these tasks in offline manner, i.e. temporally segmented videos are classified as a whole. In this paper, based on the recently proposed framework of Temporal Recurrent Networks, we explore how temporal context and human movement dynamics can be effectively employed for online action detection. Our approach uses various state-of-the-art architectures and appropriately combines the extracted features in order to improve action detection. We evaluate our method on a challenging but widely used dataset for temporal action localization, THUMOS'14. Our experiments show significant improvement over the baseline method, achieving state-of-the art results on THUMOS'14.

【25】 OffRoadTranSeg: Semi-Supervised Segmentation using Transformers on OffRoad environments 标题:OffRoadTranSeg:越野环境下基于Transformer的半监督分割

作者:Anukriti Singh,Kartikeya Singh,P. B. Sujit 链接:https://arxiv.org/abs/2106.13963 摘要:我们提出了OffRoadTranSeg,第一个端到端的框架半监督分割在非结构化的户外环境使用Transformer和自动数据选择标签。越野分割是一种广泛应用于自主驾驶的场景理解方法。目前比较流行的非道路分割方法是使用完全连通的卷积层和大的标记数据,但是由于类的不平衡性,会出现一些不匹配的情况,有些类可能无法被检测到。我们的方法是以半监督的方式进行越野分割。目的是提供一个模型,其中自监督视觉Transformer用于微调越野数据集与自监督数据收集标签使用深度估计。该方法在RELLIS-3D和RUGD越野数据集上得到了验证。实验表明,OffRoadTranSeg模型的性能优于其他最先进的模型,解决了RELLIS-3D类不平衡问题。 摘要:We present OffRoadTranSeg, the first end-to-end framework for semi-supervised segmentation in unstructured outdoor environment using transformers and automatic data selection for labelling. The offroad segmentation is a scene understanding approach that is widely used in autonomous driving. The popular offroad segmentation method is to use fully connected convolution layers and large labelled data, however, due to class imbalance, there will be several mismatches and also some classes may not be detected. Our approach is to do the task of offroad segmentation in a semi-supervised manner. The aim is to provide a model where self supervised vision transformer is used to fine-tune offroad datasets with self-supervised data collection for labelling using depth estimation. The proposed method is validated on RELLIS-3D and RUGD offroad datasets. The experiments show that OffRoadTranSeg outperformed other state of the art models, and also solves the RELLIS-3D class imbalance problem.

【26】 Discovering Generalizable Skills via Automated Generation of Diverse Tasks 标题:通过自动生成不同的任务来发现可概括的技能

作者:Kuan Fang,Yuke Zhu,Silvio Savarese,Li Fei-Fei 机构: Stanford University, UT Austin, Nvidia 备注:RSS 2021 链接:https://arxiv.org/abs/2106.13935 摘要:智能体的学习效率和泛化能力可以通过使用一组有用的技能得到很大的提高。然而,机器人技能的设计在现实世界的应用中往往是棘手的,因为它需要大量的努力和专业知识。在这项工作中,我们介绍了在多样化环境中的技能学习(SLIDE),一种通过自动生成一组不同的任务来发现可概括技能的方法。与以往在无监督下发现技能的工作不同,我们的方法鼓励技能在相同的环境中产生不同的结果,我们将每一项技能与一个可训练的任务生成器生成的唯一任务配对。为了鼓励归纳技能的出现,我们的方法训练每一项技能,使之专门化成对的任务,并最大限度地提高生成任务的多样性。根据机器人在生成的任务中的行为,联合训练一个任务鉴别器来估计多样性目标的证据下界。所学的技能,然后可以组成一个分层强化学习算法来解决看不见的目标任务。实验结果表明,该方法能有效地学习两个桌面操作领域的机器人技能。结果表明,与现有的强化学习和技能学习方法相比,所学习的技能能够有效地提高机器人在各种未知目标任务中的性能。 摘要:The learning efficiency and generalization ability of an intelligent agent can be greatly improved by utilizing a useful set of skills. However, the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work, we introduce Skill Learning In Diversified Environments (SLIDE), a method to discover generalizable skills via automated generation of a diverse set of tasks. As opposed to prior work on unsupervised discovery of skills which incentivizes the skills to produce different outcomes in the same environment, our method pairs each skill with a unique task produced by a trainable task generator. To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks. A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective. The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks. We demonstrate that the proposed method can effectively learn a variety of robot skills in two tabletop manipulation domains. Our results suggest that the learned skills can effectively improve the robot's performance in various unseen target tasks compared to existing reinforcement learning and skill learning methods.

【27】 Closed-form Continuous-Depth Models 标题:闭合形式的连续深度模型

作者:Ramin Hasani,Mathias Lechner,Alexander Amini,Lucas Liebenwein,Max Tschaikowski,Gerald Teschl,Daniela Rus 机构: 3Aalborg University, 4University of Vienna 备注:17 pages 链接:https://arxiv.org/abs/2106.13898 摘要:连续深度神经模型,其中模型隐藏状态的导数由神经网络定义,具有强大的顺序数据处理能力。然而,这些模型依赖于先进的数值微分方程(DE)解算器,在计算成本和模型复杂性方面都产生了巨大的开销。在本文中,我们提出了一个新的模型族,称为闭式连续深度(CfC)网络,该网络描述简单,速度至少快一个数量级,同时与基于ODE的网络模型相比具有同样强大的建模能力。这些模型由此从时间连续模型的表达子集的解析闭式解导出,从而减轻了对所有复杂解算器的需求。在我们的实验评估中,我们证明了CfC网络在一系列不同的时间序列预测任务(包括那些具有长期依赖性和不规则采样数据的任务)上优于先进的递归模型。我们相信,我们的发现为在资源受限的环境中训练和部署丰富的、连续的神经模型提供了新的机会,这些环境对性能和效率都有要求。 摘要:Continuous-depth neural models, where the derivative of the model's hidden state is defined by a neural network, have enabled strong sequential data processing capabilities. However, these models rely on advanced numerical differential equation (DE) solvers resulting in a significant overhead both in terms of computational cost and model complexity. In this paper, we present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster while exhibiting equally strong modeling abilities compared to their ODE-based counterparts. The models are hereby derived from the analytical closed-form solution of an expressive subset of time-continuous models, thus alleviating the need for complex DE solvers all together. In our experimental evaluations, we demonstrate that CfC networks outperform advanced, recurrent models over a diverse set of time-series prediction tasks, including those with long-term dependencies and irregularly sampled data. We believe our findings open new opportunities to train and deploy rich, continuous neural models in resource-constrained settings, which demand both performance and efficiency.

【28】 Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits 标题:向探索者学习:土匪的最优报酬估计

作者:Wenshuo Guo,Kumar Krishna Agrawal,Aditya Grover,Vidya Muthukumar,Ashwin Pananjady 机构:⋄Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, †Facebook AI Research, ‡School of Electrical & Computer Engineering and School of Industrial & Systems Engineering, Georgia Institute of Technology 链接:https://arxiv.org/abs/2106.14866 摘要:通过观察一个低后悔演示者的学习过程,我们引入了估计多武装土匪实例回报的“逆土匪”问题。现有的逆强化学习相关问题的解决方法假设执行一个最优策略,因此存在可辨识性问题。相比之下,我们的范例利用了演示者在通往最优的道路上的行为,特别是在探索阶段,以获得一致的报酬估计。我们开发了简单而有效的奖励估计程序,用于一类基于高置信度的算法的演示,表明奖励估计随着算法的遗憾增加而变得越来越容易。我们将这些上界与适用于任何演示算法的信息论下界相匹配,从而描述探索和报酬估计之间的最佳权衡。对自然科学的合成数据和模拟实验设计数据的大量经验评估证实了我们的理论结果。 摘要:We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy, and thereby suffer from an identifiability issue. In contrast, our paradigm leverages the demonstrator's behavior en route to optimality, and in particular, the exploration phase, to obtain consistent reward estimates. We develop simple and efficient reward estimation procedures for demonstrations within a class of upper-confidence-based algorithms, showing that reward estimation gets progressively easier as the regret of the algorithm increases. We match these upper bounds with information-theoretic lower bounds that apply to any demonstrator algorithm, thereby characterizing the optimal tradeoff between exploration and reward estimation. Extensive empirical evaluations on both synthetic data and simulated experimental design data from the natural sciences corroborate our theoretical results.

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2021-06-29,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
图像处理
图像处理基于腾讯云深度学习等人工智能技术,提供综合性的图像优化处理服务,包括图像质量评估、图像清晰度增强、图像智能裁剪等。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档