机器人相关学术速递[7.22]

公众号-arXiv每日学术速递

发布于 2021-07-27 11:10:30

3150

发布于 2021-07-27 11:10:30

文章被收录于专栏：arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.RO机器人相关，共计9篇

【1】 Demonstration-Guided Reinforcement Learning with Learned Skills 标题：带学习技能的示范引导式强化学习

作者：Karl Pertsch,Youngwoon Lee,Yue Wu,Joseph J. Lim 机构：University of Southern California 链接：https://arxiv.org/abs/2107.10253 摘要：示范引导强化学习（RL）是一种利用奖励反馈和一组目标任务示范来学习复杂行为的有效方法。先前的演示引导RL方法将每个新任务视为一个独立的学习问题，并尝试一步一步地跟随所提供的演示，类似于人类试图通过跟随演示者的精确肌肉运动来模仿完全看不见的行为。当然，这样的学习会很慢，但新的行为往往不是完全看不见的：它们与我们以前学过的行为共享子任务。在这项工作中，我们的目标是利用这种共享的子任务结构来提高演示引导RL的效率。我们首先从跨多个任务收集的大量离线经验数据集中学习一组可重用的技能。然后，我们提出了基于技能的示范学习（SkiLD），这是一种示范引导RL算法，它通过遵循示范技能而不是原始动作来有效地利用所提供的示范，从而比以前的示范引导RL方法有显著的性能改进。在长视距迷宫导航和复杂机器人操作任务中验证了该方法的有效性。摘要：Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors by leveraging both reward feedback and a set of target task demonstrations. Prior approaches for demonstration-guided RL treat every new task as an independent learning problem and attempt to follow the provided demonstrations step-by-step, akin to a human trying to imitate a completely unseen behavior by following the demonstrator's exact muscle movements. Naturally, such learning will be slow, but often new behaviors are not completely unseen: they share subtasks with behaviors we have previously learned. In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL. We first learn a set of reusable skills from large offline datasets of prior experience collected across many tasks. We then propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations by following the demonstrated skills instead of the primitive actions, resulting in substantial performance improvements over prior demonstration-guided RL approaches. We validate the effectiveness of our approach on long-horizon maze navigation and complex robot manipulation tasks.

【2】 Assured Mission Adaptation of UAVs 标题：无人机的有保障的任务适应

作者：Sebastián Zudaire,Leandro Nahabedian,Sebastián Uchitel 机构：ar 3Universidad de Buenos Aires 备注：Author-submitted article to journal: Robotics and Autonomous Systems, September 2020 链接：https://arxiv.org/abs/2107.10173 摘要：设计能够改变其行为的系统，以考虑在设计时无法预见的情况，仍然是一个开放的挑战。在这篇论文中，我们提出了一种移动机器人任务适应的方法，它不受一组预定义的任务演化的限制。我们建议将变形自适应软件架构应用于无人机，并展示如何使用控制器综合来保证从旧任务目标到新任务目标的正确转换，同时在必要时进行架构重构以包括新的软件执行器和传感器。该体系结构将机器人技术中常见的体系结构概念（如时间规划、离散、混合和连续控制层）与自适应系统的体系结构概念（如运行时模型和运行时综合）结合在一起。我们验证了体系结构飞行的几个任务从机器人文献不同的真实和模拟无人机。摘要：The design of systems that can change their behaviour to account for scenarios that were not foreseen at design time remains an open challenge. In this paper we propose an approach for adaptation of mobile robot missions that is not constrained to a predefined set of mission evolutions. We propose applying the MORPH adaptive software architecture to UAVs and show how controller synthesis can be used both to guarantee correct transitioning from the old to the new mission goals while architectural reconfiguration to include new software actuators and sensors if necessary. The architecture brings together architectural concepts that are commonplace in robotics such as temporal planning, discrete, hybrid and continuous control layers together with architectural concepts from adaptive systems such as runtime models and runtime synthesis. We validate the architecture flying several missions taken from the robotic literature for different real and simulated UAVs.

【3】 Enumeration of Polyominoes & Polycubes Composed of Magnetic Cubes 标题：由磁立方组成的多项式和多项式的计数

作者：Yitong Lu,Anuruddha Bhattacharjee,Daniel Biediger,Min Jun Kim,Aaron T. Becker 机构： 1Authors are with the University of Houston, 2Authors are with Southern Methodist University 备注：8 pages, 9 figures, 2 tables 链接：https://arxiv.org/abs/2107.10167 摘要：本文研究了磁立方体的一系列设计，并计算了每种设计可能的配置数量，作为模块数量的函数。磁性模块立方体是在其表面排列磁铁的立方体。磁铁的位置使每个面都有向外的南极或北极。此外，我们要求立方体的净磁矩通过相对面的中心。当具有相反极性的立方体面靠近时，这些磁性布置使得耦合成为可能，并且使得能够通过控制全局磁场的方向来移动立方体。本文研究了磁性模块立方体所能构造的二维和三维形状，并描述了所有可能的磁体排列。我们选择了10个磁性排列，并为每个排列指定一个“colo”，以便于可视化和参考。我们提供了一种方法来枚举从一组给定的有色立方体中可以构造的唯一的多面体和多面体的数量。我们使用这个方法来列举所有的安排多达20个模块在二维和16个模块在三维。我们为二维装配提供了一个运动规划器，并通过仿真比较了哪些安排需要较少的运动来生成，哪些安排更常见。硬件演示探索这些模块在2D和3D中的自组装和拆卸。摘要：This paper examines a family of designs for magnetic cubes and counts how many configurations are possible for each design as a function of the number of modules. Magnetic modular cubes are cubes with magnets arranged on their faces. The magnets are positioned so that each face has either magnetic south or north pole outward. Moreover, we require that the net magnetic moment of the cube passes through the center of opposing faces. These magnetic arrangements enable coupling when cube faces with opposite polarity are brought in close proximity and enable moving the cubes by controlling the orientation of a global magnetic field. This paper investigates the 2D and 3D shapes that can be constructed by magnetic modular cubes, and describes all possible magnet arrangements that obey these rules. We select ten magnetic arrangements and assign a "colo"' to each of them for ease of visualization and reference. We provide a method to enumerate the number of unique polyominoes and polycubes that can be constructed from a given set of colored cubes. We use this method to enumerate all arrangements for up to 20 modules in 2D and 16 modules in 3D. We provide a motion planner for 2D assembly and through simulations compare which arrangements require fewer movements to generate and which arrangements are more common. Hardware demonstrations explore the self-assembly and disassembly of these modules in 2D and 3D.

【4】 MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments 标题：MarsExplorer：通过深度强化学习和程序生成环境探索未知地形

作者：Dimitrios I. Koutras,Athanasios Ch. Kapoutsis,Angelos A. Amanatiadis,Elias B. Kosmatopoulos 机构：Kosmatopoulos , Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece, Information Technologies Institute, The Centre for Research & Technology, Hellas, Thessaloniki, Greece 链接：https://arxiv.org/abs/2107.09996 摘要：本文是一个初步的努力，以弥补之间的差距强大的深层强化学习方法和探索/覆盖未知地形的问题。在这个范围内，MarsExplorer，一个与openai健身房兼容的环境，专门用于未知区域的探索/覆盖。MarsExplorer将最初的机器人问题转化为一个强化学习设置，各种现成的算法可以解决。任何学习到的策略都可以直接应用到机器人平台上，而无需对机器人的动力学建立详细的仿真模型来应用不同的学习/适应阶段。它的核心特征之一是可控的多维地形过程生成，这是生成具有较强泛化能力的策略的关键。在MarsExplorer环境中训练了四种不同的最新RL算法（A3C、PPO、Rainbow和SAC），并与人类水平的平均性能进行了比较。在后续的实验分析中，分析了多维难度设置对最佳执行算法（PPO）学习能力的影响。一个里程碑式的结果是生成一个遵循希尔BERT曲线的勘探政策，而不向环境提供这些信息，也不直接或间接地奖励希尔BERT曲线样的轨迹。实验分析的结论是比较PPO学习的政策结果与基于边界的探索背景下的扩展地形大小。源代码位于：https://github.com/dimikout3/GeneralExplorationPolicy. 摘要：This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is presented. MarsExplorer translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle. Any learned policy can be straightforwardly applied to a robotic platform without an elaborate simulation model of the robot's dynamics to apply a different learning/adaptation phase. One of its core features is the controllable multi-dimensional procedural generation of terrains, which is the key for producing policies with strong generalization capabilities. Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment, and a proper evaluation of their results compared to the average human-level performance is reported. In the follow-up experimental analysis, the effect of the multi-dimensional difficulty setting on the learning capabilities of the best-performing algorithm (PPO) is analyzed. A milestone result is the generation of an exploration policy that follows the Hilbert curve without providing this information to the environment or rewarding directly or indirectly Hilbert-curve-like trajectories. The experimental analysis is concluded by comparing PPO learned policy results with frontier-based exploration context for extended terrain sizes. The source code can be found at: https://github.com/dimikout3/GeneralExplorationPolicy.

【5】 Levels of Automation for a Mobile Robot Teleoperated by a Caregiver 标题：护理员遥操作移动机器人的自动化水平

作者：Samuel Olatunji,Andre Potenza,Andrey Kiselev,Tal Oron-Gilad,Amy Loutfi,Yael Edan 机构：Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Be’er Sheva, Israel, Center for Applied Autonomous Sensor Systems (AASS), ¨Orebro University, ¨Orebro, Sweden 备注：12 pages, 4 figures, 4 tables 链接：https://arxiv.org/abs/2107.09992 摘要：老年护理人员可以受益于远程临场感机器人，使他们能够远程执行各种任务。为了让非技术用户有效地操作这些机器人，必须检查机器人系统的自动化水平（LOA）是否以及如何影响它们的性能。这项工作的目的是为老年人护理的移动机器人临场感（MRP）系统开发合适的LOA模式，并在两个不同的任务复杂度水平上评估它们对用户性能、工作负荷、环境意识和可用性的影响。为此，在MRP平台上实现了两种LOA模式：辅助遥操作（low-LOA模式）和自主导航（high-LOA模式）。在一项有20名参与者参与的用户研究中，对该系统进行了评估，这些参与者作为照料者，引导机器人在类似家庭的环境中执行各种控制和感知任务。结果表明，在任务复杂度较低的情况下，在高LOA下，系统的性能有所提高。然而，当任务复杂度增加时，LOA越低，性能越好。工作量和情况意识的结果也出现了相反的趋势。我们讨论了LOA对用户对自动化的态度的影响以及对可用性的影响。摘要：Caregivers in eldercare can benefit from telepresence robots that allow them to perform a variety of tasks remotely. In order for such robots to be operated effectively and efficiently by non-technical users, it is important to examine if and how the robotic system's level of automation (LOA) impacts their performance. The objective of this work was to develop suitable LOA modes for a mobile robotic telepresence (MRP) system for eldercare and assess their influence on users' performance, workload, awareness of the environment and usability at two different levels of task complexity. For this purpose, two LOA modes were implemented on the MRP platform: assisted teleoperation (low LOA mode) and autonomous navigation (high LOA mode). The system was evaluated in a user study with 20 participants, who, in the role of the caregiver, navigated the robot through a home-like environment to perform various control and perception tasks. Results revealed that performance improved at high LOA when the task complexity was low. However, when task complexity increased, lower LOA improved performance. This opposite trend was also observed in the results for workload and situation awareness. We discuss the results in terms of the LOAs' impact on users' attitude towards automation and implications on usability.

【6】 Multi-Agent Belief Sharing through Autonomous Hierarchical Multi-Level Clustering 标题：基于自治分层多级聚类的多Agent信念共享

作者：Mirco Theile,Jonathan Ponniah,Or Dantsker,Marco Caccamo 机构： and Marco Caccamoare with Technical University of Munich (TUM), deJonathan Ponniah is with San Jose State University 备注：Submitted to IEEE Transactions on Robotics, article extends on this https URL 链接：https://arxiv.org/abs/2107.09973 摘要：多智能体系统中的协调对于诸如无人机（uav）这样的敏捷机器人来说是一个挑战，由于无约束的运动，相对智能体的位置经常改变。在整个特派团中，单独起飞和降落电池充电剂导致不同数量的活性剂，使问题更加严重。本文提出了自治层次多级聚类（MLC），它利用分散的方法形成一个聚类层次。通过簇头的定期维护，实现了稳定的多级聚类。由此产生的层次结构被用作主干，以解决本地交互应用（如无人机跟踪问题）的通信问题。通过观察聚集、压缩和传播，代理在整个层次结构中共享局部观察，使每个代理都有一个具有空间相关分辨率和新鲜度的总体系统信念。大量的仿真结果表明，MLC在不同的运动模式下都能产生稳定的聚类层次结构，所提出的信念共享方法在野火前线监测场景中具有很强的适用性。摘要：Coordination in multi-agent systems is challenging for agile robots such as unmanned aerial vehicles (UAVs), where relative agent positions frequently change due to unconstrained movement. The problem is exacerbated through the individual take-off and landing of agents for battery recharging leading to a varying number of active agents throughout the whole mission. This work proposes autonomous hierarchical multi-level clustering (MLC), which forms a clustering hierarchy utilizing decentralized methods. Through periodic cluster maintenance executed by cluster-heads, stable multi-level clustering is achieved. The resulting hierarchy is used as a backbone to solve the communication problem for locally-interactive applications such as UAV tracking problems. Using observation aggregation, compression, and dissemination, agents share local observations throughout the hierarchy, giving every agent a total system belief with spatially dependent resolution and freshness. Extensive simulations show that MLC yields a stable cluster hierarchy under different motion patterns and that the proposed belief sharing is highly applicable in wildfire front monitoring scenarios.

【7】 Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics 标题：贝叶斯控制器融合：在机器人深度强化学习中利用控制先验

作者：Krishan Rana,Vibhavari Dasagi,Jesse Haviland,Ben Talbot,Michael Milford,Niko Sünderhauf 机构： 1Queensland University of Technology (QUT) Centre for Robotics 备注：Under review for The International Journal of Robotics Research (IJRR). Project page: this https URL 链接：https://arxiv.org/abs/2107.09822 摘要：我们提出了贝叶斯控制器融合（BCF）：一种混合控制策略，结合了传统手工控制器和无模型深层强化学习（RL）的优点。BCF在机器人领域蓬勃发展，在机器人领域，许多任务存在可靠但次优的控制先验，但RL从头开始仍然不安全，数据效率低下。通过融合来自每个系统的具有不确定性意识的分布式输出，BCF在它们之间仲裁控制，利用它们各自的优势。我们研究了两个真实世界的机器人任务的BCF，这两个任务涉及在广阔和长地平线环境中导航，以及一个涉及可操作性最大化的复杂到达任务。对于这两个领域，都存在简单的手工控制器，它们可以以规避风险的方式解决手头的任务，但由于分析建模、控制器错误校准和任务变化的限制，不一定表现出最优解。由于在训练的早期阶段，探索自然受到先验知识的指导，随着策略获得更多的经验，BCF加速了学习，同时大大提高了控制先验知识的表现。更重要的是，考虑到控制优先权的风险厌恶性，BCF确保了安全的探索和部署，其中控制优先权在策略未知的状态下自然支配着动作分布。此外，我们还展示了BCF对零炮模拟现实环境的适用性，以及它处理现实世界中分布外状态的能力。BCF是一种很有前途的方法，它将深度RL和传统机器人控制的互补优势结合起来，超越了两者各自可以独立实现的优势。代码和补充视频材料在\url公开{https://krishanrana.github.io/bcf}. 摘要：We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, there exist simple handcrafted controllers that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration \emph{and} deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF's applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real-world. BCF is a promising approach for combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at \url{https://krishanrana.github.io/bcf}.

【8】 A Factor Graph-based approach to vehicle sideslip angle estimation 标题：一种基于因子图的车辆侧滑角估计方法

作者：Antonio Leanza,Giulio Reina,Jose-Luis Blanco-Claraco 备注：15 pages, 9 figures 链接：https://arxiv.org/abs/2107.09815 摘要：侧滑角是了解和监测车辆动力学的一个重要变量，但缺乏一种廉价的直接测量方法。因此，通常使用Kalman滤波器族的滤波方法从船上的惯性和其他本体感知传感器估计。作为一种新的选择，这项工作提出将问题直接建模为一个图形模型（因子图），然后可以使用多种方法进行优化，例如离线处理的整数据集批量优化或在线操作的固定滞后平滑器。在实车数据集上的实验结果验证了该方法的有效性，估计的侧滑角与实际的侧滑角符合得很好，显示出与现有技术相似的性能，由于其灵活的数学框架，具有很大的扩展潜力。摘要：Sideslip angle is an important variable for understanding and monitoring vehicle dynamics but it lacks an inexpensive method for direct measurement. Therefore, it is typically estimated from inertial and other proprioceptive sensors onboard using filtering methods from the family of the Kalman Filter. As a novel alternative, this work proposes modelling the problem directly as a graphical model (factor graph), which can then be optimized using a variety of methods, such as whole dataset batch optimization for offline processing or fixed-lag smoother for on-line operation. Experimental results on real vehicle datasets validate the proposal with a good agreement between estimated and actual sideslip angle, showing similar performance than the state-of-the-art with a great potential for future extensions due to the flexible mathematical framework.

【9】 Track based Offline Policy Learning for Overtaking Maneuvers with Autonomous Racecars 标题：基于轨迹的自主赛车超车离线策略学习

作者：Jayanth Bhargav,Johannes Betz,Hongrui Zheng,Rahul Mangharam 机构： 1UniversityofPennsylvania, School of Engineering and Applied Sciencesjoebetz 备注：Presented at the 1st Workshop "Opportunitites and Challenges for Autonomous Racing" at the 2021 International Conference on Robotics and Automation (ICRA 2021) 链接：https://arxiv.org/abs/2107.09782 摘要：无人驾驶汽车的日益普及带动了自主赛车领域的研究与开发，而自主赛车中的超车是一项具有挑战性的任务。车辆必须在动态操纵极限下进行检测和操作，并且必须在高速和高加速度下做出决策。在动态对手车辆的超车机动中，路径规划与决策是自主赛车中最关键的环节之一。在这篇论文中，我们提出了一个基于赛道的离线策略学习方法的评估。我们定义了特定的轨道部分，并进行离线实验，以评估基于车辆速度和位置的超车机动概率。基于这些实验，我们可以定义每个轨道部分的超车概率分布。此外，我们还提出了一种切换MPCC控制器的设计方案，用于合并学习到的策略以获得更高的超车率。通过详尽的仿真，我们证明了我们提出的算法能够增加不同轨道部分的超车次数。摘要：The rising popularity of driver-less cars has led to the research and development in the field of autonomous racing, and overtaking in autonomous racing is a challenging task. Vehicles have to detect and operate at the limits of dynamic handling and decisions in the car have to be made at high speeds and high acceleration. One of the most crucial parts in autonomous racing is path planning and decision making for an overtaking maneuver with a dynamic opponent vehicle. In this paper we present the evaluation of a track based offline policy learning approach for autonomous racing. We define specific track portions and conduct offline experiments to evaluate the probability of an overtaking maneuver based on speed and position of the ego vehicle. Based on these experiments we can define overtaking probability distributions for each of the track portions. Further, we propose a switching MPCC controller setup for incorporating the learnt policies to achieve a higher rate of overtaking maneuvers. By exhaustive simulations, we show that our proposed algorithm is able to increase the number of overtakes at different track portions.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-22，如有侵权请联系 cloudcommunity@tencent.com 删除

php