人工智能学术速递[7.5]

公众号-arXiv每日学术速递

发布于 2021-07-27 10:23:32

9330

发布于 2021-07-27 10:23:32

cs.AI人工智能，共计39篇

【1】 How Incomplete is Contrastive Learning? AnInter-intra Variant Dual Representation Method forSelf-supervised Video Recognition 标题：对比学习有多不完整？一种自监督视频识别的帧内变量对偶表示方法

作者：Lin Zhang,Qi She,Zhengyang Shen,Changhu Wang 机构：Carnegie Mellon University, ByteDance AI Lab, Peking University 备注：10 pages with appendix 链接：https://arxiv.org/abs/2107.01194 摘要：对比学习在自我监督表征学习中的应用在深层模型中重新兴起。在本文中，我们发现现有的基于对比学习的自监督视频识别方法主要集中在方差间编码上，而忽略了同一视频片段中存在的方差。因此，我们建议学习双重表示为每个剪辑（\罗马数字1）编码内方差通过洗牌排名借口任务(\罗马数字2）通过一个时间连贯的对比损失来编码内部方差。实验结果表明，该方法在平衡内部和内部方差方面发挥了重要作用，并在多个主干和对比学习框架上取得了一致的性能。结合SimCLR和Kinetics-400的预训练，我们的方法在UCF101和HMDB51测试集上分别达到$\textbf{82.0\%}$和$\textbf{51.2\%}$的下游分类精度，在UCF101上达到$\textbf{46.1\%}$的视频检索精度，优于基于借口任务和基于对比学习的同类方法。摘要：Contrastive learning applied to self-supervised representation learning has seen a resurgence in deep models. In this paper, we find that existing contrastive learning based solutions for self-supervised video recognition focus on inter-variance encoding but ignore the intra-variance existing in clips within the same video. We thus propose to learn dual representations for each clip which (\romannumeral 1) encode intra-variance through a shuffle-rank pretext task; (\romannumeral 2) encode inter-variance through a temporal coherent contrastive loss. Experiment results show that our method plays an essential role in balancing inter and intra variances and brings consistent performance gains on multiple backbones and contrastive learning frameworks. Integrated with SimCLR and pretrained on Kinetics-400, our method achieves $\textbf{82.0\%}$ and $\textbf{51.2\%}$ downstream classification accuracy on UCF101 and HMDB51 test sets respectively and $\textbf{46.1\%}$ video retrieval accuracy on UCF101, outperforming both pretext-task based and contrastive learning based counterparts.

【2】 CHISEL: Compression-Aware High-Accuracy Embedded Indoor Localization with Deep Learning 标题：基于深度学习的压缩感知高精度嵌入式室内定位

作者：Liping Wang,Saideep Tiku,Sudeep Pasricha 链接：https://arxiv.org/abs/2107.01192 摘要：GPS技术彻底改变了我们在户外定位和导航的方式。然而，GPS信号在建筑物中的接收性能较差，不适合室内定位。基于WiFi指纹的室内定位是满足这一需求的最有前途的方法之一。不幸的是，该领域的大多数工作都无法解决与资源有限的嵌入式设备上的可部署性相关的挑战。在这项工作中，我们提出了一个压缩感知和高精度的深度学习框架称为凿子，它在保持嵌入式设备上的定位鲁棒性的同时，优于该领域最著名的工作。摘要：GPS technology has revolutionized the way we localize and navigate outdoors. However, the poor reception of GPS signals in buildings makes it unsuitable for indoor localization. WiFi fingerprinting-based indoor localization is one of the most promising ways to meet this demand. Unfortunately, most work in the domain fails to resolve challenges associated with deployability on resource-limited embedded devices. In this work, we propose a compression-aware and high-accuracy deep learning framework called CHISEL that outperforms the best-known works in the area while maintaining localization robustness on embedded devices.

【3】 Combinatorial Optimization with Physics-Inspired Graph Neural Networks 标题：基于物理启发图神经网络的组合优化

作者：Martin J. A. Schuetz,J. Kyle Brubaker,Helmut G. Katzgraber 机构：Amazon Quantum Solutions Lab, Seattle, Washington , USA, AWS Intelligent and Advanced Compute Technologies, Professional Services, Seattle, Washington , USA, AWS Center for Quantum Computing, Pasadena, CA , USA, ) 备注：Manuscript: 13 pages, 5 figures, 1 table. Supplemental Material: 1 page, 1 table 链接：https://arxiv.org/abs/2107.01188 摘要：我们演示了如何使用图神经网络来解决组合优化问题。我们的方法广泛适用于以二次无约束二元优化问题形式出现的规范NP-hard问题，如最大割集、最小顶点覆盖、最大独立集，以及以多项式无约束二元优化问题形式出现的Ising自旋玻璃及其高阶推广。我们对问题的哈密顿量采用一种松弛策略来生成一个可微的损失函数，用它来训练图神经网络，并在无监督训练过程完成后对整变量应用一个简单的投影。我们展示了我们的方法与数值结果的规范最大割和最大独立集问题。我们发现，图形神经网络优化器执行PAR或优于现有的求解器，具有能力超出现有技术的状态，以百万个变量的问题。摘要：We demonstrate how graph neural networks can be used to solve combinatorial optimization problems. Our approach is broadly applicable to canonical NP-hard problems in the form of quadratic unconstrained binary optimization problems, such as maximum cut, minimum vertex cover, maximum independent set, as well as Ising spin glasses and higher-order generalizations thereof in the form of polynomial unconstrained binary optimization problems. We apply a relaxation strategy to the problem Hamiltonian to generate a differentiable loss function with which we train the graph neural network and apply a simple projection to integer variables once the unsupervised training process has completed. We showcase our approach with numerical results for the canonical maximum cut and maximum independent set problems. We find that the graph neural network optimizer performs on par or outperforms existing solvers, with the ability to scale beyond the state of the art to problems with millions of variables.

【4】 Ethics Sheets for AI Tasks 标题：人工智能任务的伦理说明书

作者：Saif M. Mohammad 机构：National Research Council Canada 链接：https://arxiv.org/abs/2107.01183 摘要：一些引人注目的事件，如使用有偏见的累犯系统和对易受伤害的亚群体进行情绪识别系统的大规模测试，突出表明技术往往会给那些已经被边缘化的人带来更不利的结果。在本文中，我将提出一个案例，不仅在单个模型和数据集的层面上，而且在人工智能任务的层面上思考伦理问题。我将提出一种新的工作形式，即人工智能任务的道德规范表，致力于充实隐藏在任务通常框架中的假设和道德考虑，以及我们在数据、方法和评估方面所做的选择。最后，我将提供一个自动情感识别的例子。与数据集的数据表和人工智能系统的模型卡一起，伦理表有助于开发和部署负责任的人工智能系统。摘要：Several high-profile events, such as the use of biased recidivism systems and mass testing of emotion recognition systems on vulnerable sub-populations, have highlighted how technology will often lead to more adverse outcomes for those that are already marginalized. In this paper, I will make a case for thinking about ethical considerations not just at the level of individual models and datasets, but also at the level of AI tasks. I will present a new form of such an effort, Ethics Sheets for AI Tasks, dedicated to fleshing out the assumptions and ethical considerations hidden in how a task is commonly framed and in the choices we make regarding the data, method, and evaluation. Finally, I will provide an example ethics sheet for automatic emotion recognition. Together with Data Sheets for datasets and Model Cards for AI systems, Ethics Sheets aid in the development and deployment of responsible AI systems.

【5】 Visual Relationship Forecasting in Videos 标题：视频中的可视关系预测

作者：Li Mi,Yangjun Ou,Zhenzhong Chen 机构：School of Remote Sensing and Information Engineering, Wuhan University 链接：https://arxiv.org/abs/2107.01181 摘要：现实场景通常需要对未知未来的对象交互进行预期，这将有助于人和代理的决策过程。为了迎接这一挑战，我们提出了一个新的任务视频中的视觉关系预测（VRF）来探索视觉关系的预测推理方式。具体地说，给定一对具有H个现有帧的主客体，VRF的目标是在没有视觉证据的情况下预测下一个T帧的未来交互。为了评估VRF任务，我们引入了两个视频数据集VRF-AG和VRF-VidOR，并在视频中添加了一系列时空局部的视觉关系注释。这两个数据集分别对1923年和13447个视频片段中的13个和35个视觉关系进行了密集注释。此外，我们还提出了一种新的图卷积变换器（GCT）框架，该框架通过时空图卷积网络和变换器来捕获对象级和帧级的依赖关系。在VRF-AG和VRF-VidOR数据集上的实验结果表明，GCT在视觉关系预测方面优于现有的序列建模方法。摘要：Real-world scenarios often require the anticipation of object interactions in unknown future, which would assist the decision-making process of both humans and agents. To meet this challenge, we present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a reasoning manner. Specifically, given a subject-object pair with H existing frames, VRF aims to predict their future interactions for the next T frames without visual evidence. To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series of spatio-temporally localized visual relation annotations in a video. These two datasets densely annotate 13 and 35 visual relationships in 1923 and 13447 video clips, respectively. In addition, we present a novel Graph Convolutional Transformer (GCT) framework, which captures both object-level and frame-level dependencies by spatio-temporal Graph Convolution Network and Transformer. Experimental results on both VRF-AG and VRF-VidOR datasets demonstrate that GCT outperforms the state-of-the-art sequence modelling methods on visual relationship forecasting.

【6】 Computing Fuzzy Rough Set based Similarities with Fuzzy Inference and Its Application to Sentence Similarity Computations 标题：基于模糊粗糙集的模糊推理相似度计算及其在句子相似度计算中的应用

作者：Nidhika Yadav 机构：Ph.D., IIT Delhi 备注：5 figures, 3 tables 链接：https://arxiv.org/abs/2107.01170 摘要：利用模糊粗糙集计算分析中两个模糊集之间的相似性已经提出了一些研究方案。这些技术产生两种措施，即。低相似度和高相似度。而在大多数应用中，只有一个实体有助于进一步分析和得出结论。本文的目的是提出一种基于模糊粗糙集的下相似度与上相似度相结合的新方法。进一步，将该方法应用于句子相似度计算问题，并在SICK2014数据集上进行了评价。摘要：Several research initiatives have been proposed for computing similarity between two Fuzzy Sets in analysis through Fuzzy Rough Sets. These techniques yield two measures viz. lower similarity and upper similarity. While in most applications only one entity is useful to further analysis and for drawing conclusions. The aim of this paper is to propose novel technique to combine Fuzzy Rough Set based lower similarity and upper similarity using Fuzzy Inference Engine. Further, the proposed approach is applied to the problem computing sentence similarity and have been evaluated on SICK2014 dataset.

【7】 Collaborative Visual Navigation 标题：协作视觉导航

作者：Haiyang Wang,Wenguan Wang,Xizhou Zhu,Jifeng Dai,Liwei Wang 机构：Key Laboratory of Machine Perception, MOE, Peking University, SenseTime Research , Computer Vision Lab, ETH Zurich 链接：https://arxiv.org/abs/2107.01151 摘要：多智能体系统（multi-agent system，MAS）作为人工智能的一个基本问题，在多智能体强化学习（multi-agent reinforcement learning，MARL）技术的推动下得到了迅速的发展。然而，以前的MARL方法主要集中在网格世界或游戏环境中；在视觉丰富的环境中，MAS的研究仍然较少。为了缩小这一差距，强调感知在多智能体视觉导航中的重要作用，我们提出了一个用于多智能体视觉导航的大规模三维数据集CollaVN。在CollaVN中，需要多个代理协同导航，穿越照片真实的环境到达目标位置。探索了不同的MAVN变体，使我们的问题更具一般性。此外，还提出了一种记忆增强的通信框架。每个代理都配备有一个私有的外部内存，用于持久存储通信信息。这使得代理能够更好地利用其过去的通信信息，从而实现更高效的协作和稳健的长期规划。在我们的实验中，我们设计了一些基线和评估指标。我们还通过实验验证了我们提出的MARL方法在不同MAVN任务设置下的有效性。摘要：As a fundamental problem for Artificial Intelligence, multi-agent system (MAS) is making rapid progress, mainly driven by multi-agent reinforcement learning (MARL) techniques. However, previous MARL methods largely focused on grid-world like or game environments; MAS in visually rich environments has remained less explored. To narrow this gap and emphasize the crucial role of perception in MAS, we propose a large-scale 3D dataset, CollaVN, for multi-agent visual navigation (MAVN). In CollaVN, multiple agents are entailed to cooperatively navigate across photo-realistic environments to reach target locations. Diverse MAVN variants are explored to make our problem more general. Moreover, a memory-augmented communication framework is proposed. Each agent is equipped with a private, external memory to persistently store communication information. This allows agents to make better use of their past communication information, enabling more efficient collaboration and robust long-term planning. In our experiments, several baselines and evaluation metrics are designed. We also empirically verify the efficacy of our proposed MARL approach across different MAVN task settings.

【8】 4C: A Computation, Communication, and Control Co-Design Framework for CAVs 标题：4C：一个面向CAVS的计算、通信和控制协同设计框架

作者：Liangkai Liu,Shaoshan Liu,Weisong Shi 机构：∗Wayne State University, †PerceptIn 备注：7 pages, 4 figures, accepted by IEEE Wireless Communication Magazine 链接：https://arxiv.org/abs/2107.01142 摘要：互联和自主车辆（cav）由于其潜在的安全性和效率优势而具有广阔的发展前景，并吸引了政府机构、工业界和学术界的大量投资和兴趣。随着更多的计算和通信资源可用，车辆和边缘服务器都配备了一组基于摄像头的视觉传感器，也称为视觉物联网（V-IoT）技术，用于感知和感知。为了实现可编程通信、计算和控制，人们付出了巨大的努力。然而，它们主要在竖井模式下进行，限制了在现实世界中处理具有挑战性场景的响应能力和效率。为了提高端到端的性能，我们设想未来的cav需要通信、计算和控制的协同设计。本文提出了CAVs的端到端设计原则4C，它通过提供统一的通信、计算和控制协同设计框架来扩展V-IoT系统。通过可编程通信、细粒度异构计算和4C中高效的车辆控制，CAVs可以处理关键场景并实现节能自动驾驶。最后，我们提出了实现4C框架愿景的几个挑战。摘要：Connected and autonomous vehicles (CAVs) are promising due to their potential safety and efficiency benefits and have attracted massive investment and interest from government agencies, industry, and academia. With more computing and communication resources are available, both vehicles and edge servers are equipped with a set of camera-based vision sensors, also known as Visual IoT (V-IoT) techniques, for sensing and perception. Tremendous efforts have been made for achieving programmable communication, computation, and control. However, they are conducted mainly in the silo mode, limiting the responsiveness and efficiency of handling challenging scenarios in the real world. To improve the end-to-end performance, we envision that future CAVs require the co-design of communication, computation, and control. This paper presents our vision of the end-to-end design principle for CAVs, called 4C, which extends the V-IoT system by providing a unified communication, computation, and control co-design framework. With programmable communications, fine-grained heterogeneous computation, and efficient vehicle controls in 4C, CAVs can handle critical scenarios and achieve energy-efficient autonomous driving. Finally, we present several challenges to achieving the vision of the 4C framework.

【9】 Ensemble of Loss Functions to Improve Generalizability of Deep Metric Learning methods 标题：改进深度度量学习方法泛化能力的损失函数集成

作者：Davood Zabihzadeh 机构：Computer Department, Hakim Sabzevari University, Sabzevar, IRAN, Corresponding Author 备注：27 pages, 12 figures 链接：https://arxiv.org/abs/2107.01130 摘要：深度度量学习（Deep Metric Learning，DML）从输入数据中学习一种非线性的语义嵌入，它将相似的数据对聚集在一起，同时使不同的数据彼此远离。为此，在过去的十年中提出了许多不同的方法，并在各种应用中取得了很好的结果。DML算法的成功在很大程度上取决于它的损失函数。然而，无损失函数是完美的，它只处理最佳相似性嵌入的某些方面。此外，在测试阶段，DML在不可见类别上的可推广性是一个重要的问题，而现有的损失函数并没有考虑这个问题。为了应对这些挑战，我们提出了一种新的方法来组合构建在共享深度特征提取器之上的不同损失。提出的损失集合强制deep模型提取与所有损失一致的特征。由于选择的损失是多样的，并且每个损失都强调最佳语义嵌入的不同方面，因此我们的有效组合方法比任何单个损失都有相当大的改进，并且可以很好地推广到不可见的类别。在这里，选择损失函数没有限制，并且我们的方法可以与任何一组现有的方法一起工作。此外，他们可以在端到端的范例中优化每个损失函数及其权重，而不需要调整任何超参数。在传统的零拍学习（ZSL）环境下，我们在机器视觉领域的一些流行数据集上评估了我们的方法。结果是非常令人鼓舞的，并且表明我们的方法在所有的数据集中都比所有的基线损失有很大的优势。摘要：Deep Metric Learning (DML) learns a non-linear semantic embedding from input data that brings similar pairs together while keeps dissimilar data away from each other. To this end, many different methods are proposed in the last decade with promising results in various applications. The success of a DML algorithm greatly depends on its loss function. However, no loss function is perfect, and it deals only with some aspects of an optimal similarity embedding. Besides, the generalizability of the DML on unseen categories during the test stage is an important matter that is not considered by existing loss functions. To address these challenges, we propose novel approaches to combine different losses built on top of a shared deep feature extractor. The proposed ensemble of losses enforces the deep model to extract features that are consistent with all losses. Since the selected losses are diverse and each emphasizes different aspects of an optimal semantic embedding, our effective combining methods yield a considerable improvement over any individual loss and generalize well on unseen categories. Here, there is no limitation in choosing loss functions, and our methods can work with any set of existing ones. Besides, they can optimize each loss function as well as its weight in an end-to-end paradigm with no need to adjust any hyper-parameter. We evaluate our methods on some popular datasets from the machine vision domain in conventional Zero-Shot-Learning (ZSL) settings. The results are very encouraging and show that our methods outperform all baseline losses by a large margin in all datasets.

【10】 Decision-Making Technology for Autonomous Vehicles Learning-Based Methods, Applications and Future Outlook 标题：基于学习的自动驾驶车辆决策技术方法、应用及未来展望

作者：Qi Liu,Xueyuan Li,Shihua Yuan,Zirui Li 机构：Civil Engineering and Geosciences, Delft University of Technology, Stevinweg , CN Delft, The Netherlands., learning-based methods are utilized to achieve better, decision-making for autonomous vehicles [,]; in addition, with 备注：8 pages, 1 figure, 5 tables, ITSC2021(accepted) 链接：https://arxiv.org/abs/2107.01110 摘要：自主车辆在民用和军事领域都有着巨大的应用潜力，随着科学和经济的飞速发展，自主车辆已成为研究的热点。基于学习的自主车辆决策技术对于提高自主车辆的安全性和高效性具有重要意义，本文对其进行了简要的综述。首先，给出了决策技术的基本概况。其次，对基于学习的自主车辆决策方法的相关研究进行了综述，并与经典的决策方法进行了比较。此外，还总结了决策方法在现有自主车辆中的应用。最后，对未来自主车辆决策技术的研究方向进行了展望。摘要：Autonomous vehicles have a great potential in the application of both civil and military fields, and have become the focus of research with the rapid development of science and economy. This article proposes a brief review on learning-based decision-making technology for autonomous vehicles since it is significant for safer and efficient performance of autonomous vehicles. Firstly, the basic outline of decision-making technology is provided. Secondly, related works about learning-based decision-making methods for autonomous vehicles are mainly reviewed with the comparison to classical decision-making methods. In addition, applications of decision-making methods in existing autonomous vehicles are summarized. Finally, promising research topics in the future study of decision-making technology for autonomous vehicles are prospected.

【11】 Cooperative Training and Latent Space Data Augmentation for Robust Medical Image Segmentation 标题：基于协作训练和潜在空间数据增强的鲁棒医学图像分割

作者：Chen Chen,Kerstin Hammernik,Cheng Ouyang,Chen Qin,Wenjia Bai,Daniel Rueckert 机构：BioMedIA Group, Department of Computing, Imperial College London, UK, Klinikum rechts der Isar, Technical University of Munich, Germany, Institute for Digital Communications, University of Edinburgh, UK, Data Science Institute, Imperial College London, UK 备注：MICCAI 2021 链接：https://arxiv.org/abs/2107.01079 摘要：基于深度学习的分割方法在部署过程中容易受到不可预见的数据分布变化的影响，如不同扫描仪引起的图像外观或对比度的变化、不可预见的图像伪影等，我们提出了一种用于训练图像分割模型的协作框架和一种用于生成硬示例的潜在空间增强方法。这两个贡献提高了模型的泛化和鲁棒性与有限的数据。合作训练框架由快速思维网络（FTN）和慢速思维网络（STN）组成。FTN学习解耦的图像特征和形状特征，用于图像重建和分割任务。STN学习形状先验知识进行分割校正和细化。这两个网络以合作的方式进行训练。潜在空间增强通过在信道和空间两个方面掩盖解耦的潜在空间，产生具有挑战性的训练示例。我们在公共心脏成像数据集上进行了广泛的实验。与强基线方法相比，我们只使用了10名来自单个站点的受试者进行训练，证明了改进的跨站点分割性能和增强的针对各种不可预见的成像伪影的鲁棒性。特别地，与标准训练方法相比，具有潜在空间数据增强的合作训练在平均骰子得分方面产生15%的改进。摘要：Deep learning-based segmentation methods are vulnerable to unforeseen data distribution shifts during deployment, e.g. change of image appearances or contrasts caused by different scanners, unexpected imaging artifacts etc. In this paper, we present a cooperative framework for training image segmentation models and a latent space augmentation method for generating hard examples. Both contributions improve model generalization and robustness with limited data. The cooperative training framework consists of a fast-thinking network (FTN) and a slow-thinking network (STN). The FTN learns decoupled image features and shape features for image reconstruction and segmentation tasks. The STN learns shape priors for segmentation correction and refinement. The two networks are trained in a cooperative manner. The latent space augmentation generates challenging examples for training by masking the decoupled latent space in both channel-wise and spatial-wise manners. We performed extensive experiments on public cardiac imaging datasets. Using only 10 subjects from a single site for training, we demonstrated improved cross-site segmentation performance and increased robustness against various unforeseen imaging artifacts compared to strong baseline methods. Particularly, cooperative training with latent space data augmentation yields 15% improvement in terms of average Dice score when compared to a standard training method.

【12】 General Board Game Concepts 标题：一般棋类游戏概念

作者：Éric Piette,Matthew Stephenson,Dennis J. N. J. Soemers,Cameron Browne 机构：Department of Data Science and Knowledge Engineering, Maastricht University, Maastricht, the Netherlands 链接：https://arxiv.org/abs/2107.01078 摘要：许多游戏经常在它们之间共享共同的想法或方面，例如它们的规则、控制或游戏区域。然而，在棋盘游戏的一般游戏（GGP）的背景下，这一领域仍处于探索阶段。我们建议将“游戏概念”的概念形式化，其灵感来源于游戏玩家和设计师通常使用的术语。通过Ludii通用游戏系统，我们描述了几个抽象层次的概念，例如游戏本身、玩的动作或达到的状态。这个新的GGP特性与游戏的ludeme表示相关联，开启了许多新的研究领域。超智能体选择器的创建，游戏间AI学习的转移，或者使用游戏术语解释AI技术，都可以通过游戏概念的使用来实现。文中还讨论了其它可从博弈概念中获益的应用，如为不完全古代博弈生成合理的重构规则，或实现棋盘游戏推荐系统。摘要：Many games often share common ideas or aspects between them, such as their rules, controls, or playing area. However, in the context of General Game Playing (GGP) for board games, this area remains under-explored. We propose to formalise the notion of "game concept", inspired by terms generally used by game players and designers. Through the Ludii General Game System, we describe concepts for several levels of abstraction, such as the game itself, the moves played, or the states reached. This new GGP feature associated with the ludeme representation of games opens many new lines of research. The creation of a hyper-agent selector, the transfer of AI learning between games, or explaining AI techniques using game terms, can all be facilitated by the use of game concepts. Other applications which can benefit from game concepts are also discussed, such as the generation of plausible reconstructed rules for incomplete ancient games, or the implementation of a board game recommender system.

【13】 Backward-Compatible Prediction Updates: A Probabilistic Approach 标题：向后兼容预测更新：一种概率方法

作者：Frederik Träuble,Julius von Kügelgen,Matthäus Kleindessner,Francesco Locatello,Bernhard Schölkopf,Peter Gehler 机构：Amazon T¨ubingen, Germany, Max Planck Institute for Intelligent Systems, T¨ubingen, Germany, Department of Engineering, University of Cambridge, United Kingdom 链接：https://arxiv.org/abs/2107.01057 摘要：当机器学习系统满足实际应用时，准确度只是其中的一个要求。在这篇论文中，我们分析了一个补充的观点，这个观点源于不断增加的预先训练和定期改进的最新模型的可用性。虽然新的改进模型发展速度很快，但下游任务变化更慢或保持不变。假设我们有一个大的未标记的数据集，我们想保持准确的预测。每当一个新的、可能更好的ML模型可用时，我们都会遇到两个问题：（i）给定有限的预算，哪些数据点应该使用新模型重新评估？；如果新的预测与当前的不同，我们应该更新吗？问题（i）是关于计算成本的，这对于非常大的数据集和模型非常重要。问题（ii）是关于保持预测的一致性，这可能与下游应用程序高度相关；我们的要求是避免消极的转变，即将正确的预测转变为错误的预测。本文将预测更新问题形式化，提出了一种有效的概率方法来解决上述问题。在标准分类基准数据集上的大量实验表明，在向后兼容预测更新的关键度量上，我们的方法优于其他策略。摘要：When machine learning systems meet real world applications, accuracy is only one of several requirements. In this paper, we assay a complementary perspective originating from the increasing availability of pre-trained and regularly improving state-of-the-art models. While new improved models develop at a fast pace, downstream tasks vary more slowly or stay constant. Assume that we have a large unlabelled data set for which we want to maintain accurate predictions. Whenever a new and presumably better ML models becomes available, we encounter two problems: (i) given a limited budget, which data points should be re-evaluated using the new model?; and (ii) if the new predictions differ from the current ones, should we update? Problem (i) is about compute cost, which matters for very large data sets and models. Problem (ii) is about maintaining consistency of the predictions, which can be highly relevant for downstream applications; our demand is to avoid negative flips, i.e., changing correct to incorrect predictions. In this paper, we formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions. In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies along key metrics for backward-compatible prediction updates.

【14】 The Optimal Size of an Epistemic Congress 标题：认识型国会的最优规模

作者：Manon Revel,Tao Lin,Daniel Halpern 机构：MIT, Harvard University, the Representatives must be raised to a certain, they must be divided to certain number, in order to guard against the confusion of, a multitude. (Federalist Paper No. ,), – James Madison 链接：https://arxiv.org/abs/2107.01042 摘要：我们分析了代议制民主国家中国会的最佳规模。我们采取一种认知的观点，即选民决定一个具有一个基本事实结果的二元问题，每个选民根据他们在$[0，1]$中的能力水平正确投票。假设我们可以抽样选出最好的专家组成一个认知大会，我们发现最优大会规模应该与群体规模成线性关系。这一结果是惊人的，因为即使允许最高代表以任意高的概率准确，它仍然成立。然后，我们分析现实世界的数据，发现实际规模的大会远远小于我们的理论结果表明的最佳规模。我们的结论是，在什么条件下，规模次优的国会仍会优于所有选民都投票的直接民主。摘要：We analyze the optimal size of a congress in a representative democracy. We take an epistemic view where voters decide on a binary issue with one ground truth outcome, and each voter votes correctly according to their competence levels in $[0, 1]$. Assuming that we can sample the best experts to form an epistemic congress, we find that the optimal congress size should be linear in the population size. This result is striking because it holds even when allowing the top representatives to be accurate with arbitrarily high probabilities. We then analyze real world data, finding that the actual sizes of congresses are much smaller than the optimal size our theoretical results suggest. We conclude by analyzing under what conditions congresses of sub-optimal sizes would still outperform direct democracy, in which all voters vote.

【15】 Feeling of Presence Maximization: mmWave-Enabled Virtual Reality Meets Deep Reinforcement Learning 标题：临场感最大化：支持mmWave的虚拟现实遇到深度强化学习

作者：Peng Yang,Tony Q. S. Quek,Jingxuan Chen,Chaoqun You,Xianbin Cao 机构：Singapore University ofTechnology and Design, Cao are with the School of Electronic and Information Engineering, Beihang University 链接：https://arxiv.org/abs/2107.01001 摘要：本文研究了为无线移动用户提供超可靠、节能的虚拟现实（VR）体验的问题。为了保证超高清（UHD）视频帧可靠地传输给移动用户，增强移动用户的沉浸式视觉体验，开发了协调多点（CoMP）传输技术和毫米波（mmWave）通信技术。由于用户的移动和无线信道的时变，无线VR体验增强问题被描述为一个序列相关的混合整数问题，其目标是最大化用户在虚拟世界中的存在感（FoP），受接入点（AP）和用户头戴式显示器（HMD）的功耗限制。然而，由于缺乏用户准确的跟踪信息，以及序列相关和混合整数的特性，这一问题很难直接得到解决。为了克服这一挑战，我们提出了一种并行回声状态网络（ESN）学习方法，通过训练APs分别采集的新鲜和历史跟踪样本来预测用户的跟踪信息。根据学习的结果，我们提出了一种基于深度强化学习（DRL）的优化算法来解决该问题。在该算法中，我们实现了深度神经网络（DNNs）作为一个可伸缩的解决方案来产生整数决策变量，并解决了一个连续功率控制问题来批判整数决策变量。最后，将该算法与各种基准算法的性能进行了比较，并讨论了不同设计参数对算法性能的影响。仿真结果表明，该算法比基准算法节能4.14%。摘要：This paper investigates the problem of providing ultra-reliable and energy-efficient virtual reality (VR) experiences for wireless mobile users. To ensure reliable ultra-high-definition (UHD) video frame delivery to mobile users and enhance their immersive visual experiences, a coordinated multipoint (CoMP) transmission technique and millimeter wave (mmWave) communications are exploited. Owing to user movement and time-varying wireless channels, the wireless VR experience enhancement problem is formulated as a sequence-dependent and mixed-integer problem with a goal of maximizing users' feeling of presence (FoP) in the virtual world, subject to power consumption constraints on access points (APs) and users' head-mounted displays (HMDs). The problem, however, is hard to be directly solved due to the lack of users' accurate tracking information and the sequence-dependent and mixed-integer characteristics. To overcome this challenge, we develop a parallel echo state network (ESN) learning method to predict users' tracking information by training fresh and historical tracking samples separately collected by APs. With the learnt results, we propose a deep reinforcement learning (DRL) based optimization algorithm to solve the formulated problem. In this algorithm, we implement deep neural networks (DNNs) as a scalable solution to produce integer decision variables and solving a continuous power control problem to criticize the integer decision variables. Finally, the performance of the proposed algorithm is compared with various benchmark algorithms, and the impact of different design parameters is also discussed. Simulation results demonstrate that the proposed algorithm is more 4.14% energy-efficient than the benchmark algorithms.

【16】 Brain over Brawn -- Using a Stereo Camera to Detect, Track and Intercept a Faster UAV by Reconstructing Its Trajectory 标题：Brain over Brown--使用立体摄像机通过重建轨迹来探测、跟踪和拦截速度更快的无人机

作者：Antonella Barišić,Frano Petric,Stjepan Bogdan 机构：Laboratory for Robotics and Intelligent Control Systems, University of Zagreb, Unska , Zagreb, Croatia 备注：To be published in Field Robotics. UAV-Eagle dataset available at: this https URL 链接：https://arxiv.org/abs/2107.00962 摘要：本文介绍的工作展示了我们拦截更快的入侵者无人机的方法，灵感来自MBZIRC2020挑战1。通过利用入侵者轨迹形状的知识，我们能够计算出拦截点。目标跟踪是基于一个YOLOv3微型卷积神经网络的图像处理，结合一个框架安装的小型立体相机的深度计算。利用ZED-Mini的RGB和深度数据提取目标的三维位置，并设计了基于深度直方图的图像去噪处理方法。利用所获得的目标位置的三维测量值，计算出一个八字形轨迹的位置、方向和大小，并用Bernoulli的lemniscate近似。一旦近似值被认为足够精确，通过测量值和近似值之间的Hausdorff距离来测量，就计算出一个拦截点，使拦截的无人机正好位于目标的路径上。该方法在MBZIRC比赛中取得了显著的改进，并通过仿真和现场试验进行了验证。结果表明，该系统能够有效地提取目标无人机的运动信息，为目标无人机的拦截提供依据。该系统能够对目标进行跟踪拦截，在大多数仿真实验中比拦截器快30%。在非结构化环境中的测试得到了12个成功结果中的9个。摘要：The work presented in this paper demonstrates our approach to intercepting a faster intruder UAV, inspired by the MBZIRC2020 Challenge 1. By leveraging the knowledge of the shape of the intruder's trajectory we are able to calculate the interception point. Target tracking is based on image processing by a YOLOv3 Tiny convolutional neural network, combined with depth calculation using a gimbal-mounted ZED Mini stereo camera. We use RGB and depth data from ZED Mini to extract the 3D position of the target, for which we devise a histogram-of-depth based processing to reduce noise. Obtained 3D measurements of target's position are used to calculate the position, the orientation and the size of a figure-eight shaped trajectory, which we approximate using lemniscate of Bernoulli. Once the approximation is deemed sufficiently precise, measured by Hausdorff distance between measurements and the approximation, an interception point is calculated to position the intercepting UAV right on the path of the target. The proposed method, which has been significantly improved based on the experience gathered during the MBZIRC competition, has been validated in simulation and through field experiments. The results confirmed that an efficient visual perception module which extracts information related to the motion of the target UAV as a basis for the interception, has been developed. The system is able to track and intercept the target which is 30% faster than the interceptor in majority of simulation experiments. Tests in the unstructured environment yielded 9 out of 12 successful results.

【17】 SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents 标题：社会人工智能：深度强化学习主体的社会认知能力基准

作者：Grgur Kovač,Rémy Portelas,Katja Hofmann,Pierre-Yves Oudeyer 机构：Inria (FR), Microsoft Research (UK) 备注：under review. arXiv admin note: substantial text overlap with arXiv:2104.13207 链接：https://arxiv.org/abs/2107.00956 摘要：人工智能的主要挑战之一是构建能够参与与人类社会互动的具身自主主体。在深度强化学习（DRL）领域，这一目标激发了许多关于具体语言使用的研究。然而，目前的研究侧重于语言作为一种交际工具在非常简单和非多样化的社会环境中：语言的“自然性”被简化为词汇量大和变异性大的概念。在这篇论文中，我们认为，面向人类水平的人工智能需要更广泛的关键社会技能：1）在复杂多变的社会环境中使用语言；2）在不断发展的社会世界中，在语言之外，复杂的多模态环境中体现了交流。我们解释了认知科学的概念如何帮助人工智能绘制出一个类似人类智能的路线图，并将重点放在它的社会维度上。作为第一步，我们建议将当前的研究扩展到更广泛的核心社交技能。为了做到这一点，我们提出了SocialAI，这是一个评估DRL代理使用多个网格世界环境（具有其他（脚本化）社会代理）获取社会技能的基准。然后，我们研究了最近在SocialAI上测试的SOTA DRL方法的局限性，并讨论了迈向熟练社会代理的重要下一步。视频和代码可在https://sites.google.com/view/socialai. 摘要：Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. Within the Deep Reinforcement Learning (DRL) field, this objective motivated multiple works on embodied language use. However, current approaches focus on language as a communication tool in very simplified and non-diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary size and variability. In this paper, we argue that aiming towards human-level AI requires a broader set of key social skills: 1) language use in complex and variable social contexts; 2) beyond language, complex embodied communication in multimodal settings within constantly evolving social worlds. We explain how concepts from cognitive sciences could help AI to draw a roadmap towards human-like intelligence, with a focus on its social dimensions. As a first step, we propose to expand current research to a broader set of core social skills. To do this, we present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents using multiple grid-world environments featuring other (scripted) social agents. We then study the limits of a recent SOTA DRL approach when tested on SocialAI and discuss important next steps towards proficient social agents. Videos and code are available at https://sites.google.com/view/socialai.

【18】 Embodiment and Computational Creativity 标题：体现与计算创造力

作者：Christian Guckelsberger,Anna Kantosalo,Santiago Negrete-Yankelevich,Tapio Takala 机构：Finnish Center for Artificial Intelligence, Department of Computer Science, Aalto University, Espoo, Finland, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK 备注：10 pages, 1 Table, 1 Figure. Accepted as full paper at the International Conference on Computational Creativity (ICCC) 2021 链接：https://arxiv.org/abs/2107.00949 摘要：我们推测，创造力和对创造力的感知，至少在某种程度上，是由化身塑造的。这使得具体化与计算创造力（CC）的研究高度相关，但是现有的研究很少，并且概念的使用非常模糊。我们通过对国际计算创造力会议上的出版物进行系统的回顾和规范性的分析来克服这种情况。我们通过识别和比较概念的不同用法，采用并扩展已建立的具体化类型来解决歧义。我们收集、分析和强调了在CC中拥抱化身的机遇和挑战，作为研究的参考，并提出了进一步推进化身CC研究计划的重要方向。摘要：We conjecture that creativity and the perception of creativity are, at least to some extent, shaped by embodiment. This makes embodiment highly relevant for Computational Creativity (CC) research, but existing research is scarce and the use of the concept highly ambiguous. We overcome this situation by means of a systematic review and a prescriptive analysis of publications at the International Conference on Computational Creativity. We adopt and extend an established typology of embodiment to resolve ambiguity through identifying and comparing different usages of the concept. We collect, contextualise and highlight opportunities and challenges in embracing embodiment in CC as a reference for research, and put forward important directions to further the embodied CC research programme.

【19】 A Novel Deep Reinforcement Learning Based Stock Direction Prediction using Knowledge Graph and Community Aware Sentiments 标题：一种新的基于知识图和社区感知情感的深度强化学习股市走向预测方法

作者：Anil Berk Altuner,Zeynep Hilal Kilimci 机构：Department of Information Systems Engineering, University of Kocaeli, Kocaeli ,. 备注：15 pages 链接：https://arxiv.org/abs/2107.00931 摘要：股票市场预测一直是投资者、研究者和分析师的重要课题。由于受诸多因素的影响，股市预测是一项很难处理的任务。在本研究中，我们提出了一种新的方法，即基于深度强化学习的方法，利用社区情绪和知识图对股票进行方向预测。为此，我们首先通过分析连接之间的关系来构造用户的社会知识图。然后，将相关股票的时间序列分析和情绪分析与深度强化方法相结合。土耳其版本的双向编码器表示来自Transformer（BerTurk）被用来分析用户的情绪，而深度Q学习方法被用于所提出的模型的深度强化学习侧来构建深度Q网络。以伊斯坦布尔证券交易所的Garanti-Bank（GARAN）、Akbank（AKBNK）、T\'urkiye\.I\c{s}Bankas{\I}（ISCTR）股票为例，验证了该模型的有效性。实验结果表明，该模型在股市预测任务中取得了显著的效果。摘要：Stock market prediction has been an important topic for investors, researchers, and analysts. Because it is affected by too many factors, stock market prediction is a difficult task to handle. In this study, we propose a novel method that is based on deep reinforcement learning methodologies for the direction prediction of stocks using sentiments of community and knowledge graph. For this purpose, we firstly construct a social knowledge graph of users by analyzing relations between connections. After that, time series analysis of related stock and sentiment analysis is blended with deep reinforcement methodology. Turkish version of Bidirectional Encoder Representations from Transformers (BerTurk) is employed to analyze the sentiments of the users while deep Q-learning methodology is used for the deep reinforcement learning side of the proposed model to construct the deep Q network. In order to demonstrate the effectiveness of the proposed model, Garanti Bank (GARAN), Akbank (AKBNK), T\"urkiye \.I\c{s} Bankas{\i} (ISCTR) stocks in Istanbul Stock Exchange are used as a case study. Experiment results show that the proposed novel model achieves remarkable results for stock market prediction task.

【20】 Online Multi-Agent Forecasting with Interpretable Collaborative Graph Neural Network 标题：基于可解释协同图神经网络的在线多Agent预测

作者：Maosen Li,Siheng Chen,Yanning Shen,Genjia Liu,Ivor W. Tsang,Ya Zhang 机构：Ivor Tsang, Senior Member, IEEE 备注：Submitted to IEEE-TNNLS SI-Deep Neural Networks for Graphs: Theory, Models, Algorithms and Applications 链接：https://arxiv.org/abs/2107.00894 摘要：本文考虑利用系统中的动态交互来在线预测多个代理的未来状态。我们提出了一种新的协作预测单元（CoPU），它根据一个协作图聚合来自多个协作预测者的预测。通过考虑另一个代理的影响，训练每个协作预测器来预测代理的状态。协作图的边权值反映了每个预测因子的重要性。通过乘法更新在线调整协同图，通过最小化显式目标来激励协同图。基于这一目标，我们还进行了遗憾分析，以表明，随着训练，我们的CoPU取得了类似的表现与最好的个人合作预测事后。这种理论上的可解释性使我们的方法不同于其他许多图网络。为了逐步完善预测，多个copu被叠加形成一个协作图神经网络。在在线模拟轨迹预测、在线人体运动预测和在线交通速度预测三个任务上进行了大量实验，我们的方法在这三个任务上的平均性能分别比现有的方法高28.6%、17.4%和21.0%。摘要：This paper considers predicting future statuses of multiple agents in an online fashion by exploiting dynamic interactions in the system. We propose a novel collaborative prediction unit (CoPU), which aggregates the predictions from multiple collaborative predictors according to a collaborative graph. Each collaborative predictor is trained to predict the status of an agent by considering the impact of another agent. The edge weights of the collaborative graph reflect the importance of each predictor. The collaborative graph is adjusted online by multiplicative update, which can be motivated by minimizing an explicit objective. With this objective, we also conduct regret analysis to indicate that, along with training, our CoPU achieves similar performance with the best individual collaborative predictor in hindsight. This theoretical interpretability distinguishes our method from many other graph networks. To progressively refine predictions, multiple CoPUs are stacked to form a collaborative graph neural network. Extensive experiments are conducted on three tasks: online simulated trajectory prediction, online human motion prediction and online traffic speed prediction, and our methods outperform state-of-the-art works on the three tasks by 28.6%, 17.4% and 21.0% on average, respectively.

【21】 On-Demand and Lightweight Knowledge Graph Generation -- a Demonstration with DBpedia 标题：按需轻量级知识图生成--以DBpedia为例

作者：Malte Brockmeier,Yawen Liu,Sunita Pateer,Sven Hertling,Heiko Paulheim 机构：Data and Web Science Group, University of Mannheim, Germany 备注：Accepted at Semantics 2021 链接：https://arxiv.org/abs/2107.00873 摘要：现代大规模知识图（如DBpedia）是需要大量计算资源来服务和处理的数据集。此外，它们通常有较长的发布周期，这导致这些图中的信息过时。在本文中，我们提出了DBpedia on Demand——一个按需提供DBpedia资源的系统，它不需要具体化和存储整个图形，甚至提供有限的查询功能。摘要：Modern large-scale knowledge graphs, such as DBpedia, are datasets which require large computational resources to serve and process. Moreover, they often have longer release cycles, which leads to outdated information in those graphs. In this paper, we present DBpedia on Demand -- a system which serves DBpedia resources on demand without the need to materialize and store the entire graph, and which even provides limited querying functionality.

【22】 Learning Primal Heuristics for Mixed Integer Programs 标题：混合整数规划的学习原始启发式算法

作者：Yunzhuang Shen,Yuan Sun,Andrew Eberhard,Xiaodong Li 机构：Computing Technologies, RMIT University, Melbourne, Australia, School of Mathematics, Monash University, School of Science 备注：Accepted by IJCNN'21 链接：https://arxiv.org/abs/2107.00866 摘要：利用机器学习技术，提出了一种新的混合整数规划原始启发式算法。混合整数规划是求解组合优化问题的一种通用技术。在一个求解器中，原始启发式算法在寻找好的可行解方面起着关键的作用，使人们能够从分支定界算法（B&B）的开始就缩小对偶差距，通过积极地修剪B&B树大大提高其性能。在本文中，我们探讨是否有效的原始启发式可以自动学习通过机器学习。提出了一种将优化问题表示为图的新方法，并在已知最优解的问题实例上训练图卷积网络。这反过来又可以预测决策变量的值在最优解的一个看不见的问题实例类似的类型。然后利用B&B方法的一种新结构，即概率分支和引导深度优先搜索（PB-DFS）方法来预测变量解，旨在快速找到（接近）最优解。实验结果表明，与现有的原始启发式算法相比，这种新的启发式算法能够在求解过程的早期找到更好的原始解。摘要：This paper proposes a novel primal heuristic for Mixed Integer Programs, by employing machine learning techniques. Mixed Integer Programming is a general technique for formulating combinatorial optimization problems. Inside a solver, primal heuristics play a critical role in finding good feasible solutions that enable one to tighten the duality gap from the outset of the Branch-and-Bound algorithm (B&B), greatly improving its performance by pruning the B&B tree aggressively. In this paper, we investigate whether effective primal heuristics can be automatically learned via machine learning. We propose a new method to represent an optimization problem as a graph, and train a Graph Convolutional Network on solved problem instances with known optimal solutions. This in turn can predict the values of decision variables in the optimal solution for an unseen problem instance of a similar type. The prediction of variable solutions is then leveraged by a novel configuration of the B&B method, Probabilistic Branching with guided Depth-first Search (PB-DFS) approach, aiming to find (near-)optimal solutions quickly. The experimental results show that this new heuristic can find better primal solutions at a much earlier stage of the solving process, compared to other state-of-the-art primal heuristics.

【23】 User Role Discovery and Optimization Method based on K-means + Reinforcement learning in Mobile Applications 标题：移动应用中基于K-均值+强化学习的用户角色发现与优化方法

作者：Yuanbang Li 机构：Zhoukou Normal University, Zhoukou, Henan Province , China 链接：https://arxiv.org/abs/2107.00862 摘要：随着手机的广泛使用，用户可以随时随地分享自己的位置和活动，作为签到数据的一种形式。这些数据反映了用户的特征。长期稳定，一组用户共享的特性可以抽象为用户角色。角色与用户的社会背景、职业和生活习惯密切相关。这项研究提供了四个主要贡献。首先，通过对签入数据的分析，为每个用户建立不同视图的用户特征模型。其次，利用K均值算法从用户特征中发现用户角色。再次，提出了一种强化学习算法，增强了用户角色的聚类效果，提高了聚类结果的稳定性。最后通过实验验证了该方法的有效性，实验结果表明了该方法的有效性。摘要：With the widespread use of mobile phones, users can share their location and activity anytime, anywhere, as a form of check in data. These data reflect user features. Long term stable, and a set of user shared features can be abstracted as user roles. The role is closely related to the user's social background, occupation, and living habits. This study provides four main contributions. Firstly, user feature models from different views for each user are constructed from the analysis of check in data. Secondly, K Means algorithm is used to discover user roles from user features. Thirdly, a reinforcement learning algorithm is proposed to strengthen the clustering effect of user roles and improve the stability of the clustering result. Finally, experiments are used to verify the validity of the method, the results of which show the effectiveness of the method.

【24】 An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors 标题：机器学习可再现性的经验报告：给从业者和TensorFlow模型花园贡献者的指导

作者：Vishnu Banna,Akhil Chinnakotla,Zhengxin Yan,Ani Vegesana,Naveen Vivek,Kruthi Krishnappa,Wenxin Jiang,Yung-Hsiang Lu,George K. Thiruvathukal,James C. Davis 机构：Department of Electrical & Computer Engineering, Purdue University, Department of Computer Science, Loyola University Chicago 备注：Technical Report 链接：https://arxiv.org/abs/2107.00821 摘要：机器学习技术正成为科学和工程进步的基本工具。这些技术被应用于各种各样的环境中，如天文学和垃圾邮件过滤。然而，正确应用这些技术需要仔细的工程设计。重视技术潜力；将基于研究的机器学习技术应用到实际应用中所需的软件工程过程受到的关注相对较少。技术公司通过TensorFLow和PyTorch等机器学习框架为工程界提供了支持，但是如何在这些框架中设计复杂的机器学习模型的细节仍然是隐藏的。为了在工程界推广最佳实践，学术机构和谷歌合作成立了一个机器学习模型特别兴趣小组（SIGMODELS），其目标是在TensorFlow Model Garden（TFMG）等社区位置开发著名机器学习模型的示范性实现。本报告的目的是定义一个以适合纳入TFMG的质量水平再现最先进机器学习模型的过程。我们定义了工程流程，并详细阐述了从论文分析到模型发布的每一步。我们报告了我们与26名学生研究人员组成的团队实施YOLO模型系列的经验，分享了我们开发的工具，并描述了我们在这一过程中学到的经验教训。摘要：Machine learning techniques are becoming a fundamental tool for scientific and engineering progress. These techniques are applied in contexts as diverse as astronomy and spam filtering. However, correctly applying these techniques requires careful engineering. Much attention has been paid to the technical potential; relatively little attention has been paid to the software engineering process required to bring research-based machine learning techniques into practical utility. Technology companies have supported the engineering community through machine learning frameworks such as TensorFLow and PyTorch, but the details of how to engineer complex machine learning models in these frameworks have remained hidden. To promote best practices within the engineering community, academic institutions and Google have partnered to launch a Special Interest Group on Machine Learning Models (SIGMODELS) whose goal is to develop exemplary implementations of prominent machine learning models in community locations such as the TensorFlow Model Garden (TFMG). The purpose of this report is to define a process for reproducing a state-of-the-art machine learning model at a level of quality suitable for inclusion in the TFMG. We define the engineering process and elaborate on each step, from paper analysis to model release. We report on our experiences implementing the YOLO model family with a team of 26 student researchers, share the tools we developed, and describe the lessons we learned along the way.

【25】 The Causal Neural Connection: Expressiveness, Learnability, and Inference 标题：因果神经联系：表现性、可学习性和推理性

作者：Kevin Xia,Kai-Zhan Lee,Yoshua Bengio,Elias Bareinboim 机构：Columbia University, MILA, Université de Montréal 备注：10 pages main body (53 total pages with references and appendix), 5 figures in main body (20 total figures including appendix) 链接：https://arxiv.org/abs/2107.00793 摘要：任何因果推理的核心要素之一是一个称为结构因果模型（SCM）的对象，它代表了被调查系统随机变化的机制和外部来源的集合（Pearl，2000）。许多神经网络的一个重要特性是普适逼近性：将任意函数逼近到任意精度的能力。考虑到这一特性，人们可能会猜测，一组神经网络能够通过训练SCM生成的数据来学习任何SCM。在本文中，我们通过解开表达性和可学性的概念来证明这不是事实。具体地说，我们证明了因果层次定理（Thm。1，Bareinboim et al.，2020），它描述了从数据中可以学到的东西的局限性，对于神经模型仍然适用。例如，一个任意复杂和表达的神经网络无法预测干预措施的效果，仅凭观察数据。基于这一结果，我们引入了一种特殊类型的SCM，称为神经因果模型（NCM），并形式化了一种新的归纳偏差来编码执行因果推理所必需的结构约束。在这类新模型的基础上，我们致力于解决文献中发现的两个典型任务，即因果识别和估计。利用神经工具箱，我们开发了一个算法，这是充分和必要的，以确定因果关系是否可以从数据中学习（即，因果可识别性）；然后，只要可识别性成立，它就估计效果（因果估计）。仿真结果证实了所提出的方法。摘要：One of the central elements of any causal inference is an object called structural causal model (SCM), which represents a collection of mechanisms and exogenous sources of random variation of the system under investigation (Pearl, 2000). An important property of many kinds of neural networks is universal approximability: the ability to approximate any function to arbitrary precision. Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM. In this paper, we show this is not the case by disentangling the notions of expressivity and learnability. Specifically, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020), which describes the limits of what can be learned from data, still holds for neural models. For instance, an arbitrarily complex and expressive neural net is unable to predict the effects of interventions given observational data alone. Given this result, we introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences. Building on this new class of models, we focus on solving two canonical tasks found in the literature known as causal identification and estimation. Leveraging the neural toolbox, we develop an algorithm that is both sufficient and necessary to determine whether a causal effect can be learned from data (i.e., causal identifiability); it then estimates the effect whenever identifiability holds (causal estimation). Simulations corroborate the proposed approach.

【26】 On Bridging Generic and Personalized Federated Learning 标题：泛化与个性化联合学习的桥梁研究

作者：Hong-You Chen,Wei-Lun Chao 机构：The Ohio State University, USA 链接：https://arxiv.org/abs/2107.00778 摘要：联邦学习因其在不访问多个客户机数据的情况下协作训练模型的能力而有希望，但当客户机的数据分布彼此不同时，它很容易受到攻击。这种分歧进一步导致了一个难题：“我们应该优先考虑学习模型的通用性能（供服务器将来使用）还是其个性化性能（供每个客户机使用）？”这两个看似相互竞争的目标使社区各执一词，然而在本文中，我们证明了同时处理这两个问题是可能的。具体地说，我们提出了一个新的联邦学习框架，该框架将模型的双重任务与两个预测任务显式解耦。一方面，我们引入了一系列对非同一类分布具有鲁棒性的损失，使客户能够训练一个具有一致目标的通用预测因子。另一方面，我们将个性化预测作为一个轻量级的自适应模块，在通用预测的基础上学习最小化每个客户的经验风险。在这种双损失双预测框架下，我们称之为联邦鲁棒解耦反馈控制棒，学习模型可以同时实现最先进的通用和个性化性能，基本上连接了这两个任务。摘要：Federated learning is promising for its ability to collaboratively train models with multiple clients without accessing their data, but vulnerable when clients' data distributions diverge from each other. This divergence further leads to a dilemma: "Should we prioritize the learned model's generic performance (for future use at the server) or its personalized performance (for each client)?" These two, seemingly competing goals have divided the community to focus on one or the other, yet in this paper we show that it is possible to approach both at the same time. Concretely, we propose a novel federated learning framework that explicitly decouples a model's dual duties with two prediction tasks. On the one hand, we introduce a family of losses that are robust to non-identical class distributions, enabling clients to train a generic predictor with a consistent objective across them. On the other hand, we formulate the personalized predictor as a lightweight adaptive module that is learned to minimize each client's empirical risk on top of the generic predictor. With this two-loss, two-predictor framework which we name Federated Robust Decoupling Fed-RoD, the learned model can simultaneously achieve state-of-the-art generic and personalized performance, essentially bridging the two tasks.

【27】 Autonomous Navigation for Quadrupedal Robots with Optimized Jumping through Constrained Obstacles 标题：优化跳跃穿越受限障碍物的四足机器人自主导航

作者：Scott Gilroy,Derek Lau,Lizhi Yang,Ed Izaguirre,Kristen Biermayer,Anxing Xiao,Mengti Sun,Ayush Agrawal,Jun Zeng,Zhongyu Li,Koushil Sreenath 备注：Accepted to 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE 2021) 链接：https://arxiv.org/abs/2107.00773 摘要：四足动物因其灵活和动态的设计而成为在充满挑战的环境中航行的有力人选。本文提出了一种方法，通过创建一个端到端的导航框架，利用步行和跳跃模式，扩展了四足机器人的探索范围。为了在避障的同时获得动态跳跃机动，在安全约束条件下，通过基于配置的优化离线优化动态可行轨迹。这样的优化方案使得机器人能够同时考虑空中和地面的障碍物，跳过窗口形状的障碍物。在自主导航管道中采用跳跃模式，利用基于搜索的全局规划器和局部规划器使机器人通过步行到达目标位置。一个状态机和一个决策策略允许系统在绕过障碍物或跳过障碍物之间切换行为。提出的框架在四足机器人迷你猎豹上进行了实验部署和验证，使机器人能够自主地在环境中导航，同时避开障碍物，并跳过13厘米的最大高度，通过一个窗户形状的开口以达到其目标。摘要：Quadrupeds are strong candidates for navigating challenging environments because of their agile and dynamic designs. This paper presents a methodology that extends the range of exploration for quadrupedal robots by creating an end-to-end navigation framework that exploits walking and jumping modes. To obtain a dynamic jumping maneuver while avoiding obstacles, dynamically-feasible trajectories are optimized offline through collocation-based optimization where safety constraints are imposed. Such optimization schematic allows the robot to jump through window-shaped obstacles by considering both obstacles in the air and on the ground. The resulted jumping mode is utilized in an autonomous navigation pipeline that leverages a search-based global planner and a local planner to enable the robot to reach the goal location by walking. A state machine together with a decision making strategy allows the system to switch behaviors between walking around obstacles or jumping through them. The proposed framework is experimentally deployed and validated on a quadrupedal robot, a Mini Cheetah, to enable the robot to autonomously navigate through an environment while avoiding obstacles and jumping over a maximum height of 13 cm to pass through a window-shaped opening in order to reach its goal.

【28】 Proof of the impossibility of probabilistic induction 标题：概率归纳法不可能性的证明

作者：Vaden Masrani 机构：In what follows I restate and simplify the proof of the impossibility of, probabilistic induction given in [,]. Other proofs are possible (cf. [,])., Logical Entailment, Given two statements x, y we write x ⊢ y if x entails y. For example, if 链接：https://arxiv.org/abs/2107.00749 摘要：在这篇短文中，我重申并简化了波普尔（1992）关于概率归纳不可能性的证明。其他证据是可能的（参见波普尔（1985））。摘要：In this short note I restate and simplify the proof of the impossibility of probabilistic induction from Popper (1992). Other proofs are possible (cf. Popper (1985)).

【29】 q-Paths: Generalizing the Geometric Annealing Path using Power Means 标题：Q-路径：用幂平均推广几何退火路径

作者：Vaden Masrani,Rob Brekelmans,Thang Bui,Frank Nielsen,Aram Galstyan,Greg Ver Steeg,Frank Wood 机构：University of British Columbia,USC Information Sciences Institute, University of Sydney,Sony CSL,MILA, ∗Equal Contribution 备注：arXiv admin note: text overlap with arXiv:2012.07823 链接：https://arxiv.org/abs/2107.00745 摘要：许多常见的机器学习方法都涉及到几何退火路径，即用几何平均值构造的两个感兴趣分布之间的中间密度序列。虽然矩平均路径等替代方法在某些情况下表现出性能提升，但它们的实际适用性仍然受到指数族端点假设和缺乏封闭形式能量函数的限制。在这项工作中，我们引入了$q$-路，这是一个由广义平均概念导出的路族，它包括几何和算术混合作为特例，并且允许一个简单的封闭形式，它包含了非扩展热力学中的变形对数函数。根据之前对几何路径的分析，我们将我们的$q$-路径解释为对应于$q$-指数分布族，并将中间密度的变分表示为最小化到端点的$\alpha$-发散的混合物。我们表明，小偏差远离几何路径产生经验收益贝叶斯推理使用序贯蒙特卡罗和生成模型评估使用退火重要性抽样。摘要：Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of closed form energy function. In this work, we introduce $q$-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics. Following previous analysis of the geometric path, we interpret our $q$-paths as corresponding to a $q$-exponential family of distributions, and provide a variational representation of intermediate densities as minimizing a mixture of $\alpha$-divergences to the endpoints. We show that small deviations away from the geometric path yield empirical gains for Bayesian inference using Sequential Monte Carlo and generative model evaluation using Annealed Importance Sampling.

【30】 Neural Task Success Classifiers for Robotic Manipulation from Few Real Demonstrations 标题：基于少数真实示例的机器人操作神经任务成功分类器

作者：Abdalkarim Mohtasib,Amir Ghalamzan E.,Nicola Bellotto,Heriberto Cuayáhuitl 机构：School of Computer Science, University of Lincoln, Lincoln, UK, Lincoln Institute for Agri-Food, Technology, Heriberto Cuay´ahuitl 备注：8 pages 链接：https://arxiv.org/abs/2107.00722 摘要：在不同的工作环境中，越来越多的机器人需要从少量的演示中学习新的操作任务。一个评估动作质量的分类器模型可以预测一个任务的成功完成，智能代理可以利用它进行动作选择。本文提出了一种新的分类器，它只需通过少量的实例就可以对任务完成情况进行分类。我们对不同的神经分类器进行了综合比较，如全连通分类、全卷积分类、序列分类和域自适应分类。我们还提出了一个新的数据集，包括五个机器人操作任务，这是公开的。我们使用我们的数据集和MIME数据集比较了我们的新分类器和现有模型的性能。研究结果表明，领域自适应和基于时间的特征可以提高成功预测。我们的新模型，即具有域自适应和时序特征的全卷积神经网络，在两个数据集中的任务中的平均分类准确率分别为97.3%和95.5%，而没有域自适应和时序特征的最新分类器仅分别达到82.4%和90.3%。摘要：Robots learning a new manipulation task from a small amount of demonstrations are increasingly demanded in different workspaces. A classifier model assessing the quality of actions can predict the successful completion of a task, which can be used by intelligent agents for action-selection. This paper presents a novel classifier that learns to classify task completion only from a few demonstrations. We carry out a comprehensive comparison of different neural classifiers, e.g. fully connected-based, fully convolutional-based, sequence2sequence-based, and domain adaptation-based classification. We also present a new dataset including five robot manipulation tasks, which is publicly available. We compared the performances of our novel classifier and the existing models using our dataset and the MIME dataset. The results suggest domain adaptation and timing-based features improve success prediction. Our novel model, i.e. fully convolutional neural network with domain adaptation and timing features, achieves an average classification accuracy of 97.3\% and 95.5\% across tasks in both datasets whereas state-of-the-art classifiers without domain adaptation and timing-features only achieve 82.4\% and 90.3\%, respectively.

【31】 Long-Short Ensemble Network for Bipolar Manic-Euthymic State Recognition Based on Wrist-worn Sensors 标题：基于手腕佩戴传感器的长短集成网络双相躁狂-愉悦状态识别

作者：Ulysse Côté-Allard,Petter Jakobsen,Andrea Stautland,Tine Nordgreen,Ole Bernt Fasmer,Ketil Joachim Oedegaard,Jim Torresen 机构：Department of Informatics, University of Oslo, Oslo, Norway, NORMENT, Division of Psychiatry, Haukeland University Hospital, Bergen, Norway, Department of Clinical Medicine, University of Bergen, Norway 备注：Submitted for peer-review. 11 pages + 3. 2 Figures and 1 table 链接：https://arxiv.org/abs/2107.00710 摘要：躁狂发作的双相情感障碍可导致不加批判的行为和妄想性精神病，往往对受影响的人及其周围环境造成破坏性后果。早期发现和干预躁狂发作对于防止病情升级、入院和过早死亡至关重要。然而，双相情感障碍患者可能没有意识到他们正在经历躁狂发作，诸如快感和生产力提高等症状也会阻止患者寻求帮助。这项工作提出了执行用户独立，自动情绪状态检测的基础上，从手腕上获得的活动和皮肤电活动装置在躁狂和恢复后（安乐死）。本文提出了一种新的基于深度学习的集成方法，利用长（20小时）和短（5分钟）的时间间隔来区分情绪状态。通过对47例双相情感障碍患者的测试，本文提出的分类方法在心境正常/躁狂状态识别中的平均准确率为91.59%。摘要：Manic episodes of bipolar disorder can lead to uncritical behaviour and delusional psychosis, often with destructive consequences for those affected and their surroundings. Early detection and intervention of a manic episode are crucial to prevent escalation, hospital admission and premature death. However, people with bipolar disorder may not recognize that they are experiencing a manic episode and symptoms such as euphoria and increased productivity can also deter affected individuals from seeking help. This work proposes to perform user-independent, automatic mood-state detection based on actigraphy and electrodermal activity acquired from a wrist-worn device during mania and after recovery (euthymia). This paper proposes a new deep learning-based ensemble method leveraging long (20h) and short (5 minutes) time-intervals to discriminate between the mood-states. When tested on 47 bipolar patients, the proposed classification scheme achieves an average accuracy of 91.59% in euthymic/manic mood-state recognition.

【32】 Distilling Reinforcement Learning Tricks for Video Games 标题：视频游戏强化学习策略的提取

作者：Anssi Kanervisto,Christian Scheller,Yanick Schraner,Ville Hautamäki 机构：School of Computing, University of Eastern Finland, Joensuu, Finland, Institute for Data Science, University of Applied Sciences, Northwestern Switzerland, Windisch, Switzerland, Ville Hautam¨aki 备注：To appear in IEEE Conference on Games 2021. Experiment code is available at this https URL 链接：https://arxiv.org/abs/2107.00703 摘要：强化学习（RL）研究的重点是可以应用于不同领域的通用解决方案。这就产生了RL从业者几乎可以在任何领域使用的方法。然而，最近的研究往往缺乏有效使用RL所需的工程步骤（“技巧”），例如奖励形成、课程学习和将一个大任务分成更小的部分。如果没有必要的话，这样的技巧是很常见的，可以获得最先进的成绩并赢得RL比赛。为了简化工程工作，我们从最新的结果中提取技巧的描述，并研究这些技巧在多大程度上可以改进标准的深度Q学习代理。这项工作的长期目标是通过提供一个统一的软件框架和多个领域的相关见解，将成熟的RL方法与特定领域的技巧结合起来。摘要：Reinforcement learning (RL) research focuses on general solutions that can be applied across different domains. This results in methods that RL practitioners can use in almost any domain. However, recent studies often lack the engineering steps ("tricks") which may be needed to effectively use RL, such as reward shaping, curriculum learning, and splitting a large task into smaller chunks. Such tricks are common, if not necessary, to achieve state-of-the-art results and win RL competitions. To ease the engineering efforts, we distill descriptions of tricks from state-of-the-art results and study how well these tricks can improve a standard deep Q-learning agent. The long-term goal of this work is to enable combining proven RL methods with domain-specific tricks by providing a unified software framework and accompanying insights in multiple domains.

【33】 Deep Semantic Segmentation at the Edge for Autonomous Navigation in Vineyard Rows 标题：葡萄园行自主导航的边缘深层语义分割

作者：Diego Aghi,Simone Cerrato,Vittorio Mazzia,Marcello Chiaberge 备注：IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021) 链接：https://arxiv.org/abs/2107.00700 摘要：精准农业是一个快速发展的领域，其目标是在农业生产过程中引入经济有效的自动化。如今，葡萄园导航的算法解决方案需要昂贵的传感器和高计算工作量，这使得自主机器人平台在实际商业案例场景中无法大规模应用。从这个角度来看，我们提出的新控制利用机器感知和边缘人工智能技术的最新进展，以较低的计算和功耗在葡萄园行内实现高度经济和可靠的导航。事实上，使用一个定制的分割网络和一个低范围的RGB-D摄像机，我们能够利用环境的语义信息在不同的葡萄园场景中产生平滑的轨迹和稳定的控制。此外，由控制算法本身生成的分割图可以直接用作作物营养状态评估的过滤器。对真实数据和模拟环境的大量实验和评估表明了我们方法的有效性和内在的鲁棒性。摘要：Precision agriculture is a fast-growing field that aims at introducing affordable and effective automation into agricultural processes. Nowadays, algorithmic solutions for navigation in vineyards require expensive sensors and high computational workloads that preclude large-scale applicability of autonomous robotic platforms in real business case scenarios. From this perspective, our novel proposed control leverages the latest advancement in machine perception and edge AI techniques to achieve highly affordable and reliable navigation inside vineyard rows with low computational and power consumption. Indeed, using a custom-trained segmentation network and a low-range RGB-D camera, we are able to take advantage of the semantic information of the environment to produce smooth trajectories and stable control in different vineyards scenarios. Moreover, the segmentation maps generated by the control algorithm itself could be directly exploited as filters for a vegetative assessment of the crop status. Extensive experimentations and evaluations against real-world data and simulated environments demonstrated the effectiveness and intrinsic robustness of our methodology.

【34】 Active Learning of Abstract Plan Feasibility 标题：抽象方案可行性的主动学习

作者：Michael Noseworthy,Caris Moses,Isaiah Brand,Sebastian Castro,Leslie Kaelbling,Tomás Lozano-Pérez,Nicholas Roy 机构：MIT, CSAIL 备注：To appear in Robotics: Science and Systems 2021 链接：https://arxiv.org/abs/2107.00683 摘要：长视距序列操作任务被有效地分层处理：在高层次的抽象中，规划者搜索抽象的动作序列，当找到一个计划时，生成低层次的运动计划。这种策略依赖于可靠地预测一个满足抽象计划的低层可行计划的能力。然而，计算抽象计划可行性（APF）是困难的，因为计划的结果依赖于难以建模的现实世界现象，例如估计和执行中的噪声。在这项工作中，我们提出了一种主动学习的方法，有效地获得一个APF预测通过独立的任务，好奇的机器人探索。机器人识别计划，其结果将是APF的信息，执行这些计划，并从他们的成功或失败中学习。关键的是，我们利用一个不可行的子序列属性来删减主动学习策略中的候选计划，使我们的系统能够从较少的数据中学习。我们评估了我们的策略在仿真和一个真正的Franka-Emika熊猫机器人集成感知，实验，规划和执行。在物体质量分布不均匀的堆积域中，我们证明了我们的系统允许机器人在400个自监督交互中学习APF模型，并且我们的学习模型可以有效地用于多个下游任务。摘要：Long horizon sequential manipulation tasks are effectively addressed hierarchically: at a high level of abstraction the planner searches over abstract action sequences, and when a plan is found, lower level motion plans are generated. Such a strategy hinges on the ability to reliably predict that a feasible low level plan will be found which satisfies the abstract plan. However, computing Abstract Plan Feasibility (APF) is difficult because the outcome of a plan depends on real-world phenomena that are difficult to model, such as noise in estimation and execution. In this work, we present an active learning approach to efficiently acquire an APF predictor through task-independent, curious exploration on a robot. The robot identifies plans whose outcomes would be informative about APF, executes those plans, and learns from their successes or failures. Critically, we leverage an infeasible subsequence property to prune candidate plans in the active learning strategy, allowing our system to learn from less data. We evaluate our strategy in simulation and on a real Franka Emika Panda robot with integrated perception, experimentation, planning, and execution. In a stacking domain where objects have non-uniform mass distributions, we show that our system permits real robot learning of an APF model in four hundred self-supervised interactions, and that our learned model can be used effectively in multiple downstream tasks.

【35】 Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE 标题：更简单、更快、更强：用FlatNCE打破对比型学习者的log-K魔咒

作者：Junya Chen,Zhe Gan,Xuan Li,Qing Guo,Liqun Chen,Shuyang Gao,Tagyoung Chung,Yi Xu,Belinda Zeng,Wenlian Lu,Fan Li,Lawrence Carin,Chenyang Tao 机构：Duke University ,Microsoft ,Virginia Tech ,Amazon ,Fudan University ,KAUST 链接：https://arxiv.org/abs/2107.01152 摘要：基于信息的对比表征学习者，如SimCLR，近年来取得了巨大的成功。然而，这些对比方案是出了名的资源需求，因为它们的有效性随着小批量训练而崩溃（即log-K诅咒，而K是批量大小）。在这项工作中，我们从数学上揭示了对比学习者在小批量情况下失败的原因，并提出了一个新的简单的、非平凡的对比目标FlatNCE来解决这个问题。与InfoNCE不同的是，我们的flatness不再明确地诉诸于区分性分类目标来进行对比学习。理论上，我们证明了平坦度是信息量的数学对偶形式，从而架起了能量模型经典文献的桥梁；从经验上讲，我们证明了，在对代码进行最小修改的情况下，FlatNCE能够独立于主题工程工作而立即提高性能。对比学习技术的广泛应用，以及对对比训练的监控和诊断的新工具的引入，进一步说明了本研究的意义。我们用CIFAR10、ImageNet和其他数据集的经验证据来证实我们的观点，在这些数据集中，flatness始终优于InfoNCE。摘要：InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes this issue. Unlike InfoNCE, our FlatNCE no longer explicitly appeals to a discriminative classification goal for contrastive learning. Theoretically, we show FlatNCE is the mathematical dual formulation of InfoNCE, thus bridging the classical literature on energy modeling; and empirically, we demonstrate that, with minimal modification of code, FlatNCE enables immediate performance boost independent of the subject-matter engineering efforts. The significance of this work is furthered by the powerful generalization of contrastive learning techniques, and the introduction of new tools to monitor and diagnose contrastive training. We substantiate our claims with empirical evidence on CIFAR10, ImageNet, and other datasets, where FlatNCE consistently outperforms InfoNCE.

【36】 Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization 标题：基于对比Fenchel-Legendre优化的紧互信息估计

作者：Qing Guo,Junya Chen,Dong Wang,Yuewei Yang,Xinwei Deng,Lawrence Carin,Fan Li,Chenyang Tao 机构：Duke University ,Virginia Tech ,KAUST 链接：https://arxiv.org/abs/2107.01131 摘要：InfoNCE及其变体的成功应用使对比变分互信息（MI）估计器在机器学习中的应用得到了推广。这些估计器虽然具有很好的稳定性，但在很大程度上依赖于代价高昂的大批量训练，并且为了减少方差而牺牲了界紧性。为了克服这些限制，我们从非正规化统计建模和凸优化的角度重新研究了流行的变分MI界的数学。我们的研究不仅产生了一个新的统一的理论框架，包含了流行的变分MI界，而且产生了一个新颖的、简单的、强大的对比MI估计器FLO。理论上，我们证明了FLO估计是紧的，并且在随机梯度下降下是收敛的。经验上，我们的FLO估计克服了前人的局限性，学习效率更高。FLO的实用性通过一组广泛的基准进行了验证，这也揭示了实际MI估计中的权衡。摘要：Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation not only yields a new unified theoretical framework encompassing popular variational MI bounds but also leads to a novel, simple, and powerful contrastive MI estimator named as FLO. Theoretically, we show that the FLO estimator is tight, and it provably converges under stochastic gradient descent. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.

【37】 Molecular structure prediction based on graph convolutional networks 标题：基于图卷积网络的分子结构预测

作者：Xiaohui Lin,Yongquan Jiang,Yan Yang 机构：School of Computing and Artificial Intelligence, Southwest Jiaotong University, Institute of Artificial Intelligence, Southwest Jiaotong University 备注：11 pages, 5 figures, 4 tables 链接：https://arxiv.org/abs/2107.01035 摘要：由于分子结构在许多领域有着重要的应用，用实验手段或传统的密度泛函理论进行计算往往很费时。鉴于此，提出了一种新的基于图卷积神经网络（MSGCN）的模型结构，通过预测两个原子之间的距离来确定分子结构。为了验证MSGCN模型的有效性，将MSGCN模型与RDKit中计算分子三维构象的方法进行了比较，结果优于MSGCN模型。另外，利用MSGCN模型预测的距离和QM9数据集计算的距离对分子性质进行了预测，验证了MSGCN模型预测距离的有效性。摘要：Due to the important application of molecular structure in many fields, calculation by experimental means or traditional density functional theory is often time consuming. In view of this, a new Model Structure based on Graph Convolutional Neural network (MSGCN) is proposed, which can determine the molecular structure by predicting the distance between two atoms. In order to verify the effect of MSGCN model, the model is compared with the method of calculating molecular three-dimensional conformation in RDKit, and the result is better than it. In addition, the distance predicted by the MSGCN model and the distance calculated by the QM9 dataset were used to predict the molecular properties, thus proving the effectiveness of the distance predicted by the MSGCN model.

【38】 MegazordNet: combining statistical and machine learning standpoints for time series forecasting 标题：MegazordNet：结合统计和机器学习观点进行时间序列预测

作者：Angelo Garangau Menezes,Saulo Martiello Mastelini 机构：Instituto de Ciˆencias Matem´aticas e de Computac¸˜ao – Universidade de S˜ao Paulo, Av. Trabalhador S˜ao Carlense, – ,-, S˜ao Carlos – SP, Brasil. 链接：https://arxiv.org/abs/2107.01017 摘要：由于金融时间序列的混沌特性，预测金融时间序列是一项困难的任务。统计方法在预测市场走向、股票单一价格等具体问题上取得了良好的效果；然而，随着近年来深度学习和大数据技术的发展，金融时间序列预测的新方法应运而生。此外，最近的文献表明，与单一解相比，采用统计和机器学习相结合的方法可以提高预测的准确性。考虑到上述方面，在这项工作中，我们提出了MegazordNet，这是一个探索金融序列内统计特征的框架，结合了一个用于时间序列预测的结构化深度学习模型。我们使用不同的指标评估了我们预测标准普尔500指数股票收盘价的方法，我们能够击败单一的统计和机器学习方法。摘要：Forecasting financial time series is considered to be a difficult task due to the chaotic feature of the series. Statistical approaches have shown solid results in some specific problems such as predicting market direction and single-price of stocks; however, with the recent advances in deep learning and big data techniques, new promising options have arises to tackle financial time series forecasting. Moreover, recent literature has shown that employing a combination of statistics and machine learning may improve accuracy in the forecasts in comparison to single solutions. Taking into consideration the mentioned aspects, in this work, we proposed the MegazordNet, a framework that explores statistical features within a financial series combined with a structured deep learning model for time series forecasting. We evaluated our approach predicting the closing price of stocks in the S&P 500 using different metrics, and we were able to beat single statistical and machine learning methods.

【39】 EMG-Based Feature Extraction and Classification for Prosthetic Hand Control 标题：基于肌电图的假手控制特征提取与分类

作者：Reza Bagherian Azhiri,Mohammad Esmaeili,Mehrdad Nourani 机构：. Department of Mechanical Engineering, University of Texas at Dallas, Richardson, Texas, USA, . Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, Texas, USA 链接：https://arxiv.org/abs/2107.00733 摘要：近年来，假手的实时控制得到了广泛的关注。特别是，肌电图信号的实时分析在获得可接受的准确度和执行延迟方面存在一些挑战。在本文中，我们通过在较短的信号长度内提高精度来解决其中的一些挑战。我们首先介绍了一组新的特征提取函数应用于小波分解的各个层次。然后，我们提出一种后处理方法来处理神经网络的输出。实验结果表明，该方法提高了肌电信号的实时分类精度，在800美元毫秒的信号长度下，分类精度可达95.5美元。与传统的多数投票和贝叶斯融合方法相比，本文提出的后处理方法具有更高的一致性。摘要：In recent years, real-time control of prosthetic hands has gained a great deal of attention. In particular, real-time analysis of Electromyography (EMG) signals has several challenges to achieve an acceptable accuracy and execution delay. In this paper, we address some of these challenges by improving the accuracy in a shorter signal length. We first introduce a set of new feature extraction functions applying on each level of wavelet decomposition. Then, we propose a postprocessing approach to process the neural network outputs. The experimental results illustrate that the proposed method enhances the accuracy of real-time classification of EMG signals up to $95.5\%$ for $800$ msec signal length. The proposed postprocessing method achieves higher consistency compared with conventional majority voting and Bayesian fusion methods.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-05，如有侵权请联系 cloudcommunity@tencent.com 删除

linux