cs.AI人工智能,共计75篇
【1】 What Makes for Hierarchical Vision Transformer? 标题:是什么造就了分层视觉转换器?
作者:Yuxin Fang,Xinggang Wang,Rui Wu,Jianwei Niu,Wenyu Liu 机构:School of EIC, Huazhong University of Science & Technology, Horizon Robotics 备注:Preprint. Work in progress 链接:https://arxiv.org/abs/2107.02174 摘要:近年来的研究表明,具有交错非重叠窗口内自我注意和移动窗口自我注意的层次视觉变换器能够在各种视觉识别任务中获得最先进的性能,对CNN的密集滑动窗口范式提出了挑战。大多数后续研究试图用其他类型的跨窗口交流来代替移位窗口操作,同时将自我注意作为窗口内信息聚合的事实标准。在这篇简短的预印本中,我们质疑自我注意是否是层次视觉转换器获得强大性能的唯一选择,以及是什么造就了层次视觉转换器?我们用简单的线性映射替换了Swin变换器和Shuffle变换器中的自注意层,并保持了其他部分不变。该结构的参数为25.4M,触发器为4.2G,最高精度为80.5%,而SwinTransformer的参数为28.3M,触发器为4.5G,最高精度为81.3%。我们还用其他替代自我注意的方法在每个非重叠窗口内进行上下文聚合实验,这些方法在相同的架构下都给出了相似的竞争结果。我们的研究表明,除了特定的聚合层或特定的跨窗口通信方式之外,Swin模型族(即交叉窗口内和跨窗口通信)的宏体系结构(macro architecture)可能对其强大的性能负有更大的责任,是CNN密集滑动窗口范例的真正挑战者。 摘要:Recent studies show that hierarchical Vision Transformer with interleaved non-overlapped intra window self-attention \& shifted window self-attention is able to achieve state-of-the-art performance in various visual recognition tasks and challenges CNN's dense sliding window paradigm. Most follow-up works try to replace shifted window operation with other kinds of cross window communication while treating self-attention as the de-facto standard for intra window information aggregation. In this short preprint, we question whether self-attention is the only choice for hierarchical Vision Transformer to attain strong performance, and what makes for hierarchical Vision Transformer? We replace self-attention layers in Swin Transformer and Shuffle Transformer with simple linear mapping and keep other components unchanged. The resulting architecture with 25.4M parameters and 4.2G FLOPs achieves 80.5\% Top-1 accuracy, compared to 81.3\% for Swin Transformer with 28.3M parameters and 4.5G FLOPs. We also experiment with other alternatives to self-attention for context aggregation inside each non-overlapped window, which all give similar competitive results under the same architecture. Our study reveals that the \textbf{macro architecture} of Swin model families (i.e., interleaved intra window \& cross window communications), other than specific aggregation layers or specific means of cross window communication, may be more responsible for its strong performance and is the real challenger to CNN's dense sliding window paradigm.
【2】 On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 标题:长尾目标检测与实例分割的模型标定研究
作者:Tai-Yu Pan,Cheng Zhang,Yandong Li,Hexiang Hu,Dong Xuan,Soravit Changpinyo,Boqing Gong,Wei-Lun Chao 机构:The Ohio State University, Google Research, University of Southern California 链接:https://arxiv.org/abs/2107.02170 摘要:传统的目标检测和实例分割模型在长尾背景下对频繁目标的检测存在很大的偏差。现有的方法主要在训练过程中解决这个问题,例如通过重新采样或重新加权。在本文中,我们调查了一个很大程度上被忽视的方法——置信度的后处理校准。我们提出了NorCal,长尾目标检测和实例分割的标准化校准,一个简单而直接的方法,通过训练样本大小重新加权每个类的预测分数。我们表明,分别处理背景类和规范化的分数超过每一类的建议是关键,以实现卓越的性能。在LVIS数据集上,NorCal不仅能有效地改进稀有类的基本模型,而且能有效地改进常见类和频繁类的基本模型。最后,我们进行了广泛的分析和消融研究,以提供对我们的方法的各种模型选择和机制的见解。 摘要:Vanilla models for object detection and instance segmentation suffer from the heavy bias toward detecting frequent objects in the long-tailed setting. Existing methods address this issue mostly during training, e.g., by re-sampling or re-weighting. In this paper, we investigate a largely overlooked approach -- post-processing calibration of confidence scores. We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size. We show that separately handling the background class and normalizing the scores over classes for each proposal are keys to achieving superior performance. On the LVIS dataset, NorCal can effectively improve nearly all the baseline models not only on rare classes but also on common and frequent classes. Finally, we conduct extensive analysis and ablation studies to offer insights into various modeling choices and mechanisms of our approach.
【3】 Do Different Tracking Tasks Require Different Appearance Models? 标题:不同的跟踪任务是否需要不同的外观模型?
作者:Zhongdao Wang,Hengshuang Zhao,Ya-Li Li,Shengjin Wang,Philip H. S. Torr,Luca Bertinetto 机构:Tsinghua University, University of Oxford, Philip H.S. Torr, FiveAI 链接:https://arxiv.org/abs/2107.02156 摘要:跟踪视频中感兴趣的物体是计算机视觉中最流行和应用最广泛的问题之一。然而,随着时间的推移,寒武纪大量的用例和基准测试将这个问题分散在许多不同的实验装置中。因此,文献也变得支离破碎,现在社区提出的新方法通常只适合一个特定的设置。为了了解这种专业化在多大程度上是必要的,在这项工作中,我们提出了UniTrack,一个统一的跟踪解决方案,在同一个框架内处理五个不同的任务。UniTrack由一个与任务无关的外观模型和多个“头”组成,前者可以在监督或自我监督的方式下学习,后者可以处理单个任务,不需要训练。我们展示了如何在这个框架内解决大多数跟踪任务,并且可以使用相同的外观模型来获得与所考虑的所有五个任务的专门方法相比具有竞争力的性能。该框架还允许我们分析用最新的自我监督方法获得的外观模型,从而显著地将它们的评估和比较扩展到更广泛的重要问题。代码位于https://github.com/Zhongdao/UniTrack. 摘要:Tracking objects of interest in a video is one of the most popular and widely applicable problems in computer vision. However, with the years, a Cambrian explosion of use cases and benchmarks has fragmented the problem in a multitude of different experimental setups. As a consequence, the literature has fragmented too, and now the novel approaches proposed by the community are usually specialised to fit only one specific setup. To understand to what extent this specialisation is actually necessary, in this work we present UniTrack, a unified tracking solution to address five different tasks within the same framework. UniTrack consists of a single and task-agnostic appearance model, which can be learned in a supervised or self-supervised fashion, and multiple "heads" to address individual tasks and that do not require training. We show how most tracking tasks can be solved within this framework, and that the same appearance model can be used to obtain performance that is competitive against specialised methods for all the five tasks considered. The framework also allows us to analyse appearance models obtained with the most recent self-supervised methods, thus significantly extending their evaluation and comparison to a larger variety of important problems. Code available at https://github.com/Zhongdao/UniTrack.
【4】 FaVIQ: FAct Verification from Information-seeking Questions 标题:FAVIQ:从信息寻求性问题中验证事实
作者:Jungsoo Park,Sewon Min,Jaewoo Kang,Luke Zettlemoyer,Hannaneh Hajishirzi 机构:Korea University, University of Washington, Allen Institute of AI 备注:12 pages, 3 figures; Data & Code available at this https URL 链接:https://arxiv.org/abs/2107.02153 摘要:尽管人们对开发通用的事实检验模型非常感兴趣,但要构建一个包含现实世界中可能发生的真实声明的大规模事实检验数据集仍然是一个挑战。现有的声明要么是由crowdworks编写的,从而引入了难以控制的微妙偏见,要么是由专业的事实核查人员手动验证的,导致它们的成本高昂且规模有限。在本文中,我们利用不知道如何回答的真实用户提出的信息寻求问题,构建了一个具有挑战性的、真实的、大规模的事实验证数据集FaVIQ。信息寻求问题中的模糊性使得能够自动构建真实和错误的声明,以反映用户产生的混淆(例如,电影拍摄与发行的年份)。我们的主张被证实是自然的,包含很少的词汇偏见,并需要一个完整的证据的理解进行验证。我们的实验表明,最先进的模型远远不能解决我们的新任务。此外,对我们的数据进行训练有助于专业的事实检查,比对最广泛使用的数据集或域内数据进行训练的模型的绝对性能高出17%。总之,我们的数据将作为一个具有挑战性的基准,自然语言的理解和支持未来的进展,在专业事实检查。 摘要:Despite significant interest in developing general purpose fact checking models, it is challenging to construct a large-scale fact verification dataset with realistic claims that would occur in the real world. Existing claims are either authored by crowdworkers, thereby introducing subtle biases that are difficult to control for, or manually verified by professional fact checkers, causing them to be expensive and limited in scale. In this paper, we construct a challenging, realistic, and large-scale fact verification dataset called FaVIQ, using information-seeking questions posed by real users who do not know how to answer. The ambiguity in information-seeking questions enables automatically constructing true and false claims that reflect confusions arisen from users (e.g., the year of the movie being filmed vs. being released). Our claims are verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification. Our experiments show that the state-of-the-art models are far from solving our new task. Moreover, training on our data helps in professional fact-checking, outperforming models trained on the most widely used dataset FEVER or in-domain data by up to 17% absolute. Altogether, our data will serve as a challenging benchmark for natural language understanding and support future progress in professional fact checking.
【5】 Feature Cross Search via Submodular Optimization 标题:基于子模块优化的特征交叉搜索
作者:Lin Chen,Hossein Esfandiari,Gang Fu,Vahab S. Mirrokni,Qian Yu 备注:Accepted to ESA 2021. Authors are ordered alphabetically 链接:https://arxiv.org/abs/2107.02139 摘要:本文将特征交叉搜索作为特征工程中的一个基本原语进行研究。特征交叉搜索的重要性,特别是对于线性模型的重要性已经被人们所认识了一段时间,有一些著名的教科书例子。在这个问题中,目标是选择一小部分特征,通过考虑它们的笛卡尔积,将它们组合起来形成一个新的特征(称为交叉特征),并找到特征交叉来学习一个\emph{精确的}模型。特别地,我们研究了在交叉特征列上训练的线性模型的曲线下归一化面积(AUC)的最大化问题。首先,我们证明了除非指数时间假设失败,否则不可能为这个问题提供一个$n^{1/\log\log n}$-近似算法。这个结果也排除了在多项式时间内解决这个问题的可能性,除非$\mathsf{P}=\mathsf{NP}$。在积极的方面,通过假设\ naive \假设,我们证明了这个问题存在一个简单的贪心$(1-1/e)$-近似算法。通过将AUC与两个概率测度的交换子的总变差联系起来,证明了交换子的总变差是单调的、子模的。为了证明这一点,我们将此函数的子模性与相应核矩阵的半正定性联系起来。然后,利用Bochner定理证明了正半定性,证明了它的逆Fourier变换处处是非负的。我们的技术和结构结果可能是独立的兴趣。 摘要:In this paper, we study feature cross search as a fundamental primitive in feature engineering. The importance of feature cross search especially for the linear model has been known for a while, with well-known textbook examples. In this problem, the goal is to select a small subset of features, combine them to form a new feature (called the crossed feature) by considering their Cartesian product, and find feature crosses to learn an \emph{accurate} model. In particular, we study the problem of maximizing a normalized Area Under the Curve (AUC) of the linear model trained on the crossed feature column. First, we show that it is not possible to provide an $n^{1/\log\log n}$-approximation algorithm for this problem unless the exponential time hypothesis fails. This result also rules out the possibility of solving this problem in polynomial time unless $\mathsf{P}=\mathsf{NP}$. On the positive side, by assuming the \naive\ assumption, we show that there exists a simple greedy $(1-1/e)$-approximation algorithm for this problem. This result is established by relating the AUC to the total variation of the commutator of two probability measures and showing that the total variation of the commutator is monotone and submodular. To show this, we relate the submodularity of this function to the positive semi-definiteness of a corresponding kernel matrix. Then, we use Bochner's theorem to prove the positive semi-definiteness by showing that its inverse Fourier transform is non-negative everywhere. Our techniques and structural results might be of independent interest.
【6】 Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints 标题:具有计算约束的开放领域问答训练自适应计算
作者:Yuxiang Wu,Pasquale Minervini,Pontus Stenetorp,Sebastian Riedel 机构:University College London 备注:7 pages, 1 figure, to be published in ACL-IJCNLP 2021 链接:https://arxiv.org/abs/2107.02102 摘要:自适应计算(AC)在提高开放域问答(ODQA)系统的效率方面是有效的。然而,当前的交流方法需要调整所有的模型参数,而训练最先进的ODQA模型需要大量的计算资源,这对于大多数研究人员来说是不可能的。我们提出了一种自适应通道编码器,这种AC方法可以应用于现有的ODQA模型,并且可以在单个GPU上有效地训练。它使基本ODQA模型的参数保持不变,但它用经过训练的AC策略覆盖编码器的默认逐层计算,以优化模型的计算效率。我们的实验结果表明,我们的方法改进了两个数据集上的最新模型,并且由于ODQA模型的基础更强,比以前的AC方法更精确。所有源代码和数据集都可以在https://github.com/uclnlp/APE. 摘要:Adaptive Computation (AC) has been shown to be effective in improving the efficiency of Open-Domain Question Answering (ODQA) systems. However, current AC approaches require tuning of all model parameters, and training state-of-the-art ODQA models requires significant computational resources that may not be available for most researchers. We propose Adaptive Passage Encoder, an AC method that can be applied to an existing ODQA model and can be trained efficiently on a single GPU. It keeps the parameters of the base ODQA model fixed, but it overrides the default layer-by-layer computation of the encoder with an AC policy that is trained to optimise the computational efficiency of the model. Our experimental results show that our method improves upon a state-of-the-art model on two datasets, and is also more accurate than previous AC methods due to the stronger base ODQA model. All source code and datasets are available at https://github.com/uclnlp/APE.
【7】 One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget 标题:一周期修剪:在紧张的训练预算下修剪ConvNet
作者:Nathan Hubens,Matei Mancas,Bernard Gosselin,Marius Preda,Titus Zaharia 机构:ISIA Lab (UMONS), Artemis (IP Paris) 备注:Accepted at Sparsity in Neural Networks (SNN 2021) 链接:https://arxiv.org/abs/2107.02086 摘要:在神经网络中引入稀疏性是一种在保持其性能不变的同时降低其复杂度的有效方法。大多数情况下,稀疏性是通过三级流水线引入的:1)训练模型收敛,2)根据一定的准则对模型进行剪枝,3)对剪枝后的模型进行微调以恢复性能。最后两个步骤通常是迭代执行的,这会导致合理的结果,但也会导致耗时和复杂的过程。在我们的工作中,我们建议去掉管道的第一步,并在一个剪枝训练周期中结合其他两个步骤,让模型在剪枝的同时共同学习最优权重。为此,我们引入了一个新的剪枝计划,称为单周期剪枝,它从训练开始,一直到训练结束。采用这样的时间表不仅可以使修剪模型的性能更好,而且还可以大大减少修剪模型所需的训练预算。实验在各种架构(VGG-16和ResNet-18)和数据集(CIFAR-10、CIFAR-100和Caltech-101)上进行,并针对相对较高的稀疏性值(去除80%、90%、95%的权重)。我们的结果显示,在固定的训练预算下,单周期剪枝始终优于常用的剪枝计划,例如单次剪枝、迭代剪枝和自动渐进剪枝。 摘要:Introducing sparsity in a neural network has been an efficient way to reduce its complexity while keeping its performance almost intact. Most of the time, sparsity is introduced using a three-stage pipeline: 1) train the model to convergence, 2) prune the model according to some criterion, 3) fine-tune the pruned model to recover performance. The last two steps are often performed iteratively, leading to reasonable results but also to a time-consuming and complex process. In our work, we propose to get rid of the first step of the pipeline and to combine the two other steps in a single pruning-training cycle, allowing the model to jointly learn for the optimal weights while being pruned. We do this by introducing a novel pruning schedule, named One-Cycle Pruning, which starts pruning from the beginning of the training, and until its very end. Adopting such a schedule not only leads to better performing pruned models but also drastically reduces the training budget required to prune a model. Experiments are conducted on a variety of architectures (VGG-16 and ResNet-18) and datasets (CIFAR-10, CIFAR-100 and Caltech-101), and for relatively high sparsity values (80%, 90%, 95% of weights removed). Our results show that One-Cycle Pruning consistently outperforms commonly used pruning schedules such as One-Shot Pruning, Iterative Pruning and Automated Gradual Pruning, on a fixed training budget.
【8】 Modeling Interactions of Multimodal Road Users in Shared Spaces 标题:共享空间中多模式道路使用者的交互建模
作者:Fatema T. Johora,Jörg P. Müller 备注:None 链接:https://arxiv.org/abs/2107.02083 摘要:在共享空间中,机动化道路使用者和非机动化道路使用者以同等优先权共享同一空间。它们的移动不受交通规则的约束,因此它们更频繁地进行交互以协商共享空间的优先级。为了评估共享空间的安全性和效率,再现此类交通场所中的交通行为非常重要。在本文中,我们考虑和组合不同层次的互动,行人和汽车在共享空间环境。我们提出的模型由三层组成:一层规划道路使用者的轨迹;基于力的建模层,再现自由流动运动和简单的交互作用;以及一个博弈论决策层,用于处理道路使用者需要对不同备选方案做出决策的复杂情况。我们通过模拟各种场景来验证我们的模型,这些场景包括行人和汽车之间的各种交互以及汽车与汽车之间的交互。结果表明,模拟行为与观测行为吻合较好。 摘要:In shared spaces, motorized and non-motorized road users share the same space with equal priority. Their movements are not regulated by traffic rules, hence they interact more frequently to negotiate priority over the shared space. To estimate the safeness and efficiency of shared spaces, reproducing the traffic behavior in such traffic places is important. In this paper, we consider and combine different levels of interaction between pedestrians and cars in shared space environments. Our proposed model consists of three layers: a layer to plan trajectories of road users; a force-based modeling layer to reproduce free flow movement and simple interactions; and a game-theoretic decision layer to handle complex situations where road users need to make a decision over different alternatives. We validate our model by simulating scenarios involving various interactions between pedestrians and cars and also car-to-car interaction. The results indicate that simulated behaviors match observed behaviors well.
【9】 MixStyle Neural Networks for Domain Generalization and Adaptation 标题:用于域泛化和自适应的混合样式神经网络
作者:Kaiyang Zhou,Yongxin Yang,Yu Qiao,Tao Xiang 备注:Extension of this https URL Code available at this https URL 链接:https://arxiv.org/abs/2107.02053 摘要:卷积神经网络(CNNs)在域移动情况下泛化性能较差。一种改进领域泛化的方法是从多个相关领域收集不同的源数据,这样CNN模型就可以学习更多的领域不变量,从而获得可泛化的表示。在这项工作中,我们用MixStyle解决域综合问题,MixStyle是一个即插即用的、无参数的模块,只需插入到浅层CNN层,不需要修改训练目标。具体地说,MixStyle可能在实例之间混合特征统计信息。这一想法的灵感来自这样一个观察:视觉领域通常可以由图像样式来表征,而图像样式又被封装在浅层CNN中的实例级特征统计中。因此,插入MixStyle模块实际上合成了新的域,尽管是以一种隐含的方式。MixStyle不仅简单灵活,而且用途广泛——它可以用于存在未标记图像的问题,例如半监督域泛化和无监督域自适应,并通过一个简单的扩展在标记和伪标记实例之间混合特征统计信息。我们通过大量的实验证明,MixStyle可以显著提高目标识别、实例检索和强化学习等多种任务的非分布泛化性能。 摘要:Convolutional neural networks (CNNs) often have poor generalization performance under domain shift. One way to improve domain generalization is to collect diverse source data from multiple relevant domains so that a CNN model is allowed to learn more domain-invariant, and hence generalizable representations. In this work, we address domain generalization with MixStyle, a plug-and-play, parameter-free module that is simply inserted to shallow CNN layers and requires no modification to training objectives. Specifically, MixStyle probabilistically mixes feature statistics between instances. This idea is inspired by the observation that visual domains can often be characterized by image styles which are in turn encapsulated within instance-level feature statistics in shallow CNN layers. Therefore, inserting MixStyle modules in effect synthesizes novel domains albeit in an implicit way. MixStyle is not only simple and flexible, but also versatile -- it can be used for problems whereby unlabeled images are available, such as semi-supervised domain generalization and unsupervised domain adaptation, with a simple extension to mix feature statistics between labeled and pseudo-labeled instances. We demonstrate through extensive experiments that MixStyle can significantly boost the out-of-distribution generalization performance across a wide range of tasks including object recognition, instance retrieval, and reinforcement learning.
【10】 Dealing with Adversarial Player Strategies in the Neural Network Game iNNk through Ensemble Learning 标题:用集成学习处理神经网络游戏墨迹中的对抗性玩家策略
作者:Mathias Löwe,Jennifer Villareale,Evan Freed,Aleksanteri Sladek,Jichen Zhu,Sebastian Risi 机构:IT University of Copenhagen, Copenhagen, Denmark, Drexel University, Philadelphia, Pennsylvania, USA 备注:10 pages, 4 Figures. Accepted for publishing at the 16th International Conference on the Foundations of Digital Games (FDG) 2021 链接:https://arxiv.org/abs/2107.02052 摘要:在游戏中应用神经网络(NN)方法可以产生各种新的和令人兴奋的游戏动力学,这是以前不可能的。然而,它们也带来了新的挑战,如缺乏大的、干净的数据集、不同的玩家技能水平以及不断变化的游戏策略。在本文中,我们主要研究了游戏iNNk中的对手方策略,即玩家试图通过图画来传递秘密密码,目的是不被NN破译。有些策略利用神经网络的弱点,不断欺骗它做出错误的分类,导致不平衡的游戏性。我们提出了一种结合迁移学习和集成方法的方法来获得对这些策略的数据有效的适应。尽管只在有限的一组对抗性示例上进行了训练,但这种组合在所有对抗性玩家策略中都显著优于基线神经网络。我们期望本文所发展的方法对于快速发展的基于神经网络的游戏领域是有用的,这将需要新的方法来处理不可预见的玩家创造力。 摘要:Applying neural network (NN) methods in games can lead to various new and exciting game dynamics not previously possible. However, they also lead to new challenges such as the lack of large, clean datasets, varying player skill levels, and changing gameplay strategies. In this paper, we focus on the adversarial player strategy aspect in the game iNNk, in which players try to communicate secret code words through drawings with the goal of not being deciphered by a NN. Some strategies exploit weaknesses in the NN that consistently trick it into making incorrect classifications, leading to unbalanced gameplay. We present a method that combines transfer learning and ensemble methods to obtain a data-efficient adaptation to these strategies. This combination significantly outperforms the baseline NN across all adversarial player strategies despite only being trained on a limited set of adversarial examples. We expect the methods developed in this paper to be useful for the rapidly growing field of NN-based games, which will require new approaches to deal with unforeseen player creativity.
【11】 A Knowledge-based Approach for Answering Complex Questions in Persian 标题:一种基于知识的波斯语复杂问题解答方法
作者:Romina Etezadi,Mehrnoush Shamsfard 机构:aShahidBeheshtiUniversity 备注:9 pages, 5 figures 链接:https://arxiv.org/abs/2107.02040 摘要:开放领域问答系统的研究由来已久。在这一领域的挑战是回答复杂的问题(CQA)需要复杂的推理方法和大量的知识。在低资源语言(如波斯语)中,用于开放域复杂问题的数据集并不多,而且语言处理工具包也不是很精确。在本文中,我们提出了一个基于知识的方法来回答波斯复杂的问题,使用法拉斯基;波斯知识图,利用PeCoQ;新创建的复杂波斯问题数据集。在这项工作中,我们处理多约束和多跳的问题,建立他们的一套可能的相应的逻辑形式。然后用多语种BERT选择最能从句法和语义上描述输入复杂问题的逻辑形式。问题的答案是建立在答案的逻辑形式,提取知识图。实验表明,该方法在波斯CQA中的性能优于其他方法。 摘要:Research on open-domain question answering (QA) has a long tradition. A challenge in this domain is answering complex questions (CQA) that require complex inference methods and large amounts of knowledge. In low resource languages, such as Persian, there are not many datasets for open-domain complex questions and also the language processing toolkits are not very accurate. In this paper, we propose a knowledge-based approach for answering Persian complex questions using Farsbase; the Persian knowledge graph, exploiting PeCoQ; the newly created complex Persian question dataset. In this work, we handle multi-constraint and multi-hop questions by building their set of possible corresponding logical forms. Then Multilingual-BERT is used to select the logical form that best describes the input complex question syntactically and semantically. The answer to the question is built from the answer to the logical form, extracted from the knowledge graph. Experiments show that our approach outperforms other approaches in Persian CQA.
【12】 Power Law Graph Transformer for Machine Translation and Representation Learning 标题:用于机器翻译和表示学习的幂律图转换器
作者:Burc Gokden 机构:Fromthesky Research Labs LLC, Oregon, USA 备注:55 pages, 39 figures 链接:https://arxiv.org/abs/2107.02039 摘要:我们提出了幂律图Transformer,Transformer模型有明确的演绎和归纳任务的预测和表示学习。演绎任务根据可学习的幂律分布参数学习数据集级(全局)和实例级(局部)图结构。归纳任务使用演绎任务输出输出预测概率,类似于一个转换模型。我们用来自TED谈话记录的土耳其语-英语和葡萄牙语-英语数据集训练我们的模型,用于机器翻译,并将模型的性能和特征与在同一实验装置上训练成比例点积注意力的Transformer模型进行比较。我们的模型在土耳其语-英语和葡萄牙语-英语翻译任务中的BLEU得分分别为17.79美元和28.33美元。我们还展示了如何利用量化集和N维流形表示之间的对偶关系,通过连续应用端到端的线性和非线性变换,在局部和全局演绎归纳输出之间进行变换。 摘要:We present the Power Law Graph Transformer, a transformer model with well defined deductive and inductive tasks for prediction and representation learning. The deductive task learns the dataset level (global) and instance level (local) graph structures in terms of learnable power law distribution parameters. The inductive task outputs the prediction probabilities using the deductive task output, similar to a transductive model. We trained our model with Turkish-English and Portuguese-English datasets from TED talk transcripts for machine translation and compared the model performance and characteristics to a transformer model with scaled dot product attention trained on the same experimental setup. We report BLEU scores of $17.79$ and $28.33$ on the Turkish-English and Portuguese-English translation tasks with our model, respectively. We also show how a duality between a quantization set and N-dimensional manifold representation can be leveraged to transform between local and global deductive-inductive outputs using successive application of linear and non-linear transformations end-to-end.
【13】 Quality Metrics for Transparent Machine Learning With and Without Humans In the Loop Are Not Correlated 标题:循环中有人和没有人的透明机器学习的质量度量是不相关的
作者:Felix Biessmann,Dionysius Refiano 机构: which inspired many researchers to investigate ways toEqual contribution 1Beuth University of Applied Sciences 备注:Proceedings of the ICML Workshop on Theoretical Foundations, Criticism, and Application Trends of Explainable AI held in conjunction with the 38th International Conference on Machine Learning (ICML), a non-peer-reviewed longer version was previously published as preprint here arXiv:1912.05011 链接:https://arxiv.org/abs/2107.02033 摘要:领域可解释人工智能(XAI)带来了一系列方法,使机器学习(ML)预测更具解释性。但是透明的ML方法提供的解释对人类有多有用仍然很难评估。在这里,我们调查的质量,可解释的计算机视觉算法使用的技术,从心理物理学。在众包注释任务中,我们研究了不同解释方法对注释准确性和任务时间的影响。我们将这些质量度量与经典的XAI(自动化质量度量)进行比较。我们的结果表明,心理物理实验允许在机器学习的透明度稳健的质量评估。有趣的是,在循环中没有人类参与的情况下计算出的质量度量并不能提供一致的可解释性方法排名,也不能代表解释对人类有多有用。这些发现凸显了经典心理物理学方法在现代机器学习应用中的潜力。我们希望我们的研究结果能为评估其自然栖息地的可解释性提供有说服力的论据,如果我们的目标是获得一个真实的可解释性评估。 摘要:The field explainable artificial intelligence (XAI) has brought about an arsenal of methods to render Machine Learning (ML) predictions more interpretable. But how useful explanations provided by transparent ML methods are for humans remains difficult to assess. Here we investigate the quality of interpretable computer vision algorithms using techniques from psychophysics. In crowdsourced annotation tasks we study the impact of different interpretability approaches on annotation accuracy and task time. We compare these quality metrics with classical XAI, automated quality metrics. Our results demonstrate that psychophysical experiments allow for robust quality assessment of transparency in machine learning. Interestingly the quality metrics computed without humans in the loop did not provide a consistent ranking of interpretability methods nor were they representative for how useful an explanation was for humans. These findings highlight the potential of methods from classical psychophysics for modern machine learning applications. We hope that our results provide convincing arguments for evaluating interpretability in its natural habitat, human-ML interaction, if the goal is to obtain an authentic assessment of interpretability.
【14】 Statistical Analysis of Perspective Scores on Hate Speech Detection 标题:仇恨言语检测视角得分的统计分析
作者:Hadi Mansourifar,Dana Alsagheer,Weidong Shi,Lan Ni,Yan Huang 机构:Computer Science Department, University of Houston, Valenti School of Communication, University of Houston 备注:Accepted paper in International IJCAI Workshop on Artificial Intelligence for Social Good 2021 链接:https://arxiv.org/abs/2107.02024 摘要:近年来,由于社交媒体中攻击性语言的指数级增长,仇恨言语检测成为一个热门话题。实验证明,目前最先进的仇恨语音分类器只有在与训练数据具有相同特征分布的数据上进行测试时才是有效的。因此,模型体系结构在改进当前结果方面起着第二个作用。在这样一个多样化的数据分布中,依赖于低层次特征是由于数据中的自然偏差而导致不足的主要原因。这就是为什么我们需要使用高级特性来避免有偏见的判断。在本文中,我们统计分析了视角得分及其对仇恨言语检测的影响。我们发现,不同的仇恨言语数据集在提取他们的观点得分时是非常相似的。最后,我们证明了对仇恨语音数据集的透视分数进行过采样可以显著提高在其他仇恨语音数据集上的泛化性能。 摘要:Hate speech detection has become a hot topic in recent years due to the exponential growth of offensive language in social media. It has proven that, state-of-the-art hate speech classifiers are efficient only when tested on the data with the same feature distribution as training data. As a consequence, model architecture plays the second role to improve the current results. In such a diverse data distribution relying on low level features is the main cause of deficiency due to natural bias in data. That's why we need to use high level features to avoid a biased judgement. In this paper, we statistically analyze the Perspective Scores and their impact on hate speech detection. We show that, different hate speech datasets are very similar when it comes to extract their Perspective Scores. Eventually, we prove that, over-sampling the Perspective Scores of a hate speech dataset can significantly improve the generalization performance when it comes to be tested on other hate speech datasets.
【15】 Fast and Scalable Optimal Transport for Brain Tractograms 标题:快速可扩展的脑电地形图优化传输
作者:Jean Feydy,Pierre Roussillon,Alain Trouvé,Pietro Gori 机构: CMLA, ENS Paris-Saclay, France, DMA, ´Ecole Normale Sup´erieure, Paris, France, LTCI, T´el´ecom ParisTech, Institut Mines T´el´ecom, Paris, France 备注:None 链接:https://arxiv.org/abs/2107.02010 摘要:我们提出了一个新的多尺度算法来解决正则化最优传输问题的GPU上,线性内存占用。该方法利用Sinkhorn发散函数的凸性、光滑性和正定损失函数,可以在几分钟内计算出数百万个点之间的运输计划。我们在模拟成纤维束或轨迹密度图的脑束图上显示了这种方法的有效性。我们使用得到的平滑分配来执行基于atlas的纤维束图分割的标签转移。我们的方法的参数——模糊和到达——是有意义的,它定义了两个光纤相互比较的最小和最大距离。可根据解剖学知识进行设置。此外,我们还提出了一个以Wasserstein重心估计轨道密度图总体的概率图集。我们的CUDA实现被赋予了一个用户友好的PyTorch接口,在PyPi存储库(pip-install-geomloss)和www.kernel-operations.io/geomloss. 摘要:We present a new multiscale algorithm for solving regularized Optimal Transport problems on the GPU, with a linear memory footprint. Relying on Sinkhorn divergences which are convex, smooth and positive definite loss functions, this method enables the computation of transport plans between millions of points in a matter of minutes. We show the effectiveness of this approach on brain tractograms modeled either as bundles of fibers or as track density maps. We use the resulting smooth assignments to perform label transfer for atlas-based segmentation of fiber tractograms. The parameters -- blur and reach -- of our method are meaningful, defining the minimum and maximum distance at which two fibers are compared with each other. They can be set according to anatomical knowledge. Furthermore, we also propose to estimate a probabilistic atlas of a population of track density maps as a Wasserstein barycenter. Our CUDA implementation is endowed with a user-friendly PyTorch interface, freely available on the PyPi repository (pip install geomloss) and at www.kernel-operations.io/geomloss.
【16】 Explainability via Interactivity? Supporting Nonexperts' Sensemaking of Pretrained CNN by Interacting with Their Daily Surroundings 标题:通过互动实现可理解性?通过与日常环境的互动来支持非专家对预先训练的CNN的耸人听闻
作者:Chao Wang,Pengcheng An 机构:David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada 备注:8 pages 链接:https://arxiv.org/abs/2107.01996 摘要:目前对可解释人工智能(XAI)的研究主要针对专家用户(数据科学家或人工智能开发人员)。然而,越来越重要的是让非专家更容易理解人工智能,他们被期望利用人工智能技术,但对人工智能的知识有限。我们提出了一个移动应用程序来支持非专家交互式地理解卷积神经网络(CNN);它允许用户通过拍摄周围物体的照片来玩预先训练好的CNN。我们使用最新的XAI技术(类激活图)直观地可视化模型的决策(导致特定结果的最重要的图像区域)。部署在大学课程中,这个好玩的学习工具被发现支持设计学生获得生动的理解关于能力和局限性,在现实世界环境中预先训练的CNS。学生的游戏探索的具体例子报告,以表征他们的感官过程,反映不同深度的思想。 摘要:Current research on Explainable AI (XAI) heavily targets on expert users (data scientists or AI developers). However, increasing importance has been argued for making AI more understandable to nonexperts, who are expected to leverage AI techniques, but have limited knowledge about AI. We present a mobile application to support nonexperts to interactively make sense of Convolutional Neural Networks (CNN); it allows users to play with a pretrained CNN by taking pictures of their surrounding objects. We use an up-to-date XAI technique (Class Activation Map) to intuitively visualize the model's decision (the most important image regions that lead to a certain result). Deployed in a university course, this playful learning tool was found to support design students to gain vivid understandings about the capabilities and limitations of pretrained CNNs in real-world environments. Concrete examples of students' playful explorations are reported to characterize their sensemaking processes reflecting different depths of thought.
【17】 Imputation-Free Learning from Incomplete Observations 标题:归罪--从不完全观察中自由学习
作者:Qitong Gao,Dong Wang,Joshua D. Amason,Siyang Yuan,Chenyang Tao,Ricardo Henao,Majda Hadziahmetovic,Lawrence Carin,Miroslav Pajic 机构:edu†Department of Electrical and Computer Engineering, Duke University, ‡Department of Ophthalmology 链接:https://arxiv.org/abs/2107.01983 摘要:尽管最近的工作已经开发了一些方法,可以生成数据集中缺失条目的估计(或插补),以便于下游分析,但大多数依赖于可能与实际应用不一致的假设,并且可能在后续任务中表现不佳。如果数据丢失率高或人口少,这一点尤其正确。更重要的是,插补误差可能会传播到随后的预测步骤中,导致用于训练预测模型的梯度有偏差。因此,在这项工作中,我们引入重要性引导随机梯度下降(IGSGD)方法来训练多层感知器(mlp)和长-短期记忆(LSTMs),以直接从包含缺失值的输入中进行推断,而无需插补。具体来说,我们采用强化学习(RL)来调整梯度来训练模型通过反向传播。这不仅减少了偏差,而且允许模型利用缺失模式背后的潜在信息。我们在真实世界的时间序列(即,MIMIC-III)、从眼科诊所获得的表格数据和标准数据集(即,MNIST)上测试了所提出的方法,其中我们的无插补预测优于使用最先进插补方法的传统两步插补预测。 摘要:Although recent works have developed methods that can generate estimations (or imputations) of the missing entries in a dataset to facilitate downstream analysis, most depend on assumptions that may not align with real-world applications and could suffer from poor performance in subsequent tasks. This is particularly true if the data have large missingness rates or a small population. More importantly, the imputation error could be propagated into the prediction step that follows, causing the gradients used to train the prediction models to be biased. Consequently, in this work, we introduce the importance guided stochastic gradient descent (IGSGD) method to train multilayer perceptrons (MLPs) and long short-term memories (LSTMs) to directly perform inference from inputs containing missing values without imputation. Specifically, we employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. This not only reduces bias but allows the model to exploit the underlying information behind missingness patterns. We test the proposed approach on real-world time-series (i.e., MIMIC-III), tabular data obtained from an eye clinic, and a standard dataset (i.e., MNIST), where our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
【18】 The MineRL BASALT Competition on Learning from Human Feedback 标题:从人类反馈中学习的MineRL玄武岩大赛
作者:Rohin Shah,Cody Wild,Steven H. Wang,Neel Alex,Brandon Houghton,William Guss,Sharada Mohanty,Anssi Kanervisto,Stephanie Milani,Nicholay Topin,Pieter Abbeel,Stuart Russell,Anca Dragan 机构: UC Berkeley‡OpenAI§Carnegie Mellon University¶AIcrowd‖University of Eastern Finland 1arXiv 备注:NeurIPS 2021 Competition Track 链接:https://arxiv.org/abs/2107.01969 摘要:在过去的十年里,人们对深度学习研究的兴趣显著增加,许多公开的成功证明了它的潜力。因此,这些系统现在正被纳入商业产品。随之而来的是另一个挑战:我们如何构建人工智能系统来解决那些没有清晰、定义良好的规范的任务?虽然提出了多种解决方案,但在本次比赛中,我们特别关注一个:从人类反馈中学习。我们不是使用预定义的奖励函数或使用带有预定义类别集的标记数据集来训练人工智能系统,而是使用来自某种形式的人类反馈的学习信号来训练人工智能系统,随着对任务的理解的改变或人工智能系统能力的提高,学习信号可以随着时间的推移而演变。Minel玄武岩竞赛旨在推动这一重要技术类别的前沿研究。我们在Minecraft中设计了一套四个任务,我们希望很难写下硬编码的奖励函数。这些任务是由一段自然语言定义的:例如,“创建一个瀑布并拍下它的风景图片”,还有其他的澄清细节。参与者必须为每个任务训练一个单独的代理,使用他们想要的任何方法。然后由已阅读任务描述的人员对代理进行评估。为了帮助参与者入门,我们提供了四项任务中每个任务的人类演示数据集,以及利用这些演示的模仿学习基线。我们的希望是,这场比赛将提高我们的能力,建立人工智能系统,做他们的设计师打算他们做的,即使意图不能很容易地形式化。除了让人工智能解决更多的任务外,这还可以使人工智能系统得到更有效的监管,并在价值取向问题上取得进展。 摘要:The last decade has seen a significant increase of interest in deep learning research, with many public successes that have demonstrated its potential. As such, these systems are now being incorporated into commercial products. With this comes an additional challenge: how can we build AI systems that solve tasks where there is not a crisp, well-defined specification? While multiple solutions have been proposed, in this competition we focus on one in particular: learning from human feedback. Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve. The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. These tasks are defined by a paragraph of natural language: for example, "create a waterfall and take a scenic picture of it", with additional clarifying details. Participants must train a separate agent for each task, using any method they want. Agents are then evaluated by humans who have read the task description. To help participants get started, we provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline that leverages these demonstrations. Our hope is that this competition will improve our ability to build AI systems that do what their designers intend them to do, even when the intent cannot be easily formalized. Besides allowing AI to solve more tasks, this can also enable more effective regulation of AI systems, as well as making progress on the value alignment problem.
【19】 Android Malware Category and Family Detection and Identification using Machine Learning 标题:基于机器学习的Android恶意软件分类和家庭检测与识别
作者:Ahmed Hashem El Fiky,Ayman El Shenawy,Mohamed Ashraf Madkour 机构: Systems and Computer Engineering Dept., Al-Azhar University, Cairo, Egypt., Software Engineering and Information Technology, Egyptian Chinese 链接:https://arxiv.org/abs/2107.01927 摘要:Android恶意软件是互联网上最危险的威胁之一,而且在过去几年中一直呈上升趋势。尽管在从无害的android应用程序中检测和分类android恶意软件方面做出了重大努力,但仍有很长的路要走。因此,需要对最常见的Android恶意软件类别和系列所显示的行为提供基本的了解。每个Android恶意软件家族和类别都有不同的目标。因此,它已经影响到每个公司领域,包括医疗保健、银行、交通、政府和电子商务。本文提出了两种用于Android恶意软件动态分析的机器学习方法:一种用于检测和识别Android恶意软件类别,另一种用于检测和识别Android恶意软件家族,这是通过在动态层上分析具有14个显著恶意软件类别和180个显著恶意软件家族的CCCS-CIC和Mal2020数据集的大量恶意软件数据集实现的。我们的方法在Android恶意软件分类检测中达到96%以上的准确率,在Android恶意软件家族检测中达到99%以上的准确率。我们的方法提供了一种对Android恶意软件进行高精度动态分析的方法,同时也缩短了分析智能手机恶意软件所需的时间。 摘要:Android malware is one of the most dangerous threats on the internet, and it's been on the rise for several years. Despite significant efforts in detecting and classifying android malware from innocuous android applications, there is still a long way to go. As a result, there is a need to provide a basic understanding of the behavior displayed by the most common Android malware categories and families. Each Android malware family and category has a distinct objective. As a result, it has impacted every corporate area, including healthcare, banking, transportation, government, and e-commerce. In this paper, we presented two machine-learning approaches for Dynamic Analysis of Android Malware: one for detecting and identifying Android Malware Categories and the other for detecting and identifying Android Malware Families, which was accomplished by analyzing a massive malware dataset with 14 prominent malware categories and 180 prominent malware families of CCCS-CIC-AndMal2020 dataset on Dynamic Layers. Our approach achieves in Android Malware Category detection more than 96 % accurate and achieves in Android Malware Family detection more than 99% accurate. Our approach provides a method for high-accuracy Dynamic Analysis of Android Malware while also shortening the time required to analyze smartphone malware.
【20】 Logic Locking at the Frontiers of Machine Learning: A Survey on Developments and Opportunities 标题:机器学习前沿的逻辑锁定:发展与机遇综述
作者:Dominik Sisejkovic,Lennart M. Reimann,Elmira Moussavi,Farhad Merchant,Rainer Leupers 机构:Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, Germany 备注:6 pages, 3 figures, accepted at VLSI-SOC 2021 链接:https://arxiv.org/abs/2107.01915 摘要:在过去的十年里,逻辑锁的设计和评估取得了很大的进展;在整个电子供应链中保护集成电路完整性的首要技术。然而,机器学习的广泛应用为逻辑锁方案的评估提供了新的途径。本文综述了现代机器学习模型前沿的逻辑锁定攻击和对策的最新进展。在此基础上,提出了下一代逻辑锁的设计建议。 摘要:In the past decade, a lot of progress has been made in the design and evaluation of logic locking; a premier technique to safeguard the integrity of integrated circuits throughout the electronics supply chain. However, the widespread proliferation of machine learning has recently introduced a new pathway to evaluating logic locking schemes. This paper summarizes the recent developments in logic locking attacks and countermeasures at the frontiers of contemporary machine learning models. Based on the presented work, the key takeaways, opportunities, and challenges are highlighted to offer recommendations for the design of next-generation logic locking.
【21】 Creating Unbiased Public Benchmark Datasets with Data Leakage Prevention for Predictive Process Monitoring 标题:为预测性过程监控创建具有数据防泄漏功能的无偏公共基准数据集
作者:Hans Weytjens,Jochen De Weerdt 机构:Research Centre for Information Systems Engineering (LIRIS), KU Leuven, Leuven, Belgium 备注:Accepted for AI4BPM workshop at BMP2021 conferences 链接:https://arxiv.org/abs/2107.01905 摘要:人工智能,特别是机器学习的发展,越来越引起人们对预测过程监控的研究兴趣和努力,预测过程监控是过程挖掘的一个子领域,它涉及预测下一个事件、过程结果和剩余执行时间。不幸的是,研究人员使用各种各样的数据集和方法将它们分成训练集和测试集。这些预处理步骤的文档并不总是完整的。因此,研究成果很难复制,甚至不可能在论文之间进行比较。有时,非公共领域知识的使用进一步阻碍了思想的公平竞争。通常,训练集和测试集并没有完全分离,这是预测过程监控特有的数据泄漏问题。此外,测试集通常在案例持续时间和运行案例数量的混合方面都存在偏差。这些障碍对该领域的进展构成了挑战。本文的贡献在于识别和证明这些障碍的重要性,并提出预处理步骤,以原则性的方式获得无偏基准数据集,从而创建无数据泄漏的代表性测试集,以达到公平竞争的目的,促进开放科学,促进预测过程监测的更快进展。 摘要:Advances in AI, and especially machine learning, are increasingly drawing research interest and efforts towards predictive process monitoring, the subfield of process mining (PM) that concerns predicting next events, process outcomes and remaining execution times. Unfortunately, researchers use a variety of datasets and ways to split them into training and test sets. The documentation of these preprocessing steps is not always complete. Consequently, research results are hard or even impossible to reproduce and to compare between papers. At times, the use of non-public domain knowledge further hampers the fair competition of ideas. Often the training and test sets are not completely separated, a data leakage problem particular to predictive process monitoring. Moreover, test sets usually suffer from bias in terms of both the mix of case durations and the number of running cases. These obstacles pose a challenge to the field's progress. The contribution of this paper is to identify and demonstrate the importance of these obstacles and to propose preprocessing steps to arrive at unbiased benchmark datasets in a principled way, thus creating representative test sets without data leakage with the aim of levelling the playing field, promoting open science and contributing to more rapid progress in predictive process monitoring.
【22】 Ensemble and Auxiliary Tasks for Data-Efficient Deep Reinforcement Learning 标题:数据高效深度强化学习的集成和辅助任务
作者:Muhammad Rizki Maulana,Wee Sun Lee 机构:School of Computing, National University of Singapore, Singapore 备注:ECML-PKDD 2021. Code: this https URL 链接:https://arxiv.org/abs/2107.01904 摘要:在数据有限的情况下,集成任务和辅助任务都可以提高机器学习模型的性能。然而,这两种方法之间的相互作用还没有得到很好的研究,特别是在深层强化学习的背景下。本文研究了集成任务和辅助任务与深度Q学习算法相结合时的效果。我们对有限数据约束下的ATARI博弈进行了案例研究。此外,我们推导了一个改进的偏差-方差-协方差分解来分析学习集合和使用辅助任务的不同方式,并使用该分析来帮助理解案例研究。我们的代码是开源的,可以在https://github.com/NUS-LID/RENAULT. 摘要:Ensemble and auxiliary tasks are both well known to improve the performance of machine learning models when data is limited. However, the interaction between these two methods is not well studied, particularly in the context of deep reinforcement learning. In this paper, we study the effects of ensemble and auxiliary tasks when combined with the deep Q-learning algorithm. We perform a case study on ATARI games under limited data constraint. Moreover, we derive a refined bias-variance-covariance decomposition to analyze the different ways of learning ensembles and using auxiliary tasks, and use the analysis to help provide some understanding of the case study. Our code is open source and available at https://github.com/NUS-LID/RENAULT.
【23】 SM-SGE: A Self-Supervised Multi-Scale Skeleton Graph Encoding Framework for Person Re-Identification 标题:SM-SGE:一种用于身份识别的自监督多尺度骨架图编码框架
作者:Haocong Rao,Xiping Hu,Jun Cheng,Bin Hu 机构:Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, The Chinese University of Hong Kong, Hong Kong, Lanzhou University, Beijing Institute of Technology 备注:Accepted at ACMMM 2021 Main Track. Sole copyright holder is ACMMM. Codes are available at this https URL 链接:https://arxiv.org/abs/2107.01903 摘要:基于三维骨架的人员再识别是一个新兴的研究课题,在安全领域有着巨大的应用潜力。现有的方法通常是从人体关节轨迹中学习人体和运动特征,然而它们缺乏一种系统的方法来模拟人体结构和超出人体关节尺度的人体构件之间的潜在关系。在本文中,我们首次提出了一种自监督多尺度骨架图编码(SM-SGE)框架,该框架从不同尺度的未标记骨架图中综合建模人体、构件关系和骨架动力学,以学习有效的人体骨架表示,首先设计多尺度人体骨架图,将人体从粗到细进行划分,使我们能够在多个层次上对人体结构和骨架动力学进行建模。其次,为了挖掘骨骼运动中人体各组成部分之间的内在关联,提出了一种多尺度图关系网络来学习相邻人体各组成部分节点之间的结构关系和不同尺度节点之间的协作关系,从而获取更具鉴别力的骨骼图特征。最后,我们提出了一种新的多尺度骨架重建机制,使得我们的框架能够从未标记的骨架图中编码骨架动力学和高级语义,大量的实验表明,SM-SGE的性能优于最先进的基于骨架的方法。我们进一步证明了该算法对大规模RGB视频中的三维骨架数据估计的有效性。我们的密码在https://github.com/Kali-Hac/SM-SGE. 摘要:Person re-identification via 3D skeletons is an emerging topic with great potential in security-critical applications. Existing methods typically learn body and motion features from the body-joint trajectory, whereas they lack a systematic way to model body structure and underlying relations of body components beyond the scale of body joints. In this paper, we for the first time propose a Self-supervised Multi-scale Skeleton Graph Encoding (SM-SGE) framework that comprehensively models human body, component relations, and skeleton dynamics from unlabeled skeleton graphs of various scales to learn an effective skeleton representation for person Re-ID. Specifically, we first devise multi-scale skeleton graphs with coarse-to-fine human body partitions, which enables us to model body structure and skeleton dynamics at multiple levels. Second, to mine inherent correlations between body components in skeletal motion, we propose a multi-scale graph relation network to learn structural relations between adjacent body-component nodes and collaborative relations among nodes of different scales, so as to capture more discriminative skeleton graph features. Last, we propose a novel multi-scale skeleton reconstruction mechanism to enable our framework to encode skeleton dynamics and high-level semantics from unlabeled skeleton graphs, which encourages learning a discriminative skeleton representation for person Re-ID. Extensive experiments show that SM-SGE outperforms most state-of-the-art skeleton-based methods. We further demonstrate its effectiveness on 3D skeleton data estimated from large-scale RGB videos. Our codes are open at https://github.com/Kali-Hac/SM-SGE.
【24】 Faster-LTN: a neuro-symbolic, end-to-end object detection architecture 标题:FASTER-LTN:一种神经符号化的端到端目标检测体系结构
作者:Francesco Manigrasso,Filomeno Davide Miro,Lia Morra,Fabrizio Lamberti 机构:Politecnico di Torino, Dipartimento di Automatica e Informatica, Torino, Italy 备注:accepted for presentation at ICANN 2021 链接:https://arxiv.org/abs/2107.01877 摘要:图像中物体间语义关系的检测是图像判读中的一个基本问题。神经符号技术,如逻辑张量网络(LTNs),允许将语义知识表示和推理与从神经网络典型示例中有效学习的能力相结合。本文提出了一种更快的LTN,它是由卷积主干和LTN组成的目标检测器。据我们所知,这是首次尝试在端到端训练环境中结合这两种框架。该体系结构是通过优化一个扎根的理论来训练的,该理论以逻辑公理的形式将标记的例子与先验知识相结合。实验比较表明,与传统的更快的R-CNN结构相比,具有更好的性能。 摘要:The detection of semantic relationships between objects represented in an image is one of the fundamental challenges in image interpretation. Neural-Symbolic techniques, such as Logic Tensor Networks (LTNs), allow the combination of semantic knowledge representation and reasoning with the ability to efficiently learn from examples typical of neural networks. We here propose Faster-LTN, an object detector composed of a convolutional backbone and an LTN. To the best of our knowledge, this is the first attempt to combine both frameworks in an end-to-end training setting. This architecture is trained by optimizing a grounded theory which combines labelled examples with prior knowledge, in the form of logical axioms. Experimental comparisons show competitive performance with respect to the traditional Faster R-CNN architecture.
【25】 DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling 标题:DeepRapper:基于韵律和节奏建模的神经Rap生成
作者:Lanqing Xue,Kaitao Song,Duocai Wu,Xu Tan,Nevin L. Zhang,Tao Qin,Wei-Qiang Zhang,Tie-Yan Liu 机构:The Hong Kong University of Science and Technology, Nanjing University of Science and Technology, Fudan University, Microsoft Research Asia, Tsinghua University 备注:Accepted by ACL 2021 main conference 链接:https://arxiv.org/abs/2107.01875 摘要:说唱生成的目的是产生歌词和相应的歌唱节拍,它需要同时模拟押韵和节奏。以前的说唱作品主要集中在押韵的歌词上,而忽略了对说唱表演很重要的节奏。在本文中,我们开发了DeepRapper,一个基于Transformer的rap生成系统,可以同时模拟韵律和节奏。由于没有可用的rap节奏数据集,我们开发了一个数据挖掘管道来收集一个大规模的rap数据集,其中包括大量的rap歌曲与对齐的歌词和节奏。其次,我们设计了一个基于变换器的自回归语言模型,对韵律和韵律进行了细致的建模。具体地说,我们以相反的顺序生成歌词,用押韵表示和约束来增强押韵,并在歌词中插入节拍符号来进行节奏/节拍建模。据我们所知,DeepRapper是第一个同时产生押韵和节奏的说唱系统。客观和主观评价都表明,深说唱产生了富有创意和高质量的说唱韵律和节奏。代码将在GitHub上发布。 摘要:Rap generation, which aims to produce lyrics and corresponding singing beats, needs to model both rhymes and rhythms. Previous works for rap generation focused on rhyming lyrics but ignored rhythmic beats, which are important for rap performance. In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms. Since there is no available rap dataset with rhythmic beats, we develop a data mining pipeline to collect a large-scale rap dataset, which includes a large number of rap songs with aligned lyrics and rhythmic beats. Second, we design a Transformer-based autoregressive language model which carefully models rhymes and rhythms. Specifically, we generate lyrics in the reverse order with rhyme representation and constraint for rhyme enhancement and insert a beat symbol into lyrics for rhythm/beat modeling. To our knowledge, DeepRapper is the first system to generate rap with both rhymes and rhythms. Both objective and subjective evaluations demonstrate that DeepRapper generates creative and high-quality raps with rhymes and rhythms. Code will be released on GitHub.
【26】 Detecting Concept Drift With Neural Network Model Uncertainty 标题:基于神经网络模型不确定性的概念漂移检测
作者:Lucas Baier,Tim Schlör,Jakob Schöffer,Niklas Kühl 机构:Karlsruhe Institute of Technology, Karlsruhe, Germany 链接:https://arxiv.org/abs/2107.01873 摘要:部署的机器学习模型面临着数据随时间变化的问题,这种现象也被称为概念漂移。虽然现有的概念漂移检测方法已经显示出令人信服的结果,但它们需要真实的标签作为成功漂移检测的先决条件。特别是在许多现实世界的应用场景中,如本工作中所涉及的场景中,真正的标签是稀缺的,并且它们的获取是昂贵的。因此,我们提出了一种新的漂移检测算法,即不确定性漂移检测(UDD),它能够在没有真实标签的情况下检测漂移。我们的方法是基于深度神经网络与蒙特卡罗辍学相结合提供的不确定性估计。通过对不确定性估计应用ADWIN技术来检测结构随时间的变化,并且检测到的漂移触发预测模型的再训练。与基于输入数据的漂移检测相比,我们的方法考虑了当前输入数据对预测模型属性的影响,而不是只检测输入数据的变化(这可能导致不必要的重训练)。我们发现,在回归和分类任务中,UDD在两个合成数据集和十个真实数据集上都优于其他最先进的策略。 摘要:Deployed machine learning models are confronted with the problem of changing data over time, a phenomenon also called concept drift. While existing approaches of concept drift detection already show convincing results, they require true labels as a prerequisite for successful drift detection. Especially in many real-world application scenarios-like the ones covered in this work-true labels are scarce, and their acquisition is expensive. Therefore, we introduce a new algorithm for drift detection, Uncertainty Drift Detection (UDD), which is able to detect drifts without access to true labels. Our approach is based on the uncertainty estimates provided by a deep neural network in combination with Monte Carlo Dropout. Structural changes over time are detected by applying the ADWIN technique on the uncertainty estimates, and detected drifts trigger a retraining of the prediction model. In contrast to input data-based drift detection, our approach considers the effects of the current input data on the properties of the prediction model rather than detecting change on the input data only (which can lead to unnecessary retrainings). We show that UDD outperforms other state-of-the-art strategies on two synthetic as well as ten real-world data sets for both regression and classification tasks.
【27】 Control of rough terrain vehicles using deep reinforcement learning 标题:基于深度强化学习的崎岖地形车辆控制
作者:Viktor Wiberg,Erik Wallin,Martin Servin,Tomas Nordfjell 机构: Ume˚a University, se† Swedish University of Agricultural Sciences 备注:16 pages, 13 figures 链接:https://arxiv.org/abs/2107.01867 摘要:我们探讨了在人工操作和传统控制方法不足的情况下,使用深度强化控制地形车辆的可能性。这封信提出了一个控制器,感知,计划,并成功地控制一个16吨的林业车辆与两个框架铰接关节,六个车轮,以及他们的积极铰接悬挂横穿崎岖的地形。精心塑造的奖励信号促进了安全、环保和高效驾驶,这导致了前所未有的驾驶技能的出现。我们在虚拟环境中测试所学的技能,包括用高密度激光扫描森林遗址重建的地形。控制器显示处理障碍物、高达27$^\circ$的斜坡和各种自然地形的能力,所有这些都具有有限的车轮打滑、平滑和垂直的横向移动以及智能使用主动悬架。结果表明,与人工操作或传统的控制方法相比,深度强化学习对具有复杂动力学和高维观测数据的车辆具有增强控制的潜力,特别是在崎岖地形下。 摘要:We explore the potential to control terrain vehicles using deep reinforcement in scenarios where human operators and traditional control methods are inadequate. This letter presents a controller that perceives, plans, and successfully controls a 16-tonne forestry vehicle with two frame articulation joints, six wheels, and their actively articulated suspensions to traverse rough terrain. The carefully shaped reward signal promotes safe, environmental, and efficient driving, which leads to the emergence of unprecedented driving skills. We test learned skills in a virtual environment, including terrains reconstructed from high-density laser scans of forest sites. The controller displays the ability to handle obstructing obstacles, slopes up to 27$^\circ$, and a variety of natural terrains, all with limited wheel slip, smooth, and upright traversal with intelligent use of the active suspensions. The results confirm that deep reinforcement learning has the potential to enhance control of vehicles with complex dynamics and high-dimensional observation data compared to human operators or traditional control methods, especially in rough terrain.
【28】 Winning at Any Cost -- Infringing the Cartel Prohibition With Reinforcement Learning 标题:不惜一切代价取胜--强化学习侵犯卡特尔禁令
作者:Michael Schlechtinger,Damaris Kosack,Heiko Paulheim,Thomas Fetzer 机构: University of Mannheim, Chair of Data Science, Mannheim, Germany, University of Mannheim, Chair of Public Law, Regulatory Law and Tax Law 备注:accepted at the 19th International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS 2021) 链接:https://arxiv.org/abs/2107.01856 摘要:人工智能越来越多地做出定价决策。由于他们能够在动态决策时利用实时市场数据进行训练,深度强化学习算法在做出此类定价决策时尤其有效。在电子商务场景中,多个强化学习代理可以根据其竞争对手的价格来设定价格。因此,研究表明,从长远来看,代理人可能最终处于合谋状态。为了进一步分析这个问题,我们构建了一个基于囚徒困境的改进版本的场景,其中三个代理玩石头布剪刀的游戏。我们的研究结果表明,行为选择可以被分解为特定的阶段,建立了一种可能性,以发展合谋预防系统,能够识别可能导致竞争对手之间合谋的情况。此外,我们还提供了证据的情况下,代理人有能力执行默契合作战略,没有明确的训练这样做。 摘要:Pricing decisions are increasingly made by AI. Thanks to their ability to train with live market data while making decisions on the fly, deep reinforcement learning algorithms are especially effective in taking such pricing decisions. In e-commerce scenarios, multiple reinforcement learning agents can set prices based on their competitor's prices. Therefore, research states that agents might end up in a state of collusion in the long run. To further analyze this issue, we build a scenario that is based on a modified version of a prisoner's dilemma where three agents play the game of rock paper scissors. Our results indicate that the action selection can be dissected into specific stages, establishing the possibility to develop collusion prevention systems that are able to recognize situations which might lead to a collusion between competitors. We furthermore provide evidence for a situation where agents are capable of performing a tacit cooperation strategy without being explicitly trained to do so.
【29】 Poisoning Attack against Estimating from Pairwise Comparisons 标题:对来自成对比较的估计的毒害攻击
作者:Ke Ma,Qianqian Xu,Jinshan Zeng,Xiaochun Cao,Qingming Huang 机构: Cao is with the State Key Laboratory of Information Security(SKLOIS), Institute of Information Engineering 备注:31 pages 链接:https://arxiv.org/abs/2107.01854 摘要:随着两两排名被广泛应用于选举、体育比赛、推荐等等,攻击者有很强的动机和动机操纵排名列表。他们可以在训练数据中注入恶意的比较来欺骗受害者。这种技术在回归和分类任务中称为中毒攻击。本文首次系统地研究了成对排序算法中的数据中毒攻击,将其形式化为攻击者与攻击者之间的动态和静态博弈,并将其建模为若干整数规划问题。为了突破整数规划问题的计算障碍,我们将其转化为分布鲁棒优化(DRO)问题。基于这样的DRO公式,我们提出了两种有效的中毒攻击算法,并建立了相应的理论保证。通过一系列的玩具模拟和实际数据实验,验证了所提出的中毒攻击策略的有效性。实验结果表明,所提出的方法可以显著降低ranker的性能,因为真实排名列表与聚合结果之间的相关性可以显著降低。 摘要:As pairwise ranking becomes broadly employed for elections, sports competitions, recommendations, and so on, attackers have strong motivation and incentives to manipulate the ranking list. They could inject malicious comparisons into the training data to fool the victim. Such a technique is called poisoning attack in regression and classification tasks. In this paper, to the best of our knowledge, we initiate the first systematic investigation of data poisoning attacks on pairwise ranking algorithms, which can be formalized as the dynamic and static games between the ranker and the attacker and can be modeled as certain kinds of integer programming problems. To break the computational hurdle of the underlying integer programming problems, we reformulate them into the distributionally robust optimization (DRO) problems, which are computationally tractable. Based on such DRO formulations, we propose two efficient poisoning attack algorithms and establish the associated theoretical guarantees. The effectiveness of the suggested poisoning attack strategies is demonstrated by a series of toy simulations and several real data experiments. These experimental results show that the proposed methods can significantly reduce the performance of the ranker in the sense that the correlation between the true ranking list and the aggregated results can be decreased dramatically.
【30】 GraspME -- Grasp Manifold Estimator 标题:GRAPME--GRAP流形估计器
作者:Janik Hager,Ruben Bauer,Marc Toussaint,Jim Mainprice 机构:Machine Learning and Robotics Lab, IPVS, University of Stuttgart, Germany, Max Planck Institute for Intelligent Systems ; IS-MPI ; T¨ubingenStuttgart, Germany, Technische Universit¨at Berlin ; TUB ; Germany 备注:Accepted to RoMan 2021 链接:https://arxiv.org/abs/2107.01836 摘要:在本文中,我们引入了一个抓取流形估计器(GraspME)来直接检测二维相机图像中物体的抓取启示。为了自主地执行操作任务,机器人必须对周围物体建立这样的可抓取性模型。抓取流形具有提供连续无限多个抓取的优点,这在使用其他抓取表示(例如预定义的抓取点)时不是这种情况。例如,在运动优化中可以利用这个特性,将目标集定义为机器人配置空间中的隐式曲面约束。在这项工作中,我们限制自己的情况下估计可能的末端效应器位置直接从二维相机图像。为此,我们通过一组关键点来定义抓取流形,并使用掩模R-CNN主干在图像中定位它们。使用学习到的特征可以概括到不同的视角,具有潜在噪声的图像和不属于训练集的对象。我们只依赖模拟数据,对简单和复杂的物体进行实验,包括看不见的物体。该框架在GPU上的推理速度为11.5fps,关键点估计的平均精度为94.5%,平均像素距离仅为1.29。这表明,通过边界盒和分割掩模可以很好地估计出目标,并且可以逼近正确的抓取流形的关键点坐标。 摘要:In this paper, we introduce a Grasp Manifold Estimator (GraspME) to detect grasp affordances for objects directly in 2D camera images. To perform manipulation tasks autonomously it is crucial for robots to have such graspability models of the surrounding objects. Grasp manifolds have the advantage of providing continuously infinitely many grasps, which is not the case when using other grasp representations such as predefined grasp points. For instance, this property can be leveraged in motion optimization to define goal sets as implicit surface constraints in the robot configuration space. In this work, we restrict ourselves to the case of estimating possible end-effector positions directly from 2D camera images. To this extend, we define grasp manifolds via a set of key points and locate them in images using a Mask R-CNN backbone. Using learned features allows generalizing to different view angles, with potentially noisy images, and objects that were not part of the training set. We rely on simulation data only and perform experiments on simple and complex objects, including unseen ones. Our framework achieves an inference speed of 11.5 fps on a GPU, an average precision for keypoint estimation of 94.5% and a mean pixel distance of only 1.29. This shows that we can estimate the objects very well via bounding boxes and segmentation masks as well as approximate the correct grasp manifold's keypoint coordinates.
【31】 Provable Convergence of Nesterov Accelerated Method for Over-Parameterized Neural Networks 标题:过参数神经网络Nesterov加速法的可证明收敛性
作者:Xin Liu,Zhisong Pan 机构:College of Command and Control, Army Engineering University of PLA, Nanjing, China 备注:9 pages 链接:https://arxiv.org/abs/2107.01832 摘要:尽管深度学习在经验上取得了成功,但对于用一阶优化方法训练的随机初始化神经网络,尽管其前景是非凸的、非光滑的,却能达到零训练损失的原因,仍然缺乏理论上的理解。近年来,在参数化程度过高的情况下,有一些工作试图揭开这一现象的神秘面纱。在这项工作中,我们考虑了一种常用的动量优化算法:Nesterov加速法(NAG),在这方面取得了进一步的进展。分析了具有ReLU激活的两层全连通神经网络NAG的收敛性。具体地说,我们证明了NAG的误差在线性收敛速度$1-\Theta(1/\sqrt{\kappa})$下收敛到零,其中$\kappa>1$由神经网络的初始化和结构决定。与梯度下降的速率$1-\Theta(1/\kappa)$相比,NAG实现了加速。此外,还验证了NAG方法和Heavy-ball方法具有相似的收敛速度。 摘要:Despite the empirical success of deep learning, it still lacks theoretical understandings to explain why randomly initialized neural network trained by first-order optimization methods is able to achieve zero training loss, even though its landscape is non-convex and non-smooth. Recently, there are some works to demystifies this phenomenon under over-parameterized regime. In this work, we make further progress on this area by considering a commonly used momentum optimization algorithm: Nesterov accelerated method (NAG). We analyze the convergence of NAG for two-layer fully connected neural network with ReLU activation. Specifically, we prove that the error of NAG converges to zero at a linear convergence rate $1-\Theta(1/\sqrt{\kappa})$, where $\kappa > 1$ is determined by the initialization and the architecture of neural network. Comparing to the rate $1-\Theta(1/\kappa)$ of gradient descent, NAG achieves an acceleration. Besides, it also validates NAG and Heavy-ball method can achieve a similar convergence rate.
【32】 ARM-Net: Adaptive Relation Modeling Network for Structured Data 标题:ARM-Net:结构化数据的自适应关系建模网络
作者:Shaofeng Cai,Kaiping Zheng,Gang Chen,H. V. Jagadish,Beng Chin Ooi,Meihui Zhang 机构:National University of Singapore, Zhejiang University, University of Michigan, Beijing Institute of Technology 备注:14 pages, 11 figures, 5 tables, published as a conference paper in ACM SIGMOD 2020 链接:https://arxiv.org/abs/2107.01830 摘要:关系数据库是存储和查询结构化数据的事实标准,从结构化数据中提取见解需要高级分析。深度神经网络(DNNs)在特定的数据类型(如图像)中取得了超人的预测性能。然而,当应用于结构化数据时,现有的dnn可能不会产生有意义的结果。原因是表中属性值的组合之间存在相关性和依赖性,这些相关性和依赖性不遵循DNN可以轻松模仿的简单相加模式。这种可能的交叉特征的数量是组合的,这使得它们在计算上无法建模。此外,在实际应用中部署学习模型也突出了对可解释性的需要,特别是对于高风险应用,这仍然是DNNs关注的另一个问题。在本文中,我们提出了ARM-Net,一个为结构化数据定制的自适应关系建模网络,以及一个基于ARM-Net的轻量级关系数据分析框架。其核心思想是通过将输入特征转化为指数空间,然后自适应地确定每个交叉特征的交互顺序和交互权重,有选择地、动态地建立具有交叉特征的特征交互模型。我们提出了一种新的稀疏注意机制,在给定输入元组的情况下动态生成交互权重,从而可以显式地对任意阶的交叉特征进行建模,并对噪声特征进行选择性过滤。然后在模型推理过程中,ARM网络可以指定用于每个预测的交叉特征,以获得更高的精度和更好的解释性。我们在真实世界数据集上的大量实验表明,ARM-Net始终优于现有的模型,并为数据驱动的决策提供了更具解释性的预测。 摘要:Relational databases are the de facto standard for storing and querying structured data, and extracting insights from structured data requires advanced analytics. Deep neural networks (DNNs) have achieved super-human prediction performance in particular data types, e.g., images. However, existing DNNs may not produce meaningful results when applied to structured data. The reason is that there are correlations and dependencies across combinations of attribute values in a table, and these do not follow simple additive patterns that can be easily mimicked by a DNN. The number of possible such cross features is combinatorial, making them computationally prohibitive to model. Furthermore, the deployment of learning models in real-world applications has also highlighted the need for interpretability, especially for high-stakes applications, which remains another issue of concern to DNNs. In this paper, we present ARM-Net, an adaptive relation modeling network tailored for structured data, and a lightweight framework ARMOR based on ARM-Net for relational data analytics. The key idea is to model feature interactions with cross features selectively and dynamically, by first transforming the input features into exponential space, and then determining the interaction order and interaction weights adaptively for each cross feature. We propose a novel sparse attention mechanism to dynamically generate the interaction weights given the input tuple, so that we can explicitly model cross features of arbitrary orders with noisy features filtered selectively. Then during model inference, ARM-Net can specify the cross features being used for each prediction for higher accuracy and better interpretability. Our extensive experiments on real-world datasets demonstrate that ARM-Net consistently outperforms existing models and provides more interpretable predictions for data-driven decision making.
【33】 A System for Traded Control Teleoperation of Manipulation Tasks using Intent Prediction from Hand Gestures 标题:基于手势意图预测的操作任务交换控制遥操作系统
作者:Yoojin Oh,Marc Toussaint,Jim Mainprice 机构:Machine Learning and Robotics Lab, IPVS, University of Stuttgart, Germany, Max Planck Institute for Intelligent Systems ; MPI-IS ; T¨ubingenStuttgart, Germany, Technische Universit¨at Berlin ; TUB ; Germany 备注:Accepted to IEEE-RoMAN 2021 链接:https://arxiv.org/abs/2107.01829 摘要:提出了一种包含机器人感知和手势意图预测的遥操作系统。感知模块识别机器人工作空间中存在的对象,意图预测模块识别用户可能想要抓住的对象。该体系结构允许该方法依赖于交易控制而不是直接控制:我们使用手势来指定连续操作任务的目标对象,然后机器人通过轨迹优化自主生成抓取或回收运动。感知模块依靠基于模型的跟踪器精确跟踪目标的6D姿态,并利用最新的基于学习的目标检测和分割方法,通过自动检测场景中的目标来初始化跟踪器。利用训练好的多层感知器分类器从用户手势中识别目标对象。在介绍了系统的所有组成部分及其经验评估之后,我们给出了将我们的管道与直接交易控制方法(即不使用预测的方法)进行比较的实验结果,这表明使用意图预测可以降低总体任务执行时间。 摘要:This paper presents a teleoperation system that includes robot perception and intent prediction from hand gestures. The perception module identifies the objects present in the robot workspace and the intent prediction module which object the user likely wants to grasp. This architecture allows the approach to rely on traded control instead of direct control: we use hand gestures to specify the goal objects for a sequential manipulation task, the robot then autonomously generates a grasping or a retrieving motion using trajectory optimization. The perception module relies on the model-based tracker to precisely track the 6D pose of the objects and makes use of a state of the art learning-based object detection and segmentation method, to initialize the tracker by automatically detecting objects in the scene. Goal objects are identified from user hand gestures using a trained a multi-layer perceptron classifier. After presenting all the components of the system and their empirical evaluation, we present experimental results comparing our pipeline to a direct traded control approach (i.e., one that does not use prediction) which shows that using intent prediction allows to bring down the overall task execution time.
【34】 Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation 标题:基于模型集成探索和开发的样本有效强化学习
作者:Yao Yao,Li Xiao,Zhicheng An,Wanpeng Zhang,Dijun Luo 机构: Tsinghua Shenzhen International Graduate School, TsinghuaUniversity 备注:7 pages, 5 figures, accepted by IEEE International Conference on Robotics and Automation 2021 (IEEE ICRA 2021) 链接:https://arxiv.org/abs/2107.01825 摘要:基于模型的深度强化学习在许多需要高样本效率的领域取得了成功,如Go和机器人学。然而,仍然存在一些问题,如规划有效的探索来学习更精确的动态模型,评估学习模型的不确定性,以及更合理地利用模型。为了缓解这些问题,我们提出了MEEE,一个模型集成方法,包括乐观探索和加权开发。在探索过程中,与以往的方法直接选择期望累积收益最大化的最优行为不同,我们的代理首先生成一组候选行为,然后寻找同时考虑期望收益和未来观测新奇性的最优行为。在开发过程中,根据模型的不确定性,将不同的折扣权值分别分配给想象的过渡元组,以防止模型预测误差在agent训练中的传播。在多个具有挑战性的连续控制基准任务上的实验表明,该方法优于其他无模型和基于模型的最新方法,特别是在样本复杂度方面。 摘要:Model-based deep reinforcement learning has achieved success in various domains that require high sample efficiencies, such as Go and robotics. However, there are some remaining issues, such as planning efficient explorations to learn more accurate dynamic models, evaluating the uncertainty of the learned models, and more rational utilization of models. To mitigate these issues, we present MEEE, a model-ensemble method that consists of optimistic exploration and weighted exploitation. During exploration, unlike prior methods directly selecting the optimal action that maximizes the expected accumulative return, our agent first generates a set of action candidates and then seeks out the optimal action that takes both expected return and future observation novelty into account. During exploitation, different discounted weights are assigned to imagined transition tuples according to their model uncertainty respectively, which will prevent model predictive error propagation in agent training. Experiments on several challenging continuous control benchmark tasks demonstrated that our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
【35】 An Explainable AI System for the Diagnosis of High Dimensional Biomedical Data 标题:一种用于高维生物医学数据诊断的可解释人工智能系统
作者:Alfred Ultsch,Jörg Hoffmann,Maximilian Röhnert,Malte Von Bonin,Uta Oelschlägel,Cornelia Brendel,Michael C. Thrun 机构:) Databionics, Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Straße , D-, Marburg., ) J Department of Hematology, Oncology and Immunology, Philipps-University, Baldinger Str., D-, Marbrug. 备注:22 pages, 1 figure, 5 tables 链接:https://arxiv.org/abs/2107.01820 摘要:典型的流式细胞术数据样本由10个或更多特征中超过100000个细胞组成。人工智能系统能够以与人类专家几乎相同的精确度诊断此类数据。然而,在这样的系统中有一个核心挑战:他们的决定对人们的健康和生活具有深远的影响,因此,人工智能系统的决定需要人类能够理解和证明。在这项工作中,我们提出了一种新的可解释的人工智能方法,称为ALPODS,它能够基于高维数据中的聚类(即子群体)对病例进行分类(诊断)。ALPODS能够以人类专家可以理解的形式解释它的决定。对于识别出的子群体,生成以领域专家典型语言表达的模糊推理规则。基于这些规则的可视化方法允许人类专家理解人工智能系统使用的推理。通过与一系列最先进的可解释人工智能系统的比较,可以看出ALPODS在已知的基准数据和日常案例数据上都能有效地运行。 摘要:Typical state of the art flow cytometry data samples consists of measures of more than 100.000 cells in 10 or more features. AI systems are able to diagnose such data with almost the same accuracy as human experts. However, there is one central challenge in such systems: their decisions have far-reaching consequences for the health and life of people, and therefore, the decisions of AI systems need to be understandable and justifiable by humans. In this work, we present a novel explainable AI method, called ALPODS, which is able to classify (diagnose) cases based on clusters, i.e., subpopulations, in the high-dimensional data. ALPODS is able to explain its decisions in a form that is understandable for human experts. For the identified subpopulations, fuzzy reasoning rules expressed in the typical language of domain experts are generated. A visualization method based on these rules allows human experts to understand the reasoning used by the AI system. A comparison to a selection of state of the art explainable AI systems shows that ALPODS operates efficiently on known benchmark data and also on everyday routine case data.
【36】 Boosting Transferability of Targeted Adversarial Examples via Hierarchical Generative Networks 标题:通过分层生成网络提高目标对抗性实例的可转移性
作者:Xiao Yang,Yinpeng Dong,Tianyu Pang,Hang Su,Jun Zhu 机构:Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing, China 链接:https://arxiv.org/abs/2107.01809 摘要:在黑盒环境下,基于传输的对抗攻击可以有效地评估模型的鲁棒性。尽管有几种方法已经证明了非目标对抗性例子的可转移性,但目标对抗性的可转移性仍然是一个挑战。现有的方法要么目标转移率低,要么牺牲计算效率。在本文中,我们开发了一个简单而实用的框架来有效地构建基于目标转移的对抗性示例。具体地说,我们提出了一个条件生成攻击模型,该模型通过简单地改变类的嵌入和共享一个主干来生成针对不同类的攻击实例。大量实验表明,与现有方法相比,我们的方法显著提高了针对性黑盒攻击的成功率——在NeurIPS 2017标准测试中,仅基于一种替代白盒模型的六种不同模型的平均成功率达到29.6%,它比最先进的基于梯度的攻击方法(平均成功率$<$2\%)有很大的优势。此外,与基于梯度的方法相比,所提出的方法在一个数量级以上也更有效。 摘要:Transfer-based adversarial attacks can effectively evaluate model robustness in the black-box setting. Though several methods have demonstrated impressive transferability of untargeted adversarial examples, targeted adversarial transferability is still challenging. The existing methods either have low targeted transferability or sacrifice computational efficiency. In this paper, we develop a simple yet practical framework to efficiently craft targeted transfer-based adversarial examples. Specifically, we propose a conditional generative attacking model, which can generate the adversarial examples targeted at different classes by simply altering the class embedding and share a single backbone. Extensive experiments demonstrate that our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods -- it reaches an average success rate of 29.6\% against six diverse models based only on one substitute white-box model in the standard testing of NeurIPS 2017 competition, which outperforms the state-of-the-art gradient-based attack methods (with an average success rate of $<$2\%) by a large margin. Moreover, the proposed method is also more efficient beyond an order of magnitude than gradient-based methods.
【37】 Why is Pruning at Initialization Immune to Reinitializing and Shuffling? 标题:为什么初始化时的修剪不受重新初始化和洗牌的影响?
作者:Sahib Singh,Rosanne Liu 机构:ML Collective, Ford Motor Company 备注:6 pages, 2 figures 链接:https://arxiv.org/abs/2107.01808 摘要:最近的研究评估了修剪神经网络方法的有效性,发现了一个令人惊讶的发现:当对现有的修剪方法进行消融研究时,即SNIP、GraSP、SynFlow和magnitude修剪,这些方法的性能保持不变,有时甚至在每层中随机洗牌掩模位置(分层洗牌)或采样新的初始权重值(Reinit),同时保持剪枝掩模不变时得到改善。通过研究随机化操作前后的分层统计,我们试图理解这种网络对权重/掩码修改免疫的原因。我们发现,在每种初始剪枝方法下,未剪枝权值的分布随随机化操作的变化最小。 摘要:Recent studies assessing the efficacy of pruning neural networks methods uncovered a surprising finding: when conducting ablation studies on existing pruning-at-initialization methods, namely SNIP, GraSP, SynFlow, and magnitude pruning, performances of these methods remain unchanged and sometimes even improve when randomly shuffling the mask positions within each layer (Layerwise Shuffling) or sampling new initial weight values (Reinit), while keeping pruning masks the same. We attempt to understand the reason behind such network immunity towards weight/mask modifications, by studying layer-wise statistics before and after randomization operations. We found that under each of the pruning-at-initialization methods, the distribution of unpruned weights changed minimally with randomization operations.
【38】 Continual Contrastive Self-supervised Learning for Image Classification 标题:连续对比自监督学习在图像分类中的应用
作者:Zhiwei Lin,Yongtao Wang,Hongxiang Lin 机构:Wangxuan Institute of Computer, Technology, Peking University, Beijing, China 链接:https://arxiv.org/abs/2107.01776 摘要:对于人工学习系统,随着时间的推移,从数据流中不断学习是必不可少的。新兴的有监督持续学习研究取得了很大的进展,而非监督学习中的灾难性遗忘研究尚属空白。在无监督学习方法中,自监督学习方法在无标度数据的视觉表征上显示出巨大的潜力。为了提高自监督学习的视觉表现力,需要更大、更丰富的数据。在现实世界中,任何时候都会生成未标记的数据。这种情况为自监督方法的学习提供了巨大的优势。然而,在当前的范例中,将以前的数据和当前的数据打包在一起并再次训练它们是浪费时间和资源的。因此,迫切需要一种连续的自监督学习方法。在本文中,我们首次尝试通过提出一种排练的方法来实现连续的对比自监督学习,该方法保留了以往数据中的一些样本。我们没有直接将保存的样本与当前数据集结合起来进行训练,而是通过模仿旧网络在一组保存的样本上推断出的相似度分布,利用自监督知识提取将先前数据之间的对比信息传递到当前网络。此外,我们建立额外的样本队列,协助网路区分先前与目前的资料,并在学习各自的特徵表示时,防止互相干扰。实验结果表明,该方法在CIFAR100和ImageNet-Sub上有很好的分类效果,与无需任何技术的逐条学习任务的自监督基线相比,在10步递增设置下,CIFAR100和ImageNet-Sub的分类精度分别提高了1.60%和2.86%。 摘要:For artificial learning systems, continual learning over time from a stream of data is essential. The burgeoning studies on supervised continual learning have achieved great progress, while the study of catastrophic forgetting in unsupervised learning is still blank. Among unsupervised learning methods, self-supervise learning method shows tremendous potential on visual representation without any labeled data at scale. To improve the visual representation of self-supervised learning, larger and more varied data is needed. In the real world, unlabeled data is generated at all times. This circumstance provides a huge advantage for the learning of the self-supervised method. However, in the current paradigm, packing previous data and current data together and training it again is a waste of time and resources. Thus, a continual self-supervised learning method is badly needed. In this paper, we make the first attempt to implement the continual contrastive self-supervised learning by proposing a rehearsal method, which keeps a few exemplars from the previous data. Instead of directly combining saved exemplars with the current data set for training, we leverage self-supervised knowledge distillation to transfer contrastive information among previous data to the current network by mimicking similarity score distribution inferred by the old network over a set of saved exemplars. Moreover, we build an extra sample queue to assist the network to distinguish between previous and current data and prevent mutual interference while learning their own feature representation. Experimental results show that our method performs well on CIFAR100 and ImageNet-Sub. Compared with self-supervised baselines, which learning tasks one by one without taking any technique, we improve the image classification top-1 accuracy by 1.60% on CIFAR100 and 2.86% on ImageNet-Sub under 10 incremental steps setting.
【39】 Single Model for Influenza Forecasting of Multiple Countries by Multi-task Learning 标题:基于多任务学习的多国流感单一预测模型
作者:Taichi Murayama,Shoko Wakamiya,Eiji Aramaki 机构:Nara Institute of Science and Technology (NAIST), Japan 备注:European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2021 链接:https://arxiv.org/abs/2107.01760 摘要:准确预测流感等传染病疫情是医疗机构的一项重要任务。虽然以往的研究提出了许多流感预测方法和模型,主要是基于历史流感活动数据和在线用户生成的内容,但目前还没有针对多个国家使用两类数据的流感预测模型。我们的论文利用多任务学习来应对建立一个针对多个国家的流感预测模型的挑战;每一个国家的每一项任务。同时,为了开发性能更高的流感预测模型,我们解决了两个问题;找到合适的搜索查询,这是用户生成内容的一部分,以及如何在模型创建中有效地利用搜索查询。对于第一个问题,我们提出了从英语到其他语言的转换方法。对于第二个问题,我们提出了一个新的流感预测模型,该模型利用了使用注意机制的搜索查询,并将该模型扩展到多任务模型,用于多个国家的流感预测。在五个国家进行的流感疫情预测实验表明,与基线相比,我们的模型通过利用搜索查询和多任务学习显著提高了性能。 摘要:The accurate forecasting of infectious epidemic diseases such as influenza is a crucial task undertaken by medical institutions. Although numerous flu forecasting methods and models based mainly on historical flu activity data and online user-generated contents have been proposed in previous studies, no flu forecasting model targeting multiple countries using two types of data exists at present. Our paper leverages multi-task learning to tackle the challenge of building one flu forecasting model targeting multiple countries; each country as each task. Also, to develop the flu prediction model with higher performance, we solved two issues; finding suitable search queries, which are part of the user-generated contents, and how to leverage search queries efficiently in the model creation. For the first issue, we propose the transfer approaches from English to other languages. For the second issue, we propose a novel flu forecasting model that takes advantage of search queries using an attention mechanism and extend the model to a multi-task model for multiple countries' flu forecasts. Experiments on forecasting flu epidemics in five countries demonstrate that our model significantly improved the performance by leveraging the search queries and multi-task learning compared to the baselines.
【40】 Learning Delaunay Triangulation using Self-attention and Domain Knowledge 标题:利用自我注意和领域知识学习Delaunay三角剖分
作者:Jaeseung Lee,Woojin Choi,Jibum Kim 机构:Department of Computer Science and Engineering, Incheon National University, Incheon, South Korea 链接:https://arxiv.org/abs/2107.01759 摘要:Delaunay三角剖分是一个应用广泛的几何组合优化问题。许多算法可以在给定输入点集的情况下生成Delaunay三角剖分,但大多数算法都是非平凡的算法,需要理解几何或执行额外的几何操作,如边翻转。深度学习被用来解决各种组合优化问题;然而,基于深度学习的Delaunay三角剖分的生成仍然是一个难题,由于其复杂性,很少有人对其进行研究。本文提出了一种新的基于深度学习的Delaunay三角剖分学习方法,该方法采用了一种新的基于自我注意和领域知识的注意机制。该模型的设计使得该模型能够利用编码器中的自我注意有效地学习点对点关系。在解码器中,提出了一种新的基于领域知识的注意力评分函数,在不满足几何要求的情况下提供较高的惩罚。所提出的注意得分函数的优点在于它能够将其应用扩展到求解其他涉及几何的组合优化问题。当所提出的神经网络模型经过良好的训练后,它能自动预测输入点集的Delaunay三角剖分,而不需要任何额外的几何运算,因而简单有效。我们通过实验验证了该模型的有效性,并得出结论,与其他基于深度学习的方法相比,该模型具有更好的性能。 摘要:Delaunay triangulation is a well-known geometric combinatorial optimization problem with various applications. Many algorithms can generate Delaunay triangulation given an input point set, but most are nontrivial algorithms requiring an understanding of geometry or the performance of additional geometric operations, such as the edge flip. Deep learning has been used to solve various combinatorial optimization problems; however, generating Delaunay triangulation based on deep learning remains a difficult problem, and very few research has been conducted due to its complexity. In this paper, we propose a novel deep-learning-based approach for learning Delaunay triangulation using a new attention mechanism based on self-attention and domain knowledge. The proposed model is designed such that the model efficiently learns point-to-point relationships using self-attention in the encoder. In the decoder, a new attention score function using domain knowledge is proposed to provide a high penalty when the geometric requirement is not satisfied. The strength of the proposed attention score function lies in its ability to extend its application to solving other combinatorial optimization problems involving geometry. When the proposed neural net model is well trained, it is simple and efficient because it automatically predicts the Delaunay triangulation for an input point set without requiring any additional geometric operations. We conduct experiments to demonstrate the effectiveness of the proposed model and conclude that it exhibits better performance compared with other deep-learning-based approaches.
【41】 Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction 标题:无需再训练即可改进代理:带非策略校正的并行树搜索
作者:Assaf Hallak,Gal Dalal,Steven Dalton,Iuri Frosio,Shie Mannor,Gal Chechik 链接:https://arxiv.org/abs/2107.01715 摘要:树搜索(TS)对于强化学习中一些最有影响力的成功是至关重要的。在这里,我们用TS来解决限制其可用性的两个主要挑战:\textit{distribution shift}和\textit{scalability}。我们首先发现并分析了一个反直觉的现象:通过TS和预先训练的值函数进行行为选择,往往会导致较低的性能比原来的预先训练的代理,即使在未来的步骤中有权访问确切的状态和奖励。我们表明这是由于分布转移到价值估计高度不准确的地区,并用极值理论分析了这种影响。为了克服这个问题,我们引入了一个新的非策略修正项,通过对采样轨迹下的惩罚来解释预训练值与其对应的TS策略之间的不匹配。我们证明了我们的修正消除了上述不匹配,并限制了次优行为选择的概率。我们的修正显著提高了预先训练的彩虹特工,而无需任何进一步的训练,通常他们在阿塔里游戏中的得分会翻一番以上。接下来,我们将讨论穷举TS的计算复杂度所带来的可伸缩性问题,它随树深度呈指数级扩展。我们引入了批处理BFS:一种GPU广度优先搜索,它同时推进树的每个深度的所有节点。批处理BFS将运行时间缩短了两个数量级,而且,无可厚非的是,还可以使用以前不可行的深度TS进行训练。我们使用TS从零开始训练DQN代理,并在几个Atari游戏中显示出与原始DQN和更高级的彩虹相比的改进。 摘要:Tree Search (TS) is crucial to some of the most influential successes in reinforcement learning. Here, we tackle two major challenges with TS that limit its usability: \textit{distribution shift} and \textit{scalability}. We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps. We show this is due to a distribution shift to areas where value estimates are highly inaccurate and analyze this effect using Extreme Value theory. To overcome this problem, we introduce a novel off-policy correction term that accounts for the mismatch between the pre-trained value and its corresponding TS policy by penalizing under-sampled trajectories. We prove that our correction eliminates the above mismatch and bound the probability of sub-optimal action selection. Our correction significantly improves pre-trained Rainbow agents without any further training, often more than doubling their scores on Atari games. Next, we address the scalability issue given by the computational complexity of exhaustive TS that scales exponentially with the tree depth. We introduce Batch-BFS: a GPU breadth-first search that advances all nodes in each depth of the tree simultaneously. Batch-BFS reduces runtime by two orders of magnitude and, beyond inference, enables also training with TS of depths that were not feasible before. We train DQN agents from scratch using TS and show improvement in several Atari games compared to both the original DQN and the more advanced Rainbow.
【42】 Low-Dimensional State and Action Representation Learning with MDP Homomorphism Metrics 标题:基于MDP同态度量的低维状态和动作表示学习
作者:Nicolò Botteghi,Mannes Poel,Beril Sirmacek,Christoph Brune 机构:Robotics and Mechatronics, University of Twente, Enschede, The Netherlands, Datamanagement and Biometrics, Department of Smart Cities, Saxion University of Applied Sciences, Applied Analysis 链接:https://arxiv.org/abs/2107.01677 摘要:深度强化学习已经显示出它能够直接从高维观察中解决复杂问题。然而,在端到端环境下,强化学习算法的样本效率不高,需要较长的训练时间和大量的数据。在这项工作中,我们提出了一个有效的样本强化学习框架,利用状态和动作表示将高维问题转化为低维问题。此外,我们寻求将潜在状态映射到潜在行为的最优策略。因为现在的策略是在抽象表示上学习的,所以我们使用辅助损失函数,将这样的策略提升到原来的问题域。结果表明,该框架能有效地学习低维、可解释的状态和行为表示以及最优潜在策略。 摘要:Deep Reinforcement Learning has shown its ability in solving complicated problems directly from high-dimensional observations. However, in end-to-end settings, Reinforcement Learning algorithms are not sample-efficient and requires long training times and quantities of data. In this work, we proposed a framework for sample-efficient Reinforcement Learning that take advantage of state and action representations to transform a high-dimensional problem into a low-dimensional one. Moreover, we seek to find the optimal policy mapping latent states to latent actions. Because now the policy is learned on abstract representations, we enforce, using auxiliary loss functions, the lifting of such policy to the original problem domain. Results show that the novel framework can efficiently learn low-dimensional and interpretable state and action representations and the optimal latent policy.
【43】 Low Dimensional State Representation Learning with Robotics Priors in Continuous Action Spaces 标题:连续动作空间中机器人先验的低维状态表征学习
作者:Nicolò Botteghi,Khaled Alaa,Mannes Poel,Beril Sirmacek,Christoph Brune,Abeje Mersha,Stefano Stramigioli 机构: JönköpingUniversity 备注:Paper Accepted at IROS2021. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 链接:https://arxiv.org/abs/2107.01667 摘要:自主机器人需要高度的认知和运动智能才能进入我们的日常生活。在非结构化环境和存在不确定性的情况下,这样的智能度是不容易获得的。强化学习算法已经被证明能够以端到端的方式解决复杂的机器人任务,而不需要任何手工制作的特性或策略。特别是在机器人领域,真实世界数据的成本通常非常高,因此需要实现高样本效率的强化学习解决方案。在本文中,我们提出了一个结合低维状态表示的学习框架,从机器人原始感官读数的高维观察值中学习低维状态表示,并在给定学习状态表示的情况下学习最优策略。我们评估了我们的框架在移动机器人导航的情况下,在连续的状态和行动空间。此外,我们还研究了在存在视觉和深度干扰(如照明变化和移动障碍物)的情况下,如何将在模拟虚拟环境中学习到的知识转移到真实机器人上,而无需进一步的再训练。 摘要:Autonomous robots require high degrees of cognitive and motoric intelligence to come into our everyday life. In non-structured environments and in the presence of uncertainties, such degrees of intelligence are not easy to obtain. Reinforcement learning algorithms have proven to be capable of solving complicated robotics tasks in an end-to-end fashion without any need for hand-crafted features or policies. Especially in the context of robotics, in which the cost of real-world data is usually extremely high, reinforcement learning solutions achieving high sample efficiency are needed. In this paper, we propose a framework combining the learning of a low-dimensional state representation, from high-dimensional observations coming from the robot's raw sensory readings, with the learning of the optimal policy, given the learned state representation. We evaluate our framework in the context of mobile robot navigation in the case of continuous state and action spaces. Moreover, we study the problem of transferring what learned in the simulated virtual environment to the real robot without further retraining using real-world data in the presence of visual and depth distractors, such as lighting changes and moving obstacles.
【44】 IITP at WAT 2021: System description for English-Hindi Multimodal Translation Task 标题:2021年WATIITP:英语-印地语多模态翻译任务的系统描述
作者:Baban Gain,Dibyanayan Bandyopadhyay,Asif Ekbal 机构:Indian Institute of Technology Patna, Patna, India 链接:https://arxiv.org/abs/2107.01656 摘要:神经机器翻译(NMT)由于其端到端可训练的灵活性,是当今主流的机器翻译技术。然而,NMT仍然难以在低资源环境下正确翻译,特别是在远程语言对上。克服这一问题的一种方法是利用其他方式提供的信息(如果有的话)。其思想是,尽管语言不同,源语和目标语使用者看到的是同一事物,源语和目标语的视觉表现是相同的,这对系统有积极的帮助。多模态信息可以帮助NMT系统消除某些短语或单词的歧义,从而提高翻译质量。我们参加了第八届亚洲翻译研讨会(WAT-2021)的英语-印地语多模态翻译任务,获得了42.47分和37.50分的BLEU评估分数和挑战分数。 摘要:Neural Machine Translation (NMT) is a predominant machine translation technology nowadays because of its end-to-end trainable flexibility. However, NMT still struggles to translate properly in low-resource settings specifically on distant language pairs. One way to overcome this is to use the information from other modalities if available. The idea is that despite differences in languages, both the source and target language speakers see the same thing and the visual representation of both the source and target is the same, which can positively assist the system. Multimodal information can help the NMT system to improve the translation by removing ambiguity on some phrases or words. We participate in the 8th Workshop on Asian Translation (WAT - 2021) for English-Hindi multimodal translation task and achieve 42.47 and 37.50 BLEU points for Evaluation and Challenge subset, respectively.
【45】 Efficient Explanations for Knowledge Compilation Languages 标题:知识编译语言的高效解释
作者:Xuanxiang Huang,Yacine Izza,Alexey Ignatiev,Martin C. Cooper,Nicholas Asher,Joao Marques-Silva 机构:Université de Toulouse, Toulouse, France, Monash University, Melbourne, Australia, Université Paul Sabatier, IRIT, Toulouse, France, IRIT, CNRS, Toulouse, France 链接:https://arxiv.org/abs/2107.01654 摘要:知识编译(KC)语言的实际应用越来越多,包括在约束编程(CP)和机器学习(ML)中。在大多数应用程序中,一个自然的问题是如何解释由KC语言表示的模型所做的决策。本文表明,对于许多最著名的KC语言,著名的解释类可以在多项式时间内进行计算。这些类包括确定性的可分解否定范式(d-DNNF),以及任何比d-DNNF更简洁的KC语言。此外,本文还研究了多项式时间的解释计算可以扩展到比d-DNNF更简洁的KC语言的条件。 摘要:Knowledge compilation (KC) languages find a growing number of practical uses, including in Constraint Programming (CP) and in Machine Learning (ML). In most applications, one natural question is how to explain the decisions made by models represented by a KC language. This paper shows that for many of the best known KC languages, well-known classes of explanations can be computed in polynomial time. These classes include deterministic decomposable negation normal form (d-DNNF), and so any KC language that is strictly less succinct than d-DNNF. Furthermore, the paper also investigates the conditions under which polynomial time computation of explanations can be extended to KC languages more succinct than d-DNNF.
【46】 The Composability of Intermediate Values in Composable Inductive Programming 标题:可组合归纳规划中中间值的可组合性
作者:Edward McDaid,Sarah McDaid 机构:Chief Technology Officer, Zoea Ltd, Head of Digital 备注:8 pages, 9 figures 链接:https://arxiv.org/abs/2107.01621 摘要:人们相信,包括中间值在内的机制使得可组合归纳编程(CIP)能够被用来产生任何规模的软件。我们呈现了一项研究的结果,该研究调查了程序大小、中间值的数量和用于指定使用CIP的程序的测试用例的数量之间的关系。在这项研究中,96000个不同大小的程序被随机生成,分解成片段并转化成测试用例。然后使用测试用例使用Zoea重新生成原始程序的新版本。结果表明,在所研究的规模范围内,中间值的个数与重新生成的程序规模、测试用例的个数与重新生成的程序规模呈线性关系。此外,随着程序规模的增加,测试用例的数量与中间值的数量之间的权衡余地也在增加,反之亦然。 摘要:It is believed that mechanisms including intermediate values enable composable inductive programming (CIP) to be used to produce software of any size. We present the results of a study that investigated the relationships between program size, the number of intermediate values and the number of test cases used to specify programs using CIP. In the study 96,000 programs of various sizes were randomly generated, decomposed into fragments and transformed into test cases. The test cases were then used to regenerate new versions of the original programs using Zoea. The results show linear relationships between the number of intermediate values and regenerated program size, and between the number of test cases and regenerated program size within the size range studied. In addition, as program size increases there is increasing scope for trading off the number of test cases against the number of intermediate values and vice versa.
【47】 A Typology of Data Anomalies 标题:一种数据异常的类型学
作者:Ralph Foorthuis 机构:UWV, La Guardiaweg , HG Amsterdam, The Netherlands 备注:13 pages, 5 figures. Presented at the 17th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2018). Note: for a fully developed and more detailed typology of anomalies, see the follow-up publication 'On the Nature and Types of Anomalies: A Review of Deviations in Data'. arXiv admin note: text overlap with arXiv:2007.15634 链接:https://arxiv.org/abs/2107.01615 摘要:异常是指在某种程度上不寻常的情况,似乎不符合数据集中的一般模式。存在几种概念来区分不同类型的异常。然而,这些方法要么过于具体而不具有普遍适用性,要么过于抽象,既不能提供对异常类型性质的具体见解,也不能促进异常检测算法的功能评估。随着最近对“黑匣子”算法和分析的批评,这显然是一种不可取的情况。因此,本文介绍了一种通用的异常类型学,它为数据集中不同类型的异常提供了一个清晰而明确的定义。类型学也有助于评估异常检测算法的功能能力,并作为一个框架帮助分析数据、模式和异常的概念层次。最后,它作为一个分析工具来研究其他类型的异常类型。 摘要:Anomalies are cases that are in some way unusual and do not appear to fit the general patterns present in the dataset. Several conceptualizations exist to distinguish between different types of anomalies. However, these are either too specific to be generally applicable or so abstract that they neither provide concrete insight into the nature of anomaly types nor facilitate the functional evaluation of anomaly detection algorithms. With the recent criticism on 'black box' algorithms and analytics it has become clear that this is an undesirable situation. This paper therefore introduces a general typology of anomalies that offers a clear and tangible definition of the different types of anomalies in datasets. The typology also facilitates the evaluation of the functional capabilities of anomaly detection algorithms and as a framework assists in analyzing the conceptual levels of data, patterns and anomalies. Finally, it serves as an analytical tool for studying anomaly types from other typologies.
【48】 Leveraging Evidential Deep Learning Uncertainties with Graph-based Clustering to Detect Anomalies 标题:利用证据深度学习不确定性和基于图的聚类检测异常
作者:Sandeep Kumar Singh,Jaya Shradha Fowdur,Jakob Gawlikowski,Daniel Medina 机构:IEEE 备注:Under submission in a Journal 链接:https://arxiv.org/abs/2107.01557 摘要:理解和表示交通模式是探测海洋领域异常的关键。为此,我们提出了一种新的基于图的交通表示与关联方法,利用自动识别系统(AIS)数据对船舶的运动轨迹进行聚类。利用非聚类数据训练基于递归神经网络(RNN)的证据回归模型,该模型可以预测未来时间步长的船舶轨迹,并具有相应的预测不确定性。提出了一种基于深度学习(DL)的不确定度估计方法,用于异常船舶操纵等海上异常情况的检测。此外,我们利用证据深度学习分类器来侦测船舶的异常转弯和AIS信号的丢失,并使用相关不确定性的预测类别概率。我们的实验结果表明,使用基于图的聚类数据可以提高DL模型学习数据的时空相关性和相关不确定性的能力。使用不同的AIS数据集和实验,我们证明了估计的预测不确定性为检测海上交通异常和其他领域的交通异常提供了基本信息。 摘要:Understanding and representing traffic patterns are key to detecting anomalies in the maritime domain. To this end, we propose a novel graph-based traffic representation and association scheme to cluster trajectories of vessels using automatic identification system (AIS) data. We utilize the (un)clustered data to train a recurrent neural network (RNN)-based evidential regression model, which can predict a vessel's trajectory at future timesteps with its corresponding prediction uncertainty. This paper proposes the usage of a deep learning (DL)-based uncertainty estimation in detecting maritime anomalies, such as unusual vessel maneuvering. Furthermore, we utilize the evidential deep learning classifiers to detect unusual turns of vessels and the loss of AIS signal using predicted class probabilities with associated uncertainties. Our experimental results suggest that using graph-based clustered data improves the ability of the DL models to learn the temporal-spatial correlation of data and associated uncertainties. Using different AIS datasets and experiments, we demonstrate that the estimated prediction uncertainty yields fundamental information for the detection of traffic anomalies in the maritime and, possibly in other domains.
【49】 Incorporating Reachability Knowledge into a Multi-Spatial Graph Convolution Based Seq2Seq Model for Traffic Forecasting 标题:融合可达性知识的基于多空间图卷积的Seq2Seq交通预测模型
作者:Jiexia Ye,Furong Zheng,Juanjuan Zhao,Kejiang Ye,Chengzhong Xu 备注:12 pages, 9 figures 链接:https://arxiv.org/abs/2107.01528 摘要:准确的交通状态预测是交通控制和引导的基础。由于交通数据中存在复杂的时空相关性,这是一个非常具有挑战性的问题。对于未来较长时间段的多步交通量预测,现有的研究成果不能很好地实现。当输入步长和预测步长之间的时间间隔较大时,特别是当交通数据不充分或噪声较大时,时空信息的稀释变得非常严重。为了解决这个问题,我们提出了一个基于多空间图卷积的Seq2Seq模型。我们的主要创新点有三个方面:(1)通过融合多视角特征(时间、时间、时间和时间)来丰富模型输入的时空信息,(2)在先验知识和数据驱动知识的基础上建立多种空间相关性,以提高模型的性能,特别是在数据不足或有噪声的情况下(3) 提出了一种基于可达性知识的时空注意机制,将高层次的特征直接输入Seq2Seq解码器,以减少信息的稀释。我们的模型在两个真实的交通数据集上进行了评估,取得了比其他竞争对手更好的性能。 摘要:Accurate traffic state prediction is the foundation of transportation control and guidance. It is very challenging due to the complex spatiotemporal dependencies in traffic data. Existing works cannot perform well for multi-step traffic prediction that involves long future time period. The spatiotemporal information dilution becomes serve when the time gap between input step and predicted step is large, especially when traffic data is not sufficient or noisy. To address this issue, we propose a multi-spatial graph convolution based Seq2Seq model. Our main novelties are three aspects: (1) We enrich the spatiotemporal information of model inputs by fusing multi-view features (time, location and traffic states) (2) We build multiple kinds of spatial correlations based on both prior knowledge and data-driven knowledge to improve model performance especially in insufficient or noisy data cases. (3) A spatiotemporal attention mechanism based on reachability knowledge is novelly designed to produce high-level features fed into decoder of Seq2Seq directly to ease information dilution. Our model is evaluated on two real world traffic datasets and achieves better performance than other competitors.
【50】 AdaL: Adaptive Gradient Transformation Contributes to Convergences and Generalizations 标题:ADAL:自适应梯度变换有助于收敛和推广
作者:Hongwei Zhang,Weidong Zou,Hongbo Zhao,Qi Ming,Tijin Yan,Yuanqing Xia,Weipeng Cao 机构:Beijing Institute of Technology, Shenzhen University 链接:https://arxiv.org/abs/2107.01525 摘要:自适应优化方法在深度学习中得到了广泛的应用。该算法根据过去的梯度自适应地调整学习速率,有效地加快了收敛速度。然而,与SGD相比,它们的泛化性能较差。最近的研究表明,平滑指数梯度噪声会导致泛化退化现象。受此启发,我们提出了AdaL,在原始梯度上进行变换。AdaL在初始阶段通过增大梯度来加速收敛,在后期通过减小梯度来抑制振荡和稳定优化。这样的修改降低了梯度噪声的平滑度,产生了更好的泛化性能。我们从理论上证明了AdaL的收敛性,并在多个基准上证明了它的有效性。 摘要:Adaptive optimization methods have been widely used in deep learning. They scale the learning rates adaptively according to the past gradient, which has been shown to be effective to accelerate the convergence. However, they suffer from poor generalization performance compared with SGD. Recent studies point that smoothing exponential gradient noise leads to generalization degeneration phenomenon. Inspired by this, we propose AdaL, with a transformation on the original gradient. AdaL accelerates the convergence by amplifying the gradient in the early stage, as well as dampens the oscillation and stabilizes the optimization by shrinking the gradient later. Such modification alleviates the smoothness of gradient noise, which produces better generalization performance. We have theoretically proved the convergence of AdaL and demonstrated its effectiveness on several benchmarks.
【51】 A Data-Driven Method for Recognizing Automated Negotiation Strategies 标题:一种数据驱动的自动谈判策略识别方法
作者:Ming Li,Pradeep K. Murukannaiah,Catholijn M. Jonker 机构: Delft University of Technology 备注:17 pages 链接:https://arxiv.org/abs/2107.01496 摘要:了解对手的代理人有助于与之谈判。现有的关于理解对手的工作主要集中在偏好建模(或估计对手的效用函数)上。一个重要但尚未探索的方向是认识对手的谈判策略,它抓住了对手的战术,例如,在开始时强硬,但在最后期限前让步。认识到复杂的、最先进的谈判策略是极具挑战性的,简单的启发式方法可能不足以达到这一目的。我们提出了一种新的数据驱动的方法来识别对手的谈判策略。我们的方法包括一种数据生成方法,让代理通过跨域与各种对手协商来生成与域无关的序列;一种特征工程方法,将协商数据表示为具有时间步长特征和整体特征的时间序列,以及一种混合(基于递归神经网络)的深度学习方法,用于从投标的时间序列中识别对手的策略。我们进行了广泛的实验,跨越四个问题的情况下,以证明我们的方法的有效性。 摘要:Understanding an opponent agent helps in negotiating with it. Existing works on understanding opponents focus on preference modeling (or estimating the opponent's utility function). An important but largely unexplored direction is recognizing an opponent's negotiation strategy, which captures the opponent's tactics, e.g., to be tough at the beginning but to concede toward the deadline. Recognizing complex, state-of-the-art, negotiation strategies is extremely challenging, and simple heuristics may not be adequate for this purpose. We propose a novel data-driven approach for recognizing an opponent's s negotiation strategy. Our approach includes a data generation method for an agent to generate domain-independent sequences by negotiating with a variety of opponents across domains, a feature engineering method for representing negotiation data as time series with time-step features and overall features, and a hybrid (recurrent neural network-based) deep learning method for recognizing an opponent's strategy from the time series of bids. We perform extensive experiments, spanning four problem scenarios, to demonstrate the effectiveness of our approach.
【52】 On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs 标题:关于非属性图上图神经网络的位置和结构节点特征
作者:Hejie Cui,Zijie Lu,Pan Li,Carl Yang 机构:Department of Computer Science, Emory University, Department of Computer Science, University of Illinois at Urbana-Champaign, Department of Computer Science, Purdue University 备注:This paper has been accepted to the Sixth International Workshop on Deep Learning on Graphs (DLG-KDD'21) (co-located with KDD'21) 链接:https://arxiv.org/abs/2107.01495 摘要:图神经网络(Graph neural networks,GNNs)广泛应用于各种与图相关的问题,如节点分类和图分类,其优越的性能主要是在有自然节点特征的情况下建立的。然而,在没有自然节点特征的情况下,GNNs是如何工作的,尤其是对于构建人工节点的各种方法,人们还不太清楚。在本文中,我们指出了两类人工节点特征,即位置节点特征和结构节点特征,并分析了它们为什么更适合于某些任务,即位置节点分类、结构节点分类和图分类。在10个基准数据集上的大量实验结果验证了我们的观点,从而为非属性图上GNNs的不同人工节点特征的选择提供了实用的指导。代码可在https://github.com/zjzijielu/gnn-exp/. 摘要:Graph neural networks (GNNs) have been widely used in various graph-related problems such as node classification and graph classification, where the superior performance is mainly established when natural node features are available. However, it is not well understood how GNNs work without natural node features, especially regarding the various ways to construct artificial ones. In this paper, we point out the two types of artificial node features,i.e., positional and structural node features, and provide insights on why each of them is more appropriate for certain tasks,i.e., positional node classification, structural node classification, and graph classification. Extensive experimental results on 10 benchmark datasets validate our insights, thus leading to a practical guideline on the choices between different artificial node features for GNNs on non-attributed graphs. The code is available at https://github.com/zjzijielu/gnn-exp/.
【53】 Development of a Conversation State Recognition System 标题:一种通话状态识别系统的开发
作者:Sujay Uday Rittikar 机构:(DKTE’s Textile and Engineering Institute, India) 链接:https://arxiv.org/abs/2107.01462 摘要:随着使用LSTM的说话人重分类概念的发展,相对而言,理解输入音频流数据的特定段的说话人身份比手动标记数据更容易。考虑到这样的概念,非常希望考虑使用所识别的说话人身份来帮助识别会话中的说话人状态的可能性。在这项研究中,马尔可夫链被用来识别和更新同一组说话人之间的下一次会话的说话人状态,以便在最自然和最长的会话中识别他们的状态。该模型基于两个数据集中三个或三个以上说话人的自然对话的几个音频样本,识别状态的总错误百分比小于或等于12%。研究结果表明,对说话人二值化的扩展可以有效地预测会话的状态。 摘要:With the evolution of the concept of Speaker diarization using LSTM, it is relatively easier to understand the speaker identities for specific segments of input audio stream data than manually tagging the data. With such a concept, it is highly desirable to consider the possibility of using the identified speaker identities to aid in recognizing the Speaker States in a conversation. In this study, the Markov Chains are used to identify and update the Speaker States for the next conversations between the same set of speakers, to enable identification of their states in the most natural and long conversations. The model is based on several audio samples from natural conversations of three or greater than three speakers in two datasets with overall total error percentages for recognized states being lesser than or equal to 12%. The findings imply that the proposed extension to the Speaker diarization is effective to predict the states for a conversation.
【54】 Solving Infinite-Domain CSPs Using the Patchwork Property 标题:利用拼接性质求解无限域CSP
作者:Konrad K. Dabrowski,Peter Jonsson,Sebastian Ordyniak,George Osipov 机构:School of Computing, University of Leeds, UK, Department of Computer and Information Science, Link¨opings Universitet, Sweden. 备注:34 pages, 2 figures. Parts of this article appeared in the proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021) 链接:https://arxiv.org/abs/2107.01428 摘要:约束满足问题在计算机科学和人工智能中有着重要的应用。特别是无限域csp在人工智能的时空推理等领域得到了广泛的应用。由于约束满足是一个计算困难的问题,许多工作都致力于确定限制问题的有效解决。这样做的一种方法是限制变量和约束的相互作用,一种非常成功的方法是限制底层原始图的树宽。博迪尔斯基和达尔莫[J。计算机。系统。科学院。79(1),2013]和Huang等人[Artif。内尔。[1952013]证明了对于某些约束语言类$\Gamma$,CSP$(\Gamma)$可以在$n^{f(w)}$时间内求解(其中$n$是实例的大小,$w$是原始图的树宽,$f$是可计算函数)。我们将这个绑定改进为$f(w)\cdot n^{O(1)}$,其中函数$f$只依赖于语言$\Gamma$,用于基本关系具有拼凑特性的csp。因此,这类问题是固定参数可处理的,并且我们的算法比以前的算法渐近地快。此外,我们的方法不局限于二元约束,因此它适用于比Huang等人严格更大的一类问题。然而,存在Bodersky&Dalmau算法所涵盖的自然问题,而我们的算法所涵盖的自然问题却不存在,我们开始研究将结果推广到更大语系的方法。我们还分析了我们的算法的运行时间,并表明它是最优的(在指数时间假设下)某些语言,如艾伦的区间代数。 摘要:The constraint satisfaction problem (CSP) has important applications in computer science and AI. In particular, infinite-domain CSPs have been intensively used in subareas of AI such as spatio-temporal reasoning. Since constraint satisfaction is a computationally hard problem, much work has been devoted to identifying restricted problems that are efficiently solvable. One way of doing this is to restrict the interactions of variables and constraints, and a highly successful approach is to bound the treewidth of the underlying primal graph. Bodirsky & Dalmau [J. Comput. System. Sci. 79(1), 2013] and Huang et al. [Artif. Intell. 195, 2013] proved that CSP$(\Gamma)$ can be solved in $n^{f(w)}$ time (where $n$ is the size of the instance, $w$ is the treewidth of the primal graph and $f$ is a computable function) for certain classes of constraint languages $\Gamma$. We improve this bound to $f(w) \cdot n^{O(1)}$, where the function $f$ only depends on the language $\Gamma$, for CSPs whose basic relations have the patchwork property. Hence, such problems are fixed-parameter tractable and our algorithm is asymptotically faster than the previous ones. Additionally, our approach is not restricted to binary constraints, so it is applicable to a strictly larger class of problems than that of Huang et al. However, there exist natural problems that are covered by Bodirsky & Dalmau's algorithm but not by ours, and we begin investigating ways of generalising our results to larger families of languages. We also analyse our algorithm with respect to its running time and show that it is optimal (under the Exponential Time Hypothesis) for certain languages such as Allen's Interval Algebra.
【55】 Isotonic Data Augmentation for Knowledge Distillation 标题:知识提炼中的等张数据增强
作者:Wanyun Cui,Sen Yan 机构:Shanghai University of Finance and Economics 备注:7 pages 链接:https://arxiv.org/abs/2107.01412 摘要:知识提炼既使用真实的硬标签,也使用教师模型预测的软标签作为监督。直觉上,我们期望软标签和硬标签的概率顺序是一致的。然而,我们在增广样本中发现硬标签和软标签之间存在{\it临界顺序冲突}。例如,对于一个扩展的示例$x=0.7*panda+0.3*cat$,我们期望有意义的软标签的顺序是$P\text{soft}(panda | x)>P\text{soft}(cat | x)>P\text{soft}(other | x)$。但是真正的软标签通常会违反顺序,例如$P|text{soft}(tiger | x)>P|text{soft}(panda | x)>P|text{soft}(cat | x)$。我们把这归因于教师的泛化能力不理想,导致了增广样本的预测误差。通过实证分析,我们发现违规行为是常见的,并且会损害知识的传递,本文将序约束引入到知识提炼的数据扩充中,称为等渗数据扩充(IDA)。我们使用等渗回归(IR)——一种来自统计学的经典技术——来消除顺序冲突。我们证明了IDA可以建模为一个树结构的IR问题。因此,我们将经典的IRT-BIN算法用于时间复杂度为$O(c\log c)$的最优解,其中$c$是标签的数目。为了进一步降低时间复杂度,我们还提出了一种GPU友好的线性时间复杂度近似。我们已经在不同的数据集和数据扩充技术上验证了我们提出的IDA算法通过消除秩冲突有效地提高了知识提取的准确性。 摘要:Knowledge distillation uses both real hard labels and soft labels predicted by teacher models as supervision. Intuitively, we expect the soft labels and hard labels to be concordant w.r.t. their orders of probabilities. However, we found {\it critical order violations} between hard labels and soft labels in augmented samples. For example, for an augmented sample $x=0.7*panda+0.3*cat$, we expect the order of meaningful soft labels to be $P_\text{soft}(panda|x)>P_\text{soft}(cat|x)>P_\text{soft}(other|x)$. But real soft labels usually violate the order, e.g. $P_\text{soft}(tiger|x)>P_\text{soft}(panda|x)>P_\text{soft}(cat|x)$. We attribute this to the unsatisfactory generalization ability of the teacher, which leads to the prediction error of augmented samples. Empirically, we found the violations are common and injure the knowledge transfer.In this paper, we introduce order restrictions to data augmentation for knowledge distillation, which is denoted as isotonic data augmentation (IDA). We use isotonic regression (IR) -- a classic technique from statistics -- to eliminate the order violations. We show that IDA can be modeled as a tree-structured IR problem. We thereby adapt the classical IRT-BIN algorithm for optimal solutions with $O(c \log c)$ time complexity, where $c$ is the number of labels. In order to further reduce the time complexity, we also \cwy{propose} a GPU-friendly approximation with linear time complexity. We have verified on variant datasets and data augmentation techniques that our proposed IDA algorithms effectively increases the accuracy of knowledge distillation by eliminating the rank violations.
【56】 Maximum Entropy Weighted Independent Set Pooling for Graph Neural Networks 标题:图神经网络的最大熵加权独立集池化
作者:Amirhossein Nouranizadeh,Mohammadjavad Matinkia,Mohammad Rahmati,Reza Safabakhsh 机构:Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran 备注:21 pages, 12 figures, under review in 35th Conference on Neural Information Processing Systems (NeurIPS 2021) 链接:https://arxiv.org/abs/2107.01410 摘要:本文提出了一种新的基于最大化图与输入图互信息的图神经网络池层。由于最大互信息量难以计算,我们采用图的Shannon容量作为归集方法的归纳偏差。更准确地说,我们证明了池层的输入图可以看作是一个噪声通信信道的表示。对于这样的信道,发送属于图的独立集合的符号产生可靠且无差错的信息传输。我们证明了达到最大互信息等价于找到一个最大权重无关的图集,其中权重表示熵的内容。通过这个通信理论的观点,我们提出了一个独特的观点,提出了一个最大化信息传输速率的图池问题,在一个有噪声的通信信道,实现了一个图神经网络。我们评估了我们的方法,称为最大熵加权独立集池(MEWISPool),对图分类任务和最大独立集的组合优化问题。实验结果表明,该方法在多个基准数据集的图分类任务和最大独立集问题上取得了最新的、有竞争力的结果。 摘要:In this paper, we propose a novel pooling layer for graph neural networks based on maximizing the mutual information between the pooled graph and the input graph. Since the maximum mutual information is difficult to compute, we employ the Shannon capacity of a graph as an inductive bias to our pooling method. More precisely, we show that the input graph to the pooling layer can be viewed as a representation of a noisy communication channel. For such a channel, sending the symbols belonging to an independent set of the graph yields a reliable and error-free transmission of information. We show that reaching the maximum mutual information is equivalent to finding a maximum weight independent set of the graph where the weights convey entropy contents. Through this communication theoretic standpoint, we provide a distinct perspective for posing the problem of graph pooling as maximizing the information transmission rate across a noisy communication channel, implemented by a graph neural network. We evaluate our method, referred to as Maximum Entropy Weighted Independent Set Pooling (MEWISPool), on graph classification tasks and the combinatorial optimization problem of the maximum independent set. Empirical results demonstrate that our method achieves the state-of-the-art and competitive results on graph classification tasks and the maximum independent set problem in several benchmark datasets.
【57】 Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning 标题:绿草人在哪里?再论离线强化学习的广义策略迭代
作者:Lionel Blondé,Alexandros Kalousis 机构:University of Geneva, HES-SO, Switzerland 链接:https://arxiv.org/abs/2107.01407 摘要:在离线RL状态下,最新基线的性能在数据集质量的范围内变化很大,从“远非最佳”的随机数据到“接近最佳”的专家演示。我们在一个公平、统一和高度分解的框架下重新实现了这些方法,并表明当给定的基线在谱的一端优于竞争对手时,它在谱的另一端永远不会。这种一致的趋势使我们无法命名一个胜利者,它在所有方面都优于其他人。我们将质量谱两端的性能不对称性归因于注入代理的归纳偏差量,以诱使代理假定脱机数据集的行为对于任务是最佳的。如果数据集接近最优,则注入的偏差越多,代理的性能就越高。否则,它的影响是极其有害的。采用优势加权回归模板作为基础,我们进行了一项调查,证实了这种最优诱导偏倚的注入,当不吝啬地进行时,一旦离线策略是次优的,就会使代理在数据集中处于次优地位。为了设计在整个范围内表现良好的方法,我们重新研究了离线状态下的广义策略迭代方案,并研究了九种不同的新引入的提议分布对操作的影响,涉及提议的策略评估和策略改进更新规则的概化。我们表明,某些编排达到了正确的平衡,可以在不损害另一端性能的情况下提高频谱一端的性能。 摘要:The performance of state-of-the-art baselines in the offline RL regime varies widely over the spectrum of dataset qualities, ranging from "far-from-optimal" random data to "close-to-optimal" expert demonstrations. We re-implement these under a fair, unified, and highly factorized framework, and show that when a given baseline outperforms its competing counterparts on one end of the spectrum, it never does on the other end. This consistent trend prevents us from naming a victor that outperforms the rest across the board. We attribute the asymmetry in performance between the two ends of the quality spectrum to the amount of inductive bias injected into the agent to entice it to posit that the behavior underlying the offline dataset is optimal for the task. The more bias is injected, the higher the agent performs, provided the dataset is close-to-optimal. Otherwise, its effect is brutally detrimental. Adopting an advantage-weighted regression template as base, we conduct an investigation which corroborates that injections of such optimality inductive bias, when not done parsimoniously, makes the agent subpar in the datasets it was dominant as soon as the offline policy is sub-optimal. In an effort to design methods that perform well across the whole spectrum, we revisit the generalized policy iteration scheme for the offline regime, and study the impact of nine distinct newly-introduced proposal distributions over actions, involved in proposed generalization of the policy evaluation and policy improvement update rules. We show that certain orchestrations strike the right balance and can improve the performance on one end of the spectrum without harming it on the other end.
【58】 Demiguise Attack: Crafting Invisible Semantic Adversarial Perturbations with Perceptual Similarity 标题:去模式化攻击:利用知觉相似性制造看不见的语义对抗性扰动
作者:Yajie Wang,Shangbo Wu,Wenyi Jiang,Shengang Hao,Yu-an Tan,Quanxin Zhang 机构:School of Computer Science and Technology, Beijing Institute of Technology, School of Cyberspace Science and Technology, Beijing Institute of Technology, School of Computer Science and Technology, Nanyang Normal University 链接:https://arxiv.org/abs/2107.01396 摘要:深度神经网络(DNNs)已被发现容易受到对手的例子。敌对的例子是带有视觉上难以察觉的干扰的恶意图像。虽然这些精心制作的扰动被严格的$\Lp$范数限制很小,但它们仍然很容易被人类察觉。当攻击黑盒模型或具有降噪滤波器等防御措施的模型时,这些扰动的成功率也有限。为了解决这些问题,我们提出了半伪装攻击,用感知相似性制造“无限制”扰动。具体来说,我们可以通过基于感知相似度的语义信息来创建强大的真实感对抗性示例。我们生成的对抗性例子对人类视觉系统(HVS)是友好的,尽管扰动的幅度很大。我们用我们的方法扩展了广泛使用的攻击,显著地增强了对手的有效性,同时也提高了不易察觉性。大量实验表明,该方法不仅在欺骗率、可转移性、抗攻击能力等方面优于现有的各种攻击,而且能有效地提高攻击性能。此外,我们还注意到,我们的实现可以模拟真实场景中发生的照明和对比度变化,这将有助于暴露DNNs的盲点。 摘要:Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples. Adversarial examples are malicious images with visually imperceptible perturbations. While these carefully crafted perturbations restricted with tight $\Lp$ norm bounds are small, they are still easily perceivable by humans. These perturbations also have limited success rates when attacking black-box models or models with defenses like noise reduction filters. To solve these problems, we propose Demiguise Attack, crafting ``unrestricted'' perturbations with Perceptual Similarity. Specifically, we can create powerful and photorealistic adversarial examples by manipulating semantic information based on Perceptual Similarity. Adversarial examples we generate are friendly to the human visual system (HVS), although the perturbations are of large magnitudes. We extend widely-used attacks with our approach, enhancing adversarial effectiveness impressively while contributing to imperceptibility. Extensive experiments show that the proposed method not only outperforms various state-of-the-art attacks in terms of fooling rate, transferability, and robustness against defenses but can also improve attacks effectively. In addition, we also notice that our implementation can simulate illumination and contrast changes that occur in real-world scenarios, which will contribute to exposing the blind spots of DNNs.
【59】 Memory and attention in deep learning 标题:深度学习中的记忆与注意
作者:Hung Le 机构: (Honours)Submitted in fulfilment of the requirements for the degree ofDoctor of PhilosophyDeakin UniversityAugust 20 19 备注:PHD Thesis 链接:https://arxiv.org/abs/2107.01390 摘要:智力需要记忆力。没有记忆,人类就不能完成各种非平凡的任务,比如读小说、玩游戏或解数学。机器学习的最终目标是衍生出能像人类一样自动学习和行动的智能系统,因此机器的记忆构建是不可避免的。人工神经网络通过权值将计算单元连接起来,对大脑中的神经元和突触进行建模,这是一类典型的类似于记忆结构的机器学习算法。他们的后代用更复杂的建模技术(又称深度学习)成功地应用于许多实际问题,证明了记忆在机械系统学习过程中的重要性。深部学习中记忆建模的最新进展是围绕着外部记忆结构展开的,它受到了计算图灵模型和生物神经元系统的高度启发。注意机制是用来支持外部记忆的获取和保持操作的。尽管缺乏理论基础,这些方法已经显示出帮助机械系统达到更高智能水平的希望。本论文旨在促进对深度学习中记忆和注意的理解。它的贡献包括:(i)提出了一系列记忆分类法,(ii)构建了支持多个控制和记忆单元的新的记忆增强神经网络(mann),(iii)通过顺序生成模型中的记忆引入可变性,(iv)在基于槽的存储网络中寻找最佳写入操作以最大化存储容量;(v)通过神经存储程序存储器(一种新的神经网络外部存储器)模拟通用图灵机。 摘要:Intelligence necessitates memory. Without memory, humans fail to perform various nontrivial tasks such as reading novels, playing games or solving maths. As the ultimate goal of machine learning is to derive intelligent systems that learn and act automatically just like human, memory construction for machine is inevitable. Artificial neural networks model neurons and synapses in the brain by interconnecting computational units via weights, which is a typical class of machine learning algorithms that resembles memory structure. Their descendants with more complicated modeling techniques (a.k.a deep learning) have been successfully applied to many practical problems and demonstrated the importance of memory in the learning process of machinery systems. Recent progresses on modeling memory in deep learning have revolved around external memory constructions, which are highly inspired by computational Turing models and biological neuronal systems. Attention mechanisms are derived to support acquisition and retention operations on the external memory. Despite the lack of theoretical foundations, these approaches have shown promises to help machinery systems reach a higher level of intelligence. The aim of this thesis is to advance the understanding on memory and attention in deep learning. Its contributions include: (i) presenting a collection of taxonomies for memory, (ii) constructing new memory-augmented neural networks (MANNs) that support multiple control and memory units, (iii) introducing variability via memory in sequential generative models, (iv) searching for optimal writing operations to maximise the memorisation capacity in slot-based memory networks, and (v) simulating the Universal Turing Machine via Neural Stored-program Memory-a new kind of external memory for neural networks.
【60】 Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN 标题:Transformer能用自然语言跳来跳去吗?评估从扫描转移的性能
作者:Rahma Chaabouni,Roberto Dessì,Eugene Kharitonov 机构:Ecole Normale Superieure, Roberto Dessı, Facebook AI & Pompeu Fabra 链接:https://arxiv.org/abs/2107.01366 摘要:尽管现代seq2seq体系结构在实际应用中取得了成功,但它们无法系统地概括几个扫描任务。因此,目前尚不清楚扫描样式的合成泛化在实际NLP任务中是否有用。在这项工作中,我们研究这样的组合带来的好处,几个机器翻译任务。我们提出了几个重点修改的Transformer,极大地提高泛化能力扫描,并选择一个仍然与标准的机器翻译(MT)任务上的香草Transformer。接下来,我们研究了它在低资源环境下的性能和一个新引入的分布移位英法翻译任务。总的来说,我们发现一个能够扫描的模型的改进不会直接转移到资源丰富的MT设置。相比之下,在低资源设置中,一般的修改可以使BLEU评分w.r.t.a提高13.1%。同样,在引入的合成英法翻译任务中,基于精确度的度量提高了14%。这提供了实验证据,证明SCAN中评估的合成泛化在资源匮乏和域转移的场景中特别有用。 摘要:Despite their practical success, modern seq2seq architectures are unable to generalize systematically on several SCAN tasks. Hence, it is not clear if SCAN-style compositional generalization is useful in realistic NLP tasks. In this work, we study the benefit that such compositionality brings about to several machine translation tasks. We present several focused modifications of Transformer that greatly improve generalization capabilities on SCAN and select one that remains on par with a vanilla Transformer on a standard machine translation (MT) task. Next, we study its performance in low-resource settings and on a newly introduced distribution-shifted English-French translation task. Overall, we find that improvements of a SCAN-capable model do not directly transfer to the resource-rich MT setup. In contrast, in the low-resource setup, general modifications lead to an improvement of up to 13.1% BLEU score w.r.t. a vanilla Transformer. Similarly, an improvement of 14% in an accuracy-based metric is achieved in the introduced compositional English-French translation task. This provides experimental evidence that the compositional generalization assessed in SCAN is particularly useful in resource-starved and domain-shifted scenarios.
【61】 Sensor-invariant Fingerprint ROI Segmentation Using Recurrent Adversarial Learning 标题:基于递归对抗性学习的传感器不变指纹ROI分割
作者:Indu Joshi,Ayush Utkarsh,Riya Kothari,Vinod K Kurmi,Antitza Dantcheva,Sumantra Dutta Roy,Prem Kumar Kalra 机构:IIT Delhi, India, Independent Researcher, India, USC, USA, IIT Kanpur, India, Inria Sophia Antipolis, France 备注:None 链接:https://arxiv.org/abs/2107.01361 摘要:设计了一种指纹感兴趣区域(roi)分割算法,将前景指纹与背景噪声分离。文献中提出的所有基于学习的最新指纹roi分割算法都是在训练和测试数据库都由同一传感器采集的指纹图像组成的场景下进行基准测试的。然而,当在不同的传感器上进行测试时,得到的分割性能往往不能令人满意。因此,每次使用一个新的指纹传感器进行测试时,都需要用从新传感器获取的指纹图像及其相应的人工标记的感兴趣区域对指纹感兴趣区域分割模型进行重新训练。手动标记指纹感兴趣区是昂贵的,因为首先,它是耗时的,更重要的是,需要领域的专业知识。为了节省人工生成最新技术所需注释的工作量,我们提出了一种指纹感兴趣区域分割模型,该模型将来自未知传感器的指纹图像的特征对齐,使其与从可用于训练的地面真实感兴趣区域模板的指纹中获得的特征相似。具体来说,我们提出了一种基于循环对抗学习的特征对齐网络,帮助指纹roi分割模型学习传感器不变的特征。因此,所提出的感兴趣区域分割模型学习到的感兴趣区域不变特征有助于提高从新传感器获取的指纹的分割性能。在公开的FVC数据库上的实验证明了该方法的有效性。 摘要:A fingerprint region of interest (roi) segmentation algorithm is designed to separate the foreground fingerprint from the background noise. All the learning based state-of-the-art fingerprint roi segmentation algorithms proposed in the literature are benchmarked on scenarios when both training and testing databases consist of fingerprint images acquired from the same sensors. However, when testing is conducted on a different sensor, the segmentation performance obtained is often unsatisfactory. As a result, every time a new fingerprint sensor is used for testing, the fingerprint roi segmentation model needs to be re-trained with the fingerprint image acquired from the new sensor and its corresponding manually marked ROI. Manually marking fingerprint ROI is expensive because firstly, it is time consuming and more importantly, requires domain expertise. In order to save the human effort in generating annotations required by state-of-the-art, we propose a fingerprint roi segmentation model which aligns the features of fingerprint images derived from the unseen sensor such that they are similar to the ones obtained from the fingerprints whose ground truth roi masks are available for training. Specifically, we propose a recurrent adversarial learning based feature alignment network that helps the fingerprint roi segmentation model to learn sensor-invariant features. Consequently, sensor-invariant features learnt by the proposed roi segmentation model help it to achieve improved segmentation performance on fingerprints acquired from the new sensor. Experiments on publicly available FVC databases demonstrate the efficacy of the proposed work.
【62】 CInC Flow: Characterizable Invertible 3x3 Convolution 标题:CINC流:特征化的可逆3x3卷积
作者:Sandeep Nagar,Marius Dufraisse,Girish Varma 机构:Machine Learning Lab, International Institute of Information Technology, Hyderabad, India, Computer Science Dept., École Normale Supérieure (ENS), Paris-Saclay, France 备注:Accepted for the 4th Workshop on Tractable Probabilistic Modeling,(UAI 2021) 链接:https://arxiv.org/abs/2107.01358 摘要:规范化流是生成性建模中GANs的一个重要替代方法,可以直接根据数据集的最大可能性进行优化。由于它们是由可逆变换组成的,因此它们还允许计算对应于图像的精确潜在向量。然而,由于变换可逆性的要求,使得CNNs等标准的表达型神经网络模型无法直接使用。提出了利用一对掩蔽的CNN层构造3$\times$3cnn可逆层的紧急卷积方法,使其效率低下。我们研究了3$\times$3 CNN是可逆的条件,允许它们构造表达性规范化流。我们得到了填充CNN是可逆的充要条件。我们的可逆性条件很简单,在训练过程中很容易保持。由于每个有效的可逆CNN层只需要一个CNN层,因此我们的方法比新兴卷积更有效。我们还提出了一种耦合方法,四耦合。我们对我们的方法进行了基准测试,并显示出与紧急卷积相似的性能结果,同时提高了模型的效率。 摘要:Normalizing flows are an essential alternative to GANs for generative modelling, which can be optimized directly on the maximum likelihood of the dataset. They also allow computation of the exact latent vector corresponding to an image since they are composed of invertible transformations. However, the requirement of invertibility of the transformation prevents standard and expressive neural network models such as CNNs from being directly used. Emergent convolutions were proposed to construct an invertible 3$\times$3 CNN layer using a pair of masked CNN layers, making them inefficient. We study conditions such that 3$\times$3 CNNs are invertible, allowing them to construct expressive normalizing flows. We derive necessary and sufficient conditions on a padded CNN for it to be invertible. Our conditions for invertibility are simple, can easily be maintained during the training process. Since we require only a single CNN layer for every effective invertible CNN layer, our approach is more efficient than emerging convolutions. We also proposed a coupling method, Quad-coupling. We benchmark our approach and show similar performance results to emergent convolutions while improving the model's efficiency.
【63】 Pool of Experts: Realtime Querying Specialized Knowledge in Massive Neural Networks 标题:专家库:海量神经网络中专业知识的实时查询
作者:Hakbin Kim,Dong-Wan Choi 机构:Inha University, Incheon, South Korea 备注:None 链接:https://arxiv.org/abs/2107.01354 摘要:尽管深度学习技术取得了巨大的成功,但训练和提供一个实用的模型仍然是一个非常耗时的过程。此外,生成的模型通常过于通用和繁重,因此基本上要经历另一个昂贵的模型压缩阶段,以适应资源有限的设备(如嵌入式系统)。受移动用户特别要求的机器学习任务往往比大规模通用模型支持的任务简单得多这一事实的启发,本文提出了一种称为PoE(PoE)的框架,该框架可以在不需要任何训练过程的情况下立即构建轻量级的特定于任务的模型。对于实时模型查询服务,PoE首先利用一种新的条件知识提取方法,从训练有素且足够通用的网络中提取一个称为专家的原始组件池,然后执行我们的无训练知识整合,将必要的专家快速组合到一个轻量级网络中,以完成一项目标任务。由于这种无需训练的特性,在我们深入的实证研究中,PoE可以实时地建立一个相当精确但紧凑的模型,而其他训练方法每次查询都需要几分钟才能达到类似的精度水平。 摘要:In spite of the great success of deep learning technologies, training and delivery of a practically serviceable model is still a highly time-consuming process. Furthermore, a resulting model is usually too generic and heavyweight, and hence essentially goes through another expensive model compression phase to fit in a resource-limited device like embedded systems. Inspired by the fact that a machine learning task specifically requested by mobile users is often much simpler than it is supported by a massive generic model, this paper proposes a framework, called Pool of Experts (PoE), that instantly builds a lightweight and task-specific model without any training process. For a realtime model querying service, PoE first extracts a pool of primitive components, called experts, from a well-trained and sufficiently generic network by exploiting a novel conditional knowledge distillation method, and then performs our train-free knowledge consolidation to quickly combine necessary experts into a lightweight network for a target task. Thanks to this train-free property, in our thorough empirical study, PoE can build a fairly accurate yet compact model in a realtime manner, whereas it takes a few minutes per query for the other training methods to achieve a similar level of the accuracy.
【64】 Spatiotemporal convolutional network for time-series prediction and causal inference 标题:用于时间序列预测和因果推理的时空卷积网络
作者:Hao Peng,Pei Chen,Rui Liu,Luonan Chen 机构: School of Mathematics, South China University of Technology, Guangzhou , China., Pazhou Lab, Guangzhou , China., State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology 备注:23 pages, 6 figures 链接:https://arxiv.org/abs/2107.01353 摘要:对于非线性系统,用稳健的方法进行预测是不容易的。本文提出了一种基于时空卷积网络(STCN)的神经网络计算框架,利用时空信息(STI)变换高效、准确地对时间序列进行多步预测。STCN结合了时间卷积网络(TCN)和STI方程的优点,后者将高维/空间数据映射到目标变量的未来时间值,从而自然地提供了目标变量的预测。STCN还从观测变量中推断出目标变量在Granger因果关系意义下的因果因素,并以此作为有效的空间信息,提高预测的鲁棒性。STCN已成功地应用于基准系统和实际数据集,在多步预测中表现出良好的鲁棒性,即使数据受到噪声干扰。从理论和计算的角度来看,STCN作为一种仅基于观测数据的无模型方法,在人工智能或机器学习领域有着巨大的应用潜力,同时也为机器学习开辟了一条动态探索高维观测数据的新途径。 摘要:Making predictions in a robust way is not easy for nonlinear systems. In this work, a neural network computing framework, i.e., a spatiotemporal convolutional network (STCN), was developed to efficiently and accurately render a multistep-ahead prediction of a time series by employing a spatial-temporal information (STI) transformation. The STCN combines the advantages of both the temporal convolutional network (TCN) and the STI equation, which maps the high-dimensional/spatial data to the future temporal values of a target variable, thus naturally providing the prediction of the target variable. From the observed variables, the STCN also infers the causal factors of the target variable in the sense of Granger causality, which are in turn selected as effective spatial information to improve the prediction robustness. The STCN was successfully applied to both benchmark systems and real-world datasets, all of which show superior and robust performance in multistep-ahead prediction, even when the data were perturbed by noise. From both theoretical and computational viewpoints, the STCN has great potential in practical applications in artificial intelligence (AI) or machine learning fields as a model-free method based only on the observed data, and also opens a new way to explore the observed high-dimensional data in a dynamical manner for machine learning.
【65】 Split-and-Bridge: Adaptable Class Incremental Learning within a Single Neural Network 标题:分裂桥接法:单神经网络中的自适应类增量学习
作者:Jong-Yeong Kim,Dong-Wan Choi 机构:Department of Computer Science and Engineering, Inha University, South Korea 备注:None 链接:https://arxiv.org/abs/2107.01349 摘要:持续学习一直是深度学习社区中的一个主要问题,其中的主要挑战是如何有效地学习一系列新到的任务而不忘记以前任务的知识。从不遗忘学习(LwF)开始,已有的许多研究报告称,知识提取可以有效地保留以前的知识,因此他们通常对旧任务使用软标签,即知识提取(KD)损失,同时对新任务使用类标签,即交叉熵(CE)损失,形成单一神经网络的复合损耗。然而,这种方法存在通过CE损失来学习知识的问题,因为当它们在单个网络中处于竞争环境时,KD损失往往对目标函数的影响更大。这可能是一个关键问题,特别是在类增量场景中,由于统一分类器的存在,跨任务的知识以及新任务中的知识(这两者都只能通过CE损失获得)基本上是学习的。本文提出了一种新的连续学习方法Split-and-Bridge,该方法将神经网络分成两个部分,分别训练新任务和旧任务,并重新连接新任务和旧任务来学习任务间的知识,从而成功地解决了上述问题。在我们深入的实验分析中,我们的分裂和桥接方法在基于KD的持续学习中优于最先进的竞争对手。 摘要:Continual learning has been a major problem in the deep learning community, where the main challenge is how to effectively learn a series of newly arriving tasks without forgetting the knowledge of previous tasks. Initiated by Learning without Forgetting (LwF), many of the existing works report that knowledge distillation is effective to preserve the previous knowledge, and hence they commonly use a soft label for the old task, namely a knowledge distillation (KD) loss, together with a class label for the new task, namely a cross entropy (CE) loss, to form a composite loss for a single neural network. However, this approach suffers from learning the knowledge by a CE loss as a KD loss often more strongly influences the objective function when they are in a competitive situation within a single network. This could be a critical problem particularly in a class incremental scenario, where the knowledge across tasks as well as within the new task, both of which can only be acquired by a CE loss, is essentially learned due to the existence of a unified classifier. In this paper, we propose a novel continual learning method, called Split-and-Bridge, which can successfully address the above problem by partially splitting a neural network into two partitions for training the new task separated from the old task and re-connecting them for learning the knowledge across tasks. In our thorough experimental analysis, our Split-and-Bridge method outperforms the state-of-the-art competitors in KD-based continual learning.
【66】 Examining average and discounted reward optimality criteria in reinforcement learning 标题:强化学习中平均和贴现报酬最优准则的检验
作者:Vektor Dewanto,Marcus Gallagher 机构:School of Information Technology and Electrical Engineering, University of Queensland, Australia 备注:14 pages, 3 figures, 10-page main content 链接:https://arxiv.org/abs/2107.01348 摘要:在强化学习(RL)中,目标是获得一个最优策略,其中最优性准则至关重要。两个主要的最优性标准是平均报酬和折扣报酬,后者通常被认为是前者的近似值。虽然折扣奖励更受欢迎,但在没有折扣的自然概念的环境中应用是有问题的。这促使我们重新审视a)动态规划中最优性标准的发展,b)人工贴现因子的合理性和复杂性,以及c)直接使平均报酬最大化的好处。我们的贡献包括对平均报酬和折扣报酬之间关系的彻底研究,以及对它们在RL中的优缺点的讨论。我们强调平均报酬RL方法具有发展RL中一般无折扣最优准则(Veinott,1969)的成分和机制。 摘要:In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality criterion is fundamentally important. Two major optimality criteria are average and discounted rewards, where the later is typically considered as an approximation to the former. While the discounted reward is more popular, it is problematic to apply in environments that have no natural notion of discounting. This motivates us to revisit a) the progression of optimality criteria in dynamic programming, b) justification for and complication of an artificial discount factor, and c) benefits of directly maximizing the average reward. Our contributions include a thorough examination of the relationship between average and discounted rewards, as well as a discussion of their pros and cons in RL. We emphasize that average-reward RL methods possess the ingredient and mechanism for developing the general discounting-free optimality criterion (Veinott, 1969) in RL.
【67】 Traffic Signal Control with Communicative Deep Reinforcement Learning Agents: a Case Study 标题:基于通信型深度强化学习智能体的交通信号控制实例研究
作者:Paolo Fazzini,Isaac Wheeler,Francesco Petracchini 机构:Institute of Atmospheric Pollution Research, CNR, Rome, Italy, Department of Chemical Engineering Brigham Young, University Provo UT , United States, Address: co CNR-IIA, Via Salaria Km , - , Monterotondo(Rome) 备注:41 pages, 16 figures 链接:https://arxiv.org/abs/2107.01347 摘要:本文从理论和实验两方面分析了多智能体优势-行为-批评(MA2C)和独立优势-行为-批评(IA2C)这两种新近提出的可应用于城市交通信号控制的多智能体强化学习方法。这两种方法在使用本地或全球计算的奖励以及管理代理的通信方面有所不同。我们利用非马尔可夫决策过程提供的框架对这些方法进行了理论分析,为算法分析提供了有益的启示。此外,我们通过在意大利博洛尼亚地区的两个交通区域进行实验,并用SUMO软件进行模拟,分析了这些方法的有效性和鲁棒性。实验结果表明,MA2C在大多数情况下都取得了最好的性能,优于所考虑的替代方法,并且在学习过程中表现出足够的稳定性。 摘要:In this work we theoretically and experimentally analyze Multi-Agent Advantage Actor-Critic (MA2C) and Independent Advantage Actor-Critic (IA2C), two recently proposed multi-agent reinforcement learning methods that can be applied to control traffic signals in urban areas. The two methods differ in their use of a reward calculated locally or globally and in the management of agents' communication. We analyze the methods theoretically with the framework provided by non-Markov decision processes, which provides useful insights in the analysis of the algorithms. Moreover, we analyze the efficacy and the robustness of the methods experimentally by testing them in two traffic areas in the Bologna (Italy) area, simulated by SUMO, a software tool. The experimental results indicate that MA2C achieves the best performance in the majority of cases, outperforms the alternative method considered, and displays sufficient stability during the learning process.
【68】 SHORING: Design Provable Conditional High-Order Interaction Network via Symbolic Testing 标题:支撑:通过符号测试设计可证明的条件高阶交互网络
作者:Hui Li,Xing Fu,Ruofan Wu,Jinyu Xu,Kai Xiao,Xiaofu Chang,Weiqiang Wang,Shuai Chen,Leilei Shi,Tao Xiong,Yuan Qi 机构:Ant Group, Hang Zhou, China 备注:18 pages, 4 figures 链接:https://arxiv.org/abs/2107.01326 摘要:深度学习为以端到端的方式从原始数据中提取有效表示提供了一种很有前途的方法,并已在计算机视觉、自然语言处理等领域证明了其有效性。然而,在内容/产品推荐和风险管理等领域,在事件数据序列是最常用的原始数据形式,而专家衍生的特征更常用的情况下,深度学习模型很难主宰游戏。在本文中,我们提出了一个符号测试框架,有助于回答什么样的专家衍生特征可以通过神经网络学习的问题。受此测试框架的启发,我们引入了一个名为SHORING的高效体系结构,它包含两个组件:\textit{event network}和\textit{sequence network}。通过一个可证明的重参数化技巧,\textit{sequence}网络从\textit{event level}嵌入序列聚合而来,\textit{event}网络可以任意但有效地学习高阶\textit{event level}嵌入。我们认为支撑能够学习标准的多头自我注意网络无法学习的某些标准符号表达,并对四个合成数据集和三个真实数据集进行了综合实验和烧蚀研究。结果表明,支护方法在经验上优于最先进的方法。 摘要:Deep learning provides a promising way to extract effective representations from raw data in an end-to-end fashion and has proven its effectiveness in various domains such as computer vision, natural language processing, etc. However, in domains such as content/product recommendation and risk management, where sequence of event data is the most used raw data form and experts derived features are more commonly used, deep learning models struggle to dominate the game. In this paper, we propose a symbolic testing framework that helps to answer the question of what kinds of expert-derived features could be learned by a neural network. Inspired by this testing framework, we introduce an efficient architecture named SHORING, which contains two components: \textit{event network} and \textit{sequence network}. The \textit{event} network learns arbitrarily yet efficiently high-order \textit{event-level} embeddings via a provable reparameterization trick, the \textit{sequence} network aggregates from sequence of \textit{event-level} embeddings. We argue that SHORING is capable of learning certain standard symbolic expressions which the standard multi-head self-attention network fails to learn, and conduct comprehensive experiments and ablation studies on four synthetic datasets and three real-world datasets. The results show that SHORING empirically outperforms the state-of-the-art methods.
【69】 Fair Decision Rules for Binary Classification 标题:二分类的公平决策规则
作者:Connor Lawless,Oktay Gunluk 机构:School of Operations Research and Information Engineering, Cornell University 链接:https://arxiv.org/abs/2107.01325 摘要:近年来,机器学习已经开始在大学招生、信贷和刑事判决等领域实现决策自动化。其中一些应用程序的社会敏感性以及日益增加的监管限制要求采用既公平又可解释的算法。在本文中,我们考虑在析取范式(DNF)中建立布尔规则集的问题,该模型是一个可解释的二元分类模型,服从公平约束。我们将问题描述为一个整数规划,在两种不同的分类奇偶性度量(机会均等和机会均等)上有明确的约束条件,使分类精度最大化。列生成框架采用了一种新的形式,可以有效地搜索指数级的多个可能规则。当与更快的启发式算法相结合时,我们的方法可以处理大数据集。与其他公平和可解释的分类器相比,我们的方法能够找到满足更严格公平概念的规则集,并且在准确性上有适度的折衷。 摘要:In recent years, machine learning has begun automating decision making in fields as varied as college admissions, credit lending, and criminal sentencing. The socially sensitive nature of some of these applications together with increasing regulatory constraints has necessitated the need for algorithms that are both fair and interpretable. In this paper we consider the problem of building Boolean rule sets in disjunctive normal form (DNF), an interpretable model for binary classification, subject to fairness constraints. We formulate the problem as an integer program that maximizes classification accuracy with explicit constraints on two different measures of classification parity: equality of opportunity and equalized odds. Column generation framework, with a novel formulation, is used to efficiently search over exponentially many possible rules. When combined with faster heuristics, our method can deal with large data-sets. Compared to other fair and interpretable classifiers, our method is able to find rule sets that meet stricter notions of fairness with a modest trade-off in accuracy.
【70】 Non-Comparative Fairness for Human-Auditing and Its Relation to Traditional Fairness Notions 标题:人力审计的非比较公正性及其与传统公正观的关系
作者:Mukund Telukunta,Venkata Sriram Siddhardh Nadendla 机构:Computer Science Department, Missouri University of Science and Technology, Rolla, Missouri 备注:arXiv admin note: substantial text overlap with arXiv:2009.04383 链接:https://arxiv.org/abs/2107.01277 摘要:在基于机器学习的服务(MLS)中,基于传统算法公平性概念(依赖于比较原则)的偏差评估实际上是困难的,因此有必要依赖于人工审计反馈。然而,尽管对各种比较公平的概念进行了严格的训练,但众所周知,人类审核员在实践中对公平概念的各个方面存在分歧,因此很难收集到可靠的反馈。本文提出了一种基于非比较正义原则的新的公平概念,从而实现了向算法公平领域的范式转换。与传统的公平性概念(两个人/群体的结果进行比较)相反,我们提出的概念将MLS的结果与每个输入的期望结果进行比较。这个期望的结果自然地描述了人类审核员的期望,并且可以很容易地用于评估群组审核平台上的MLS。我们表明,任何MLS都可以被认为是公平的,从比较公平的角度来看(无论是在个人公平,统计均等,机会均等或校准方面),如果它是不相对公平的公平审计师。我们还表明,反过来成立的背景下,个人公平。鉴于这种评估依赖于审计师的可信度,我们还提出了一种方法,通过估计审计师对一组给定敏感属性的偏差来确定公平可靠的审计师,并量化给定MLS中偏差估计的不确定性。此外,上述所有结果也在COMPAS、德国信贷和成人人口普查收入数据集上进行了验证。 摘要:Bias evaluation in machine-learning based services (MLS) based on traditional algorithmic fairness notions that rely on comparative principles is practically difficult, making it necessary to rely on human auditor feedback. However, in spite of taking rigorous training on various comparative fairness notions, human auditors are known to disagree on various aspects of fairness notions in practice, making it difficult to collect reliable feedback. This paper offers a paradigm shift to the domain of algorithmic fairness via proposing a new fairness notion based on the principle of non-comparative justice. In contrary to traditional fairness notions where the outcomes of two individuals/groups are compared, our proposed notion compares the MLS' outcome with a desired outcome for each input. This desired outcome naturally describes a human auditor's expectation, and can be easily used to evaluate MLS on crowd-auditing platforms. We show that any MLS can be deemed fair from the perspective of comparative fairness (be it in terms of individual fairness, statistical parity, equal opportunity or calibration) if it is non-comparatively fair with respect to a fair auditor. We also show that the converse holds true in the context of individual fairness. Given that such an evaluation relies on the trustworthiness of the auditor, we also present an approach to identify fair and reliable auditors by estimating their biases with respect to a given set of sensitive attributes, as well as quantify the uncertainty in the estimation of biases within a given MLS. Furthermore, all of the above results are also validated on COMPAS, German credit and Adult Census Income datasets.
【71】 Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization 标题:面向AutoML代理建模优化的机器学习流水线工具包设计
作者:Paulito P. Palmes,Akihiro Kishimoto,Radu Marinescu,Parikshit Ram,Elizabeth Daly 机构:IBM Research, Mulhuddart, Dublin , Ireland, Sandy Springs, GA, United States 链接:https://arxiv.org/abs/2107.01253 摘要:机器学习中的流水线优化问题需要流水线结构的同时优化和流水线元素的参数自适应。用一种优雅的方式来表达这些结构可以帮助减少管理和分析它们的性能以及选择不同的优化策略的复杂性。考虑到这些问题,我们创建了amlptoolkit,它可以使用简单的表达式创建和评估复杂的机器学习管道结构。我们使用AMLP来寻找最佳的管道特征,对它们进行数据挖掘,并使用这些数据挖掘的特征来加速学习和预测。我们在AMLP中建立了一个两阶段的管道优化模型,在不到5分钟的AMLP计算时间内,它比其他AutoML方法有更好的性能。 摘要:The pipeline optimization problem in machine learning requires simultaneous optimization of pipeline structures and parameter adaptation of their elements. Having an elegant way to express these structures can help lessen the complexity in the management and analysis of their performances together with the different choices of optimization strategies. With these issues in mind, we created the AMLP toolkit which facilitates the creation and evaluation of complex machine learning pipeline structures using simple expressions. We use AMLP to find optimal pipeline signatures, datamine them, and use these datamined features to speed-up learning and prediction. We formulated a two-stage pipeline optimization with surrogate modeling in AMLP which outperforms other AutoML approaches with a 4-hour time budget in less than 5 minutes of AMLP computation time.
【72】 Data Uncertainty Guided Noise-aware Preprocessing Of Fingerprints 标题:数据不确定性引导的指纹噪声感知预处理
作者:Indu Joshi,Ayush Utkarsh,Riya Kothari,Vinod K Kurmi,Antitza Dantcheva,Sumantra Dutta Roy,Prem Kumar Kalra 机构:IIT Delhi, India, Independent Researcher, India, USC, USA, IIT Kanpur, India, Inria Sophia Antipolis, France 备注:IJCNN 2021 (Accepted) 链接:https://arxiv.org/abs/2107.01248 摘要:基于指纹的认证系统对高质量指纹的有效性早就建立起来了。然而,标准指纹匹配系统在噪声和低质量指纹上的性能却远不能令人满意。为此,我们提出了一种基于数据不确定性的指纹预处理框架,该框架使得最先进的指纹预处理模型能够量化输入图像中的噪声,识别出背景噪声和纹线清晰度较差的指纹区域。噪声的量化对模型有两方面的帮助:一是使目标函数对特定输入指纹中的噪声具有自适应性,从而有助于在噪声和畸变指纹区域实现鲁棒性。其次,它提供了一个噪声方差图来指示输入指纹图像中的噪声像素。预测的噪声方差映射使得最终用户能够理解由于输入图像中存在噪声而导致的错误预测。对13个公开的指纹数据库、不同的体系结构选择和两个指纹处理任务进行了广泛的实验评估,证明了该框架的有效性。 摘要:The effectiveness of fingerprint-based authentication systems on good quality fingerprints is established long back. However, the performance of standard fingerprint matching systems on noisy and poor quality fingerprints is far from satisfactory. Towards this, we propose a data uncertainty-based framework which enables the state-of-the-art fingerprint preprocessing models to quantify noise present in the input image and identify fingerprint regions with background noise and poor ridge clarity. Quantification of noise helps the model two folds: firstly, it makes the objective function adaptive to the noise in a particular input fingerprint and consequently, helps to achieve robust performance on noisy and distorted fingerprint regions. Secondly, it provides a noise variance map which indicates noisy pixels in the input fingerprint image. The predicted noise variance map enables the end-users to understand erroneous predictions due to noise present in the input image. Extensive experimental evaluation on 13 publicly available fingerprint databases, across different architectural choices and two fingerprint processing tasks demonstrate effectiveness of the proposed framework.
【73】 UCSL : A Machine Learning Expectation-Maximization framework for Unsupervised Clustering driven by Supervised Learning 标题:UCSL:一种监督学习驱动的无监督聚类机器学习期望最大化框架
作者:Robin Louiset,Pietro Gori,Benoit Dufumier,Josselin Houenou,Antoine Grigis,Edouard Duchesnay 机构: Universit´e Paris-Saclay, CEA, Neurospin, Gif-sur-Yvette, France, LTCI, T´el´ecom Paris, Institut Polytechnique de Paris, France 备注:None 链接:https://arxiv.org/abs/2107.01988 摘要:子类型发现是指发现数据集中可解释且一致的子部分,这些子部分也与某个受监督的任务相关。从数学的角度来看,这可以被定义为一个由监督学习驱动的聚类任务,以发现符合监督预测的子群。本文提出了一个通用的期望最大化集成框架UCSL(unsupervisedclusteringdriven by Supervised Learning)。我们的方法是通用的,它可以集成任何聚类方法,可以由二元分类和回归驱动。我们建议通过合并多个线性估计器来构造一个非线性模型,每个簇一个线性估计器。对每个超平面进行估计,使其能够正确识别或预测一个簇。我们使用SVC或Logistic回归进行分类,使用SVR进行回归。此外,为了在更合适的空间内进行聚类分析,我们还提出了一种降维算法,将数据投影到与监督任务相关的正交空间中。利用合成数据和实验数据分析了算法的鲁棒性和泛化能力。特别是,我们验证了它的能力,以确定合适的一致的子类型进行精神疾病聚类分析与已知的地面真相标签。在平衡精度方面,所提出的方法比以前最先进的技术的增益约为+1.9点。最后,我们在一个scikit-learn-compatible Python包中提供代码和示例https://github.com/neurospin-projects/2021_rlouiset_ucsl 摘要:Subtype Discovery consists in finding interpretable and consistent sub-parts of a dataset, which are also relevant to a certain supervised task. From a mathematical point of view, this can be defined as a clustering task driven by supervised learning in order to uncover subgroups in line with the supervised prediction. In this paper, we propose a general Expectation-Maximization ensemble framework entitled UCSL (Unsupervised Clustering driven by Supervised Learning). Our method is generic, it can integrate any clustering method and can be driven by both binary classification and regression. We propose to construct a non-linear model by merging multiple linear estimators, one per cluster. Each hyperplane is estimated so that it correctly discriminates - or predict - only one cluster. We use SVC or Logistic Regression for classification and SVR for regression. Furthermore, to perform cluster analysis within a more suitable space, we also propose a dimension-reduction algorithm that projects the data onto an orthonormal space relevant to the supervised task. We analyze the robustness and generalization capability of our algorithm using synthetic and experimental datasets. In particular, we validate its ability to identify suitable consistent sub-types by conducting a psychiatric-diseases cluster analysis with known ground-truth labels. The gain of the proposed method over previous state-of-the-art techniques is about +1.9 points in terms of balanced accuracy. Finally, we make codes and examples available in a scikit-learn-compatible Python package at https://github.com/neurospin-projects/2021_rlouiset_ucsl
【74】 A convolutional neural network for prestack fracture detection 标题:一种用于叠前裂缝检测的卷积神经网络
作者:Zhenyu Yuan,Yuxin Jiang,Jingjing Li,Handong Huang 链接:https://arxiv.org/abs/2107.01466 摘要:裂缝在油气藏中广泛发育,构成了油气的聚集空间和运移通道。裂缝检测是储层描述的一项基本任务。从叠前地震道集出发,通常采用各向异性分析和反演来表征裂缝的优势方向和相对强度。然而,现有的方法大多是基于垂直定向断裂假设,无法识别裂缝倾角。此外,现有的方法很难或不切实际地获得真实的裂缝密度。基于数据驱动的深度学习,设计了一种用于叠前裂缝检测的卷积神经网络。利用地震响应与裂缝参数之间的关系,通过裂缝有效介质建模和各向异性平面波分析,首次生成了合适的方位数据集。构造了多输入多输出的卷积神经网络,实现了裂缝密度、倾角和走向方位的同时检测。在实际调查中的应用验证了该模型的有效性。 摘要:Fractures are widely developed in hydrocarbon reservoirs and constitute the accumulation spaces and transport channels of oil and gas. Fracture detection is a fundamental task for reservoir characterization. From prestack seismic gathers, anisotropic analysis and inversion were commonly applied to characterize the dominant orientations and relative intensities of fractures. However, the existing methods were mostly based on the vertical aligned facture hypothesis, it is impossible for them to recognize fracture dip. Furthermore, it is difficult or impractical for existing methods to attain the real fracture densities. Based on data-driven deep learning, this paper designed a convolutional neural network to perform prestack fracture detection. Capitalizing on the connections between seismic responses and fracture parameters, a suitable azimuth dataset was firstly generated through fracture effective medium modeling and anisotropic plane wave analyzing. Then a multi-input and multi-output convolutional neural network was constructed to simultaneously detect fracture density, dip and strike azimuth. The application on a practical survey validated the effectiveness of the proposed CNN model.
【75】 QKSA: Quantum Knowledge Seeking Agent 标题:QKSA:量子知识寻求者
作者:Aritra Sarkar 机构:Department of Quantum & Computer Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands 备注:pre-print: motivation, core thesis and baseline framework 链接:https://arxiv.org/abs/2107.01429 摘要:在这篇文章中,我们提出的动机和核心论文对实现量子知识寻求代理(QKSA)。QKSA是一种通用的强化学习代理,可以用来模拟经典和量子动力学。它融合了通用人工智能、构造器理论和遗传编程的思想,构建了一个健壮的通用框架,用于在各种环境中测试agent的能力。它采取人工生命(或,动画)的路径,以人工一般智能人口的智能代理实例化,探索有效的方式建模的看法。智能体的多样性和生存性是由一个资源有限的环境计算模型的可解释性和可预测性所决定的。然后利用这种通用的学习方法,基于主体的主观观察状态,对环境的物理进行建模。给出了一个量子过程层析成像作为一般建模原理的具体实例。本文讨论了各种背景思想和基线形式主义,为目前正在积极开发的QKSA的实现奠定了基础。 摘要:In this article we present the motivation and the core thesis towards the implementation of a Quantum Knowledge Seeking Agent (QKSA). QKSA is a general reinforcement learning agent that can be used to model classical and quantum dynamics. It merges ideas from universal artificial general intelligence, constructor theory and genetic programming to build a robust and general framework for testing the capabilities of the agent in a variety of environments. It takes the artificial life (or, animat) path to artificial general intelligence where a population of intelligent agents are instantiated to explore valid ways of modelling the perceptions. The multiplicity and survivability of the agents are defined by the fitness, with respect to the explainability and predictability, of a resource-bounded computational model of the environment. This general learning approach is then employed to model the physics of an environment based on subjective observer states of the agents. A specific case of quantum process tomography as a general modelling principle is presented. The various background ideas and a baseline formalism are discussed in this article which sets the groundwork for the implementations of the QKSA that are currently in active development.