人工智能学术速递[6.18]

公众号-arXiv每日学术速递

发布于 2021-07-02 19:05:09

1.3K0

发布于 2021-07-02 19:05:09

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.AI人工智能，共计51篇

【1】 Orthogonal-Padé Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks 标题：正交Padé激活函数：深层网络平滑快速收敛的可训练激活函数

作者：Koushik Biswas,Shilpak Banerjee,Ashish Kumar Pandey 备注：11 pages 链接：https://arxiv.org/abs/2106.09693 摘要：我们提出了正交Pad′e激活函数，这是一种可训练的激活函数，表明它们具有更快的学习能力，提高了标准深度学习数据集和模型的准确性。在我们的实验基础上，我们从六个正交Pad激活函数中找到了两个最佳的候选函数，我们称之为安全Hermite Pade（HP）激活函数，即HP-1和HP-2。与ReLU相比，在PreActResNet-34中，HP-1和HP-2的top-1准确率分别提高了5.06%和4.63%，在CIFAR100数据集上MobileNet V2模型分别提高了3.02%和2.75%，而在CIFAR10数据集上PreActResNet-34的top-1精度分别提高了2.02%和1.78%，LeNet分别提高了2.24%和2.06%，Efficientnet B0分别提高了2.15%和2.03%。摘要：We have proposed orthogonal-Pad\'e activation functions, which are trainable activation functions and show that they have faster learning capability and improves the accuracy in standard deep learning datasets and models. Based on our experiments, we have found two best candidates out of six orthogonal-Pad\'e activations, which we call safe Hermite-Pade (HP) activation functions, namely HP-1 and HP-2. When compared to ReLU, HP-1 and HP-2 has an increment in top-1 accuracy by 5.06% and 4.63% respectively in PreActResNet-34, by 3.02% and 2.75% respectively in MobileNet V2 model on CIFAR100 dataset while on CIFAR10 dataset top-1 accuracy increases by 2.02% and 1.78% respectively in PreActResNet-34, by 2.24% and 2.06% respectively in LeNet, by 2.15% and 2.03% respectively in Efficientnet B0.

【2】 Hi-Phy: A Benchmark for Hierarchical Physical Reasoning 标题：Hi-Phy：分层物理推理的基准

作者：Cheng Xue,Vimukthini Pinto,Chathura Gamage,Peng Zhang,Jochen Renz 机构：School of Computing, The Australian National University, Canberra, Australia 链接：https://arxiv.org/abs/2106.09692 摘要：对物理对象的行为进行推理是在物理世界中运行的代理的一项关键能力。人类在物理推理方面非常有经验，但这仍然是人工智能的一个主要挑战。为了促进解决这一问题的研究，最近提出了若干基准。然而，在解决复杂的推理任务时，这些基准并不能使我们度量代理的粒度物理推理能力。在本文中，我们提出了一个新的基准物理推理，使我们能够测试个人的物理推理能力。受人类如何获得这些能力的启发，我们提出了一个具有日益复杂的物理推理能力的一般层次结构。我们的基准测试能力，根据这个层次结构，通过生成的物理推理任务，在视频游戏愤怒的小鸟。这个基准使我们能够通过测量代理的粒度物理推理能力来进行全面的代理评估。我们进行评估与人类球员，学习代理，启发式代理和确定他们的能力。我们的评估表明，学习代理，具有良好的局部泛化能力，仍然难以学习潜在的物理推理能力，表现不如目前最先进的启发式代理和人类。我们相信，这个基准将鼓励研究人员开发具有先进的、类似人类的物理推理能力的智能代理。网址：https://github.com/Cheng-Xue/Hi-Phy 摘要：Reasoning about the behaviour of physical objects is a key capability of agents operating in physical worlds. Humans are very experienced in physical reasoning while it remains a major challenge for AI. To facilitate research addressing this problem, several benchmarks have been proposed recently. However, these benchmarks do not enable us to measure an agent's granular physical reasoning capabilities when solving a complex reasoning task. In this paper, we propose a new benchmark for physical reasoning that allows us to test individual physical reasoning capabilities. Inspired by how humans acquire these capabilities, we propose a general hierarchy of physical reasoning capabilities with increasing complexity. Our benchmark tests capabilities according to this hierarchy through generated physical reasoning tasks in the video game Angry Birds. This benchmark enables us to conduct a comprehensive agent evaluation by measuring the agent's granular physical reasoning capabilities. We conduct an evaluation with human players, learning agents, and heuristic agents and determine their capabilities. Our evaluation shows that learning agents, with good local generalization ability, still struggle to learn the underlying physical reasoning capabilities and perform worse than current state-of-the-art heuristic agents and humans. We believe that this benchmark will encourage researchers to develop intelligent agents with advanced, human-like physical reasoning capabilities. URL: https://github.com/Cheng-Xue/Hi-Phy

【3】 How Low Can We Go: Trading Memory for Error in Low-Precision Training 标题：我们能走多低：用记忆换取低精度训练中的错误

作者：Chengrun Yang,Ziyang Wu,Jerry Chee,Christopher De Sa,Madeleine Udell 机构： 1Cornell University 链接：https://arxiv.org/abs/2106.09686 摘要：低精度算法使用更少的能量、更少的内存和更少的时间来训练深度学习模型。然而，我们为此付出了代价：较低的精度可能会产生较大的舍入误差，从而产生较大的预测误差。随着应用程序的激增，用户必须选择使用哪种精度来训练新模型，芯片制造商必须决定制造哪种精度。我们将这些精度选择视为一个超参数调整问题，并借鉴元学习的思想来学习记忆和错误之间的权衡。本文引入Pareto估计来选取最佳精度（PEPPP）。我们使用矩阵分解来寻找非支配配置（帕累托前沿）与数量有限的网络评估。对于任何给定的内存预算，使错误最小化的精度是这个边界上的一个点。实践者可以利用边界来用记忆换取错误，并为他们的目标选择最佳精度。摘要：Low-precision arithmetic trains deep learning models using less energy, less memory and less time. However, we pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error. As applications proliferate, users must choose which precision to use to train a new model, and chip manufacturers must decide which precisions to manufacture. We view these precision choices as a hyperparameter tuning problem, and borrow ideas from meta-learning to learn the tradeoff between memory and error. In this paper, we introduce Pareto Estimation to Pick the Perfect Precision (PEPPP). We use matrix factorization to find non-dominated configurations (the Pareto frontier) with a limited number of network evaluations. For any given memory budget, the precision that minimizes error is a point on this frontier. Practitioners can use the frontier to trade memory for error and choose the best precision for their goals.

【4】 LoRA: Low-Rank Adaptation of Large Language Models 标题：LORA：大型语言模型的低阶改编

作者：Edward J. Hu,Yelong Shen,Phillip Wallis,Zeyuan Allen-Zhu,Yuanzhi Li,Shean Wang,Weizhu Chen 机构：Microsoft Corporation 链接：https://arxiv.org/abs/2106.09685 摘要：自然语言处理的主导范式包括对一般领域数据的大规模预训练和对特定任务或领域的适应。当我们预先训练较大的模型时，传统的重新训练所有模型参数的微调变得不太可行。以GPT-3175b为例，部署许多微调模型的独立实例（每个实例都有175B参数）是极其昂贵的。我们提出低秩自适应（Low-Rank adaption，简称LoRA），它冻结预先训练好的模型权值，并将可训练的秩分解矩阵注入到Transformer结构的每一层中，大大减少了下游任务的可训练参数数目。对于GPT-3，LoRA算法可使可训练参数的个数减少10000倍，计算硬件需求比全微调算法减少3倍。尽管具有较少的可训练参数、较高的训练吞吐量和没有额外的推理延迟，但在GPT-3和GPT-2上，LoRA在模型质量上的性能与微调相当或更好。我们还对语言模式适应中的秩缺陷进行了实证研究，从而揭示了LoRA的有效性。我们在GPT-2中发布了我们的实现https://github.com/microsoft/LoRA . 摘要：The dominant paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, conventional fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example, deploying many independent instances of fine-tuned models, each with 175B parameters, is extremely expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning. LoRA performs on-par or better than fine-tuning in model quality on both GPT-3 and GPT-2, despite having fewer trainable parameters, a higher training throughput, and no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptations, which sheds light on the efficacy of LoRA. We release our implementation in GPT-2 at https://github.com/microsoft/LoRA .

【5】 SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies 标题：Sucant：用于视觉策略零射泛化的自我专家克隆(Self-Expert Clone)

作者：Linxi Fan,Guanzhi Wang,De-An Huang,Zhiding Yu,Li Fei-Fei,Yuke Zhu,Anima Anandkumar 机构： 3The University of Texas atAustin, 4California Institute of Technology 备注：ICML 2021. Website: this https URL 链接：https://arxiv.org/abs/2106.09678 摘要：泛化是强化学习的一个长期挑战。特别是视觉RL，在高维的观察空间中，很容易被不相关的因素分散注意力。在这项工作中，我们考虑稳健的策略学习，其目标是将Zero-Shot泛化到具有较大分布偏移的未知视觉环境中。我们提出割线，一种新的自我专家克隆技术，利用图像增强分两个阶段解耦鲁棒表示学习策略优化。具体地说，一个专家策略首先由RL从无到有的弱增广训练。然后，学生网络通过有监督学习和强增强学习来学习模仿专家策略，使其表示比专家更能抵抗视觉变化。大量的实验表明，割线显著地提高了在4个具有挑战性的领域中的Zero-Shot泛化的技术水平。与之前的SOTA相比，我们的平均回报改进是：深度控制（+26.5%）、机器人操作（+337.8%）、基于视觉的自动驾驶（+47.7%）和室内物体导航（+15.8%）。代码发布和视频可在https://linxifan.github.io/secant-site/. 摘要：Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization. Specifically, an expert policy is first trained by RL from scratch with weak augmentations. A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert. Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: DeepMind Control (+26.5%), robotic manipulation (+337.8%), vision-based autonomous driving (+47.7%), and indoor object navigation (+15.8%). Code release and video are available at https://linxifan.github.io/secant-site/.

【6】 Adaptive Low-Rank Regularization with Damping Sequences to Restrict Lazy Weights in Deep Networks 标题：深层网络中约束惰性权重的带阻尼序列的自适应低秩正则化

作者：Mohammad Mahdi Bejani,Mehdi Ghatee 机构：Department of Mathematics and Computer Science, Amirkabir University of Technology 备注：Preprint of a paper submitted in Neural Networks, 27 Pages, 4 Tables and 6 Figures. arXiv admin note: text overlap with arXiv:2005.01995 链接：https://arxiv.org/abs/2106.09677 摘要：过度拟合是深度神经网络的关键问题之一。许多正则化方案试图防止盲目的过度拟合。但是，它们降低了训练算法的收敛速度。自适应正则化方法可以更智能地解决过拟合问题。它们通常不会影响整个网络的权重。本文检测导致过拟合的权重层子集。过拟合通过矩阵和张量条件数进行识别。提出了一种自适应低秩（ALR）正则化方法，将加权层的子集收敛到低秩因子分解（LRF）。它通过最小化一个新的基于Tikhonov的损失函数来实现。ALR还鼓励懒惰的权重有助于规则化的时代长大。它使用阻尼序列来增加上一代中的层选择可能性。因此，在降低训练精度之前，ALR减少了延迟权值，并对网络进行了大幅度的正则化。实验结果表明，ALR算法训练速度快，资源利用率低，能很好地正则化深层网络。摘要：Overfitting is one of the critical problems in deep neural networks. Many regularization schemes try to prevent overfitting blindly. However, they decrease the convergence speed of training algorithms. Adaptive regularization schemes can solve overfitting more intelligently. They usually do not affect the entire network weights. This paper detects a subset of the weighting layers that cause overfitting. The overfitting recognizes by matrix and tensor condition numbers. An adaptive regularization scheme entitled Adaptive Low-Rank (ALR) is proposed that converges a subset of the weighting layers to their Low-Rank Factorization (LRF). It happens by minimizing a new Tikhonov-based loss function. ALR also encourages lazy weights to contribute to the regularization when epochs grow up. It uses a damping sequence to increment layer selection likelihood in the last generations. Thus before falling the training accuracy, ALR reduces the lazy weights and regularizes the network substantially. The experimental results show that ALR regularizes the deep networks well with high training speed and low resource usage.

【7】 An Attract-Repel Decomposition of Undirected Networks 标题：无向网络的吸引-排斥分解

作者：Alexander Peysakhovich,Leon Bottou 机构：Facebook AI Research 链接：https://arxiv.org/abs/2106.09671 摘要：点积潜在空间嵌入是无向图（如社会网络、共现网络）中表示学习的一种常见形式。我们证明了这种模型在处理A与B相连，B与C相连但A与C不相连的“不及物”情况时存在问题。这种情况发生在社会网络中，当对方吸引（异性恋）时，以及在共现网络中，当存在替代节点时（例如，百事可乐或可口可乐的存在，但很少两者都存在，在其他类似的购物篮中）。我们提出了一个简单的扩展，我们称之为吸引-排斥（AR）分解：一组相似节点吸引的潜在属性和另一组相似节点排斥的潜在属性。我们证明了真实社会网络中的AR分解，并证明了它可以用来衡量潜在的同嗜性和异嗜性。此外，它还可以应用于共现网络，以发现团队中的角色，并在配方中找到可替代的成分。摘要：Dot product latent space embedding is a common form of representation learning in undirected graphs (e.g. social networks, co-occurrence networks). We show that such models have problems dealing with 'intransitive' situations where A is linked to B, B is linked to C but A is not linked to C. Such situations occur in social networks when opposites attract (heterophily) and in co-occurrence networks when there are substitute nodes (e.g. the presence of Pepsi or Coke, but rarely both, in otherwise similar purchase baskets). We present a simple expansion which we call the attract-repel (AR) decomposition: a set of latent attributes on which similar nodes attract and another set of latent attributes on which similar nodes repel. We demonstrate the AR decomposition in real social networks and show that it can be used to measure the amount of latent homophily and heterophily. In addition, it can be applied to co-occurrence networks to discover roles in teams and find substitutable ingredients in recipes.

【8】 Prototypical Graph Contrastive Learning 标题：原型图对比学习

作者：Shuai Lin,Pan Zhou,Zi-Yuan Hu,Shuojia Wang,Ruihui Zhao,Yefeng Zheng,Liang Lin,Eric Xing,Xiaodan Liang 机构：Sun Yat-sen University,Sea AI Lab,Tencent Jarvis Lab, DarkMatter AI Research,Carnegie Mellon University 链接：https://arxiv.org/abs/2106.09645 摘要：图形级表示在各种实际应用中非常重要，例如预测分子的性质。但是在实践中，精确的图形注释通常是非常昂贵和耗时的。为了解决这一问题，图对比学习构建了一个实例识别任务，该任务将正对（同一图的增广对）拉到一起，将负对（不同图的增广对）推开，进行无监督的表征学习。然而，由于对于一个查询，其否定项是从所有图中统一采样的，因此现有的方法存在严重的采样偏差问题，即否定项可能与查询具有相同的语义结构，从而导致性能下降。为了缓解这种抽样偏差问题，本文提出了一种原型图对比学习（PGCL）方法。具体地说，PGCL通过将语义相似的图聚类到同一个组中来建模图数据的底层语义结构，同时鼓励对同一图的不同扩充进行聚类一致性。然后给出一个查询，通过从不同于查询簇的簇中提取图来进行负采样，保证了查询与其负样本之间的语义差异。此外，对于查询，PGCL进一步基于负样本的原型（簇质心）和查询原型之间的距离对其重新加权，使得那些具有中等原型距离的负样本享有相对较大的权重。这种重加权策略被证明比均匀抽样更有效。在各种图形基准上的实验结果证明了我们的PGCL方法优于现有的方法。摘要：Graph-level representations are critical in various real-world applications, such as predicting the properties of molecules. But in practice, precise graph annotations are generally very expensive and time-consuming. To address this issue, graph contrastive learning constructs instance discrimination task which pulls together positive pairs (augmentation pairs of the same graph) and pushes away negative pairs (augmentation pairs of different graphs) for unsupervised representation learning. However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i.e., the negatives likely having the same semantic structure with the query, leading to performance degradation. To mitigate this sampling bias issue, in this paper, we propose a Prototypical Graph Contrastive Learning (PGCL) approach. Specifically, PGCL models the underlying semantic structure of the graph data via clustering semantically similar graphs into the same group, and simultaneously encourages the clustering consistency for different augmentations of the same graph. Then given a query, it performs negative sampling via drawing the graphs from those clusters that differ from the cluster of query, which ensures the semantic difference between query and its negative samples. Moreover, for a query, PGCL further reweights its negative samples based on the distance between their prototypes (cluster centroids) and the query prototype such that those negatives having moderate prototype distance enjoy relatively large weights. This reweighting strategy is proved to be more effective than uniform sampling. Experimental results on various graph benchmarks testify the advantages of our PGCL over state-of-the-art methods.

【9】 MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data 标题：MetaBalance：处理类不平衡数据的高性能神经网络

作者：Arpit Bansal,Micah Goldblum,Valeriia Cherepanova,Avi Schwarzschild,C. Bayan Bruss,Tom Goldstein 机构：Department of Electrical and Computer Engineering, University of Maryland, College Park, Department of Computer Science, Department of Mathematics, Capital One, Center for Machine Learning 链接：https://arxiv.org/abs/2106.09643 摘要：类不平衡数据在实际应用中普遍存在，其中一些类包含的样本远远多于其他类。处理班级不平衡的标准技术通常是通过训练重新加权的损失或重新平衡的数据来实现的。不幸的是，在这样的目标上训练过参数化的神经网络会导致少数类数据的快速记忆。为了避免这个陷阱，我们利用元学习，它使用一个“外环”和一个“内环”损失，每一个都可以使用不同的策略来平衡。我们评估了MetaBalance方法在图像分类、信用卡欺诈检测、贷款违约预测和严重不平衡数据下的人脸识别任务中的性能，发现MetaBalance方法优于一系列流行的重采样策略。摘要：Class-imbalanced data, in which some classes contain far more samples than others, is ubiquitous in real-world applications. Standard techniques for handling class-imbalance usually work by training on a re-weighted loss or on re-balanced data. Unfortunately, training overparameterized neural networks on such objectives causes rapid memorization of minority class data. To avoid this trap, we harness meta-learning, which uses both an ''outer-loop'' and an ''inner-loop'' loss, each of which may be balanced using different strategies. We evaluate our method, MetaBalance, on image classification, credit-card fraud detection, loan default prediction, and facial recognition tasks with severely imbalanced data, and we find that MetaBalance outperforms a wide array of popular re-sampling strategies.

【10】 Learning Knowledge Graph-based World Models of Textual Environments 标题：基于知识图学习的文本环境世界模型

作者：Prithviraj Ammanabrolu,Mark O. Riedl 机构：School of Interactive Computing, Georgia Institute of Technology 备注：Preprint. Under review 链接：https://arxiv.org/abs/2106.09608 摘要：世界模型提高了学习代理在交互和情境环境中高效运行的能力。这项工作的重点是建立基于文本的游戏环境的世界模型的任务。基于文本的游戏，或称互动叙事，是一种强化学习环境，在这种环境中，主体使用文本自然语言感知并与世界互动。这些环境包含了漫长的、多步骤的谜题或任务，它们编织在一个充满数百个角色、地点和对象的世界中。我们的世界模型同时学习：（1）将世界表示为知识图时，预测由代理行为引起的世界变化；以及（2）生成在世界上操作所需的一组上下文相关的自然语言动作。通过利用知识图和动作的内在结构，我们将此任务定义为一组序列生成问题，并引入基于变换器的多任务体系结构和损失函数来训练它。一项对从未见过的文本世界的Zero-Shot消融研究表明，我们的方法显著优于现有的文本世界建模技术以及我们的每一项贡献的重要性。摘要：World models improve a learning agent's ability to efficiently operate in interactive and situated environments. This work focuses on the task of building world models of text-based game environments. Text-based games, or interactive narratives, are reinforcement learning environments in which agents perceive and interact with the world using textual natural language. These environments contain long, multi-step puzzles or quests woven through a world that is filled with hundreds of characters, locations, and objects. Our world model learns to simultaneously: (1) predict changes in the world caused by an agent's actions when representing the world as a knowledge graph; and (2) generate the set of contextually relevant natural language actions required to operate in the world. We frame this task as a Set of Sequences generation problem by exploiting the inherent structure of knowledge graphs and actions and introduce both a transformer-based multi-task architecture and a loss function to train it. A zero-shot ablation study on never-before-seen textual worlds shows that our methodology significantly outperforms existing textual world modeling techniques as well as the importance of each of our contributions.

【11】 Modeling Worlds in Text 标题：文本中的建模世界

作者：Prithviraj Ammanabrolu,Mark O. Riedl 机构：School of Interactive Computing, Georgia Institute of Technology 备注：Preprint. Under review. Benchmark can be found at this https URL 链接：https://arxiv.org/abs/2106.09578 摘要：我们提供了一个数据集，使学习代理，可以建立基于知识图的交互式叙事世界模型的创建。互动叙事——或文本冒险游戏——是一种部分可观察的环境，被构造成长时间的谜题或任务，在这种环境中，一个主体纯粹通过文本自然语言感知并与世界互动。每个游戏通常包含数百个位置、角色和对象——每个都有自己独特的描述——提供了一个机会来研究如何为基于语言的代理提供在这样的世界中运行所必需的结构化内存的问题。我们的数据集提供了24198个丰富的自然语言观测值和（1）以地图形式反映世界状态的知识图之间的映射(2）自然语言的行为，保证会引起特定世界状态的变化。训练数据收集了27个不同类型的游戏，并在测试集中包含了另外9个游戏的7836个heldout实例。除了对数据和相应的学习任务进行分析之外，我们还提供了使用基于规则、问答和序列学习方法的基线模型。摘要：We provide a dataset that enables the creation of learning agents that can build knowledge graph-based world models of interactive narratives. Interactive narratives -- or text-adventure games -- are partially observable environments structured as long puzzles or quests in which an agent perceives and interacts with the world purely through textual natural language. Each individual game typically contains hundreds of locations, characters, and objects -- each with their own unique descriptions -- providing an opportunity to study the problem of giving language-based agents the structured memory necessary to operate in such worlds. Our dataset provides 24198 mappings between rich natural language observations and: (1) knowledge graphs that reflect the world state in the form of a map; (2) natural language actions that are guaranteed to cause a change in that particular world state. The training data is collected across 27 games in multiple genres and contains a further 7836 heldout instances over 9 additional games in the test set. We further provide baseline models using rules-based, question-answering, and sequence learning approaches in addition to an analysis of the data and corresponding learning tasks.

【12】 Knowledge distillation from multi-modal to mono-modal segmentation networks 标题：从多模态分割网络到单模态分割网络的知识提取

作者：Minhao Hu,Matthis Maillard,Ya Zhang,Tommaso Ciceri,Giammarco La Barbera,Isabelle Bloch,Pietro Gori 机构： CMIC, Shanghai Jiao Tong University, Shanghai, China, LTCI, T´el´ecom Paris, Institut Polytechnique de Paris, France 备注：None 链接：https://arxiv.org/abs/2106.09564 摘要：近年来，多成像模式联合应用于医学图像分割得到了广泛的研究。在多个应用中，不同模态的信息融合可以提高单模态分割的精度。然而，由于医生和扫描仪的数量有限，以及成本和扫描时间的限制，在临床环境中获取多种模式通常是不可能的。大多数情况下，只获得一种模态。在本文中，我们提出了KD-Net，一个将知识从训练过的多模态网络（教师）转移到单模态网络（学生）的框架。所提出的方法是对广义蒸馏框架的一种改进，其中学生网络是在教师输入（n个模态）的子集（1个模态）上训练的。我们用BraTS 2018数据集说明了该框架在脑肿瘤分割中的有效性。通过使用不同的结构，我们证明了学生网络有效地从老师那里学习，并且在分割精度方面总是优于基线单峰网络。摘要：The joint use of multiple imaging modalities for medical image segmentation has been widely studied in recent years. The fusion of information from different modalities has demonstrated to improve the segmentation accuracy, with respect to mono-modal segmentations, in several applications. However, acquiring multiple modalities is usually not possible in a clinical setting due to a limited number of physicians and scanners, and to limit costs and scan time. Most of the time, only one modality is acquired. In this paper, we propose KD-Net, a framework to transfer knowledge from a trained multi-modal network (teacher) to a mono-modal one (student). The proposed method is an adaptation of the generalized distillation framework where the student network is trained on a subset (1 modality) of the teacher's inputs (n modalities). We illustrate the effectiveness of the proposed framework in brain tumor segmentation with the BraTS 2018 dataset. Using different architectures, we show that the student network effectively learns from the teacher and always outperforms the baseline mono-modal network in terms of segmentation accuracy.

【13】 Exploring deterministic frequency deviations with explainable AI 标题：利用可解释人工智能探索确定性频率偏差

作者：Johannes Kruse,Benjamin Schäfer,Dirk Witthaut 机构：∗Forschungszentrum Jülich, Institute for Energy and Climate Research - Systems Analysis and Technology Evaluation, (IEK-STE), Jülich, Germany, †Institute for Theoretical Physics, University of Cologne, Köln, Germany 备注：7 pages, 4 figures 链接：https://arxiv.org/abs/2106.09538 摘要：确定性频率偏差（DFDs）严重影响电网频率质量和电力系统稳定性。随着近年来欧洲电网频率偏差的增加，迫切需要更好地了解这些事件。DFD的部分原因是电力交易间隔后发电量的快速调整，但这种直观的图像尤其在中午之前和中午前后失效。在本文中，我们使用可解释人工智能的方法详细分析了DFDs及其与外部特征的关系。我们建立了一个机器学习模型，很好地描述了DFDs的日常周期，并用SHapley加法解释（SHAP）阐明了关键的相互依赖关系。因此，我们认为太阳的倾斜是解释频率变化率（RoCoF）模式的关键。摘要：Deterministic frequency deviations (DFDs) critically affect power grid frequency quality and power system stability. A better understanding of these events is urgently needed as frequency deviations have been growing in the European grid in recent years. DFDs are partially explained by the rapid adjustment of power generation following the intervals of electricity trading, but this intuitive picture fails especially before and around noonday. In this article, we provide a detailed analysis of DFDs and their relation to external features using methods from explainable Artificial Intelligence. We establish a machine learning model that well describes the daily cycle of DFDs and elucidate key interdependencies using SHapley Additive exPlanations (SHAP). Thereby, we identify solar ramps as critical to explain patterns in the Rate of Change of Frequency (RoCoF).

【14】 Exploring the Properties and Evolution of Neural Network Eigenspaces during Training 标题：神经网络特征空间在训练过程中的性质和演化研究

作者：Mats L. Richter Leila Malihi Anne-Kathrin Patricia Windler Ulf Krumnack 机构： RichterDepartment of Cognitive ScienceUniversity of Osnabrück 49080, deLeila MalihiDepartment of Cognitive ScienceUniversity of Osnabrück 49080, deAnne-Kathrin Patricia WindlerDepartment of Cognitive ScienceUniversity of Osnabrück 49080 链接：https://arxiv.org/abs/2106.09526 摘要：在这项工作中，我们使用logistic回归探针和饱和度量来探索神经网络内部的信息处理。我们发现，问题难度和神经网络容量以对抗的方式影响预测性能，为检测给定任务的神经网络参数化过度和不足提供了可能性。我们进一步表明，观察到的效应独立于先前报道的病理模式，如\cite{featurespace\u saturation}中描述的“尾模式”。最后，我们能够证明饱和模式在训练期间提前收敛，从而在分析期间允许更快的周期时间摘要：In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}. We show that problem difficulty and neural network capacity affect the predictive performance in an antagonistic manner, opening the possibility of detecting over- and under-parameterization of neural networks for a given task. We further show that the observed effects are independent from previously reported pathological patterns like the ``tail pattern'' described in \cite{featurespace_saturation}. Finally we are able to show that saturation patterns converge early during training, allowing for a quicker cycle time during analysis

【15】 Modelling resource allocation in uncertain system environment through deep reinforcement learning 标题：基于深度强化学习的不确定系统环境下资源分配建模

作者：Neel Gandhi,Shakti Mishra 机构：Student,School of Technology, Pandit Deendayal Energy University, Gandhinagar,Gujarat, Associate Professor,School of Technology 备注：Accepted at IRMAS'21 链接：https://arxiv.org/abs/2106.09461 摘要：强化学习在机电一体化、机器人等资源受限控制系统领域有着广泛的应用。资源分配问题主要是利用传统的预定义技术和现代的深度学习方法来解决的。预定义的、最深入的资源分配学习方法的缺点是在系统环境不确定的情况下不能满足要求。利用深度强化学习，我们可以在遵循一定准则的同时，研究不确定系统环境下的资源分配问题。强化学习具有长时间适应新的不确定环境的能力。本文对各种深度强化学习方法进行了详细的对比分析，通过使用不同的组件来修改强化学习的体系结构，包括使用噪声层、优先重放、bagging、决斗网络、，以及其他相关的组合，以获得性能上的改进和计算成本的降低。文中指出，在给定的资源分配模拟环境中，采用带噪声的Bagging-duelling双deep-Q网络可以有效地解决不确定环境下的资源分配问题，通过显著的探索，使报酬最大化，效率达到97.7%。摘要：Reinforcement Learning has applications in field of mechatronics, robotics, and other resource-constrained control system. Problem of resource allocation is primarily solved using traditional predefined techniques and modern deep learning methods. The drawback of predefined and most deep learning methods for resource allocation is failing to meet the requirements in cases of uncertain system environment. We can approach problem of resource allocation in uncertain system environment alongside following certain criteria using deep reinforcement learning. Also, reinforcement learning has ability for adapting to new uncertain environment for prolonged period of time. The paper provides a detailed comparative analysis on various deep reinforcement learning methods by applying different components to modify architecture of reinforcement learning with use of noisy layers, prioritized replay, bagging, duelling networks, and other related combination to obtain improvement in terms of performance and reduction of computational cost. The paper identifies problem of resource allocation in uncertain environment could be effectively solved using Noisy Bagging duelling double deep Q network achieving efficiency of 97.7% by maximizing reward with significant exploration in given simulated environment for resource allocation.

【16】 Conference proceedings KI4Industry AI for SMEs -- the online congress for practical entry into AI for SMEs 标题：中小企业人工智能KI4Industry大会论文集--中小企业人工智能实战入门在线大会

作者：Matthias Feiner,Manuel Schoellhorn 备注：56 pages, in German. Editors: Matthias Feiner and Manuel Schoellhorn 链接：https://arxiv.org/abs/2106.09455 摘要：德国卡尔斯鲁厄应用科学大学材料与工艺研究所（IMP）与VDI联盟Deutser-IngEnEnE.E.V，AEN汽车工程网络及其合作伙伴合作，展示了他们在生产工程领域中基于人工智能的解决方案的能力。在2020年11月12日和13日举行的KI 4工业在线大会上，展示了使用人工智能为中型制造企业、中小企业提供了哪些机遇，以及潜在的应用领域在哪里。KI 4产业的主要目的是增加大学向中小企业的知识、研究和技术转移，揭开人工智能的神秘面纱，鼓励企业在自己的价值链或产品中使用基于人工智能的解决方案。摘要：The Institute of Materials and Processes, IMP, of the University of Applied Sciences in Karlsruhe, Germany in cooperation with VDI Verein Deutscher Ingenieure e.V, AEN Automotive Engineering Network and their cooperation partners present their competences of AI-based solution approaches in the production engineering field. The online congress KI 4 Industry on November 12 and 13, 2020, showed what opportunities the use of artificial intelligence offers for medium-sized manufacturing companies, SMEs, and where potential fields of application lie. The main purpose of KI 4 Industry is to increase the transfer of knowledge, research and technology from universities to small and medium-sized enterprises, to demystify the term AI and to encourage companies to use AI-based solutions in their own value chain or in their products.

【17】 Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers 标题：基于相关均衡元解的超零和多智能体训练

作者：Luke Marris,Paul Muller,Marc Lanctot,Karl Tuyls,Thore Grapael 机构： 20 1 3; 1DeepMind 2University College London 3Universit´e GustaveEiffel 备注：ICML 2021, 9 pages, coded implementation available in this https URL (jpsro.py in examples) 链接：https://arxiv.org/abs/2106.09435 摘要：文献中对两人常和对策进行了很好的研究，但在这一背景之外的研究进展有限。我们提出了联合策略空间响应预言（JPSRO）算法，该算法可证明收敛到一个均衡点。我们进一步提出相关均衡（CE）是一个有前途的元求解器，并提出了一个新的解决方案概念最大基尼相关均衡（MGCE），这是一个解决相关均衡选择问题的有原则和计算效率的解决方案家族。我们使用JPSRO的CE元求解器进行了几个实验，并证明了在n人一般和对策上的收敛性。摘要：Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO), an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We further suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Maximum Gini Correlated Equilibrium (MGCE), a principled and computationally efficient family of solutions for solving the correlated equilibrium selection problem. We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.

【18】 Towards Heterogeneous Clients with Elastic Federated Learning 标题：面向异构客户端的弹性联合学习

作者：Zichen Ma,Yu Lu,Zihan Lu,Wenye Li,Jinfeng Yi,Shuguang Cui 机构：The Chinese University of Hong Kong, Shenzhen, JD AI Lab, Ping An Technology 备注：Under Review 链接：https://arxiv.org/abs/2106.09433 摘要：联合学习涉及在设备或数据仓库（如边缘处理器或数据仓库）上训练机器学习模型，同时保持数据的本地性。在异构的、潜在的海量网络中进行训练会给系统带来偏差，这种偏差源于非IID数据和现实中的低参与率。本文提出了弹性联邦学习（EFL）算法，该算法利用不完全局部更新，使得训练过程中信息量最大的参数不易变化。它是一种高效的压缩上下游通信的算法。理论上，该算法在低参与率下对非IID数据进行训练时具有收敛性保证。实证实验证实了EFL框架在鲁棒性和效率上的竞争性能。摘要：Federated learning involves training machine learning models over devices or data silos, such as edge processors or data warehouses, while keeping the data local. Training in heterogeneous and potentially massive networks introduces bias into the system, which is originated from the non-IID data and the low participation rate in reality. In this paper, we propose Elastic Federated Learning (EFL), an unbiased algorithm to tackle the heterogeneity in the system, which makes the most informative parameters less volatile during training, and utilizes the incomplete local updates. It is an efficient and effective algorithm that compresses both upstream and downstream communications. Theoretically, the algorithm has convergence guarantee when training on the non-IID data at the low participation rate. Empirical experiments corroborate the competitive performance of EFL framework on the robustness and the efficiency.

【19】 Interpretable Machine Learning Classifiers for Brain Tumour Survival Prediction 标题：用于脑瘤生存预测的可解释机器学习分类器

作者：Colleen E. Charlton,Michael Tin Chung Poon,Paul M. Brennan,Jacques D. Fleuriot 机构：Artificial Intelligence and its Applications Institute, School of Informatics, University of, Edinburgh, Crichton Street, Edinburgh, EH,AB, UK, Cancer Research UK Brain Tumour Centre of Excellence, CRUK Edinburgh Centre 链接：https://arxiv.org/abs/2106.09424 摘要：由于不同的肿瘤行为和治疗反应，预测脑肿瘤患者的生存率具有挑战性。更好地估计预后将有助于治疗计划和患者支持。机器学习的发展为临床预测模型的发展提供了信息，但将其融入临床实践几乎是不存在的。其中一个原因是模型缺乏可解释性。在这篇论文中，我们使用一个新的脑瘤数据集来比较两个可解释的规则列表模型和流行的机器学习方法来预测脑瘤的存活率。所有模型都使用标准性能指标进行定量评估。对规则列表的可解释性和临床实用性也进行了定性评估。使用两种事后解释技术（LIME和SHAP）对黑盒机器学习模型的可解释性进行了评估。我们的结果表明，规则列表只比黑盒模型略好。我们证明，规则列表算法产生简单的决策列表，符合临床专业知识。相比之下，应用于黑箱模型的事后解释性方法可能会对局部模型预测产生不可靠的解释。模型的可解释性对于理解预测性能的差异和融入临床实践至关重要。摘要：Prediction of survival in patients diagnosed with a brain tumour is challenging because of heterogeneous tumour behaviours and responses to treatment. Better estimations of prognosis would support treatment planning and patient support. Advances in machine learning have informed development of clinical predictive models, but their integration into clinical practice is almost non-existent. One reasons for this is the lack of interpretability of models. In this paper, we use a novel brain tumour dataset to compare two interpretable rule list models against popular machine learning approaches for brain tumour survival prediction. All models are quantitatively evaluated using standard performance metrics. The rule lists are also qualitatively assessed for their interpretability and clinical utility. The interpretability of the black box machine learning models is evaluated using two post-hoc explanation techniques, LIME and SHAP. Our results show that the rule lists were only slightly outperformed by the black box models. We demonstrate that rule list algorithms produced simple decision lists that align with clinical expertise. By comparison, post-hoc interpretability methods applied to black box models may produce unreliable explanations of local model predictions. Model interpretability is essential for understanding differences in predictive performance and for integration into clinical practice.

【20】 CRIL: Continual Robot Imitation Learning via Generative and Prediction Model 标题：CRIL：基于产生式和预测性模型的连续机器人模仿学习

作者：Chongkai Gao,Haichuan Gao,Shangqi Guo,Tianren Zhang,Feng Chen 机构： this is is not essentially required in continual 1Department of Automation, Tsinghua University 链接：https://arxiv.org/abs/2106.09422 摘要：模仿学习（IL）算法在机器人从专家演示中学习技能方面取得了很好的效果。然而，对于现在需要学习不同任务的多功能机器人来说，同时提供和学习多任务演示都是困难的。为了解决这一问题，本文研究了如何实现连续模仿学习能力，使机器人能够一个接一个地不断学习新任务，从而减轻多任务学习的负担，同时加快新任务学习的进程。提出了一种新的轨迹生成模型，在新的任务学习过程中，利用生成对抗网络和动态预测模型从所有学习到的任务中生成伪轨迹，以实现连续的模仿学习能力。在模拟和真实操作任务上的实验证明了该方法的有效性。摘要：Imitation learning (IL) algorithms have shown promising results for robots to learn skills from expert demonstrations. However, for versatile robots nowadays that need to learn diverse tasks, providing and learning the multi-task demonstrations all at once are both difficult. To solve this problem, in this work we study how to realize continual imitation learning ability that empowers robots to continually learn new tasks one by one, thus reducing the burden of multi-task IL and accelerating the process of new task learning at the same time. We propose a novel trajectory generation model that employs both a generative adversarial network and a dynamics prediction model to generate pseudo trajectories from all learned tasks in the new task learning process to achieve continual imitation learning ability. Our experiments on both simulation and real world manipulation tasks demonstrate the effectiveness of our method.

【21】 Class Balancing GAN with a Classifier in the Loop 标题：在回路中使用分类器的类平衡GAN

作者：Harsh Rangwani,Konda Reddy Mopuri,R. Venkatesh Babu 机构：Indian Institute of Science, Bengaluru, Indian Institute of Technology Tirupati 备注：UAI 2021 链接：https://arxiv.org/abs/2106.09402 摘要：生成性对抗网络（generativediscountarial Networks，GANs）已经迅速演化为模拟日益复杂的图像分布。然而，大多数的发展集中在GANs在平衡数据集上的性能上。我们发现，在不平衡（即长尾）数据集的情况下，现有的GANs及其训练机制在平衡数据集上效果很好。在这项工作中，我们介绍了一种新的理论上激励类平衡正则化训练机构。我们的正则化器利用来自预先训练的分类器的知识来确保数据集中所有类的均衡学习。这是通过建立基于神经网络中观察到的指数遗忘的有效类频率模型，并鼓励GAN关注代表性不足的类来实现的。通过在多个数据集上取得比现有方法更好的性能，我们证明了正则化器在长尾分布学习表示中的实用性。具体来说，当应用于无条件GAN时，它将长尾不自然列表（2019美元数据集）上的FID从13.03美元提高到9.01美元。摘要：Generative Adversarial Networks (GANs) have swiftly evolved to imitate increasingly complex image distributions. However, majority of the developments focus on performance of GANs on balanced datasets. We find that the existing GANs and their training regimes which work well on balanced datasets fail to be effective in case of imbalanced (i.e. long-tailed) datasets. In this work we introduce a novel theoretically motivated Class Balancing regularizer for training GANs. Our regularizer makes use of the knowledge from a pre-trained classifier to ensure balanced learning of all the classes in the dataset. This is achieved via modelling the effective class frequency based on the exponential forgetting observed in neural networks and encouraging the GAN to focus on underrepresented classes. We demonstrate the utility of our regularizer in learning representations for long-tailed distributions via achieving better performance than existing approaches over multiple datasets. Specifically, when applied to an unconditional GAN, it improves the FID from $13.03$ to $9.01$ on the long-tailed iNaturalist-$2019$ dataset.

【22】 Deep Subdomain Adaptation Network for Image Classification 标题：用于图像分类的深子域自适应网络

作者：Yongchun Zhu,Fuzhen Zhuang,Jindong Wang,Guolin Ke,Jingwu Chen,Jiang Bian,Hui Xiong,Qing He 机构：Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing , China, University of Chinese Academy of Sciences, Beijing , China, Microsoft Research, ByteDance 备注：published on TNNLS 链接：https://arxiv.org/abs/2106.09388 摘要：对于标记数据不可用的目标任务，域适应可以将学习者从不同的源域转移到不同的源域。以往的深域自适应方法主要学习全局域偏移，即在不考虑同一类不同域的两个子域之间的关系的情况下，对齐全局源分布和目标分布，导致在没有捕获细粒度信息的情况下，迁移学习性能不理想。近年来，子域自适应越来越受到研究者的重视，它的核心是精确地调整相关子域的分布。然而，这些方法大多是对抗性的，包含多个损失函数，收敛速度慢。在此基础上，我们提出了一种基于局部最大平均差（LMMD）的深子域自适应网络（DSAN），它通过在不同的域之间对齐域特定层激活的相关子域分布来学习传输网络。我们的DSAN算法简单有效，不需要对抗性训练，收敛速度快。对于大多数前馈网络模型，利用LMMD损失对其进行扩展，可以很容易地实现自适应，而LMMD损失可以通过反向传播进行有效的训练。实验表明，DSAN在目标识别和数字分类任务上都能取得显著的效果。我们的代码将在以下网址提供：https://github.com/easezyc/deep-transfer-learning 摘要：For a target task where labeled data is unavailable, domain adaptation can transfer a learner from a different source domain. Previous deep domain adaptation methods mainly learn a global domain shift, i.e., align the global source and target distributions without considering the relationships between two subdomains within the same category of different domains, leading to unsatisfying transfer learning performance without capturing the fine-grained information. Recently, more and more researchers pay attention to Subdomain Adaptation which focuses on accurately aligning the distributions of the relevant subdomains. However, most of them are adversarial methods which contain several loss functions and converge slowly. Based on this, we present Deep Subdomain Adaptation Network (DSAN) which learns a transfer network by aligning the relevant subdomain distributions of domain-specific layer activations across different domains based on a local maximum mean discrepancy (LMMD). Our DSAN is very simple but effective which does not need adversarial training and converges fast. The adaptation can be achieved easily with most feed-forward network models by extending them with LMMD loss, which can be trained efficiently via back-propagation. Experiments demonstrate that DSAN can achieve remarkable results on both object recognition tasks and digit classification tasks. Our code will be available at: https://github.com/easezyc/deep-transfer-learning

【23】 Unsupervised Path Representation Learning with Curriculum Negative Sampling 标题：基于课程负抽样的无监督路径表示学习

作者：Sean Bin Yang,Chenjuan Guo,Jilin Hu,Jian Tang,Bin Yang 机构：Department of Computer Science, Aalborg University, Denmark, Mila-Quebec AI Institute, HEC Montreal, Canada, CIFAR AI Research Chair 备注：This paper has been accepted by IJCAI-21 链接：https://arxiv.org/abs/2106.09373 摘要：路径表示在各种交通应用中都是非常关键的，例如在路径推荐系统中估计路径排名，在导航系统中估计路径旅行时间。现有的研究通常是以有监督的方式学习特定于任务的路径表示，这需要大量的有标记的训练数据，对其他任务的泛化能力较差。我们提出一个无监督学习框架路径InfoMax（PIM）来学习适用于不同下游任务的通用路径表示。我们首先提出了一种课程负采样方法，根据课程学习的原则，对每个输入路径生成少量的负路径。接下来，emph{PIM}使用互信息最大化从全局和局部视图学习路径表示。在全局视图中，PIM区分了输入路径和负路径的表示。在局部视图中，\emph{PIM}将输入路径表示与仅出现在负路径中的节点表示区分开来。这使得学习到的路径表示能够以不同的尺度对全局和局部信息进行编码。利用两个路网数据集对两个下游任务（排名分数估计和行程时间估计）进行的大量实验表明，PIM方法的性能明显优于其他无监督方法，也可以作为一种预训练方法来增强有监督路径表示学习。摘要：Path representations are critical in a variety of transportation applications, such as estimating path ranking in path recommendation systems and estimating path travel time in navigation systems. Existing studies often learn task-specific path representations in a supervised manner, which require a large amount of labeled training data and generalize poorly to other tasks. We propose an unsupervised learning framework Path InfoMax (PIM) to learn generic path representations that work for different downstream tasks. We first propose a curriculum negative sampling method, for each input path, to generate a small amount of negative paths, by following the principles of curriculum learning. Next, \emph{PIM} employs mutual information maximization to learn path representations from both a global and a local view. In the global view, PIM distinguishes the representations of the input paths from those of the negative paths. In the local view, \emph{PIM} distinguishes the input path representations from the representations of the nodes that appear only in the negative paths. This enables the learned path representations to encode both global and local information at different scales. Extensive experiments on two downstream tasks, ranking score estimation and travel time estimation, using two road network datasets suggest that PIM significantly outperforms other unsupervised methods and is also able to be used as a pre-training method to enhance supervised path representation learning.

【24】 Virtual Reality based Digital Twin System for remote laboratories and online practical learning 标题：基于虚拟现实的远程实验室与在线实践学习数字孪生系统

作者：Claire Palmer,Ben Roullier,Muhammad Aamir,Leonardo Stella,Uchenna Diala,Ashiq Anjum,Frank Mcquade,Keith Cox,Alex Calvert 机构：a University of Derby, Kedleston Road, Derby, DE,GB, b Bloc Digital,nd Floor, Enterprise Centre, Bridge St, Derby, DE,LD, UK, University of Leicester, University Road, Leicester, LE,RH 备注：6 pages, 4 figures, accepted for publication ICMR2021 18th International Conference in Manufacturing Research Virtual Conference hosted by the University of Derby, UK 7 - 10 September 2021 链接：https://arxiv.org/abs/2106.09344 摘要：有必要开发远程学习和虚拟学习应用程序，如当前流行的虚拟现实（VR）和基于平板电脑的解决方案。开发人员创建复杂的学习场景非常耗时，可能需要一年多的时间。需要提供一种简单的方法，使讲师能够为他们的实验室教程创建自己的内容。目前正在研究开发通用模型，以实现虚拟学习应用程序的半自动创建。一个案例研究描述了一个电气实验室教程的虚拟学习应用程序的创建。摘要：There is a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions which the current pandemic has demonstrated. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. There is a need to provide a simple method to enable lecturers to create their own content for their laboratory tutorials. Research is currently being undertaken into developing generic models to enable the semi-automatic creation of a virtual learning application. A case study describing the creation of a virtual learning application for an electrical laboratory tutorial is presented.

【25】 Towards bio-inspired unsupervised representation learning for indoor aerial navigation 标题：面向室内航空导航的仿生无监督表征学习

作者：Ni Wang,Ozan Catal,Tim Verbelen,Matthias Hartmann,Bart Dhoedt 机构：∗IDLab, Ghent University - imec, Belgium, †imec, Belgium 链接：https://arxiv.org/abs/2106.09326 摘要：室内环境下的GPS空中导航仍然是一个开放的挑战。无人机可以从更丰富的视角感知环境，同时比其他自主平台有更严格的计算和能量限制。为了解决这一问题，本研究提出了一种基于生物启发的同步定位与映射深度学习算法（SLAM）及其在无人机导航系统中的应用。我们提出了一种无监督的表示学习方法，该方法产生低维的潜在状态描述符，降低了对感知混叠的敏感性，并且适用于节能的嵌入式硬件。在一个室内仓库环境中采集的数据集上对所设计的算法进行了评估，初步结果表明了鲁棒室内空中导航的可行性。摘要：Aerial navigation in GPS-denied, indoor environments, is still an open challenge. Drones can perceive the environment from a richer set of viewpoints, while having more stringent compute and energy constraints than other autonomous platforms. To tackle that problem, this research displays a biologically inspired deep-learning algorithm for simultaneous localization and mapping (SLAM) and its application in a drone navigation system. We propose an unsupervised representation learning method that yields low-dimensional latent state descriptors, that mitigates the sensitivity to perceptual aliasing, and works on power-efficient, embedded hardware. The designed algorithm is evaluated on a dataset collected in an indoor warehouse environment, and initial results show the feasibility for robust indoor aerial navigation.

【26】 Central Kurdish machine translation: First large scale parallel corpus and experiments 标题：中央库尔德语机器翻译：首次大规模平行语料库与实验

作者：Zhila Amini,Mohammad Mohammadamini,Hawre Hosseini,Mehran Mansouri,Daban Jaff 机构：Laboratoire Informatique d’Avignon (LIA), Avignon University, Avignon, France, Laboratory for Systems, Software and Semantics (LS,), Ryerson University, Toronto, Canada, Daban Q Jaff 链接：https://arxiv.org/abs/2106.09325 摘要：虽然库尔德语的计算处理经历了相对的增长，但这种语言的机器翻译似乎缺乏大量的科学工作。这在一定程度上是由于缺乏专门为这项任务策划的资源。在本文中，我们提出了第一个大规模平行语料库中央库尔德英语，Awta，包含229222对手动对齐翻译。我们的语料库是从不同的文本类型和领域收集的，试图建立更强大的和真实的机器翻译应用程序。为了促进这一领域的研究，我们公开了这一语料库的一部分。此外，我们建立了几个神经机器翻译模型，以测试库尔德机器翻译的任务。此外，我们对结果进行了广泛的实验分析，以确定中央库尔德机器翻译面临的主要挑战。这些挑战包括本文所分类的语言依赖性和独立性挑战，第一类挑战在不同的形态、句法和语义层次上意识到中央库尔德语的语言特性。我们性能最好的系统在BLEU分数中分别达到22.72和16.81分（Ku$\rightarrow$EN和EN$\rightarrow$Ku）。摘要：While the computational processing of Kurdish has experienced a relative increase, the machine translation of this language seems to be lacking a considerable body of scientific work. This is in part due to the lack of resources especially curated for this task. In this paper, we present the first large scale parallel corpus of Central Kurdish-English, Awta, containing 229,222 pairs of manually aligned translations. Our corpus is collected from different text genres and domains in an attempt to build more robust and real-world applications of machine translation. We make a portion of this corpus publicly available in order to foster research in this area. Further, we build several neural machine translation models in order to benchmark the task of Kurdish machine translation. Additionally, we perform extensive experimental analysis of results in order to identify the major challenges that Central Kurdish machine translation faces. These challenges include language-dependent and-independent ones as categorized in this paper, the first group of which are aware of Central Kurdish linguistic properties on different morphological, syntactic and semantic levels. Our best performing systems achieve 22.72 and 16.81 in BLEU score for Ku$\rightarrow$EN and En$\rightarrow$Ku, respectively.

【27】 EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model 标题：EMOVIE：一个具有简单情感文语转换模型的汉语情感语音数据集

作者：Chenye Cui,Yi Ren,Jinglin Liu,Feiyang Chen,Rongjie Huang,Ming Lei,Zhou Zhao 机构：Zhejiang University, Alibaba Group 备注：Accepted by Interspeech 2021 链接：https://arxiv.org/abs/2106.09317 摘要：近年来，神经语音合成受到越来越多的关注。在深度神经网络实现文本到语音（TTS）任务的最新成果的同时，由于缺乏高质量的情感语音数据集和先进的情感TTS模型，如何生成更具情感性和表现力的语音成为研究者面临的新挑战。本文首先简要介绍并公开发布了一个包含9724个语音样本的汉语情感语音数据集及其情感标注。在此基础上，我们提出了一种简单而有效的情感语音合成架构EMSpeech。与那些需要额外的参考音频作为输入的模型不同，我们的模型能够仅从输入文本中预测情感标签，并在情感嵌入的条件下生成更具表现力的语音。在实验阶段，我们首先通过情绪分类任务来验证数据集的有效性。然后我们在提出的数据集上训练我们的模型，并进行一系列的主观评价。最后，通过在情感语音合成任务中的对比实验，证明了该模型的有效性。摘要：Recently, there has been an increasing interest in neural speech synthesis. While the deep neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to generate a more emotional and more expressive speech is becoming a new challenge to researchers due to the scarcity of high-quality emotion speech dataset and the lack of advanced emotional TTS model. In this paper, we first briefly introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation. After that, we propose a simple but efficient architecture for emotional speech synthesis called EMSpeech. Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding. In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations. Finally, by showing a comparable performance in the emotional speech synthesis task, we successfully demonstrate the ability of the proposed model.

【28】 PEN4Rec: Preference Evolution Networks for Session-based Recommendation 标题：PEN4Rec：基于会话推荐的偏好进化网络

作者：Dou Hu,Lingwei Wei,Wei Zhou,Xiaoyong Huai,Zhiqi Fang,Songlin Hu 机构： National Computer System Engineering Research Institute of China, Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences 备注：12 pages, accepted by KSEM 2021 链接：https://arxiv.org/abs/2106.09306 摘要：基于会话的推荐是基于匿名会话中的历史行为来预测用户的下一步行为。为了获得更好的推荐，捕捉用户的偏好及其动态是至关重要的。此外，用户偏好随时间动态演化，每个偏好都有自己的演化轨迹。然而，以往的研究大多忽略了偏好的演化趋势，容易受到偏好漂移效应的干扰。本文提出了一种新的基于会话推荐的偏好演化网络（PEN4Rec），通过两阶段的历史上下文检索来模拟偏好演化过程。具体而言，第一阶段过程根据最近的项目整合相关行为。然后，第二阶段过程动态模拟偏好随时间的演化轨迹，推断出丰富的偏好。该过程可以增强偏好演化过程中相关序贯行为的影响，减弱偏好漂移的干扰。在三个公共数据集上的大量实验证明了该模型的有效性和优越性。摘要：Session-based recommendation aims to predict user the next action based on historical behaviors in an anonymous session. For better recommendations, it is vital to capture user preferences as well as their dynamics. Besides, user preferences evolve over time dynamically and each preference has its own evolving track. However, most previous works neglect the evolving trend of preferences and can be easily disturbed by the effect of preference drifting. In this paper, we propose a novel Preference Evolution Networks for session-based Recommendation (PEN4Rec) to model preference evolving process by a two-stage retrieval from historical contexts. Specifically, the first-stage process integrates relevant behaviors according to recent items. Then, the second-stage process models the preference evolving trajectory over time dynamically and infer rich preferences. The process can strengthen the effect of relevant sequential behaviors during the preference evolution and weaken the disturbance from preference drifting. Extensive experiments on three public datasets demonstrate the effectiveness and superiority of the proposed model.

【29】 Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction 标题：时间序列是一种特殊的序列：样本卷积和交互作用预测

作者：Minhao Liu,Ailing Zeng,Qiuxia Lai,Qiang Xu 机构：The Chinese University of Hong Kong 链接：https://arxiv.org/abs/2106.09305 摘要：时间序列是一种特殊类型的序列数据，是一组以偶数时间间隔收集并按时间顺序排列的观测数据。现有的深度学习技术使用一般的序列模型（例如，递归神经网络、Transformer模型或时间卷积网络）进行时间序列分析，而忽略了它的一些独特特性。例如，时间序列数据的下采样通常会保留数据中的大部分信息，而对于文本序列和DNA序列等一般序列数据则不是这样。基于此，本文提出了一种新的神经网络结构，并将其应用于时间序列预测问题，在多分辨率下进行样本卷积和交互，实现时间序列建模。提出的体系结构名为lyscinet，有助于提取具有增强可预测性的特征。实验结果表明，SCINet在各种实际时间序列预测数据集的预测精度上都比现有的方法有了显著的提高。特别是，它可以在不使用复杂的空间建模技术的情况下，对那些时空数据集实现高精度的预测。我们的代码和数据见补充材料。摘要：Time series is a special type of sequence data, a set of observations collected at even intervals of time and ordered chronologically. Existing deep learning techniques use generic sequence models (e.g., recurrent neural network, Transformer model, or temporal convolutional network) for time series analysis, which ignore some of its unique properties. For example, the downsampling of time series data often preserves most of the information in the data, while this is not true for general sequence data such as text sequence and DNA sequence. Motivated by the above, in this paper, we propose a novel neural network architecture and apply it for the time series forecasting problem, wherein we conduct sample convolution and interaction at multiple resolutions for temporal modeling. The proposed architecture, namelySCINet, facilitates extracting features with enhanced predictability. Experimental results show that SCINet achieves significant prediction accuracy improvement over existing solutions across various real-world time series forecasting datasets. In particular, it can achieve high fore-casting accuracy for those temporal-spatial datasets without using sophisticated spatial modeling techniques. Our codes and data are presented in the supplemental material.

【30】 Voice2Series: Reprogramming Acoustic Models for Time Series Classification 标题：Voice2Series：用于时间序列分类的重新编程声学模型

作者：Chao-Han Huck Yang,Yun-Yun Tsai,Pin-Yu Chen 机构： 1GeorgiaInstituteofTechnology 2ColumbiaUniver-sity 3IBMResearch 备注：Accepted to ICML 2021, 16 Pages 链接：https://arxiv.org/abs/2106.09296 摘要：如何利用有限的数据对时间序列进行分类是一个实际而又富有挑战性的问题。目前的方法主要是基于手工设计的特征提取规则或特定领域的数据扩充。基于深度语音处理模型的发展和语音数据是单变量时间信号这一事实，本文提出了Voice2Series（V2S），一种新的端到端方法，通过输入变换学习和输出标签映射对声学模型进行重编程以进行时间序列分类。利用大规模预训练语音处理模型的表征学习能力，在30个不同的时间序列任务上，我们证明了v2在20个任务上的表现优于或与最先进的方法相结合，平均准确率提高了1.84%。通过证明V2S的总体风险是源风险和Wasserstein距离的上界，我们进一步从理论上证明了V2S的合理性。研究结果为时间序列分类提供了新的有效手段。摘要：Learning to classify time series with limited data is a practical yet challenging problem. Current methods are primarily based on hand-designed feature extraction rules or domain-specific data augmentation. Motivated by the advances in deep speech processing models and the fact that voice data are univariate temporal signals, in this paper, we propose Voice2Series (V2S), a novel end-to-end approach that reprograms acoustic models for time series classification, through input transformation learning and output label mapping. Leveraging the representation learning power of a large-scale pre-trained speech processing model, on 30 different time series tasks we show that V2S either outperforms or is tied with state-of-the-art methods on 20 tasks, and improves their average accuracy by 1.84%. We further provide a theoretical justification of V2S by proving its population risk is upper bounded by the source risk and a Wasserstein distance accounting for feature alignment via reprogramming. Our results offer new and effective means to time series classification.

【31】 MHNF: Multi-hop Heterogeneous Neighborhood information Fusion graph representation learning 标题：MHNF：多跳异构邻域信息融合图表示学习

作者：Dongjie Zhu,Yundong Sun,Haiwen Du,Zhaoshuo Tian 机构：ZhuiswiththeSchoolofComputerScienceandTechnol-ogy, Harbin Institute of Technology 链接：https://arxiv.org/abs/2106.09289 摘要：注意机制使图神经网络（GNNs）能够学习目标节点与其单跳邻居之间的注意权重，进一步提高了性能。然而，现有的gnn大多是面向齐次图的，每一层只能聚合一跳邻居的信息。多层网络的叠加会引入大量的噪声，容易导致过度平滑。提出了一种多跳异构邻域信息融合图表示学习方法。具体地说，我们首先提出一个混合元路径自主抽取模型来有效地抽取多跳混合邻居。然后，提出了一种跳级异构信息聚合模型，该模型在同一混合元路径中选择性地聚合不同的跳邻域信息。最后，提出了一种分层语义注意融合模型（HSAF），该模型能有效地融合不同跳数和不同路径邻域信息。本文解决了多跳邻域信息的聚合问题，并能学习目标任务的混合元路径，减少了人工指定元路径的限制。此外，HSAF还可以提取元路径的内部节点信息，更好地整合不同层次的语义信息。在真实数据集上的实验结果表明，MHNF在节点分类和聚类任务上优于现有的方法（平均相对改进率分别为10.94%-69.09%和11.58%-394.93%）。摘要：Attention mechanism enables the Graph Neural Networks(GNNs) to learn the attention weights between the target node and its one-hop neighbors, the performance is further improved. However, the most existing GNNs are oriented to homogeneous graphs and each layer can only aggregate the information of one-hop neighbors. Stacking multi-layer networks will introduce a lot of noise and easily lead to over smoothing. We propose a Multi-hop Heterogeneous Neighborhood information Fusion graph representation learning method (MHNF). Specifically, we first propose a hybrid metapath autonomous extraction model to efficiently extract multi-hop hybrid neighbors. Then, we propose a hop-level heterogeneous Information aggregation model, which selectively aggregates different-hop neighborhood information within the same hybrid metapath. Finally, a hierarchical semantic attention fusion model (HSAF) is proposed, which can efficiently integrate different-hop and different-path neighborhood information respectively. This paper can solve the problem of aggregating the multi-hop neighborhood information and can learn hybrid metapaths for target task, reducing the limitation of manually specifying metapaths. In addition, HSAF can extract the internal node information of the metapaths and better integrate the semantic information of different levels. Experimental results on real datasets show that MHNF is superior to state-of-the-art methods in node classification and clustering tasks (10.94% - 69.09% and 11.58% - 394.93% relative improvement on average, respectively).

【32】 MatES: Web-based Forward Chaining Expert System for Maternal Care 标题：Mate：基于Web的产妇护理正向链接专家系统

作者：Haile Misgna,Moges Ahmed,Anubhav Kumar 机构：.,School of Computing, EiT-M, Mekelle University, Ethiopia 链接：https://arxiv.org/abs/2106.09281 摘要：预防孕产妇并发症的解决方案是众所周知的，训练有素的卫生专业人员是可以预防的。但在埃塞俄比亚这样的国家，病人与医生的比率是1名医生对1000名病人，产妇死亡率和发病率很高。为了填补训练有素的卫生专业人员的缺口，埃塞俄比亚推出了卫生推广方案。向卫生推广工作者（HEW）转移任务有助于降低埃塞俄比亚的死亡率和发病率。知识差距一直是高等教育面临的主要挑战之一。原因是没有定期进行训练，没有助产士、妇科医生或医生在场咨询，所有指南都是纸质的，容易受到损害。本文介绍了一个基于web的孕产妇保健专家系统的设计与实现。我们只针对撒哈拉以南非洲的10种主要疾病和产妇健康问题的并发症。专家系统可以通过使用计算机和智能手机的网络浏览器访问。采用前向链式规则专家系统，从知识库中提出建议和创建新知识。该专家系统可用于孕产妇保健领域的训练。关键词：专家系统，孕产妇保健，正向链接，基于规则的专家系统，PHLIPS 摘要：The solution to prevent maternal complications are known and preventable by trained health professionals. But in countries like Ethiopia where the patient to physician ratio is 1 doctor to 1000 patients, maternal mortality and morbidity rate is high. To fill the gap of highly trained health professionals, Ethiopia introduced health extension programs. Task shifting to health extension workers (HEWs) contributed in decreasing mortality and morbidity rate in Ethiopia. Knowledge-gap has been one of the major challenges to HEWs. The reasons are trainings are not given in regular manner, there is no midwife, gynecologists or doctors around for consultation, and all guidelines are paper-based which are easily exposed to damage. In this paper, we describe the design and implementation of a web-based expert system for maternal care. We only targeted the major 10 diseases and complication of maternal health issues seen in Sub-Saharan Africa. The expert system can be accessed through the use of web browsers from computers as well as smart phones. Forward chaining rule-based expert system is used in order to give suggestions and create a new knowledge from the knowledge-base. This expert system can be used to train HEWs in the field of maternal health. Keywords: expert system, maternal care, forward-chaining, rule-based expert system, PHLIPS

【33】 Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks 标题：认知无线电网络中基于协作多Agent强化学习的分布式动态频谱接入

作者：Xiang Tan,Li Zhou,Haijun Wang,Yuli Sun,Haitao Zhao,Boon-Chong Seet,Jibo Wei,Victor C. M. Leung 机构： Auckland University of Technology 链接：https://arxiv.org/abs/2106.09274 摘要：随着5G和物联网的发展，大量的无线设备需要共享有限的频谱资源。动态频谱接入（DSA）是一种很有前途的解决频谱分配的历史命令和控制方法所带来的频谱利用率低的问题的方法。本文研究了典型多信道认知无线电网络中多用户的分布式DSA问题。将该问题描述为一个分散的部分可观测马尔可夫决策过程（Dec-POMDP），提出了一种基于协作多智能体强化学习（MARL）的集中式离线训练和分布式在线执行框架。我们采用深度递归Q网络（DRQN）来解决每个认知用户状态的部分可观测性问题。最终的目标是学习一种协作策略，在认知用户之间不进行协同信息交换的情况下，以分布式方式最大化认知无线电网络的总吞吐量。最后，通过大量的实验验证了该算法在不同环境下的有效性。仿真结果表明，该算法收敛速度快，性能接近最优。摘要：With the development of the 5G and Internet of Things, amounts of wireless devices need to share the limited spectrum resources. Dynamic spectrum access (DSA) is a promising paradigm to remedy the problem of inefficient spectrum utilization brought upon by the historical command-and-control approach to spectrum allocation. In this paper, we investigate the distributed DSA problem for multi-user in a typical multi-channel cognitive radio network. The problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and we proposed a centralized off-line training and distributed on-line execution framework based on cooperative multi-agent reinforcement learning (MARL). We employ the deep recurrent Q-network (DRQN) to address the partial observability of the state for each cognitive user. The ultimate goal is to learn a cooperative strategy which maximizes the sum throughput of cognitive radio network in distributed fashion without coordination information exchange between cognitive users. Finally, we validate the proposed algorithm in various settings through extensive experiments. From the simulation results, we can observe that the proposed algorithm can converge fast and achieve almost the optimal performance.

【34】 Pruning Randomly Initialized Neural Networks with Iterative Randomization 标题：用迭代随机化方法修剪随机初始化的神经网络

作者：Daiki Chijiwa,Shin'ya Yamaguchi,Yasutoshi Ida,Kenji Umakoshi,Tomohiro Inoue 机构：Shin’ya Yamaguchi, NTT Software Innovation Center, NTT Corporation 备注：Code will be available at this https URL 链接：https://arxiv.org/abs/2106.09269 摘要：随机初始化神经网络权值的剪枝在彩票假设中起着重要的作用。Ramanujan等人（2020）的经验表明，只有删减权重才能获得显著的性能，而不是优化权重值。然而，为了达到与权值优化相同的性能水平，剪枝方法在剪枝之前需要网络中的更多参数，从而需要更多的内存空间。为了克服这个参数无效的问题，我们引入了一个新的框架，用迭代随机权值（IteRand）修剪随机初始化的神经网络。理论上，我们在我们的框架中证明了一个逼近定理，这表明随机化操作可以有效地减少所需的参数数目。在CIFAR-10和ImageNet上进行了多次实验，验证了参数的有效性。摘要：Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameters in the networks before pruning and thus more memory space. To overcome this parameter inefficiency, we introduce a novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand). Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters. We also empirically demonstrate the parameter efficiency in multiple experiments on CIFAR-10 and ImageNet.

【35】 A Random CNN Sees Objects: One Inductive Bias of CNN and Its Applications 标题：随机CNN看对象：CNN的一种归纳偏向及其应用

作者：Yun-Hao Cao,Jianxin Wu 机构：National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 备注：17 pages, 9 figures, 10 tables 链接：https://arxiv.org/abs/2106.09259 摘要：本文首先揭示了一个令人惊讶的发现：在没有任何学习的情况下，一个随机初始化的CNN可以很好地定位物体。也就是说，CNN有一种自然聚焦于物体的归纳倾向，本文称之为Tobias（“物体在视线中”）。进一步分析了这种经验归纳偏差，并将其成功地应用于自监督学习。鼓励CNN学习聚焦于前景对象的表示，通过将每个图像转换成具有不同背景的不同版本，其中前景和背景分离由Tobias引导。实验结果表明，本文提出的Tobias算法能有效地改善下游任务，尤其是目标检测。本文还表明，Tobias在不同大小的训练集上有一致的改进，并且对图像增强的变化更具弹性。我们的代码将在https://github.com/CupidJay/Tobias. 摘要：This paper starts by revealing a surprising finding: without any learning, a randomly initialized CNN can localize objects surprisingly well. That is, a CNN has an inductive bias to naturally focus on objects, named as Tobias (``The object is at sight'') in this paper. This empirical inductive bias is further analyzed and successfully applied to self-supervised learning. A CNN is encouraged to learn representations that focus on the foreground object, by transforming every image into various versions with different backgrounds, where the foreground and background separation is guided by Tobias. Experimental results show that the proposed Tobias significantly improves downstream tasks, especially for object detection. This paper also shows that Tobias has consistent improvements on training sets of different sizes, and is more resilient to changes in image augmentations. Our codes will be available at https://github.com/CupidJay/Tobias.

【36】 Knowledge Graphs and Machine Learning in biased C4I applications 标题：有偏C4I应用中的知识图与机器学习

作者：Evangelos Paparidis,Konstantinos Kotis 机构：Intelligent Systems Lab, Dept. of Cultural Technology and Communication, University of the Aegean, Mytilene, Greece 链接：https://arxiv.org/abs/2106.09258 摘要：本文介绍了我们对最近出现在人工智能应用中的偏差这一关键问题的立场。具体来说，我们讨论了人工智能应用中使用的当前技术的组合，即机器学习和知识图，并指出它们参与了C4I领域的（去）偏向应用。尽管这是一个广泛的问题，目前出现在不同的应用领域，偏见似乎更重要的C4I比其他由于其安全相关的性质。虽然我们建议采取某些措施来降低C4I应用程序的成本，但我们承认知识图和语义Web社区中这个主题的不成熟方面。摘要：This paper introduces our position on the critical issue of bias that recently appeared in AI applications. Specifically, we discuss the combination of current technologies used in AI applications i.e., Machine Learning and Knowledge Graphs, and point to their involvement in (de)biased applications of the C4I domain. Although this is a wider problem that currently emerges from different application domains, bias appears more critical in C4I than in others due to its security-related nature. While proposing certain actions to be taken towards debiasing C4I applications, we acknowledge the immature aspect of this topic within the Knowledge Graph and Semantic Web communities.

【37】 Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations 标题：看待不同，行动相似：异质观察的模仿学习

作者：Xin-Qiang Cai,Yao-Xiang Ding,Zi-Xuan Chen,Yuan Jiang,Masashi Sugiyama,Zhi-Hua Zhou 机构：National Key Laboratory for Novel Software Technology Nanjing University, Nanjing, China., RIKEN Center for Advanced Intelligence Project, Tokyo, Japan., The University of Tokyo, Tokyo, Japan. 备注：17 pages, 25 figures 链接：https://arxiv.org/abs/2106.09256 摘要：在许多真实世界的模仿学习任务中，演示者和学习者必须在不同但充分的观察空间中行动。这种情况对现有的模仿学习方法产生了重大的障碍，即使它们与传统的空间适应技术相结合。主要的挑战在于在不同的观察空间下，将专家的占用度量与学习者动态变化的占用度量联系起来。在这项工作中，我们将上述学习问题建模为异质观察模仿学习（HOIL）。我们提出了基于重要性加权、拒绝学习和主动查询技术的重要性加权拒绝算法（IWRE）来解决占用度量匹配的关键问题。实验结果表明，IWRE能够成功地解决HOIL任务，包括将基于视觉的演示转化为Atari域下基于随机存取存储器（RAM）策略的挑战性任务。摘要：In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces. This situation generates significant obstacles for existing imitation learning approaches to work, even when they are combined with traditional space adaptation techniques. The main challenge lies in bridging expert's occupancy measures to learner's dynamically changing occupancy measures under the different observation spaces. In this work, we model the above learning problem as Heterogeneous Observations Imitation Learning (HOIL). We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching. Experimental results show that IWRE can successfully solve HOIL tasks, including the challenging task of transforming the vision-based demonstrations to random access memory (RAM)-based policies under the Atari domain.

【38】 CoANE: Modeling Context Co-occurrence for Attributed Network Embedding 标题：Coane：属性网络嵌入的上下文共现建模

作者：I-Chung Hsieh,Cheng-Te Li 机构： National Cheng Kung University 备注：Accepted to IEEE TKDE 2021. Code can be accessed via this https URL 链接：https://arxiv.org/abs/2106.09241 摘要：属性网络嵌入（ANE）是学习低维向量，使网络结构和节点属性都能保持在嵌入空间中。现有的ANE模型没有考虑图的结构和属性之间的具体组合。虽然每个节点都有其自身的结构特征，例如高度互联的邻居及其特定的属性分布模式，但每个节点的邻居不应仅由多跳节点来描述，而应考虑特定的集群或社交圈。为了对这些信息进行建模，本文提出了一种新的ANE模型&上下文共现感知属性网络嵌入（CoANE）。CoANE的基本思想是对每个节点所涉及的不同模式的上下文属性进行建模，并将每个属性作为一个通道，应用卷积机制对位置信息进行编码。上下文共现的学习可以捕获每个节点潜在的社交圈。为了更好地编码节点的结构知识和语义知识，我们设计了一个由正图似然、上下文负采样和属性重构组成的三向目标函数。我们在五个真实数据集上进行了链接预测、节点标签分类和节点聚类的实验。结果表明，CoANE模型的性能明显优于现有的ANE模型。摘要：Attributed network embedding (ANE) is to learn low-dimensional vectors so that not only the network structure but also node attributes can be preserved in the embedding space. Existing ANE models do not consider the specific combination between graph structure and attributes. While each node has its structural characteristics, such as highly-interconnected neighbors along with their certain patterns of attribute distribution, each node's neighborhood should be not only depicted by multi-hop nodes, but consider certain clusters or social circles. To model such information, in this paper, we propose a novel ANE model, Context Co-occurrence-aware Attributed Network Embedding (CoANE). The basic idea of CoANE is to model the context attributes that each node's involved diverse patterns, and apply the convolutional mechanism to encode positional information by treating each attribute as a channel. The learning of context co-occurrence can capture the latent social circles of each node. To better encode structural and semantic knowledge of nodes, we devise a three-way objective function, consisting of positive graph likelihood, contextual negative sampling, and attribute reconstruction. We conduct experiments on five real datasets in the tasks of link prediction, node label classification, and node clustering. The results exhibit that CoANE can significantly outperform state-of-the-art ANE models.

【39】 Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases 标题：有学问还是有学问的猜测？作为知识库的语言模型再认识

作者：Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun,Lingyong Yan,Meng Liao,Tong Xue,Jin Xu 机构：Chinese Information Processing Laboratory, State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China, University of Chinese Academy of Sciences, Beijing, China, Data Quality Team, WeChat, Tencent Inc., China 备注：Accepted to ACL2021(main conference) 链接：https://arxiv.org/abs/2106.09231 摘要：已有文献表明，经过预训练的蒙蔽语言模型（MLMs）如BERT在某些数据集上可以获得有竞争力的事实知识抽取性能，表明MLMs有可能成为一种可靠的知识源。在本文中，我们进行了严格的研究，以探讨潜在的预测机制传销在不同的提取范式。通过研究MLMs的行为，我们发现，以前良好的性能主要是由于过度拟合数据集工件的偏差提示。此外，结合说明性案例和外部情境，主要由于实体类型引导和黄金答案泄漏，提高了知识预测能力。我们的发现揭示了传销的潜在预测机制，并强烈质疑先前的结论，即当前传销可以作为可靠的事实知识库。摘要：Previous literatures show that pre-trained masked language models (MLMs) such as BERT can achieve competitive factual knowledge extraction performance on some datasets, indicating that MLMs can potentially be a reliable knowledge source. In this paper, we conduct a rigorous study to explore the underlying predicting mechanisms of MLMs over different extraction paradigms. By investigating the behaviors of MLMs, we find that previous decent performance mainly owes to the biased prompts which overfit dataset artifacts. Furthermore, incorporating illustrative cases and external contexts improve knowledge prediction mainly due to entity type guidance and golden answer leakage. Our findings shed light on the underlying predicting mechanisms of MLMs, and strongly question the previous conclusion that current MLMs can potentially serve as reliable factual knowledge bases.

【40】 JSI at the FinSim-2 task: Ontology-Augmented Financial Concept Classification 标题：FinSim-2任务中的JSi：本体-增强的金融概念分类

作者：Timen Stepišnik Perdih,Senja Pollak,Blaž \v{Skrlj} 机构：Ljubljana, Slovenia, Blaž Škrlj, Jožef Stefan Institute and Jožef Stefan, International Postgraduate School 链接：https://arxiv.org/abs/2106.09230 摘要：在过去的几年里，本体越来越多地被用于机器推理。如果存在从所需标签到相关本体的映射，它们可以提供概念的解释或用于概念分类。使用本体的另一个优点是它们不需要学习过程，这意味着我们在使用它们之前不需要训练数据或时间。本文提出了一个实际使用的本体分类问题，从金融领域。它首先将一个给定的本体转化为一个图，然后进行泛化，目的是找到金融概念输入集的共同语义描述。我们提出了一个金融领域语义相似性学习共享任务（FinSim-2任务）的解决方案。其任务是设计一个系统，能够自动将金融领域的概念分类为外部本体中最相关的超词概念——金融行业业务本体。我们提出了一种方法，将给定的概念映射到所提到的本体，并对最相关的超词进行图搜索。我们也使用文字向量化方法和机器学习分类器来补充该方法，为每个概念提供一个标签的排序列表。摘要：Ontologies are increasingly used for machine reasoning over the last few years. They can provide explanations of concepts or be used for concept classification if there exists a mapping from the desired labels to the relevant ontology. Another advantage of using ontologies is that they do not need a learning process, meaning that we do not need the train data or time before using them. This paper presents a practical use of an ontology for a classification problem from the financial domain. It first transforms a given ontology to a graph and proceeds with generalization with the aim to find common semantic descriptions of the input sets of financial concepts. We present a solution to the shared task on Learning Semantic Similarities for the Financial Domain (FinSim-2 task). The task is to design a system that can automatically classify concepts from the Financial domain into the most relevant hypernym concept in an external ontology - the Financial Industry Business Ontology. We propose a method that maps given concepts to the mentioned ontology and performs a graph search for the most relevant hypernyms. We also employ a word vectorization method and a machine learning classifier to supplement the method with a ranked list of labels for each concept.

【41】 On the Capabilities of Pointer Networks for Deep Deductive Reasoning 标题：论指针网络的深度演绎推理能力

作者：Monireh Ebrahimi,Aaron Eberhart,Pascal Hitzler 机构：Department of Computer Science, Kansas State University 备注：14 pages, 1 figures 链接：https://arxiv.org/abs/2106.09225 摘要：建立能够学习推理的神经网络的重要性在神经符号学界已经得到了很好的认识。本文应用神经指针网络对符号知识库进行推理。在此过程中，我们探讨了编码器-解码器体系结构的优点和局限性，特别是指针网络的优点和局限性，以开发准确、可推广和健壮的神经符号推理器。基于我们的实验结果，指针网络在多个推理任务中表现得非常好，同时大大超过了先前报道的最新技术。我们观察到，指针网络保持其性能，即使在挑战与知识图的领域/词汇，它从来没有遇到过。据我们所知，这是第一次研究神经符号推理使用指针网络。我们希望我们在这些推理问题上令人印象深刻的结果将鼓励更广泛地探索指针网络在更复杂的逻辑和其他神经符号问题上的推理能力。摘要：The importance of building neural networks that can learn to reason has been well recognized in the neuro-symbolic community. In this paper, we apply neural pointer networks for conducting reasoning over symbolic knowledge bases. In doing so, we explore the benefits and limitations of encoder-decoder architectures in general and pointer networks in particular for developing accurate, generalizable and robust neuro-symbolic reasoners. Based on our experimental results, pointer networks performs remarkably well across multiple reasoning tasks while outperforming the previously reported state of the art by a significant margin. We observe that the Pointer Networks preserve their performance even when challenged with knowledge graphs of the domain/vocabulary it has never encountered before. To the best of our knowledge, this is the first study on neuro-symbolic reasoning using Pointer Networks. We hope our impressive results on these reasoning problems will encourage broader exploration of pointer networks' capabilities for reasoning over more complex logics and for other neuro-symbolic problems.

【42】 Long-Short Temporal Contrastive Learning of Video Transformers 标题：视频Transformer的长短时间对比学习

作者：Jue Wang,Gedas Bertasius,Du Tran,Lorenzo Torresani 机构： Facebook AI, Dartmouth College 备注：Technical report 链接：https://arxiv.org/abs/2106.09212 摘要：最近，视频Transformer作为3D CNN的竞争替代品出现在视频理解领域。然而，由于这些模型具有大量的参数和减少的诱导偏差，因此需要对大规模图像数据集进行有监督的预训练以获得最佳性能。在本文中，我们实证地证明了在纯视频数据集上对视频变换器进行自我监督预训练可以得到与在大规模图像数据集上，甚至像ImageNet-21K这样的大规模图像数据集上进行监督预训练相当或更好的动作识别结果。由于基于转换器的模型能够有效地捕获扩展时间跨度上的依赖关系，因此我们提出了一个简单的学习过程，强制模型将同一视频的长期视图与短期视图相匹配。我们的方法称为长-短时间对比学习（LSTCL），通过预测从较长时间范围内捕获的时间上下文，使视频转换器能够学习有效的片段级表示。为了证明我们的研究结果的普遍性，我们在三种不同的自我监督对比学习框架（mocov3，BYOL，SimSiam）下使用了两种不同的视频转换器结构，包括一种改进的Swin转换器，它增加了时空注意。我们进行了一项深入的消融研究，结果表明，LSTCL在多个视频基准上取得了有竞争力的性能，是有监督的基于图像的预训练的一个令人信服的替代方案。摘要：Video transformers have recently emerged as a competitive alternative to 3D CNNs for video understanding. However, due to their large number of parameters and reduced inductive biases, these models require supervised pretraining on large-scale image datasets to achieve top performance. In this paper, we empirically demonstrate that self-supervised pretraining of video transformers on video-only datasets can lead to action recognition results that are on par or better than those obtained with supervised pretraining on large-scale image datasets, even massive ones such as ImageNet-21K. Since transformer-based models are effective at capturing dependencies over extended temporal spans, we propose a simple learning procedure that forces the model to match a long-term view to a short-term view of the same video. Our approach, named Long-Short Temporal Contrastive Learning (LSTCL), enables video transformers to learn an effective clip-level representation by predicting temporal context captured from a longer temporal extent. To demonstrate the generality of our findings, we implement and validate our approach under three different self-supervised contrastive learning frameworks (MoCo v3, BYOL, SimSiam) using two distinct video-transformer architectures, including an improved variant of the Swin Transformer augmented with space-time attention. We conduct a thorough ablation study and show that LSTCL achieves competitive performance on multiple video benchmarks and represents a convincing alternative to supervised image-based pretraining.

【43】 Amortized Auto-Tuning: Cost-Efficient Transfer Optimization for Hyperparameter Recommendation 标题：摊余自动调谐：超参数推荐的低成本传输优化

作者：Yuxin Xiao,Eric P. Xing,Willie Neiswanger 机构：Carnegie Mellon University,Stanford University,Petuum,MBZUAI 链接：https://arxiv.org/abs/2106.09179 摘要：随着现代机器学习模型的超参数数目和训练次数的激增，超参数整定的代价越来越高。尽管已经提出了通过知识转移来加速调谐的方法，但是它们通常需要超参数的最终性能，并且不关注低保真度信息。然而，这种常见的做法是次优的，可能导致不必要的资源使用。相反，利用低保真度的调整观察值来度量任务间的相似性，并相应地将知识从现有任务转移到新任务，更具成本效益。然而，在传输设置中执行多保真度调谐有其自身的挑战：附加观测中的噪声和性能预测的需要。因此，我们对多任务多保真度贝叶斯优化框架进行了深入的分析，得到了最佳的实例——摊余自动调整（AT2）。我们进一步提出了一个离线计算的27任务超参数推荐（HyperRec）数据库来服务于社区。在HyperRec和其他真实数据库上的大量实验说明了我们的AT2方法的有效性。摘要：With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. Although methods have been proposed to speed up tuning via knowledge transfer, they typically require the final performance of hyperparameters and do not focus on low-fidelity information. Nevertheless, this common practice is suboptimal and can incur an unnecessary use of resources. It is more cost-efficient to instead leverage the low-fidelity tuning observations to measure inter-task similarity and transfer knowledge from existing to new tasks accordingly. However, performing multi-fidelity tuning comes with its own challenges in the transfer setting: the noise in the additional observations and the need for performance forecasting. Therefore, we conduct a thorough analysis of the multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2). We further present an offline-computed 27-task hyperparameter recommendation (HyperRec) database to serve the community. Extensive experiments on HyperRec and other real-world databases illustrate the effectiveness of our AT2 method.

【44】 Can I Be of Further Assistance? Using Unstructured Knowledge Access to Improve Task-oriented Conversational Modeling 标题：我能为您做些什么吗？利用非结构化知识获取改进面向任务的会话建模

作者：Di Jin,Seokhwan Kim,Dilek Hakkani-Tur 机构：Amazon Alexa AI 备注：Presented as a DIALDOC workshop paper at ACL 2021 链接：https://arxiv.org/abs/2106.09174 摘要：大多数以前关于面向任务的对话系统的工作都局限于对域api的有限覆盖。但是，用户经常有超出这些api范围的请求。这项工作的重点是通过合并外部的、非结构化的知识源来响应这些超出API覆盖范围的用户转向。我们的方法以流水线的方式工作，依次进行知识搜索、转弯检测、知识选择和响应生成。我们在前两个步骤中引入了新的数据扩充方法，并证明了使用从对话上下文中提取的信息可以提高知识选择和端到端性能。通过实验，我们在DSTC9 Track 1基准数据集上实现了自动和人工评估指标的最新性能，验证了我们贡献的有效性。摘要：Most prior work on task-oriented dialogue systems are restricted to limited coverage of domain APIs. However, users oftentimes have requests that are out of the scope of these APIs. This work focuses on responding to these beyond-API-coverage user turns by incorporating external, unstructured knowledge sources. Our approach works in a pipelined manner with knowledge-seeking turn detection, knowledge selection, and response generation in sequence. We introduce novel data augmentation methods for the first two steps and demonstrate that the use of information extracted from dialogue context improves the knowledge selection and end-to-end performances. Through experiments, we achieve state-of-the-art performance for both automatic and human evaluation metrics on the DSTC9 Track 1 benchmark dataset, validating the effectiveness of our contributions.

【45】 Work in Progress: Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework 标题：进行中的工作：移动还是FPGA？能效综合评价与统一优化框架

作者：Geng Yuan,Peiyan Dong,Mengshu Sun,Wei Niu,Zhengang Li,Yuxuan Cai,Jun Liu,Weiwen Jiang,Xue Lin,Bin Ren,Xulong Tang,Yanzhi Wang 机构：Northeastern University,College of William and Mary,Carnegie Mellon University, University of Notre Dame,University of Pittsburgh 备注：Poster in the 27th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2021 链接：https://arxiv.org/abs/2106.09166 摘要：深度神经网络（DNN）在边缘设备（即fpga和移动平台）上的有效部署是非常具有挑战性的，尤其是在DNN模型规模和复杂性不断增加的情况下。虽然各种优化方法已经被证明在许多边缘设备上的dnn中是有效的，但是大多数最新的工作集中在ad-hoc优化上，并且没有深入的研究来全面揭示不同边缘设备在考虑不同优化时的潜力和限制。本文对基于FPGA和基于移动的DNN执行的能量效率进行了定性和定量的比较，并进行了详细的分析。摘要：Efficient deployment of Deep Neural Networks (DNNs) on edge devices (i.e., FPGAs and mobile platforms) is very challenging, especially under a recent witness of the increasing DNN model size and complexity. Although various optimization approaches have been proven to be effective in many DNNs on edge devices, most state-of-the-art work focuses on ad-hoc optimizations, and there lacks a thorough study to comprehensively reveal the potentials and constraints of different edge devices when considering different optimizations. In this paper, we qualitatively and quantitatively compare the energy-efficiency of FPGA-based and mobile-based DNN executions, and provide detailed analysis.

【46】 mPyPl: Python Monadic Pipeline Library for Complex Functional Data Processing 标题：mPyP1：用于复杂函数数据处理的Python一元流水线库

作者：Dmitry Soshnikov,Yana Valieva 备注：Published in Microsoft Journal of Applied Research, Dec.2019., Vol. 12 链接：https://arxiv.org/abs/2106.09164 摘要：在本文中，我们提出了一个新的Python库mPyPl，它旨在使用函数方法简化复杂的数据处理任务。该库定义了对以生成器表示的命名词典的惰性数据流（所谓的多字段数据流）的操作，并允许在数据准备和特征提取过程中使用更多的“字段”丰富这些数据流。因此，大多数数据准备任务都可以用简洁的线性“pipeline”形式表示，类似于UNIX管道的语法，或者F |中的|>函数复合操作符。我们定义了多字段数据流上的基本操作，这些操作类似于经典的一元操作，并且展示了所提出的方法与函数编程中的一元操作的相似性。我们还展示了如何在视频事件检测的复杂深度学习任务中使用该库，并讨论了允许在记忆和性能方面进行不同折衷的不同评估策略。摘要：In this paper, we present a new Python library called mPyPl, which is intended to simplify complex data processing tasks using functional approach. This library defines operations on lazy data streams of named dictionaries represented as generators (so-called multi-field datastreams), and allows enriching those data streams with more 'fields' in the process of data preparation and feature extraction. Thus, most data preparation tasks can be expressed in the form of neat linear 'pipeline', similar in syntax to UNIX pipes, or |> functional composition operator in F#. We define basic operations on multi-field data streams, which resemble classical monadic operations, and show similarity of the proposed approach to monads in functional programming. We also show how the library was used in complex deep learning tasks of event detection in video, and discuss different evaluation strategies that allow for different compromises in terms of memory and performance.

【47】 Contrastive Reinforcement Learning of Symbolic Reasoning Domains 标题：符号推理域的对比强化学习

作者：Gabriel Poesia,WenXin Dong,Noah Goodman 机构：Stanford University, Noah D. Goodman 链接：https://arxiv.org/abs/2106.09146 摘要：摘要符号推理是人类智能的重要组成部分，是数学、逻辑学等领域所要求的。这些领域的求解器有着重要的应用，特别是在计算机辅助教育中。但是学习解决符号问题对于机器学习算法来说是一个挑战。现有的模型要么从人类的解决方案中学习，要么使用手工设计的功能，这使得它们在新领域的应用成本很高。在本文中，我们将符号域视为简单的环境，其中状态和动作以非结构化文本形式给出，二进制奖励表示问题是否得到解决。这种灵活的设置使得指定新域变得很容易，但是搜索和规划变得很有挑战性。我们介绍了四个受数学公共核心课程启发的环境，并观察到现有的强化学习基线表现不佳。然后，我们提出了一种新的学习算法，对比策略学习（ConPoLe），它显式地优化了信息损失，降低了当前状态和下一个状态之间的交互信息，这些状态在通往解决方案的路径上继续。康波尔成功地解决了所有四个领域。此外，ConPoLe学习到的问题表征能够准确预测实际数学课程中的问题类别。我们的研究结果为符号领域的强化学习以及在数学教育中的应用提供了新的方向。摘要：Abstract symbolic reasoning, as required in domains such as mathematics and logic, is a key component of human intelligence. Solvers for these domains have important applications, especially to computer-assisted education. But learning to solve symbolic problems is challenging for machine learning algorithms. Existing models either learn from human solutions or use hand-engineered features, making them expensive to apply in new domains. In this paper, we instead consider symbolic domains as simple environments where states and actions are given as unstructured text, and binary rewards indicate whether a problem is solved. This flexible setup makes it easy to specify new domains, but search and planning become challenging. We introduce four environments inspired by the Mathematics Common Core Curriculum, and observe that existing Reinforcement Learning baselines perform poorly. We then present a novel learning algorithm, Contrastive Policy Learning (ConPoLe) that explicitly optimizes the InfoNCE loss, which lower bounds the mutual information between the current state and next states that continue on a path to the solution. ConPoLe successfully solves all four domains. Moreover, problem representations learned by ConPoLe enable accurate prediction of the categories of problems in a real mathematics curriculum. Our results suggest new directions for reinforcement learning in symbolic domains, as well as applications to mathematics education.

【48】 Explainable AI for Natural Adversarial Images 标题：自然对抗性图像的可解释人工智能

作者：Tomas Folke,ZhaoBin Li,Ravi B. Sojitra,Scott Cheng-Hsin Yang,Patrick Shafto 机构：Department of of Mathematics and Computer Science, Rutgers University, Newark, NJ , USA 链接：https://arxiv.org/abs/2106.09106 摘要：对抗性图像突出显示了现代图像分类器是多么容易受到训练集之外的干扰。人类的疏忽可能会减轻这一弱点，但这取决于人类对人工智能的充分理解，以预测它何时可能出错。在以前的研究中，我们发现人类倾向于认为人工智能的决策过程反映了他们自己的决策过程。在这里，我们评估来自可解释人工智能的方法是否可以破坏这一假设，以帮助参与者预测对抗性和标准图像的人工智能分类。我们发现显著图和例子都有助于捕捉人工智能错误，但它们的效果不是相加的，显著图比例子更有效。摘要：Adversarial images highlight how vulnerable modern image classifiers are to perturbations outside of their training set. Human oversight might mitigate this weakness, but depends on humans understanding the AI well enough to predict when it is likely to make a mistake. In previous work we have found that humans tend to assume that the AI's decision process mirrors their own. Here we evaluate if methods from explainable AI can disrupt this assumption to help participants predict AI classifications for adversarial and standard images. We find that both saliency maps and examples facilitate catching AI errors, but their effects are not additive, and saliency maps are more effective than examples.

【49】 Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings 标题：学习型信念搜索：部分可观测环境下策略的有效改进

作者：Hengyuan Hu,Adam Lerer,Noam Brown,Jakob Foerster 链接：https://arxiv.org/abs/2106.09086 摘要：搜索是在单代理和多代理环境中计算有效策略的重要工具，对于在多个完全和部分可观测的基准博弈中实现超人性能至关重要。然而，对于部分可观测环境，先验搜索方法的一个主要限制是计算量与隐藏信息量的比例很小。本文提出了一种计算效率高的部分可观测环境搜索方法&emph{Learned belience Search}（LBS）。LBS使用近似的自回归反事实信念，而不是保持一个精确的信念分布。在多代理设置中，LBS为底层策略使用了一种新的公共-私有模型体系结构，以便在部署期间有效地评估这些策略。在Hanabi的基准域中，LBS可以获得精确搜索55%~91%的好处，同时将计算需求减少35.8倍$~$4.6倍$，允许它扩展到以前搜索方法无法访问的更大设置。摘要：Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games. However, one major limitation of prior search approaches for partially observable environments is that the computational cost scales poorly with the amount of hidden information. In this paper we present \emph{Learned Belief Search} (LBS), a computationally efficient search procedure for partially observable environments. Rather than maintaining an exact belief distribution, LBS uses an approximate auto-regressive counterfactual belief that is learned as a supervised task. In multi-agent settings, LBS uses a novel public-private model architecture for underlying policies in order to efficiently evaluate these policies during rollouts. In the benchmark domain of Hanabi, LBS can obtain 55% ~ 91% of the benefit of exact search while reducing compute requirements by $35.8 \times$ ~ $4.6 \times$, allowing it to scale to larger settings that were inaccessible to previous search methods.

【50】 Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure 标题：基于三维动态场景结构估计的单帧无监督视频预测

作者：Paul Henderson,Christoph H. Lampert,Bernd Bickel 机构：Institute of Science and Technology (IST) Austria 链接：https://arxiv.org/abs/2106.09051 摘要：我们在这项工作中的目标是生成现实的视频给定一个初始帧作为输入。现有的无监督方法没有考虑到这样一个事实，即视频通常显示一个三维环境，即使相机和对象移动，也应该保持帧与帧之间的一致性。我们通过开发一个模型来解决这个问题，该模型首先估计场景的潜在三维结构，包括任何移动对象的分割。然后，它通过模拟对象和摄影机动态，并渲染生成的视图来预测未来的帧。重要的是，它只使用预测未来帧的无监督目标进行端到端训练，没有任何3D信息或分割注释。在两个具有挑战性的自然视频数据集上的实验表明，我们的模型可以从单个帧中估计三维结构和运动分割，从而产生合理和不同的预测。摘要：Our goal in this work is to generate realistic videos given just one initial frame as input. Existing unsupervised approaches to this task do not consider the fact that a video typically shows a 3D environment, and that this should remain coherent from frame to frame even as the camera and objects move. We address this by developing a model that first estimates the latent 3D structure of the scene, including the segmentation of any moving objects. It then predicts future frames by simulating the object and camera dynamics, and rendering the resulting views. Importantly, it is trained end-to-end using only the unsupervised objective of predicting future frames, without any 3D information nor segmentation annotations. Experiments on two challenging datasets of natural videos show that our model can estimate 3D structure and motion segmentation from a single frame, and hence generate plausible and varied predictions.

【51】 A Short Note of PAGE: Optimal Convergence Rates for Nonconvex Optimization 标题：PAGE的一个简短注记：非凸优化的最优收敛速度

作者：Zhize Li 机构：KAUST 备注：4 pages 链接：https://arxiv.org/abs/2106.09663 摘要：在本文中，我们首先回顾了非凸问题设置，并介绍了最优页面算法（Li等人，ICML'21）。然后，我们提供了一个简单和干净的收敛性分析页实现最佳收敛速度。此外，PAGE及其分析方法也很容易被采用，并推广到其他作品中。希望本文能为以后的工作提供一些启示和帮助。摘要：In this note, we first recall the nonconvex problem setting and introduce the optimal PAGE algorithm (Li et al., ICML'21). Then we provide a simple and clean convergence analysis of PAGE for achieving optimal convergence rates. Moreover, PAGE and its analysis can be easily adopted and generalized to other works. We hope that this note provides the insights and is helpful for future works.

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-06-18，如有侵权请联系 cloudcommunity@tencent.com 删除

linux