机器学习学术速递[9.10]

公众号-arXiv每日学术速递

发布于 2021-09-16 16:55:51

1.7K0

发布于 2021-09-16 16:55:51

文章被收录于专栏：arXiv每日学术速递

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计115篇

Graph相关(图学习|图神经网络|图优化等)(9篇)

【1】 fGOT: Graph Distances based on Filters and Optimal Transport 标题：fGOT：基于滤波器和最优传输的图距离链接：https://arxiv.org/abs/2109.04442

作者：Hermina Petric Maretic,Mireille El Gheche,Giovanni Chierchia,Pascal Frossard 机构： EPFL, LTS, Lausanne, Switzerland, Sony AI, Z¨urich, Switzerland, Universit´e Paris-Est, LIGM (UMR ,), CNRS, ENPC, ESIEE Paris, UPEM, F-, Noisy-le-Grand, France 摘要：图比较处理识别图之间的相似性和差异性。一个主要的障碍是图的未知对齐，以及缺乏准确和廉价的比较度量。在这项工作中，我们介绍了过滤图距离。它是一种基于传输的最佳距离，通过过滤图形信号的概率分布来驱动图形比较。这创造了一个高度灵活的距离，能够在观察到的图形中优先考虑不同的光谱信息，为比较度量提供了广泛的选择。我们通过计算最小化新过滤距离的图排列来解决图对齐问题，这隐式地解决了图比较问题。然后，我们提出了一种新的近似代价函数，它绕过了图形比较固有的许多计算困难，并允许在不严重牺牲性能的情况下利用镜像梯度下降等快速算法。最后，我们提出了一种基于随机镜像梯度下降算法的新算法，该算法适应了对准问题的非凸性，在性能精度和速度之间提供了良好的折衷。对图对齐和分类的实验表明，通过过滤图距离获得的灵活性会对性能产生显著影响，而近似成本带来的速度差异使该框架适用于实际环境。摘要：Graph comparison deals with identifying similarities and dissimilarities between graphs. A major obstacle is the unknown alignment of graphs, as well as the lack of accurate and inexpensive comparison metrics. In this work we introduce the filter graph distance. It is an optimal transport based distance which drives graph comparison through the probability distribution of filtered graph signals. This creates a highly flexible distance, capable of prioritising different spectral information in observed graphs, offering a wide range of choices for a comparison metric. We tackle the problem of graph alignment by computing graph permutations that minimise our new filter distances, which implicitly solves the graph comparison problem. We then propose a new approximate cost function that circumvents many computational difficulties inherent to graph comparison and permits the exploitation of fast algorithms such as mirror gradient descent, without grossly sacrificing the performance. We finally propose a novel algorithm derived from a stochastic version of mirror gradient descent, which accommodates the non-convexity of the alignment problem, offering a good trade-off between performance accuracy and speed. The experiments on graph alignment and classification show that the flexibility gained through filter graph distances can have a significant impact on performance, while the difference in speed offered by the approximation cost makes the framework applicable in practical settings.

【2】 Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph 标题：基于字典异构图的文本分类跨语言迁移链接：https://arxiv.org/abs/2109.04400

作者：Nuttapong Chairatanakul,Noppayut Sriwatanasakdi,Nontawat Charoenphakdee,Xin Liu,Tsuyoshi Murata 机构：Tokyo Institute of Technology,RWBC-OIL, AIST,Asurion Japan Holdings G.K., The University of Tokyo,RIKEN AIP,AIRC, AIST 备注：Published in Findings of EMNLP 2021 摘要：在跨语言文本分类中，要求高资源源语言中的任务特定训练数据可用，其中任务与低资源目标语言的任务相同。然而，由于标签成本、任务特征和隐私问题，收集此类训练数据可能是不可行的。本文提出了一种仅使用高资源语言和双语词典的任务无关单词嵌入的替代解决方案。首先，我们从双语词典中构造了一个基于词典的异构图（DHG）。这为使用图形神经网络进行跨语言迁移提供了可能性。剩下的挑战是DHG的异构性，因为考虑了多种语言。为了应对这一挑战，我们提出了基于词典的异构图神经网络（DHGNet），该网络通过两步聚合（词级聚合和语言级聚合）有效地处理DHG的异构性。实验结果表明，尽管我们的方法不能访问大型语料库，但其性能优于预训练模型。此外，即使字典中包含许多不正确的翻译，它也可以很好地执行。它的健壮性允许使用范围更广的词典，例如自动构建的词典和众包词典，这对于现实世界的应用非常方便。摘要：In cross-lingual text classification, it is required that task-specific training data in high-resource source languages are available, where the task is identical to that of a low-resource target language. However, collecting such training data can be infeasible because of the labeling cost, task characteristics, and privacy concerns. This paper proposes an alternative solution that uses only task-independent word embeddings of high-resource languages and bilingual dictionaries. First, we construct a dictionary-based heterogeneous graph (DHG) from bilingual dictionaries. This opens the possibility to use graph neural networks for cross-lingual transfer. The remaining challenge is the heterogeneity of DHG because multiple languages are considered. To address this challenge, we propose dictionary-based heterogeneous graph neural network (DHGNet) that effectively handles the heterogeneity of DHG by two-step aggregations, which are word-level and language-level aggregations. Experimental results demonstrate that our method outperforms pretrained models even though it does not access to large corpora. Furthermore, it can perform well even though dictionaries contain many incorrect translations. Its robustness allows the usage of a wider range of dictionaries such as an automatically constructed dictionary and crowdsourced dictionary, which are convenient for real-world applications.

【3】 GNisi: A graph network for reconstructing Ising models from multivariate binarized data 标题：GNISI：一个从多变量二值化数据重建Ising模型的图网络链接：https://arxiv.org/abs/2109.04257

作者：Emma Slade,Sonya Kiselgof,Lena Granovsky,Jeremy L. England 机构：GSK.ai, London, N,C ,AG, UK, School of Physics, Georgia Institute of Technology, Atlanta, GA, USA, GSK.ai, Basel St, Petah Tikvah, Israel 备注：17 pages 摘要：伊辛模型是描述相互作用的二元变量的简单生成方法。它们在许多生物学环境中被证明是有用的，因为它们使人们能够将观察到的多体相关性表示为许多直接、成对统计相互作用的可分离结果。从数据推断伊辛模型在计算上非常具有挑战性，通常必须满足数值近似或有限精度。在本文中，我们提出了一种从数据中确定伊辛参数的新方法，称为GNisi，它使用在已知伊辛模型上训练的图形神经网络来构造未知数据的参数。我们证明了GNisi比现有最先进的软件更精确，并通过将GNisi应用于基因表达数据来说明我们的方法。摘要：Ising models are a simple generative approach to describing interacting binary variables. They have proven useful in a number of biological settings because they enable one to represent observed many-body correlations as the separable consequence of many direct, pairwise statistical interactions. The inference of Ising models from data can be computationally very challenging and often one must be satisfied with numerical approximations or limited precision. In this paper we present a novel method for the determination of Ising parameters from data, called GNisi, which uses a Graph Neural network trained on known Ising models in order to construct the parameters for unseen data. We show that GNisi is more accurate than the existing state of the art software, and we illustrate our method by applying GNisi to gene expression data.

【4】 Relating Graph Neural Networks to Structural Causal Models 标题：图神经网络与结构因果模型的关联链接：https://arxiv.org/abs/2109.04173

作者：Matej Zečević,Devendra Singh Dhami,Petar Veličković,Kristian Kersting 机构：Computer Science Deptartment, TU Darmstadt,DeepMind 备注：Main paper: 7 pages, References: 2 pages, Appendix: 10 pages; Main paper: 5 figures, Appendix: 3 figures 摘要：因果关系可以用结构因果模型（SCM）来描述，该模型包含有关感兴趣的变量及其机械关系的信息。对于大多数感兴趣的过程，底层SCM将仅部分可见，因此因果推理试图利用任何公开的信息。图形神经网络（GNN）作为结构化输入的通用逼近器，为因果学习提供了一个可行的候选者，建议与SCM进行更紧密的集成。为此，我们从第一原理出发进行了理论分析，在GNN和SCM之间建立了一种新的联系，同时对一般的神经因果模型进行了扩展。然后，我们为基于GNN的因果推理建立了一个新的模型类，该模型类对于因果效应识别是必要和充分的。我们在模拟和标准基准上的实证说明验证了我们的理论证明。摘要：Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries to leverage any exposed information. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for causal learning, suggesting a tighter integration with SCM. To this effect we present a theoretical analysis from first principles that establishes a novel connection between GNN and SCM while providing an extended view on general neural-causal models. We then establish a new model class for GNN-based causal inference that is necessary and sufficient for causal effect identification. Our empirical illustration on simulations and standard benchmarks validate our theoretical proofs.

【5】 Single Image 3D Object Estimation with Primitive Graph Networks 标题：基于原始图网络的单幅图像三维目标估计链接：https://arxiv.org/abs/2109.04153

作者：Qian He,Desen Zhou,Bo Wan,Xuming He 机构：School of Information Science and Technology, ShanghaiTech University, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences 备注：Accepted by ACM MM'21 摘要：从单个图像（RGB或深度）重建三维对象是视觉场景理解中的一个基本问题，但由于其不适定性和复杂性，在现实场景中仍然具有挑战性。为了应对这些挑战，我们采用了一种基于原语的三维对象表示方法，并提出了一种用于基于原语的三维对象估计的两级图网络，该网络由一个顺序建议模块和一个图推理模块组成。给定一幅二维图像，我们的提案模块首先从输入图像生成一系列具有局部特征的三维原语。然后，图形推理模块对基本图形执行联合推理，以捕获每个基本图形的全局形状上下文。这种框架能够在三维结构恢复过程中考虑丰富的几何和语义约束，即使在具有挑战性的观察条件下，也能生成具有更连贯结构的三维对象。我们以阶段策略训练整个图形神经网络，并在三个基准上对其进行评估：Pix3D、ModelNet和纽约大学深度V2。大量的实验表明，我们的方法比以前的先进技术有很大的优势。摘要：Reconstructing 3D object from a single image (RGB or depth) is a fundamental problem in visual scene understanding and yet remains challenging due to its ill-posed nature and complexity in real-world scenes. To address those challenges, we adopt a primitive-based representation for 3D object, and propose a two-stage graph network for primitive-based 3D object estimation, which consists of a sequential proposal module and a graph reasoning module. Given a 2D image, our proposal module first generates a sequence of 3D primitives from input image with local feature attention. Then the graph reasoning module performs joint reasoning on a primitive graph to capture the global shape context for each primitive. Such a framework is capable of taking into account rich geometry and semantic constraints during 3D structure recovery, producing 3D objects with more coherent structure even under challenging viewing conditions. We train the entire graph neural network in a stage-wise strategy and evaluate it on three benchmarks: Pix3D, ModelNet and NYU Depth V2. Extensive experiments show that our approach outperforms the previous state of the arts with a considerable margin.

【6】 TimeTraveler: Reinforcement Learning for Temporal Knowledge Graph Forecasting 标题：Timetraveler：强化学习在时态知识图预测中的应用链接：https://arxiv.org/abs/2109.04101

作者：Haohai Sun,Jialun Zhong,Yunpu Ma,Zhen Han,Kun He 机构： School of Computer Science and Technology, Huazhong University of Science and Technology, Institute of Informatics, LMU Munich , Corporate Technology, Siemens AG 备注：EMNLP 2021 摘要：时态知识图（TKG）推理是近年来获得越来越多研究兴趣的一项重要任务。现有的大多数方法侧重于在过去的时间戳上进行推理以完成缺失的事实，而在已知的TKG上进行推理以预测未来事实的工作则很少。与完成任务相比，预测任务更加困难，面临两个主要挑战：（1）如何有效地建模时间信息以处理未来的时间戳？（2）如何进行归纳推理来处理随着时间推移而出现的以前看不见的实体？为了应对这些挑战，我们提出了第一种用于预测的强化学习方法。具体来说，代理通过历史知识图快照来搜索答案。我们的方法定义了一个相对时间编码函数来捕获时间跨度信息，并设计了一种新的基于Dirichlet分布的时间型奖励来指导模型学习。此外，我们还提出了一种新的不可见实体表示方法，以提高模型的归纳推理能力。我们在未来的时间戳中评估我们的链路预测任务方法。在四个基准数据集上进行的大量实验表明，与现有的最新方法相比，性能有了显著提高，同时具有更高的解释性、更少的计算量和更少的参数。摘要：Temporal knowledge graph (TKG) reasoning is a crucial task that has gained increasing research interest in recent years. Most existing methods focus on reasoning at past timestamps to complete the missing facts, and there are only a few works of reasoning on known TKGs to forecast future facts. Compared with the completion task, the forecasting task is more difficult that faces two main challenges: (1) how to effectively model the time information to handle future timestamps? (2) how to make inductive inference to handle previously unseen entities that emerge over time? To address these challenges, we propose the first reinforcement learning method for forecasting. Specifically, the agent travels on historical knowledge graph snapshots to search for the answer. Our method defines a relative time encoding function to capture the timespan information, and we design a novel time-shaped reward based on Dirichlet distribution to guide the model learning. Furthermore, we propose a novel representation method for unseen entities to improve the inductive inference ability of the model. We evaluate our method for this link prediction task at future timestamps. Extensive experiments on four benchmark datasets demonstrate substantial performance improvement meanwhile with higher explainability, less calculation, and fewer parameters when compared with existing state-of-the-art methods.

【7】 Local Augmentation for Graph Neural Networks 标题：图神经网络的局部增强算法链接：https://arxiv.org/abs/2109.03856

作者：Songtao Liu,Hanze Dong,Lanqing Li,Tingyang Xu,Yu Rong,Peilin Zhao,Junzhou Huang,Dinghao Wu 机构： College of Information Sciences and Technology, The Pennsylvania State University, Departments of Mathematics, The Hong Kong University of Science and Technology, Tencent AI Lab, University of Texas at Arlington 备注：16 pages, 5 figures 摘要：数据扩充在图像数据和语言数据中得到了广泛的应用，但在图结构数据方面的研究还不够深入。现有的方法侧重于从全局角度增强图形数据，主要分为两类：结构操纵和带有特征噪声注入的对抗性训练。然而，结构操纵方法存在信息丢失问题，而对抗性训练方法可能通过注入噪声降低特征质量。在这项工作中，我们引入了局部增强，它通过局部子图结构增强节点特征。具体来说，我们将数据论证建模为一个特征生成过程。在给定中心节点特征的情况下，我们的局部增强方法学习其邻居特征的条件分布，并生成邻居的最优特征以提高下游任务的性能。在局部扩充的基础上，我们进一步设计了一个新的框架：LA-GNN，它可以以即插即用的方式应用于任何GNN模型。大量的实验和分析表明，局部增强在一组不同的基准测试中始终能够提高各种GNN体系结构的性能。代码可在https://github.com/Soughing0823/LAGNN. 摘要：Data augmentation has been widely used in image data and linguistic data but remains under-explored on graph-structured data. Existing methods focus on augmenting the graph data from a global perspective and largely fall into two genres: structural manipulation and adversarial training with feature noise injection. However, the structural manipulation approach suffers information loss issues while the adversarial training approach may downgrade the feature quality by injecting noise. In this work, we introduce the local augmentation, which enhances node features by its local subgraph structures. Specifically, we model the data argumentation as a feature generation process. Given the central node's feature, our local augmentation approach learns the conditional distribution of its neighbors' features and generates the neighbors' optimal feature to boost the performance of downstream tasks. Based on the local augmentation, we further design a novel framework: LA-GNN, which can apply to any GNN models in a plug-and-play manner. Extensive experiments and analyses show that local augmentation consistently yields performance improvement for various GNN architectures across a diverse set of benchmarks. Code is available at https://github.com/Soughing0823/LAGNN.

【8】 PhysGNN: A Physics-Driven Graph Neural Network Based Model for Predicting Soft Tissue Deformation in Image-Guided Neurosurgery 标题：PhysGNN：基于物理驱动图神经网络的图像引导神经外科软组织变形预测模型链接：https://arxiv.org/abs/2109.04352

作者：Yasmin Salehi,Dennis Giannacopoulos 机构：Department of Electrical and Computer Engineering, McGill University, Canada 备注：Preprint 摘要：在图像引导的神经外科手术中，正确捕捉术中脑移位是将术前数据与术中几何结构对齐、确保有效手术导航和最佳手术精度的关键任务。虽然有限元法（FEM）是一种经过验证的技术，可以通过生物力学公式有效地近似软组织变形，但其成功程度归结为精度和速度之间的权衡。为了避免这一问题，该领域的最新工作提出利用通过训练各种机器学习算法（例如随机森林、人工神经网络（ANN））获得的数据驱动模型，以及有限元分析（FEA）的结果，通过预测加速组织变形近似。然而，这些方法不考虑训练期间有限元（FE）网格的结构，该训练提供了节点连接性以及它们之间距离的信息，这有助于根据力载荷点与其余网格节点的接近程度近似组织变形。因此，本研究提出了一种新的框架PhysGNN，这是一种数据驱动模型，通过利用图形神经网络（GNN）来近似FEA的解决方案，该网络能够在非结构化网格和复杂拓扑结构上考虑网格结构信息和归纳学习。从经验上看，我们证明了所提出的结构PhysGNN在保持计算可行性的同时，保证了精确和快速的软组织变形近似，适用于神经外科环境。摘要：Correctly capturing intraoperative brain shift in image-guided neurosurgical procedures is a critical task for aligning preoperative data with intraoperative geometry, ensuring effective surgical navigation and optimal surgical precision. While the finite element method (FEM) is a proven technique to effectively approximate soft tissue deformation through biomechanical formulations, their degree of success boils down to a trade-off between accuracy and speed. To circumvent this problem, the most recent works in this domain have proposed leveraging data-driven models obtained by training various machine learning algorithms, e.g. random forests, artificial neural networks (ANNs), with the results of finite element analysis (FEA) to speed up tissue deformation approximations by prediction. These methods, however, do not account for the structure of the finite element (FE) mesh during training that provides information on node connectivities as well as the distance between them, which can aid with approximating tissue deformation based on the proximity of force load points with the rest of the mesh nodes. Therefore, this work proposes a novel framework, PhysGNN, a data-driven model that approximates the solution of FEA by leveraging graph neural networks (GNNs), which are capable of accounting for the mesh structural information and inductive learning over unstructured grids and complex topological structures. Empirically, we demonstrate that the proposed architecture, PhysGNN, promises accurate and fast soft tissue deformation approximations while remaining computationally feasible, suitable for neurosurgical settings.

【9】 Popularity Adjusted Block Models are Generalized Random Dot Product Graphs 标题：受欢迎度调整的挡路模型是广义随机点乘积图链接：https://arxiv.org/abs/2109.04010

作者：John Koo,Minh Tang,Michael W. Trosset 机构：Department of Statistics, Indiana University, Department of Statistics, North Carolina State University 备注：33 pages, 7 figures 摘要：通过证明PABM是GRDPG的一个特例，其中社区对应于潜在向量的相互正交子空间，我们连接了两个随机图模型，即流行调整块模型（PABM）和广义随机点积图（GRDPG）。这种见解使我们能够为PABM构建新的社区检测和参数估计算法，并改进依赖于稀疏子空间聚类的现有算法。利用已建立的GRDPG邻接谱嵌入的渐近性质，我们得到了这些算法的渐近性质。特别地，我们证明了当图的顶点数趋于无穷大时，社区检测错误的绝对数趋于零。仿真实验说明了这些特性。摘要：We connect two random graph models, the Popularity Adjusted Block Model (PABM) and the Generalized Random Dot Product Graph (GRDPG), by demonstrating that the PABM is a special case of the GRDPG in which communities correspond to mutually orthogonal subspaces of latent vectors. This insight allows us to construct new algorithms for community detection and parameter estimation for the PABM, as well as improve an existing algorithm that relies on Sparse Subspace Clustering. Using established asymptotic properties of Adjacency Spectral Embedding for the GRDPG, we derive asymptotic properties of these algorithms. In particular, we demonstrate that the absolute number of community detection errors tends to zero as the number of graph vertices tends to infinity. Simulation experiments illustrate these properties.

Transformer(6篇)

【1】 All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality 标题：全叫不咬：Transformer语言模型中的流氓维度模糊了表征质量链接：https://arxiv.org/abs/2109.04404

作者：William Timkey,Marten van Schijndel 机构：Department of Linguistics, Cornell University 备注：Accepted at EMNLP 2021 摘要：相似性度量是理解语言模型如何表示和处理语言的重要工具。标准的表征相似性度量（如余弦相似性和欧氏距离）已成功地应用于静态单词嵌入模型中，以了解单词在语义空间中如何聚类。最近，这些措施已被应用于背景化模型（如BERT和GPT-2）的嵌入。在这项工作中，我们对语境化语言模型的这些度量的信息性提出了质疑。我们发现，少数流氓维度（通常只有1-3）主导了这些度量。此外，我们发现主导相似性度量的维度与对模型行为重要的维度之间存在显著的不匹配。我们表明，简单的后处理技术，如标准化，能够纠正流氓尺寸和揭示潜在的代表性质量。我们认为，对于上下文语言模型的任何基于相似性的分析来说，考虑恶意维度是必不可少的。摘要：Similarity measures are a vital tool for understanding how language models represent and process language. Standard representational similarity measures such as cosine similarity and Euclidean distance have been successfully used in static word embedding models to understand how words cluster in semantic space. Recently, these measures have been applied to embeddings from contextualized models such as BERT and GPT-2. In this work, we call into question the informativity of such measures for contextualized language models. We find that a small number of rogue dimensions, often just 1-3, dominate these measures. Moreover, we find a striking mismatch between the dimensions that dominate similarity measures and those which are important to the behavior of the model. We show that simple postprocessing techniques such as standardization are able to correct for rogue dimensions and reveal underlying representational quality. We argue that accounting for rogue dimensions is essential for any similarity-based analysis of contextual language models.

【2】 UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer 标题：UCTransNet：用Transformer从通道角度重新思考U-Net中的跳过连接链接：https://arxiv.org/abs/2109.04335

作者：Haonan Wang,Peng Cao,Jiaqi Wang,Osmar R. Zaiane 机构： College of Computer Science and Engineering, Key Laboratory of Intelligent Computing in Medical Image, Northeastern, University,Shenyang, China, Amii, University of Alberta, Edmonton, Canada 摘要：最新的语义分割方法采用U-Net框架和编码器-解码器体系结构。对于使用简单跳连接方案的U-Net来说，建模全局多尺度上下文仍然是一个挑战：1）由于编码器和解码器阶段的不兼容特征集的问题，并非每个跳连接设置都是有效的，甚至一些跳连接会对分段性能产生负面影响；2）在某些数据集上，原始U-Net比没有任何跳过连接的U-Net更糟糕。基于我们的研究结果，我们提出了一个新的细分框架，命名为UCTransNet（在U-Net中有一个CTrans模块），从渠道角度考虑，并结合注意机制。具体而言，CTrans模块是U-Net跳过连接的替代，它包括一个子模块，用于与Transformer进行多尺度通道交叉融合（命名为CCT）和一个子模块通道交叉注意（命名为CCA）引导融合后的多尺度信道信息有效地连接到解码器特征，以消除歧义。因此，由CCT和CCA组成的连接能够取代原来的skip连接，从而解决语义鸿沟，实现精确的医学图像自动分割。实验结果表明，我们的UCTransNet在跨不同数据集和涉及transformer或U形框架的传统体系结构的语义分割方面比最先进的技术具有更精确的分割性能，并取得了一致的改进。代码：https://github.com/McGregorWwww/UCTransNet. 摘要：Most recent semantic segmentation methods adopt a U-Net framework with an encoder-decoder architecture. It is still challenging for U-Net with a simple skip connection scheme to model the global multi-scale context: 1) Not each skip connection setting is effective due to the issue of incompatible feature sets of encoder and decoder stage, even some skip connection negatively influence the segmentation performance; 2) The original U-Net is worse than the one without any skip connection on some datasets. Based on our findings, we propose a new segmentation framework, named UCTransNet (with a proposed CTrans module in U-Net), from the channel perspective with attention mechanism. Specifically, the CTrans module is an alternate of the U-Net skip connections, which consists of a sub-module to conduct the multi-scale Channel Cross fusion with Transformer (named CCT) and a sub-module Channel-wise Cross-Attention (named CCA) to guide the fused multi-scale channel-wise information to effectively connect to the decoder features for eliminating the ambiguity. Hence, the proposed connection consisting of the CCT and CCA is able to replace the original skip connection to solve the semantic gaps for an accurate automatic medical image segmentation. The experimental results suggest that our UCTransNet produces more precise segmentation performance and achieves consistent improvements over the state-of-the-art for semantic segmentation across different datasets and conventional architectures involving transformer or U-shaped framework. Code: https://github.com/McGregorWwww/UCTransNet.

【3】 MATE: Multi-view Attention for Table Transformer Efficiency 标题：Mate：台式Transformer效率的多视角关注链接：https://arxiv.org/abs/2109.04312

作者：Julian Martin Eisenschlos,Maharshi Gor,Thomas Müller,William W. Cohen 机构：Google Research, Dept. of Computer Science, University of Maryland, Symanto Research, Valencia, Spain 备注：Accepted to EMNLP 2021 摘要：这项工作提出了一种稀疏注意转换器体系结构，用于对包含大型表的文档进行建模。表格在网络上无处不在，而且信息丰富。然而，web上超过20%的关系表具有20行或更多行（Cafarella等人，2008），这些大型表对电流互感器模型提出了挑战，电流互感器模型通常限于512个令牌。在这里，我们提出MATE，一种新的转换器体系结构，用于对web表的结构进行建模。MATE使用稀疏注意力的方式，使头部能够有效地关注表中的行或列。该体系结构在速度和内存方面呈线性扩展，可以使用当前加速器处理包含8000多个令牌的文档。MATE还对表格数据具有更合适的归纳偏差，并为三个表格推理数据集设置了新的最新技术。对于HybridQA（Chen等人，2020b），一个涉及包含表格的大型文档的数据集，我们将最佳先验结果提高了19个点。摘要：This work presents a sparse-attention Transformer architecture for modeling documents that contain large tables. Tables are ubiquitous on the web, and are rich in information. However, more than 20% of relational tables on the web have 20 or more rows (Cafarella et al., 2008), and these large tables present a challenge for current Transformer models, which are typically limited to 512 tokens. Here we propose MATE, a novel Transformer architecture designed to model the structure of web tables. MATE uses sparse attention in a way that allows heads to efficiently attend to either rows or columns in a table. This architecture scales linearly with respect to speed and memory, and can handle documents containing more than 8000 tokens with current accelerators. MATE also has a more appropriate inductive bias for tabular data, and sets a new state-of-the-art for three table reasoning datasets. For HybridQA (Chen et al., 2020b), a dataset that involves large documents containing tables, we improve the best prior result by 19 points.

【4】 Bag of Tricks for Optimizing Transformer Efficiency 标题：优化Transformer效率的一袋袋策略链接：https://arxiv.org/abs/2109.04030

作者：Ye Lin,Yanyang Li,Tong Xiao,Jingbo Zhu 机构：NLP Lab, School of Computer Science and Engineering, Northeastern University, Shenyang, China, The Chinese University of Hong Kong, Hong Kong, China, NiuTrans Research, Shenyang, China 备注：accepted by EMNLP (Findings) 2021 摘要：最近，提高Transformer效率变得越来越有吸引力。已经提出了各种各样的方法，例如剪枝、量化、新架构等。但是这些方法要么在实现上很复杂，要么依赖于硬件。在本文中，我们展示了通过结合一些简单的和硬件无关的方法，包括调整超参数、更好的设计选择和训练策略，可以提高Transformer的效率。在WMT新闻翻译任务中，我们将强转换系统的推理效率在CPU上提高了3.80倍，在GPU上提高了2.52倍。该守则可于https://github.com/Lollipop321/mini-decoder-network. 摘要：Improving Transformer efficiency has become increasingly attractive recently. A wide range of methods has been proposed, e.g., pruning, quantization, new architectures and etc. But these methods are either sophisticated in implementation or dependent on hardware. In this paper, we show that the efficiency of Transformer can be improved by combining some simple and hardware-agnostic methods, including tuning hyper-parameters, better design choices and training strategies. On the WMT news translation tasks, we improve the inference efficiency of a strong Transformer system by 3.80X on CPU and 2.52X on GPU. The code is publicly available at https://github.com/Lollipop321/mini-decoder-network.

【5】 Learning the Physics of Particle Transport via Transformers 标题：通过Transformer学习粒子输运物理链接：https://arxiv.org/abs/2109.03951

作者：Oscar Pastor-Serrano,Zoltán Perkó 机构：Delft University of Technology, Department of Radiation Science and Technology, Mekelweg ,JB Delft, Netherlands 摘要：粒子物理模拟是核工程应用的基石。其中放射治疗（RT）对社会至关重要，50%的癌症患者接受放射治疗。为了最精确地靶向肿瘤，下一代RT治疗的目标是在辐射传递过程中进行实时校正，这就需要粒子传输算法，即使在高度异质的患者几何结构中，也能在亚秒的时间内产生精确的剂量分布。这在目前可用的纯物理模拟中是不可行的。在这项研究中，我们提出了一种数据驱动的剂量计算算法，用于预测任意能量和患者几何形状的单能质子束沉积的剂量。我们的方法将粒子传输视为序列建模，其中卷积层将重要的空间特征提取到标记中，转换器自我注意机制在序列中的这些标记和波束能量标记之间路由信息。我们使用计算昂贵但精确的蒙特卡罗（MC）模拟来训练我们的网络并评估预测精度，蒙特卡罗模拟被认为是粒子物理学的金标准。我们提出的模型比当前的临床分析铅笔束算法快33倍，提高了它们在最异质和最具挑战性的几何体中的准确性。相对误差为0.34%，伽马通过率高达99.59%（1%，3mm），即使在更精细的网格分辨率下，它也大大优于唯一发布的类似数据驱动质子剂量算法。我们的模型可以使MC精度提高400倍，从而克服迄今为止禁止实时自适应质子治疗的主要障碍，并显著提高癌症治疗效果。它对其他粒子的物理相互作用建模的潜力也可以促进受传统方法速度限制的重离子治疗计划程序。摘要：Particle physics simulations are the cornerstone of nuclear engineering applications. Among them radiotherapy (RT) is crucial for society, with 50% of cancer patients receiving radiation treatments. For the most precise targeting of tumors, next generation RT treatments aim for real-time correction during radiation delivery, necessitating particle transport algorithms that yield precise dose distributions in sub-second times even in highly heterogeneous patient geometries. This is infeasible with currently available, purely physics based simulations. In this study, we present a data-driven dose calculation algorithm predicting the dose deposited by mono-energetic proton beams for arbitrary energies and patient geometries. Our approach frames particle transport as sequence modeling, where convolutional layers extract important spatial features into tokens and the transformer self-attention mechanism routes information between such tokens in the sequence and a beam energy token. We train our network and evaluate prediction accuracy using computationally expensive but accurate Monte Carlo (MC) simulations, considered the gold standard in particle physics. Our proposed model is 33 times faster than current clinical analytic pencil beam algorithms, improving upon their accuracy in the most heterogeneous and challenging geometries. With a relative error of 0.34% and very high gamma pass rate of 99.59% (1%, 3 mm), it also greatly outperforms the only published similar data-driven proton dose algorithm, even at a finer grid resolution. Offering MC precision 400 times faster, our model could overcome a major obstacle that has so far prohibited real-time adaptive proton treatments and significantly increase cancer treatment efficacy. Its potential to model physics interactions of other particles could also boost heavy ion treatment planning procedures limited by the speed of traditional methods.

【6】 EEGDnet: Fusing Non-Local and Local Self-Similarity for 1-D EEG Signal Denoising with 2-D Transformer 标题：EEGDnet：融合非局部和局部自相似性的一维脑电信号二维变换去噪链接：https://arxiv.org/abs/2109.04235

作者：Peng Yi,Kecheng Chen,Zhaoqi Ma,Di Zhao,Xiaorong Pu,Yazhou Ren 机构：School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 摘要：脑电图（EEG）显示了一种产生脑-机接口（BCI）的有用方法。由于高时间分辨率，一维（1-D）EEG信号很容易受到某些伪影（也称为噪声）的干扰。因此，去除接收到的脑电信号中的噪声至关重要。近年来，与传统的脑电信号去噪方法相比，基于深度学习的脑电信号去噪方法取得了令人印象深刻的效果。众所周知，数据（如自然图像和时域信号）的自相似性（包括非局部和局部）特性被广泛用于去噪。然而，现有的基于深度学习的脑电信号去噪方法忽略了非局部自相似性（如一维卷积神经网络）和局部自相似性（如全连通网络和递归神经网络）。为了解决这个问题，我们提出了一种新的带有二维Transformer的一维脑电信号去噪网络，即EEGDnet。具体来说，我们通过Transformer模块综合考虑了脑电信号的非局部和局部自相似性。通过融合自注意块中的非局部自相似性和前馈块中的局部自相似性，可以显著降低噪声和异常值带来的负面影响。大量实验表明，与其他最先进的模型相比，EEGDnet在定量和定性指标方面都取得了更好的性能。摘要：Electroencephalogram (EEG) has shown a useful approach to produce a brain-computer interface (BCI). One-dimensional (1-D) EEG signal is yet easily disturbed by certain artifacts (a.k.a. noise) due to the high temporal resolution. Thus, it is crucial to remove the noise in received EEG signal. Recently, deep learning-based EEG signal denoising approaches have achieved impressive performance compared with traditional ones. It is well known that the characteristics of self-similarity (including non-local and local ones) of data (e.g., natural images and time-domain signals) are widely leveraged for denoising. However, existing deep learning-based EEG signal denoising methods ignore either the non-local self-similarity (e.g., 1-D convolutional neural network) or local one (e.g., fully connected network and recurrent neural network). To address this issue, we propose a novel 1-D EEG signal denoising network with 2-D transformer, namely EEGDnet. Specifically, we comprehensively take into account the non-local and local self-similarity of EEG signal through the transformer module. By fusing non-local self-similarity in self-attention blocks and local self-similarity in feed forward blocks, the negative impact caused by noises and outliers can be reduced significantly. Extensive experiments show that, compared with other state-of-the-art models, EEGDnet achieves much better performance in terms of both quantitative and qualitative metrics.

GAN|对抗|攻击|生成相关(9篇)

【1】 Multi-granularity Textual Adversarial Attack with Behavior Cloning 标题：基于行为克隆的多粒度文本对抗攻击链接：https://arxiv.org/abs/2109.04367

作者：Yangyi Chen,Jin Su,Wei Wei 机构：Cognitive Computing and Intelligent Information Processing Laboratory, School of Computer, Science and Technology Huazhong University of Science and Tehchnology, School of Software Engineering, Huazhong University of Science and Tehchnology 备注：Accepted by the main conference of EMNLP 2021 摘要：近年来，文本对抗攻击模型由于能够成功地估计NLP模型的鲁棒性而越来越流行。然而，现有的工作有明显的不足。（1）他们通常只考虑修改策略的单一粒度（例如，词级或句子级），这是不足以探索整体的篇章空间来生成的；（2）他们需要数百次查询受害者模型才能成功进行攻击，这在实践中效率很低。为了解决这些问题，本文提出了一种多粒度攻击模型MAYA，它可以有效地生成高质量的对抗性样本，而对受害者模型的查询更少。此外，我们提出了一种基于强化学习的方法，利用MAYA算法中的专家知识，通过行为克隆来训练多粒度攻击代理，以进一步减少查询时间。此外，我们还使代理适应于攻击只输出标签而不输出置信度分数的黑盒模型。我们通过在两种不同的黑盒攻击设置和三个基准数据集中攻击BiLSTM、BERT和RoBERTa，进行综合实验来评估我们的攻击模型。实验结果表明，与基线模型相比，我们的模型总体上取得了更好的攻击性能，并产生了更流畅、更符合语法的对抗性样本。此外，我们的对抗式攻击代理显著减少了两种攻击设置下的查询时间。我们的代码发布于https://github.com/Yangyi-Chen/MAYA. 摘要：Recently, the textual adversarial attack models become increasingly popular due to their successful in estimating the robustness of NLP models. However, existing works have obvious deficiencies. (1) They usually consider only a single granularity of modification strategies (e.g. word-level or sentence-level), which is insufficient to explore the holistic textual space for generation; (2) They need to query victim models hundreds of times to make a successful attack, which is highly inefficient in practice. To address such problems, in this paper we propose MAYA, a Multi-grAnularitY Attack model to effectively generate high-quality adversarial samples with fewer queries to victim models. Furthermore, we propose a reinforcement-learning based method to train a multi-granularity attack agent through behavior cloning with the expert knowledge from our MAYA algorithm to further reduce the query times. Additionally, we also adapt the agent to attack black-box models that only output labels without confidence scores. We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets. Experimental results show that our models achieve overall better attacking performance and produce more fluent and grammatical adversarial samples compared to baseline models. Besides, our adversarial attack agent significantly reduces the query times in both attack settings. Our codes are released at https://github.com/Yangyi-Chen/MAYA.

【2】 Energy Attack: On Transferring Adversarial Examples 标题：能量攻势：传递对抗性例证链接：https://arxiv.org/abs/2109.04300

作者：Ruoxi Shi,Borui Yang,Yangzhou Jiang,Chenglong Zhao,Bingbing Ni 机构：Shanghai Jiao Tong University 备注：Under Review for AAAI-22 摘要：在这项工作中，我们提出了能量攻击，一种基于转移的黑盒$L\infty$-对抗性攻击。该攻击是无参数的，不需要梯度近似。特别是，我们首先获得了一个代理模型的白盒对抗性扰动，并将这些扰动划分为小块。然后利用主成分分析（PCA）提取这些面片的单位分量向量和特征值。基于特征值，我们可以模拟对抗性扰动的能量分布。然后，我们通过根据扰动片的能量分布对扰动片进行采样，并将采样的扰动片平铺以形成全尺寸的对抗扰动来执行黑箱攻击。这可以在没有受害者模型可用访问权限的情况下完成。大量实验表明，所提出的能量攻击在各种模型和多个数据集上的黑盒攻击中达到了最先进的性能。此外，提取的分布能够在不同的模型体系结构和不同的数据集之间传输，因此是vision体系结构的固有特性。摘要：In this work we propose Energy Attack, a transfer-based black-box $L_\infty$-adversarial attack. The attack is parameter-free and does not require gradient approximation. In particular, we first obtain white-box adversarial perturbations of a surrogate model and divide these perturbations into small patches. Then we extract the unit component vectors and eigenvalues of these patches with principal component analysis (PCA). Base on the eigenvalues, we can model the energy distribution of adversarial perturbations. We then perform black-box attacks by sampling from the perturbation patches according to their energy distribution, and tiling the sampled patches to form a full-size adversarial perturbation. This can be done without the available access to victim models. Extensive experiments well demonstrate that the proposed Energy Attack achieves state-of-the-art performance in black-box attacks on various models and several datasets. Moreover, the extracted distribution is able to transfer among different model architectures and different datasets, and is therefore intrinsic to vision architectures.

【3】 Generation, augmentation, and alignment: A pseudo-source domain based method for source-free domain adaptation 标题：生成、增强和对齐：一种基于伪源域的无源域自适应方法链接：https://arxiv.org/abs/2109.04015

作者：Yuntao Du,Haiyang Yang,Mingcai Chen,Juan Jiang,Hongtao Luo,Chongjun Wang 机构：Department of computer science and technology, Nanjing University 备注：Submitted to AAAI 2022 摘要：传统的无监督域自适应（UDA）方法需要同时访问标记源样本和未标记目标样本来训练模型。但在某些情况下，由于数据隐私和安全原因，源示例不可用于目标域。为了克服这一挑战，最近，源自由域自适应（SFDA）引起了研究人员的注意，其中既给出了经过训练的源模型，也给出了未标记的目标样本。现有的SFDA方法要么采用基于伪标签的策略，要么生成更多的样本。然而，这些方法并没有明确减少跨域的分布转移，这是良好适应的关键。虽然没有可用的源样本，但幸运的是，我们发现一些目标样本与源域非常相似，可以用于近似源域。这个近似域被表示为伪源域。在本文中，受此启发，我们提出了一种基于伪源域的新方法。该方法首先生成并扩充伪源域，然后采用基于伪标签策略的四种新损耗分布对齐。其中，在伪源域和剩余目标域之间引入域对抗损失，以减少分布偏移。在三个真实数据集上的结果验证了该方法的有效性。摘要：Conventional unsupervised domain adaptation (UDA) methods need to access both labeled source samples and unlabeled target samples simultaneously to train the model. While in some scenarios, the source samples are not available for the target domain due to data privacy and safety. To overcome this challenge, recently, source-free domain adaptation (SFDA) has attracted the attention of researchers, where both a trained source model and unlabeled target samples are given. Existing SFDA methods either adopt a pseudo-label based strategy or generate more samples. However, these methods do not explicitly reduce the distribution shift across domains, which is the key to a good adaptation. Although there are no source samples available, fortunately, we find that some target samples are very similar to the source domain and can be used to approximate the source domain. This approximated domain is denoted as the pseudo-source domain. In this paper, inspired by this observation, we propose a novel method based on the pseudo-source domain. The proposed method firstly generates and augments the pseudo-source domain, and then employs distribution alignment with four novel losses based on pseudo-label based strategy. Among them, a domain adversarial loss is introduced between the pseudo-source domain the remaining target domain to reduce the distribution shift. The results on three real-world datasets verify the effectiveness of the proposed method.

【4】 Detecting Attacks on IoT Devices using Featureless 1D-CNN 标题：利用无特征的1D-CNN检测对物联网设备的攻击链接：https://arxiv.org/abs/2109.03989

作者：Arshiya Khan,Chase Cotton 机构：University of Delaware, Newark, Delaware, USA 备注：None 摘要：深度学习的推广在过去帮助我们解决了网络安全领域中的恶意软件识别和异常检测等挑战。然而，尽管如此，内存和处理能力的匮乏使得在物联网（IoT）设备中执行这些任务非常困难。本研究通过降低对特征工程和机器学习技术中后续处理的需求，找到了一种摆脱这一瓶颈的简单方法。在这项研究中，我们引入了一个无特征的机器学习过程来执行异常检测。它使用未处理的数据包字节流作为训练数据。无特征的机器学习能够对网络流量进行低成本、低内存的时间序列分析。它得益于消除了对主题专家的大量投资和特征工程所需的时间。摘要：The generalization of deep learning has helped us, in the past, address challenges such as malware identification and anomaly detection in the network security domain. However, as effective as it is, scarcity of memory and processing power makes it difficult to perform these tasks in Internet of Things (IoT) devices. This research finds an easy way out of this bottleneck by depreciating the need for feature engineering and subsequent processing in machine learning techniques. In this study, we introduce a Featureless machine learning process to perform anomaly detection. It uses unprocessed byte streams of packets as training data. Featureless machine learning enables a low cost and low memory time-series analysis of network traffic. It benefits from eliminating the significant investment in subject matter experts and the time required for feature engineering.

【5】 Where Did You Learn That From? Surprising Effectiveness of Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning 标题：你从哪里学到的？深度强化学习中针对时间相关数据的成员推理攻击的惊人效果链接：https://arxiv.org/abs/2109.03975

作者：Maziar Gomrokchi,Susan Amin,Hossein Aboutalebi,Alexander Wong,Doina Precup 机构：Department of Computer Science, Cheriton School of Computer Science, Department of Systems Design Engineering, University of Waterloo VIP Lab DarwinAI, McGill University Mila DeepMind 摘要：虽然在深度强化学习领域取得了重大的研究进展，但对于广泛采用深度强化学习的行业来说，一个主要的挑战是潜在的隐私漏洞。深度强化学习最近已经浮出水面，但几乎没有得到探讨。特别是，文献中没有专门研究深度强化学习算法对成员推理攻击的脆弱性的具体对抗性攻击策略。为了解决这一差距，我们提出了一个对抗性攻击框架，专门用于测试深度强化学习算法对成员推理攻击的脆弱性。更具体地说，我们设计了一系列实验来研究强化学习训练数据中自然存在的时间相关性对信息泄漏概率的影响。此外，我们还研究了针对深度强化学习算法的\emph{collective}和\emph{individual}成员身份攻击的性能差异。实验结果表明，在OpenAI Gym的两个不同控制任务中，所提出的对抗式攻击框架在推断深度强化训练中使用的数据时出人意料地有效，在个人模式下的准确率超过$84\%$，在集体模式下的准确率超过$97\%$，这在部署深度强化学习模型时引发了严重的隐私问题。此外，我们还证明了强化学习算法的学习状态显著影响隐私泄露的程度。摘要：While significant research advances have been made in the field of deep reinforcement learning, a major challenge to widespread industrial adoption of deep reinforcement learning that has recently surfaced but little explored is the potential vulnerability to privacy breaches. In particular, there have been no concrete adversarial attack strategies in literature tailored for studying the vulnerability of deep reinforcement learning algorithms to membership inference attacks. To address this gap, we propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks. More specifically, we design a series of experiments to investigate the impact of temporal correlation, which naturally exists in reinforcement learning training data, on the probability of information leakage. Furthermore, we study the differences in the performance of \emph{collective} and \emph{individual} membership attacks against deep reinforcement learning algorithms. Experimental results show that the proposed adversarial attack framework is surprisingly effective at inferring the data used during deep reinforcement training with an accuracy exceeding $84\%$ in individual and $97\%$ in collective mode on two different control tasks in OpenAI Gym, which raises serious privacy concerns in the deployment of models resulting from deep reinforcement learning. Moreover, we show that the learning state of a reinforcement learning algorithm significantly influences the level of the privacy breach.

【6】 Sensitive Samples Revisited: Detecting Neural Network Attacks Using Constraint Solvers 标题：敏感样本重现：使用约束求解器检测神经网络攻击链接：https://arxiv.org/abs/2109.03966

作者：Amel Nestor Docena,Thomas Wahl,Trevor Pearce,Yunsi Fei 机构：Khoury College of Computer Sciences, Northeastern University, Boston, USA 备注：None 摘要：如今，神经网络被用于许多安全和安全相关领域，因此，它是通过操纵网络参数破坏其分类能力的常见攻击目标。之前的工作已经引入了敏感样本——对参数变化高度敏感的输入——来检测这种操作，并提出了一种基于梯度上升的方法来计算它们。在本文中，我们提供了一种替代方法，使用符号约束求解器。我们用解算器的语言对网络和敏感样本的正式规范进行建模，并寻求解决方案。这种方法支持一类丰富的查询，例如，与某些类型的攻击相对应。与早期的技术不同，我们的方法不依赖于凸搜索域，也不依赖于搜索起点的适用性。我们通过划分求解器的搜索空间，并根据一个平衡的时间表探索分区，从而解决约束求解器的性能限制，该时间表仍然保持搜索的完整性。我们通过一个检测神经网络上特洛伊木马攻击的案例研究，展示了使用解算器在功能和搜索效率方面的影响。摘要：Neural Networks are used today in numerous security- and safety-relevant domains and are, as such, a popular target of attacks that subvert their classification capabilities, by manipulating the network parameters. Prior work has introduced sensitive samples -- inputs highly sensitive to parameter changes -- to detect such manipulations, and proposed a gradient ascent-based approach to compute them. In this paper we offer an alternative, using symbolic constraint solvers. We model the network and a formal specification of a sensitive sample in the language of the solver and ask for a solution. This approach supports a rich class of queries, corresponding, for instance, to the presence of certain types of attacks. Unlike earlier techniques, our approach does not depend on convex search domains, or on the suitability of a starting point for the search. We address the performance limitations of constraint solvers by partitioning the search space for the solver, and exploring the partitions according to a balanced schedule that still retains completeness of the search. We demonstrate the impact of the use of solvers in terms of functionality and search efficiency, using a case study for the detection of Trojan attacks on Neural Networks.

【7】 Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models 标题：检索、字幕、生成：文本生成模型中增强常识性的视觉基础链接：https://arxiv.org/abs/2109.03892

作者：Steven Y. Feng,Kevin Lu,Zhuofu Tao,Malihe Alikhani,Teruko Mitamura,Eduard Hovy,Varun Gangal 机构：Carnegie Mellon University,University of Waterloo, University of California Los Angeles,University of Pittsburgh 摘要：我们研究了使用图像中包含的多模态信息作为一种有效的方法来增强用于文本生成的Transformer模型的常识。我们使用BART和T5对概念到文本的生成进行了实验，特别是生成性常识推理（CommonGen）的任务。我们将我们的方法称为VisCTG：基于视觉的文本生成概念。VisCTG包括为代表适当日常场景的图像添加字幕，并使用这些字幕来丰富和指导生成过程。综合评估和分析表明，VisCTG显著提高了模型性能，同时成功地解决了基线生成的几个问题，包括常识性差、流利性和特异性。摘要：We investigate the use of multimodal information contained in images as an effective method for enhancing the commonsense of Transformer models for text generation. We perform experiments using BART and T5 on concept-to-text generation, specifically the task of generative commonsense reasoning, or CommonGen. We call our approach VisCTG: Visually Grounded Concept-to-Text Generation. VisCTG involves captioning images representing appropriate everyday scenarios, and using these captions to enrich and steer the generation process. Comprehensive evaluation and analysis demonstrate that VisCTG noticeably improves model performance while successfully addressing several issues of the baseline generations, including poor commonsense, fluency, and specificity.

【8】 Robust Optimal Classification Trees Against Adversarial Examples 标题：针对敌意实例的鲁棒最优分类树链接：https://arxiv.org/abs/2109.03857

作者：Daniël Vos,Sicco Verwer 机构：Delft University of Technology 摘要：决策树是可解释模型的一种流行选择，但就像神经网络一样，它们也会遭受敌对示例的影响。现有的决策树拟合算法是贪婪的启发式算法，缺乏近似保证。在本文中，我们提出了ROCT，一组训练决策树的方法，这些决策树对用户指定的攻击模型具有最佳鲁棒性。我们证明了在对抗式学习中出现的最小-最大优化问题可以通过使用单个最小化公式来解决，决策树的损失为0-1。我们在混合整数线性规划和最大可满足性中提出了这样的公式，广泛使用的解算器可以对其进行优化。我们还提出了一种方法，确定上限的对抗精度为任何模型使用二分匹配。我们的实验结果表明，现有的启发式算法达到接近最优的分数，而ROCT达到最先进的分数。摘要：Decision trees are a popular choice of explainable model, but just like neural networks, they suffer from adversarial examples. Existing algorithms for fitting decision trees robust against adversarial examples are greedy heuristics and lack approximation guarantees. In this paper we propose ROCT, a collection of methods to train decision trees that are optimally robust against user-specified attack models. We show that the min-max optimization problem that arises in adversarial learning can be solved using a single minimization formulation for decision trees with 0-1 loss. We propose such formulations in Mixed-Integer Linear Programming and Maximum Satisfiability, which widely available solvers can optimize. We also present a method that determines the upper bound on adversarial accuracy for any model using bipartite matching. Our experimental results demonstrate that the existing heuristics achieve close to optimal scores while ROCT achieves state-of-the-art scores.

【9】 Memory semantization through perturbed and adversarial dreaming 标题：通过扰动和对抗性梦实现记忆语义化链接：https://arxiv.org/abs/2109.04261

作者：Nicolas Deperrois,Mihai A. Petrovici,Walter Senn,Jakob Jordan 机构：Department of Physiology, University of Bern, Kirchhoff-Institute for Physics, Heidelberg University 备注：27 pages, 13 figures; ; Jakob Jordan and Walter Senn share senior authorship 摘要：记忆巩固的经典理论强调重放在从情景记忆中提取语义信息方面的重要性。然而，梦的创造性特征表明记忆的语义化可能不仅仅是重复先前的经历。我们认为快速眼动（REM）梦是有效记忆语义化的关键，它通过随机组合情景记忆来创造新的虚拟感官体验。受生成性对抗网络（GAN）的启发，我们通过实现具有分层组织的前馈和反馈通路的皮层结构来支持这一假设。在我们的模型中，学习是在三种不同的全球大脑状态下组织的，分别模拟清醒、非快速眼动（NREM）和快速眼动睡眠，优化不同但互补的目标函数。我们在自然图像的标准数据集上以无监督的方式训练模型，并评估学习表示的质量。我们的结果表明，REM睡眠期间的对抗性做梦对于提取记忆内容至关重要，而NREM睡眠期间的干扰性做梦提高了潜在表征对嘈杂感觉输入的鲁棒性。该模型为睡眠状态、记忆回放和梦提供了一个新的计算视角，并提出了GANs的皮质实现。摘要：Classical theories of memory consolidation emphasize the importance of replay in extracting semantic information from episodic memories. However, the characteristic creative nature of dreams suggests that memory semantization may go beyond merely replaying previous experiences. We propose that rapid-eye-movement (REM) dreaming is essential for efficient memory semantization by randomly combining episodic memories to create new, virtual sensory experiences. We support this hypothesis by implementing a cortical architecture with hierarchically organized feedforward and feedback pathways, inspired by generative adversarial networks (GANs). Learning in our model is organized across three different global brain states mimicking wakefulness, non-REM (NREM) and REM sleep, optimizing different, but complementary objective functions. We train the model in an unsupervised fashion on standard datasets of natural images and evaluate the quality of the learned representations. Our results suggest that adversarial dreaming during REM sleep is essential for extracting memory contents, while perturbed dreaming during NREM sleep improves robustness of the latent representation to noisy sensory inputs. The model provides a new computational perspective on sleep states, memory replay and dreams and suggests a cortical implementation of GANs.

半/弱/无/有监督|不确定性|主动学习(8篇)

【1】 Mining Points of Interest via Address Embeddings: An Unsupervised Approach 标题：通过地址嵌入挖掘兴趣点：一种无监督的方法链接：https://arxiv.org/abs/2109.04467

作者：Abhinav Ganesan,Anubhav Gupta,Jose Mathew 机构： University of Maryland, educational institutes and hostels 备注：18 pages, single column 摘要：数字地图在全球范围内普遍用于探索用户感兴趣的地方，通常称为兴趣点（PoI）。在在线食品配送平台中，POI可以代表客户可以从医院、住宅区、办公区、教育机构和招待所订购的任何主要私人场所。在这项工作中，我们提出了一种端到端的无监督系统设计，用于从地址位置和地址文本中获取PoI（PoI多边形）的多边形表示。我们使用地名对地址文本进行预处理，并使用基于深度学习的体系结构，即，为地址文本生成嵌入。RoBERTa，接受过内部地址数据集训练。通过联合聚类匿名客户电话GPS位置（在地址输入期间获得）和嵌入地址文本来识别PoI候选。使用新的后处理步骤从这些候选PoI获得PoI多边形的最终列表。该算法比在内部数据集上使用Mummidi-Krumm基线算法获得的POI多74.8%。该算法实现了98%的中位面积精度、8%的中位面积召回率和0.15的中位F分数。为了提高算法多边形的召回率，我们使用OpenStreetMap（OSM）数据库中的building footprint多边形对其进行后处理。后处理算法包括使用OSM数据库中的相交多边形和封闭私家路重塑算法多边形，并在OSM数据库中说明与公共道路的相交。我们在这些后处理多边形上实现了70%的中位面积召回率、69%的中位面积精度和0.69的中位F分数。摘要：Digital maps are commonly used across the globe for exploring places that users are interested in, commonly referred to as points of interest (PoI). In online food delivery platforms, PoIs could represent any major private compounds where customers could order from such as hospitals, residential complexes, office complexes, educational institutes and hostels. In this work, we propose an end-to-end unsupervised system design for obtaining polygon representations of PoIs (PoI polygons) from address locations and address texts. We preprocess the address texts using locality names and generate embeddings for the address texts using a deep learning-based architecture, viz. RoBERTa, trained on our internal address dataset. The PoI candidates are identified by jointly clustering the anonymised customer phone GPS locations (obtained during address onboarding) and the embeddings of the address texts. The final list of PoI polygons is obtained from these PoI candidates using novel post-processing steps. This algorithm identified 74.8 % more PoIs than those obtained using the Mummidi-Krumm baseline algorithm run on our internal dataset. The proposed algorithm achieves a median area precision of 98 %, a median area recall of 8 %, and a median F-score of 0.15. In order to improve the recall of the algorithmic polygons, we post-process them using building footprint polygons from the OpenStreetMap (OSM) database. The post-processing algorithm involves reshaping the algorithmic polygon using intersecting polygons and closed private roads from the OSM database, and accounting for intersection with public roads on the OSM database. We achieve a median area recall of 70 %, a median area precision of 69 %, and a median F-score of 0.69 on these post-processed polygons.

【2】 Detecting and Mitigating Test-time Failure Risks via Model-agnostic Uncertainty Learning 标题：通过模型不可知不确定性学习检测和降低测试时间失败风险链接：https://arxiv.org/abs/2109.04432

作者：Preethi Lahoti,Krishna P. Gummadi,Gerhard Weikum 机构：Max Planck Institute for Informatics, Max Planck Institute for Software Systems 备注：To appear in the 21st IEEE International Conference on Data Mining (ICDM 2021), Auckland, New Zealand 摘要：可靠地预测机器学习（ML）系统在部署生产数据时的潜在故障风险是可信AI的一个关键方面。本文介绍了Risk Advisor，一种新的后hoc元学习器，用于估计任何已训练过的黑盒分类模型的故障风险和预测不确定性。除了提供风险评分外，风险顾问还将不确定性估计分解为任意和认知不确定性成分，从而对导致故障的不确定性来源提供信息性见解。因此，风险顾问可以区分由数据可变性、数据转移和模型限制引起的故障，并就缓解措施提出建议（例如，收集更多数据以应对数据转移）。在各种黑箱分类模型家族以及覆盖常见ML故障场景的真实数据集和合成数据集上进行的大量实验表明，Risk Advisor能够可靠地预测所有场景中的部署时间故障风险，并优于强基线。摘要：Reliably predicting potential failure risks of machine learning (ML) systems when deployed with production data is a crucial aspect of trustworthy AI. This paper introduces Risk Advisor, a novel post-hoc meta-learner for estimating failure risks and predictive uncertainties of any already-trained black-box classification model. In addition to providing a risk score, the Risk Advisor decomposes the uncertainty estimates into aleatoric and epistemic uncertainty components, thus giving informative insights into the sources of uncertainty inducing the failures. Consequently, Risk Advisor can distinguish between failures caused by data variability, data shifts and model limitations and advise on mitigation actions (e.g., collecting more data to counter data shift). Extensive experiments on various families of black-box classification models and on real-world and synthetic datasets covering common ML failure scenarios show that the Risk Advisor reliably predicts deployment-time failure risks in all the scenarios, and outperforms strong baselines.

【3】 Towards Robust Cross-domain Image Understanding with Unsupervised Noise Removal 标题：基于无监督去噪的鲁棒跨域图像理解链接：https://arxiv.org/abs/2109.04284

作者：Lei Zhu,Zhaojing Luo,Wei Wang,Meihui Zhang,Gang Chen,Kaiping Zheng 机构：National University of Singapore, Beijing Institute of Technology, China, Zhejing University 备注：10 pages, 7 figures 摘要：深度学习模型通常需要大量的标记数据才能获得令人满意的性能。在多媒体分析中，领域适应研究从标签丰富的源领域到标签稀缺的目标领域的跨领域知识转移问题，从而潜在地缓解了深度学习模型的注释需求。然而，我们发现，当源域有噪声时，用于跨域图像理解的当代域自适应方法性能较差。弱监督域自适应（WSDA）研究源数据可能存在噪声的情况下的域自适应问题。先前的WSDA方法在不考虑嵌入空间中的细粒度语义结构的情况下去除噪声源数据并对齐域间的边缘分布，这存在类不对齐的问题，例如，目标域中猫的特征可能映射到源域中狗的特征附近。在本文中，我们提出了一种新的WSDA方法，称为噪声容忍域自适应。具体地说，我们采用聚类假设，在嵌入空间中利用类原型进行区分学习。我们建议利用嵌入空间中数据点的位置信息，并使用高斯混合模型对位置信息进行建模，以识别噪声源数据。然后，我们设计了一个包含高斯混合噪声模型的网络作为无监督噪声去除的子模块，并提出了一种新的簇级对抗性自适应方法，该方法将未标记的目标数据与噪声较小的类原型对齐，以映射跨域的语义结构。我们进行了广泛的实验，以评估我们的方法对来自新冠病毒-19和电子商务数据集的普通图像和医学图像的有效性。结果表明，我们的方法明显优于最先进的WSDA方法。摘要：Deep learning models usually require a large amount of labeled data to achieve satisfactory performance. In multimedia analysis, domain adaptation studies the problem of cross-domain knowledge transfer from a label rich source domain to a label scarce target domain, thus potentially alleviates the annotation requirement for deep learning models. However, we find that contemporary domain adaptation methods for cross-domain image understanding perform poorly when source domain is noisy. Weakly Supervised Domain Adaptation (WSDA) studies the domain adaptation problem under the scenario where source data can be noisy. Prior methods on WSDA remove noisy source data and align the marginal distribution across domains without considering the fine-grained semantic structure in the embedding space, which have the problem of class misalignment, e.g., features of cats in the target domain might be mapped near features of dogs in the source domain. In this paper, we propose a novel method, termed Noise Tolerant Domain Adaptation, for WSDA. Specifically, we adopt the cluster assumption and learn cluster discriminatively with class prototypes in the embedding space. We propose to leverage the location information of the data points in the embedding space and model the location information with a Gaussian mixture model to identify noisy source data. We then design a network which incorporates the Gaussian mixture noise model as a sub-module for unsupervised noise removal and propose a novel cluster-level adversarial adaptation method which aligns unlabeled target data with the less noisy class prototypes for mapping the semantic structure across domains. We conduct extensive experiments to evaluate the effectiveness of our method on both general images and medical images from COVID-19 and e-commerce datasets. The results show that our method significantly outperforms state-of-the-art WSDA methods.

【4】 Cartography Active Learning 标题：地图学主动学习链接：https://arxiv.org/abs/2109.04282

作者：Mike Zhang,Barbara Plank 机构：Department of Computer Science, IT University of Copenhagen 备注：Findings EMNLP 2021 摘要：我们提出了地图主动学习（CAL），这是一种新的主动学习（AL）算法，它利用模型在训练过程中对单个实例的行为作为代理来查找信息量最大的实例进行标记。CAL的灵感来源于数据地图，最近提出的数据地图旨在深入了解数据集质量（Swayamdipta等人，2020年）。我们将我们在流行的文本分类任务上的方法与常用的AL策略进行比较，后者依赖于训练后的行为。我们证明了CAL与其他常用的AL方法相比具有竞争力，这表明从小种子数据派生的训练动态可以成功地用于AL。我们通过利用数据图分析批次级统计数据，对我们的新AL方法提供了见解。我们的研究结果进一步表明，CAL可以产生更有效的数据学习策略，可以用更少的训练数据获得可比或更好的结果。摘要：We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. CAL is inspired by data maps, which were recently proposed to derive insights into dataset quality (Swayamdipta et al., 2020). We compare our method on popular text classification tasks to commonly used AL strategies, which instead rely on post-training behavior. We demonstrate that CAL is competitive to other common AL methods, showing that training dynamics derived from small seed data can be successfully used for AL. We provide insights into our new AL method by analyzing batch-level statistics utilizing the data maps. Our results further show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.

【5】 Self-supervised Reinforcement Learning with Independently Controllable Subgoals 标题：具有独立可控子目标的自监督强化学习链接：https://arxiv.org/abs/2109.04150

作者：Andrii Zadaianchuk,Georg Martius,Fanny Yang 机构： Max Planck Institute for Intelligent Systems, T¨ubingen, Germany, Department of Computer Science, ETH Zurich 摘要：为了成功地处理具有挑战性的操作任务，自治代理必须学习多种技能，以及如何将它们结合起来。最近，通过利用环境中发现的结构来设定自己抽象目标的自监督代理在许多不同的任务中表现良好。特别是，其中一些用于学习合成多对象环境中的基本操作技能。但是，这些方法学习技能时不考虑对象之间的依赖关系。因此，所学的技能很难在现实环境中结合起来。我们提出了一种新的自监督代理，它估计环境组件之间的关系，并使用它们独立地控制环境状态的不同部分。此外，对象之间的估计关系可用于将复杂目标分解为兼容的子目标序列。我们证明，通过使用该框架，agent可以在对象间关系不同的多对象环境中高效、自动地学习操作任务。摘要：To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in compositional multi-object environments. However, these methods learn skills without taking the dependencies between objects into account. Thus, the learned skills are difficult to combine in realistic environments. We propose a novel self-supervised agent that estimates relations between environment components and uses them to independently control different parts of the environment state. In addition, the estimated relations between objects can be used to decompose a complex goal into a compatible sequence of subgoals. We show that, by using this framework, an agent can efficiently and automatically learn manipulation tasks in multi-object environments with different relations between objects.

【6】 Unsupervised Pre-training with Structured Knowledge for Improving Natural Language Inference 标题：基于结构化知识的无监督预训练改进自然语言推理链接：https://arxiv.org/abs/2109.03941

作者：Xiaoyu Yang,Xiaodan Zhu,Zhan Shi,Tianda Li 机构：ECE Department, Queen’s University, Canada 摘要：尽管近年来对自然语言推理的研究从大量的注释数据集中获益匪浅，但注释数据中提供的推理相关知识（包括常识）数量仍然相当有限。有两种方法可以用来进一步解决这一局限性：（1）无监督的预训练可以在更大的非结构化文本数据中利用知识；（2）基于神经网络的NLI模型已经开始考虑结构化（通常是人类策划的）知识。一个紧迫的问题是这两种方法是否相互补充，或者如何开发能够将它们的优势结合在一起的模型。在本文中，我们提出了在预训练模型的不同组件中利用结构化知识的模型。我们的结果表明，所提出的模型比以前基于BERT的最新模型具有更好的性能。虽然我们的模型是针对NLI提出的，但它们可以很容易地扩展到其他句子或句子对分类问题。摘要：While recent research on natural language inference has considerably benefited from large annotated datasets, the amount of inference-related knowledge (including commonsense) provided in the annotated data is still rather limited. There have been two lines of approaches that can be used to further address the limitation: (1) unsupervised pretraining can leverage knowledge in much larger unstructured text data; (2) structured (often human-curated) knowledge has started to be considered in neural-network-based models for NLI. An immediate question is whether these two approaches complement each other, or how to develop models that can bring together their advantages. In this paper, we propose models that leverage structured knowledge in different components of pre-trained models. Our results show that the proposed models perform better than previous BERT-based state-of-the-art models. Although our models are proposed for NLI, they can be easily extended to other sentence or sentence-pair classification problems.

【7】 Recommend for a Reason: Unlocking the Power of Unsupervised Aspect-Sentiment Co-Extraction 标题：推荐理由：释放无监督方面-情感共抽取的力量链接：https://arxiv.org/abs/2109.03821

作者：Zeyu Li,Wei Cheng,Reema Kshetramade,John Houser,Haifeng Chen,Wei Wang 机构：Department of Computer Science, University of California, Los Angeles, NEC Labs America 备注：16 pages; Accepted to Findings of EMNLP-2021 摘要：评论中的赞美和关注对于理解用户的购物兴趣以及他们对某些商品的特定方面的意见是很有价值的。现有的基于评论的推荐者倾向于大型复杂的语言编码器，这些编码器只能学习潜在的和不可理解的文本表示。它们缺乏明确的用户注意和项目属性建模，但这可能会提供有价值的信息，超出推荐项目的能力。因此，我们提出了一种紧密耦合的两阶段方法，包括方面情感对抽取器（ASPE）和注意属性感知评级估计器（APRE）。无监督的ASPE挖掘方面情绪对（AS-pairs），APRE使用AS-pairs作为具体的方面级证据预测评级。在7个真实世界的Amazon Review数据集上进行的大量实验表明，ASPE可以有效地提取AS对，从而使APRE能够提供优于领先基线的精度。摘要：Compliments and concerns in reviews are valuable for understanding users' shopping interests and their opinions with respect to specific aspects of certain items. Existing review-based recommenders favor large and complex language encoders that can only learn latent and uninterpretable text representations. They lack explicit user attention and item property modeling, which however could provide valuable information beyond the ability to recommend items. Therefore, we propose a tightly coupled two-stage approach, including an Aspect-Sentiment Pair Extractor (ASPE) and an Attention-Property-aware Rating Estimator (APRE). Unsupervised ASPE mines Aspect-Sentiment pairs (AS-pairs) and APRE predicts ratings using AS-pairs as concrete aspect-level evidence. Extensive experiments on seven real-world Amazon Review Datasets demonstrate that ASPE can effectively extract AS-pairs which enable APRE to deliver superior accuracy over the leading baselines.

【8】 Supervised Linear Dimension-Reduction Methods: Review, Extensions, and Comparisons 标题：有监督的线性降维方法：回顾、扩展和比较链接：https://arxiv.org/abs/2109.04244

作者：Shaojie Xu,Joel Vaughan,Jie Chen,Agus Sudjianto,Vijayan Nair 机构：Wells Fargo & Company 摘要：主成分分析（PCA）是一种著名的线性降维方法，在数据分析和建模中得到了广泛的应用。它是一种无监督学习技术，可以为输入变量识别一个合适的线性子空间，该线性子空间包含最大的变化，并保留尽可能多的信息。PCA也被用于预测模型中，在进行回归分析之前，预测因子的原始高维空间被缩减为更小、更易于管理的集合。然而，这种方法在降维阶段的响应中不包含信息，因此可能具有较差的预测性能。为了解决这个问题，文献中提出了几种有监督的线性降维技术。本文回顾了所选择的技术，扩展了其中的一些技术，并通过仿真比较了它们的性能。其中两种技术，偏最小二乘（PLS）和最小二乘主成分分析（LSPCA），在本研究中始终优于其他技术。摘要：Principal component analysis (PCA) is a well-known linear dimension-reduction method that has been widely used in data analysis and modeling. It is an unsupervised learning technique that identifies a suitable linear subspace for the input variable that contains maximal variation and preserves as much information as possible. PCA has also been used in prediction models where the original, high-dimensional space of predictors is reduced to a smaller, more manageable, set before conducting regression analysis. However, this approach does not incorporate information in the response during the dimension-reduction stage and hence can have poor predictive performance. To address this concern, several supervised linear dimension-reduction techniques have been proposed in the literature. This paper reviews selected techniques, extends some of them, and compares their performance through simulations. Two of these techniques, partial least squares (PLS) and least-squares PCA (LSPCA), consistently outperform the others in this study.

迁移|Zero/Few/One-Shot|自适应(3篇)

【1】 Translate & Fill: Improving Zero-Shot Multilingual Semantic Parsing with Synthetic Data 标题：翻译与填充：利用合成数据改进零射多语言语义分析链接：https://arxiv.org/abs/2109.04319

作者：Massimo Nicosia,Zhongdi Qu,Yasemin Altun 机构：Google Research 备注：Accepted to EMNLP 2021 (Findings) 摘要：虽然在单一语言上进行微调的多语言预训练语言模型（LMs）显示出了强大的跨语言任务转移能力，但在目标语言监控可用的情况下，语义分析任务仍存在很大的性能差距。在本文中，我们提出了一种新的翻译和填充（TaF）方法来为多语言语义分析器生成白银训练数据。该方法简化了流行的Translate-Align项目（TAP）管道，并由一个序列到序列填充模型组成，该模型构建了一个以话语和相同语法视图为条件的完整语法分析。我们的填充者只接受英语数据的训练，但能够以零拍方式准确地完成其他语言的实例（即英语训练话语的翻译）。在三个多语种语义分析数据集上的实验结果表明，使用TaF进行的数据扩充达到了与依赖传统对齐技术的类似系统相比的精度。摘要：While multilingual pretrained language models (LMs) fine-tuned on a single language have shown substantial cross-lingual task transfer capabilities, there is still a wide performance gap in semantic parsing tasks when target language supervision is available. In this paper, we propose a novel Translate-and-Fill (TaF) method to produce silver training data for a multilingual semantic parser. This method simplifies the popular Translate-Align-Project (TAP) pipeline and consists of a sequence-to-sequence filler model that constructs a full parse conditioned on an utterance and a view of the same parse. Our filler is trained on English data only but can accurately complete instances in other languages (i.e., translations of the English training utterances), in a zero-shot fashion. Experimental results on three multilingual semantic parsing datasets show that data augmentation with TaF reaches accuracies competitive with similar systems which rely on traditional alignment techniques.

【2】 Powering Comparative Classification with Sentiment Analysis via Domain Adaptive Knowledge Transfer 标题：基于领域自适应知识转移的情感分析增强比较分类能力链接：https://arxiv.org/abs/2109.03819

作者：Zeyu Li,Yilong Qin,Zihan Liu,Wei Wang 机构：Department of Compute Science, University of California, Los Angeles 备注：13 pages; EMNLP-2021 Main Conference 摘要：我们研究比较偏好分类（CPC），其目的是预测给定句子中两个实体之间是否存在偏好比较，如果存在，哪个实体比另一个实体更优先。高质量的CPC模型可以大大有利于应用程序，如比较性问题回答和基于评论的建议。在现有的方法中，非深度学习方法的性能较差。基于最先进图形神经网络的ED-GAT（Ma等人，2020）只考虑句法信息，而忽略了关键语义关系和对比较实体的情感。我们提出了情感分析增强比较网络（SAECON），该网络使用情感分析器通过领域自适应知识转移将情感学习到单个实体，从而提高了CPC的准确性。在CompSent-19（Panchenko et al.，2019）数据集上的实验表明，与现有的最佳CPC方法相比，F1成绩有了显著提高。摘要：We study Comparative Preference Classification (CPC) which aims at predicting whether a preference comparison exists between two entities in a given sentence and, if so, which entity is preferred over the other. High-quality CPC models can significantly benefit applications such as comparative question answering and review-based recommendations. Among the existing approaches, non-deep learning methods suffer from inferior performances. The state-of-the-art graph neural network-based ED-GAT (Ma et al., 2020) only considers syntactic information while ignoring the critical semantic relations and the sentiments to the compared entities. We proposed sentiment Analysis Enhanced COmparative Network (SAECON) which improves CPC ac-curacy with a sentiment analyzer that learns sentiments to individual entities via domain adaptive knowledge transfer. Experiments on the CompSent-19 (Panchenko et al., 2019) dataset present a significant improvement on the F1 scores over the best existing CPC approaches.

【3】 Adaptive importance sampling for seismic fragility curve estimation 标题：地震易损性曲线估计的自适应重要抽样链接：https://arxiv.org/abs/2109.04323

作者：Clement Gauchy,Cyril Feau,Josselin Garnier 摘要：作为概率风险评估研究的一部分，有必要研究机械和土木工程结构在地震荷载作用下的脆弱性。这种风险可以用脆性曲线来衡量，脆性曲线表示结构在地震烈度测量条件下的失效概率。脆性曲线的估计依赖于耗时的数值模拟，因此需要仔细的实验设计，以便在有限的代码评估次数下获得结构脆性的最大信息。为了减少训练损失的方差，我们提出并实现了一种基于自适应重要性抽样的主动学习方法。从理论上和数值上分析了该方法在偏差、标准差和预测区间覆盖率方面的效率。摘要：As part of Probabilistic Risk Assessment studies, it is necessary to study the fragility of mechanical and civil engineered structures when subjected to seismic loads. This risk can be measured with fragility curves, which express the probability of failure of the structure conditionally to a seismic intensity measure. The estimation of fragility curves relies on time-consuming numerical simulations, so that careful experimental design is required in order to gain the maximum information on the structure's fragility with a limited number of code evaluations. We propose and implement an active learning methodology based on adaptive importance sampling in order to reduce the variance of the training loss. The efficiency of the proposed method in terms of bias, standard deviation and prediction interval coverage are theoretically and numerically characterized.

强化学习(4篇)

【1】 OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching 标题：OPIRL：基于分布匹配的样本有效非策略逆强化学习链接：https://arxiv.org/abs/2109.04307

作者：Hana Hoshino,Kei Ota,Asako Kanezaki,Rio Yokota 机构： emerging from the same motivation as 1 School of Computing, Department of Computer Science, Tokyo Instituteof Technology 备注：Under submission 摘要：反向强化学习（IRL）在奖励工程繁琐的场景中很有吸引力。然而，以前的IRL算法用于策略转换，这需要从当前策略中进行密集采样以获得稳定和最佳性能。这限制了现实世界中的IRL应用程序，在现实世界中，环境交互可能变得非常昂贵。为了解决这个问题，我们提出了非策略反向强化学习（OPIRL），它（1）采用非策略数据分布而不是策略上的数据分布，能够显著减少与环境的交互次数，（2）学习一个固定的奖励函数，该函数在不断变化的动态中具有较高的泛化能力，并且（3）利用模式覆盖行为加快收敛。通过实验，我们证明了我们的方法具有更高的样本效率，并且可以推广到新的环境中。我们的方法在策略性能基线上取得了更好或可比的结果，并且交互作用显著减少。此外，我们的经验表明，恢复的奖励函数推广到现有技术容易失败的不同任务。摘要：Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a stationary reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.

【2】 On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC) 标题：基于平均场控制的协作异构多智能体强化学习(MAIL)的逼近链接：https://arxiv.org/abs/2109.04024

作者：Washim Uddin Mondal,Mridul Agarwal,Vaneet Aggarwal,Satish V. Ukkusuri 机构：Lyles School of Civil Engineering, School of Industrial Engineering, Purdue University, West Lafayette, IN, USA, School of Electrical and Computer Engineering 备注：47 pages 摘要：平均场控制（MFC）是缓解多智能体协作强化学习（MARL）问题维数灾难的有效方法。这项工作考虑了$N{\mathrm{pop}}$异构代理的集合，这些代理可以划分为$K$类，这样第K$类就包含$N{K$异构代理。我们的目的是通过相应的MFC问题来证明这种异构系统的MARL问题的近似保证。我们考虑三种情形，其中所有代理的报酬和转移动态分别被视为所有类别的$（1）$联合状态和行动分布的函数，每个类的$（2）$个体分布，以及整个人口的$（3）$边际分布。在这些案例中，我们展示了，在这些案例中，我们可以用MFC来近似$K$-类马尔问题，而MFC的错误给出的错误给出的错误给出的错误给出的MFC的错误给出的MFC的错误给出的MFC的错误给出的MFC的MFC的$K$-类马尔问题可以用MFC的MFC来近似，该MFC给出的错误给出的错误给出的MFC的错误给出的MFC的错误给出的MFC的错误给出的错误给出的MFC的.我们可以是在这些例子，在这些例子中，在这些例子中，我们的学生的学生的学生的学生的学生的学生的学生的学生的学生的学生给出的错误给出的错误给出的错误给出的错误给出的MFC的错误给出的MFC的错误给出的MFC的错误给出的错误给出的MFC的学生的...................e的错误给出的错误给出的错误给出的给出的给出的错误给出的..}\sum{K}\frac{1}{\sqrt{N{K}}$和$e_3=\mathcal{O}\left（\sqrt{124; \ mathcal{X}| | \mathcal{U}\left）[\frac{A}{N{\mathrm{pop}}}}\sum{k\in[k]}\sqrt{N{U}+\frac{B}{\sqrt{N{\mathrm{pop}}}}\right]\right）$，其中，$A，B$是一些常数，$\mathcal{X}，|{124;，| mathcal{U}是状态空间和动作空间的大小。最后，我们设计了自然梯度（NPG）基于该算法，在上述三种情况下，可在$\mathcal{O}（e_j）$误差内收敛到最优MARL策略，样本复杂度分别为$\mathcal{O}（e_j^{-3}）$，$j\ in\{1,2,3\}$。摘要：Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of $N_{\mathrm{pop}}$ heterogeneous agents that can be segregated into $K$ classes such that the $k$-th class contains $N_k$ homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of $(1)$ joint state and action distributions across all classes, $(2)$ individual distributions of each class, and $(3)$ marginal distributions of the entire population. We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}||\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\sqrt{|\mathcal{X}||\mathcal{U}|}\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\sqrt{|\mathcal{X}||\mathcal{U}|}\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within $\mathcal{O}(e_j)$ error with a sample complexity of $\mathcal{O}(e_j^{-3})$, $j\in\{1,2,3\}$, respectively.

【3】 PowerGym: A Reinforcement Learning Environment for Volt-Var Control in Power Distribution Systems 标题：PowerGym：配电网电压无功控制的强化学习环境链接：https://arxiv.org/abs/2109.03970

作者：Ting-Han Fan,Xian Yeow Lee,Yubo Wang 机构： Princeton University, Iowa State University, Siemens Corporate Technology 摘要：我们将介绍PowerGym，一个用于配电系统电压无功控制的开源强化学习环境。遵循OpenAI Gym API，PowerGym的目标是在物理网络约束下最小化功率损耗和电压违规。PowerGym基于IEEE基准系统和针对各种控制困难的设计变体，提供了四个配电系统（13总线、34总线、123总线和8500节点）。为了促进推广，PowerGym为使用其分发系统的用户提供了详细的定制指南。作为演示，我们检查了PowerGym中最先进的强化学习算法，并通过研究控制器行为来验证环境。摘要：We introduce PowerGym, an open-source reinforcement learning environment for Volt-Var control in power distribution systems. Following OpenAI Gym APIs, PowerGym targets minimizing power loss and voltage violations under physical networked constraints. PowerGym provides four distribution systems (13Bus, 34Bus, 123Bus, and 8500Node) based on IEEE benchmark systems and design variants for various control difficulties. To foster generalization, PowerGym offers a detailed customization guide for users working with their distribution systems. As a demonstration, we examine state-of-the-art reinforcement learning algorithms in PowerGym and validate the environment by studying controller behaviors.

【4】 Deep Reinforcement Learning for Equal Risk Pricing and Hedging under Dynamic Expectile Risk Measures 标题：动态预期风险度量下等风险定价和套期保值的深度强化学习链接：https://arxiv.org/abs/2109.04001

作者：Saeed Marzban,Erick Delage,Jonathan Yumeng Li 摘要：最近，公平风险定价框架被扩展到考虑动态风险度量，然而，目前所有的实现都采用违反时间一致性的静态风险度量，或者基于传统的动态规划解决方案，在大量UND问题中是不可行的。在本文中，我们首次扩展了一个著名的非策略确定性参与者-批评家深度强化学习（ACRL）解决风险规避马尔可夫决策过程问题的算法，该决策过程使用时间一致的递归期望风险度量对风险进行建模。此新的ACRL算法允许我们识别高质量的时间一致对冲策略（和相等的风险价格）对于不能使用传统方法处理的期权，如篮子期权，或在只有基础资产历史轨迹可用的情况下。我们的数值实验，包括一个简单的普通期权和一个更奇特的篮子期权，证实了新的ACRL算法可以产生1）在简单环境中，几乎最优的套期保值政策和高度精确的价格，同时适用于一系列到期日2）在复杂环境中，使用合理数量的计算资源的优质政策和价格；3）总体而言，套期保值策略的实际表现优于在后期评估风险时使用静态风险度量产生的策略。摘要：Recently equal risk pricing, a framework for fair derivative pricing, was extended to consider dynamic risk measures. However, all current implementations either employ a static risk measure that violates time consistency, or are based on traditional dynamic programming solution schemes that are impracticable in problems with a large number of underlying assets (due to the curse of dimensionality) or with incomplete asset dynamics information. In this paper, we extend for the first time a famous off-policy deterministic actor-critic deep reinforcement learning (ACRL) algorithm to the problem of solving a risk averse Markov decision process that models risk using a time consistent recursive expectile risk measure. This new ACRL algorithm allows us to identify high quality time consistent hedging policies (and equal risk prices) for options, such as basket options, that cannot be handled using traditional methods, or in context where only historical trajectories of the underlying assets are available. Our numerical experiments, which involve both a simple vanilla option and a more exotic basket option, confirm that the new ACRL algorithm can produce 1) in simple environments, nearly optimal hedging policies, and highly accurate prices, simultaneously for a range of maturities 2) in complex environments, good quality policies and prices using reasonable amount of computing resources; and 3) overall, hedging strategies that actually outperform the strategies produced using static risk measures when the risk is evaluated at later points of time.

元学习(1篇)

【1】 MetaXT: Meta Cross-Task Transfer between Disparate Label Spaces 标题：MetaXT：不同标签空间之间的元跨任务传输链接：https://arxiv.org/abs/2109.04240

作者：Srinagesh Sharma,Guoqing Zheng,Ahmed Hassan Awadallah 机构： Microsoft Way, Redmond, WA, Microsoft Research, Ahmed H. Awadallah 摘要：尽管预先训练的语言模型具有普遍的表征能力，但将它们应用于特定的NLP任务仍然需要大量的标记数据。有效的任务微调在任务中只有少数标记的示例时会遇到挑战。在这篇文章中，我们的目标是通过开发和转移一个不同的任务来解决Few-Shot任务学习的问题，该任务允许一个相关但不同的标签空间。具体来说，我们设计了一个标签传输网络（LTN），将标签从源任务转换为目标任务进行训练。LTN和任务预测模型都是通过一个双层优化框架学习的，我们称之为MetaXT。MetaXT提供了一个原则性的解决方案，通过从源任务转移知识，使预先训练的语言模型最好地适应目标任务。从两种不同类型的标签空间差异对四个NLP任务的跨任务转移设置进行实证评估，证明了MetaXT的有效性，尤其是当目标任务中的标签数据有限时。摘要：Albeit the universal representational power of pre-trained language models, adapting them onto a specific NLP task still requires a considerably large amount of labeled data. Effective task fine-tuning meets challenges when only a few labeled examples are present for the task. In this paper, we aim to the address of the problem of few shot task learning by exploiting and transferring from a different task which admits a related but disparate label space. Specifically, we devise a label transfer network (LTN) to transform the labels from source task to the target task of interest for training. Both the LTN and the model for task prediction are learned via a bi-level optimization framework, which we term as MetaXT. MetaXT offers a principled solution to best adapt a pre-trained language model to the target task by transferring knowledge from the source task. Empirical evaluations on cross-task transfer settings for four NLP tasks, from two different types of label space disparities, demonstrate the effectiveness of MetaXT, especially when the labeled data in the target task is limited.

医学相关(2篇)

【1】 Fair Conformal Predictors for Applications in Medical Imaging 标题：公平共形预报器在医学成像中的应用链接：https://arxiv.org/abs/2109.04392

作者：Charles Lu,Andreanne Lemay,Ken Chang,Katharina Hoebel,Jayashree Kalpathy-Cramer 机构： Department of Radiology, Massachusetts General Hospital, Polytechnique Montreal, Massachusetts Institute of Technology 摘要：深度学习有可能增强临床工作流程的许多组成部分，如医学图像解释。然而，与传统的机器学习方法相比，这些黑箱算法在临床实践中的应用相对缺乏透明度，阻碍了临床医生对关键医疗决策系统的信任。具体而言，对于可能需要进一步人工审查的案例，普通的深度学习方法没有直观的表达不确定性的方法。此外，算法偏差的可能性导致了在临床环境中使用已开发算法的犹豫。为此，我们探索了适形方法如何通过提供表达模型不确定性的临床直观方法（通过置信预测集）以及促进临床工作流中的模型透明度来补充深度学习模型。在本文中，我们与临床医生进行了一次现场调查，以评估适形预测的临床应用案例。接下来，我们使用乳房X光密度和皮肤病摄影数据集进行实验，以证明适形预测在“规则进入”和“规则退出”疾病场景中的效用。此外，我们还表明，适形预测因子可用于根据患者人口统计数据（如种族和肤色）均衡覆盖率。我们发现，整合预测是一个很有前景的框架，有可能提高临床可用性和透明度，从而更好地促进深度学习算法和临床医生之间的合作。摘要：Deep learning has the potential to augment many components of the clinical workflow, such as medical image interpretation. However, the translation of these black box algorithms into clinical practice has been marred by the relative lack of transparency compared to conventional machine learning methods, hindering in clinician trust in the systems for critical medical decision-making. Specifically, common deep learning approaches do not have intuitive ways of expressing uncertainty with respect to cases that might require further human review. Furthermore, the possibility of algorithmic bias has caused hesitancy regarding the use of developed algorithms in clinical settings. To these ends, we explore how conformal methods can complement deep learning models by providing both clinically intuitive way (by means of confidence prediction sets) of expressing model uncertainty as well as facilitating model transparency in clinical workflows. In this paper, we conduct a field survey with clinicians to assess clinical use-cases of conformal predictions. Next, we conduct experiments with a mammographic breast density and dermatology photography datasets to demonstrate the utility of conformal predictions in "rule-in" and "rule-out" disease scenarios. Further, we show that conformal predictors can be used to equalize coverage with respect to patient demographics such as race and skin tone. We find that a conformal predictions to be a promising framework with potential to increase clinical usability and transparency for better collaboration between deep learning algorithms and clinicians.

【2】 Towards Fully Automated Segmentation of Rat Cardiac MRI by Leveraging Deep Learning Frameworks 标题：利用深度学习框架实现大鼠心脏MRI的全自动分割链接：https://arxiv.org/abs/2109.04188

作者：Daniel Fernandez-Llaneza,Andrea Gondova,Harris Vince,Arijit Patra,Magdalena Zurek,Peter Konings,Patrik Kagelid,Leif Hultin 机构：† Clinical Pharmacology and Safety Sciences, Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden, ‡ Data Sciences & Quantitative Biology, Discovery Sciences, Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden , SE , Mölndal, Sweden 备注：29 pages + 22 pages (supplementary information), 8 figures + 8 supplementary figures 摘要：近年来，人类心脏磁共振数据集的自动分割一直在稳步改进。然而，由于数据集有限和图像分辨率较低，这些方法不能直接应用于临床前。深部结构在大鼠心脏分割中的成功应用，尽管对心脏功能的临床前评估至关重要，但据我们所知尚未见报道。我们开发了扩展标准U-Net架构的分段模型，并评估了收缩和舒张阶段的单独模型2MSA，以及所有时间点的一个模型1MSA。此外，在改进相位选择之前，我们使用基于高斯过程（GP）的方法校准模型输出。结果模型在1MSA和2MSA设置（S{\o}rensen骰子得分分别为0.91+/-0.072和0.93+/-0.032）下，在左心室分段质量和射血分数（EF）估计方面接近人类表现。2MSA的估计和参考EF之间的平均绝对差异为3.5+/-2.5%，而1MSA的平均绝对差异为4.1+/-3.0%。将高斯过程应用于1MSA可以自动选择收缩期和舒张期。结合一种新的心脏相位选择策略，我们的工作为在大鼠心脏分析中实现全自动分割管道迈出了重要的第一步。摘要：Automated segmentation of human cardiac magnetic resonance datasets has been steadily improving during recent years. However, these methods are not directly applicable in preclinical context due to limited datasets and lower image resolution. Successful application of deep architectures for rat cardiac segmentation, although of critical importance for preclinical evaluation of cardiac function, has to our knowledge not yet been reported. We developed segmentation models that expand on the standard U-Net architecture and evaluated separate models for systole and diastole phases, 2MSA, and one model for all timepoints, 1MSA. Furthermore, we calibrated model outputs using a Gaussian Process (GP)-based prior to improve phase selection. Resulting models approach human performance in terms of left ventricular segmentation quality and ejection fraction (EF) estimation in both 1MSA and 2MSA settings (S{\o}rensen-Dice score 0.91 +/- 0.072 and 0.93 +/- 0.032, respectively). 2MSA achieved a mean absolute difference between estimated and reference EF of 3.5 +/- 2.5 %, while 1MSA resulted in 4.1 +/- 3.0 %. Applying Gaussian Processes to 1MSA allows to automate the selection of systole and diastole phases. Combined with a novel cardiac phase selection strategy, our work presents an important first step towards a fully automated segmentation pipeline in the context of rat cardiac analysis.

聚类(2篇)

【1】 On the use of Wasserstein metric in topological clustering of distributional data 标题：关于Wasserstein度量在分布数据拓扑聚类中的应用链接：https://arxiv.org/abs/2109.04301

作者：Guénaël Cabanes,Younès Bennani,Rosanna Verde,Antonio Irpino 摘要：提出了一种基于自组织映射（SOM）学习的直方图数据聚类算法。它结合了SOM降维和数据在缩减空间中的聚类。与这类数据相关，引入了分布之间合适的相异性度量：$L_2$Wasserstein距离。此外，聚类数不是预先固定的，而是根据原始空间中的局部数据密度估计自动找到。在合成和真实数据集上的应用证实了所提出的策略。摘要：This paper deals with a clustering algorithm for histogram data based on a Self-Organizing Map (SOM) learning. It combines a dimension reduction by SOM and the clustering of the data in a reduced space. Related to the kind of data, a suitable dissimilarity measure between distributions is introduced: the $L_2$ Wasserstein distance. Moreover, the number of clusters is not fixed in advance but it is automatically found according to a local data density estimation in the original space. Applications on synthetic and real data sets corroborate the proposed strategy.

【2】 An objective function for order preserving hierarchical clustering 标题：一种保序层次聚类的目标函数链接：https://arxiv.org/abs/2109.04266

作者：Daniel Bakkelund 备注：33 pages 摘要：我们提出了一个基于相似性的偏序数据分层聚类的目标函数，它保留了偏序，即如果$x\leq y$、如果$[x]$和$[y]$分别是$x$和$y$的聚类，则在$[x]\leq'| y]$的聚类上存在顺序关系$\leq'$。该模型与现有的有序数据聚类方法和模型的不同之处在于，将顺序关系和相似性结合起来，以获得寻求满足这两者的最佳层次聚类，并且顺序关系具有介于$[0,1]$范围内的成对可比性水平。特别是，如果相似性和顺序关系不一致，则顺序保持可能会产生有利于聚类的结果。寻找最优解是NP困难的，因此我们基于有向稀疏割的连续应用，提供了一种多项式时间近似算法，相对性能保证为$O（\log^{3/2}n）$。该模型是用于分裂层次聚类的Dasgupta代价函数的扩展。摘要：We present an objective function for similarity based hierarchical clustering of partially ordered data that preserves the partial order in the sense that if $x \leq y$, and if $[x]$ and $[y]$ are the respective clusters of $x$ and $y$, then there is an order relation $\leq'$ on the clusters for which $[x] \leq' |y]$. The model distinguishes itself from existing methods and models for clustering of ordered data in that the order relation and the similarity are combined to obtain an optimal hierarchical clustering seeking to satisfy both, and that the order relation is equipped with a pairwise level of comparability in the range $[0,1]$. In particular, if the similarity and the order relation are not aligned, then order preservation may have to yield in favor of clustering. Finding an optimal solution is NP-hard, so we provide a polynomial time approximation algorithm, with a relative performance guarantee of $O(\log^{3/2}n)$, based on successive applications of directed sparsest cut. The model is an extension of the Dasgupta cost function for divisive hierarchical clustering.

自动驾驶|车辆|车道检测等(2篇)

【1】 NEAT: Neural Attention Fields for End-to-End Autonomous Driving 标题：Neat：端到端自动驾驶的神经注意区域链接：https://arxiv.org/abs/2109.04456

作者：Kashyap Chitta,Aditya Prakash,Andreas Geiger 机构：Max Planck Institute for Intelligent Systems, T¨ubingen, University of T¨ubingen 备注：ICCV 2021 摘要：关于场景的语义、空间和时间结构的有效推理是自主驾驶的关键先决条件。我们提出了神经注意场（NEAT），这是一种新的表示方法，可以为端到端的模仿学习模型提供这种推理。NEAT是一个连续函数，它将鸟瞰视图（BEV）场景坐标中的位置映射到航路点和语义，使用中间注意贴图将高维2D图像特征迭代压缩为紧凑表示。这使得我们的模型能够有选择地关注输入中的相关区域，同时忽略与驾驶任务无关的信息，从而有效地将图像与BEV表示相关联。在一个新的评估设置涉及恶劣的环境条件和具有挑战性的方案，整洁优于几个强基线，并实现驾驶分数与特权卡拉专家用来生成其训练数据。此外，可视化具有整洁中间表示的模型的注意图提供了改进的可解释性。摘要：Efficient reasoning about the semantic, spatial, and temporal structure of a scene is a crucial prerequisite for autonomous driving. We present NEural ATtention fields (NEAT), a novel representation that enables such reasoning for end-to-end imitation learning models. NEAT is a continuous function which maps locations in Bird's Eye View (BEV) scene coordinates to waypoints and semantics, using intermediate attention maps to iteratively compress high-dimensional 2D image features into a compact representation. This allows our model to selectively attend to relevant regions in the input while ignoring information irrelevant to the driving task, effectively associating the images with the BEV representation. In a new evaluation setting involving adverse environmental conditions and challenging scenarios, NEAT outperforms several strong baselines and achieves driving scores on par with the privileged CARLA expert used to generate its training data. Furthermore, visualizing the attention maps for models with NEAT intermediate representations provides improved interpretability.

【2】 DROP: Deep relocating option policy for optimal ride-hailing vehicle repositioning 标题：Drop：优化叫车车辆重新定位的深度重新定位选项政策链接：https://arxiv.org/abs/2109.04149

作者：Xinwu Qian,Shuocheng Guo,Vaneet Aggarwal 摘要：在叫车系统中，空车的最佳重新定位可以显著减少车队怠速时间，平衡供需分布，提高系统效率，提高驾驶员满意度和保留率。无模型深层强化学习（DRL）已被证明可以通过与大规模打车系统的内在动力学进行主动交互来动态学习重新定位策略。然而，报酬信号稀疏和供需分布不平衡的问题给开发有效的DRL模型设置了关键障碍。传统的勘探策略（例如，$\epsilon$-greedy）可能在这种环境下几乎不起作用，因为在远离高收入地区的低需求地区会出现抖动。本研究提出了深度搬迁选择政策（DROP），该政策监督车辆代理商逃离供应过剩地区，并有效地搬迁到潜在的服务不足地区。我们建议学习时间扩展重定位图的拉普拉斯嵌入，作为系统重定位策略的近似表示。嵌入生成任务不可知信号，与任务相关信号相结合，构成生成DROP的伪奖励函数。我们提出了一个分层学习框架，该框架训练了一个高层次的迁移策略和一组低层次的DROP。我们的方法的有效性通过一个定制的高保真模拟器和真实世界的行车记录数据进行了验证。我们报告说，DROP显著改善了基准模型，每小时收入增加15.7%，并能有效解决低需求地区的抖动问题。摘要：In a ride-hailing system, an optimal relocation of vacant vehicles can significantly reduce fleet idling time and balance the supply-demand distribution, enhancing system efficiency and promoting driver satisfaction and retention. Model-free deep reinforcement learning (DRL) has been shown to dynamically learn the relocating policy by actively interacting with the intrinsic dynamics in large-scale ride-hailing systems. However, the issues of sparse reward signals and unbalanced demand and supply distribution place critical barriers in developing effective DRL models. Conventional exploration strategy (e.g., the $\epsilon$-greedy) may barely work under such an environment because of dithering in low-demand regions distant from high-revenue regions. This study proposes the deep relocating option policy (DROP) that supervises vehicle agents to escape from oversupply areas and effectively relocate to potentially underserved areas. We propose to learn the Laplacian embedding of a time-expanded relocation graph, as an approximation representation of the system relocation policy. The embedding generates task-agnostic signals, which in combination with task-dependent signals, constitute the pseudo-reward function for generating DROPs. We present a hierarchical learning framework that trains a high-level relocation policy and a set of low-level DROPs. The effectiveness of our approach is demonstrated using a custom-built high-fidelity simulator with real-world trip record data. We report that DROP significantly improves baseline models with 15.7% more hourly revenue and can effectively resolve the dithering issue in low-demand areas.

联邦学习|隐私保护|加密(3篇)

【1】 Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection 标题：Dubhe：联合学习客户端选择中基于同态加密的数据无偏性链接：https://arxiv.org/abs/2109.04253

作者：Shulai Zhang,Zirui Li,Quan Chen,Wenli Zheng,Jingwen Leng,Minyi Guo 机构：Shanghai Jiao Tong University, China 备注：10 pages 摘要：联邦学习（FL）是一种分布式机器学习范式，它允许客户在自己的本地数据上协作训练模型。FL保证了客户端的隐私，其安全性可以通过加密方法（如附加同态加密（HE））得到加强。然而，FL的效率可能会受到客户之间数据分布差异和全球分布偏斜的统计异质性的严重影响。我们从数学上证明了FL性能下降的原因，并检查了FL在各种数据集上的性能。为了解决统计异构性问题，我们提出了一种可插拔的系统级客户端选择方法Dubhe，该方法允许客户端主动参与训练，同时在HE的帮助下保护其隐私。实验结果表明，Dubhe在分类精度上与最优贪婪方法相当，加密和通信开销可以忽略不计。摘要：Federated learning (FL) is a distributed machine learning paradigm that allows clients to collaboratively train a model over their own local data. FL promises the privacy of clients and its security can be strengthened by cryptographic methods such as additively homomorphic encryption (HE). However, the efficiency of FL could seriously suffer from the statistical heterogeneity in both the data distribution discrepancy among clients and the global distribution skewness. We mathematically demonstrate the cause of performance degradation in FL and examine the performance of FL over various datasets. To tackle the statistical heterogeneity problem, we propose a pluggable system-level client selection method named Dubhe, which allows clients to proactively participate in training, meanwhile preserving their privacy with the assistance of HE. Experimental results show that Dubhe is comparable with the optimal greedy method on the classification accuracy, with negligible encryption and communication overhead.

【2】 An Experimental Study of Class Imbalance in Federated Learning 标题：联合学习中班级失衡的实验研究链接：https://arxiv.org/abs/2109.04094

作者：C. Xiao,S. Wang 机构：An Experimental Study of Class Imbalance inFederated LearningChenguang XiaoSchool of Computer ScienceUniversity of BirminghamEdgbaston, ukShuo WangSchool of Computer ScienceUniversity of BirminghamEdgbaston 摘要：联邦学习是一种分布式机器学习范式，它在保持本地数据隐私的同时，基于客户端的多个本地模型训练用于预测的全局模型。类不平衡被认为是降低全局模型性能的因素之一。然而，关于阶级不平衡是否以及如何影响全球绩效的研究很少。由于本地客户端的类不平衡情况不同，联邦学习中的类不平衡要比传统的非分布式机器学习复杂得多。在分布式学习环境中，需要重新定义班级不平衡。在本文中，首先，我们提出了两个新的度量来定义类不平衡——全局类不平衡度（MID）和客户端之间类不平衡的局部差异（WCS）。然后，根据我们的定义，我们进行了大量的实验，分析了在各种场景下，类不平衡对全局性能的影响。我们的结果表明，更高的MID和更大的WCS会降低全局模型的性能。此外，WCS通过错误引导优化来减慢全局模型的收敛速度。摘要：Federated learning is a distributed machine learning paradigm that trains a global model for prediction based on a number of local models at clients while local data privacy is preserved. Class imbalance is believed to be one of the factors that degrades the global model performance. However, there has been very little research on if and how class imbalance can affect the global performance. class imbalance in federated learning is much more complex than that in traditional non-distributed machine learning, due to different class imbalance situations at local clients. Class imbalance needs to be re-defined in distributed learning environments. In this paper, first, we propose two new metrics to define class imbalance -- the global class imbalance degree (MID) and the local difference of class imbalance among clients (WCS). Then, we conduct extensive experiments to analyze the impact of class imbalance on the global performance in various scenarios based on our definition. Our results show that a higher MID and a larger WCS degrade more the performance of the global model. Besides, WCS is shown to slow down the convergence of the global model by misdirecting the optimization.

【3】 Iterated Vector Fields and Conservatism, with Applications to Federated Learning 标题：迭代向量场与保守性及其在联合学习中的应用链接：https://arxiv.org/abs/2109.03973

作者：Zachary Charles,Keith Rush 摘要：我们研究了迭代向量场，并研究了它们是否是保守的，因为它们是一些标量值函数的梯度。我们分析了各种迭代向量场的保守性，包括与广义线性模型的损失函数相关的梯度向量场。我们将这项研究与优化联系起来，并得出联邦学习算法的新的收敛结果。特别地，我们证明了对于某些函数类（包括非凸函数），联邦平均等价于代理损失函数上的梯度下降。最后，我们讨论了几何、动力系统和优化领域的各种开放性问题。摘要：We study iterated vector fields and investigate whether they are conservative, in the sense that they are the gradient of some scalar-valued function. We analyze the conservatism of various iterated vector fields, including gradient vector fields associated to loss functions of generalized linear models. We relate this study to optimization and derive novel convergence results for federated learning algorithms. In particular, we show that for certain classes of functions (including non-convex functions), federated averaging is equivalent to gradient descent on a surrogate loss function. Finally, we discuss a variety of open questions spanning topics in geometry, dynamical systems, and optimization.

推理|分析|理解|解释(4篇)

【1】 SONIC: A Sparse Neural Network Inference Accelerator with Silicon Photonics for Energy-Efficient Deep Learning 标题：SONIC：一种用于节能深度学习的硅光子稀疏神经网络推理加速器链接：https://arxiv.org/abs/2109.04459

作者：Febin Sunny,Mahdi Nikdast,Sudeep Pasricha 机构：Colorado State University, Fort Collins, CO, USA 摘要：稀疏神经网络可以极大地促进神经网络在资源受限平台上的部署，因为它们提供紧凑的模型尺寸，同时保持推理精度。由于参数矩阵的稀疏性，稀疏神经网络原则上可以用于加速器结构中，以提高能量效率和延迟。然而，为了在实践中实现这些改进，有必要探索稀疏感知的软硬件协同设计。在本文中，我们提出了一种新的基于硅光子学的稀疏神经网络推理加速器SONIC。我们的实验分析表明，与最先进的稀疏电子神经网络加速器相比，SONIC的每瓦性能提高了5.8倍，每比特能量降低了8.4倍；与最著名的光子神经网络加速器相比，每瓦特性能提高13.8倍，每比特能量降低27.6倍。摘要：Sparse neural networks can greatly facilitate the deployment of neural networks on resource-constrained platforms as they offer compact model sizes while retaining inference accuracy. Because of the sparsity in parameter matrices, sparse neural networks can, in principle, be exploited in accelerator architectures for improved energy-efficiency and latency. However, to realize these improvements in practice, there is a need to explore sparsity-aware hardware-software co-design. In this paper, we propose a novel silicon photonics-based sparse neural network inference accelerator called SONIC. Our experimental analysis shows that SONIC can achieve up to 5.8x better performance-per-watt and 8.4x lower energy-per-bit than state-of-the-art sparse electronic neural network accelerators; and up to 13.8x better performance-per-watt and 27.6x lower energy-per-bit than the best known photonic neural network accelerators.

【2】 Mapping Research Topics in Software Testing: A Bibliometric Analysis 标题：软件测试中映射研究主题的文献计量学分析链接：https://arxiv.org/abs/2109.04086

作者：Alireza Salahirad,Gregory Gay,Ehsan Mohammadi 机构：Department of Computer Science & Engineering, University of South Carolina, USA, Department of Computer Science and Engineering, Chalmers and the University of Gothenburg, Sweden, School of Information Science, University of South Carolina, USA 备注：Under submission to Journal of Systems and Software 摘要：在这项研究中，我们应用共词分析（一种基于术语共现的文本挖掘技术）来映射软件测试研究主题的拓扑结构，目的是为当前和未来的研究人员提供一个关于软件测试领域发展的地图和观察。我们的分析能够将软件测试研究映射到相关主题的集群中，从中产生了总共16个高级研究主题和另外18个子主题。这张地图还显示了越来越重要的主题，包括与web和移动应用程序以及人工智能相关的主题。对作者和基于国家的合作模式的探索提供了对影响合作的隐性和显性因素的类似洞察，并为未来的工作提出了新的合作来源。我们提供我们的观察结果——以及研究主题和研究合作的基本映射——以便研究人员能够更深入地了解软件测试领域的拓扑结构、探索新领域和连接的灵感，以及拓宽其视角的合作者。摘要：In this study, we apply co-word analysis - a text mining technique based on the co-occurrence of terms - to map the topology of software testing research topics, with the goal of providing current and prospective researchers with a map, and observations about the evolution, of the software testing field. Our analysis enables the mapping of software testing research into clusters of connected topics, from which emerge a total of 16 high-level research themes and a further 18 subthemes. This map also suggests topics that are growing in importance, including topics related to web and mobile applications and artificial intelligence. Exploration of author and country-based collaboration patterns offers similar insight into the implicit and explicit factors that influence collaboration and suggests emerging sources of collaboration for future work. We make our observations - and the underlying mapping of research topics and research collaborations - available so that researchers can gain a deeper understanding of the topology of the software testing field, inspiration regarding new areas and connections to explore, and collaborators who will broaden their perspectives.

【3】 Model Explanations via the Axiomatic Causal Lens 标题：通过公理因果透镜的模型解释链接：https://arxiv.org/abs/2109.03890

作者：Vignesh Viswanathan,Yair Zick 机构：University of Massachusetts, Amherst 摘要：解释黑箱模型的决策一直是可信ML研究的中心主题。文献中提出了许多措施；然而，他们中没有一个能够对解释性采取可证明的因果关系。基于Halpern和Pearl对因果解释的正式定义，我们推导了分类设置的一组类似公理，并使用它们推导了三个解释度量。我们的第一个衡量标准是乔克勒和哈尔彭因果责任概念的自然适应，而其他两个标准则对应于现有的博弈论影响衡量标准。我们提出了一个公理化处理我们提出的指数，表明他们可以唯一的特点是一套理想的性质。我们用计算分析来补充这一点，为我们提出的所有度量提供了概率近似方案。因此，我们的工作是第一次正式弥合模型解释、博弈论影响和因果分析之间的差距。摘要：Explaining the decisions of black-box models has been a central theme in the study of trustworthy ML. Numerous measures have been proposed in the literature; however, none of them have been able to adopt a provably causal take on explainability. Building upon Halpern and Pearl's formal definition of a causal explanation, we derive an analogous set of axioms for the classification setting, and use them to derive three explanation measures. Our first measure is a natural adaptation of Chockler and Halpern's notion of causal responsibility, whereas the other two correspond to existing game-theoretic influence measures. We present an axiomatic treatment for our proposed indices, showing that they can be uniquely characterized by a set of desirable properties. We compliment this with computational analysis, providing probabilistic approximation schemes for all of our proposed measures. Thus, our work is the first to formally bridge the gap between model explanations, game-theoretic influence, and causal analysis.

【4】 Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo 标题：均方分析及其在朗之万蒙特卡罗最优维相关性中的应用链接：https://arxiv.org/abs/2109.03839

作者：Ruilin Li,Hongyuan Zha,Molei Tao 机构：Georgia Institute of Technology, The Chinese University of Hong Kong, Shenzhen 备注：Submitted to NeurIPS 2021 on May 28, 2021 (the submission deadline) 摘要：基于随机微分方程（SDE）离散化的采样算法构成了MCMC方法的一个丰富而流行的子集。这项工作为2-Wasserstein距离中采样误差的非渐近分析提供了一个通用框架，这也导致了混合时间的界。该方法适用于压缩SDE的任何一致离散化。当应用于Langevin Monte Carlo算法时，它建立了$\tilde{\mathcal{O}}\left（\frac{\sqrt{d}{\epsilon}\right）$混合时间，无温启动，在普通对数平滑和对数强凸条件下，加上无穷远处目标测度势的三阶导数的增长条件。该界限改进了之前已知的$\tilde{\mathcal{O}}\left（\frac{d}{\epsilon}\right）$结果，对于满足上述假设的目标度量，在维度$d$和精度公差$\epsilon$方面都是最优的（按顺序）。数值实验进一步验证了理论分析的正确性。摘要：Sampling algorithms based on discretizations of Stochastic Differential Equations (SDEs) compose a rich and popular subset of MCMC methods. This work provides a general framework for the non-asymptotic analysis of sampling error in 2-Wasserstein distance, which also leads to a bound of mixing time. The method applies to any consistent discretization of contractive SDEs. When applied to Langevin Monte Carlo algorithm, it establishes $\tilde{\mathcal{O}}\left( \frac{\sqrt{d}}{\epsilon} \right)$ mixing time, without warm start, under the common log-smooth and log-strongly-convex conditions, plus a growth condition on the 3rd-order derivative of the potential of target measures at infinity. This bound improves the best previously known $\tilde{\mathcal{O}}\left( \frac{d}{\epsilon} \right)$ result and is optimal (in terms of order) in both dimension $d$ and accuracy tolerance $\epsilon$ for target measures satisfying the aforementioned assumptions. Our theoretical analysis is further validated by numerical experiments.

检测相关(2篇)

【1】 DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly detection in air transportation 标题：DAE：航空运输多变量时间序列异常检测的判别自动编码器链接：https://arxiv.org/abs/2109.04247

作者：Antoine Chevrot,Alexandre Vernotte,Bruno Legeard 机构：Received: date Accepted: date 摘要：自动相关监视广播协议是空中监视的最新强制性进展之一。虽然它支持跟踪空中不断增长的飞机数量，但它也带来了必须缓解的网络安全问题，例如，攻击者发出虚假监视信息的虚假数据注入攻击。可用于获取飞行跟踪记录的最新数据源和工具使研究人员能够创建数据集，并开发能够检测航路轨迹中此类异常的机器学习模型。在此背景下，我们提出了一种新的多元异常检测模型，称为判别式自动编码器（DAE）。它使用基于LSTM的常规自动编码器的基线，但带有多个解码器，每个解码器在训练期间获取特定飞行阶段（如爬升、巡航或下降）的数据。为了说明DAE的效率，使用真实的异常以及真实制作的异常创建了评估数据集，在此基础上，对DAE以及文献中的三种异常检测模型进行了评估。结果表明，DAE在检测精度和速度上都取得了较好的效果。数据集、模型实现和评估结果可在在线存储库中获得，从而实现可复制性并促进未来的实验。摘要：The Automatic Dependent Surveillance Broadcast protocol is one of the latest compulsory advances in air surveillance. While it supports the tracking of the ever-growing number of aircraft in the air, it also introduces cybersecurity issues that must be mitigated e.g., false data injection attacks where an attacker emits fake surveillance information. The recent data sources and tools available to obtain flight tracking records allow the researchers to create datasets and develop Machine Learning models capable of detecting such anomalies in En-Route trajectories. In this context, we propose a novel multivariate anomaly detection model called Discriminatory Auto-Encoder (DAE). It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase (e.g. climbing, cruising or descending) during its training.To illustrate the DAE's efficiency, an evaluation dataset was created using real-life anomalies as well as realistically crafted ones, with which the DAE as well as three anomaly detection models from the literature were evaluated. Results show that the DAE achieves better results in both accuracy and speed of detection. The dataset, the models implementations and the evaluation results are available in an online repository, thereby enabling replicability and facilitating future experiments.

【2】 Detection of Epileptic Seizures on EEG Signals Using ANFIS Classifier, Autoencoders and Fuzzy Entropies 标题：基于ANFIS分类器、自动编码器和模糊熵的脑电信号癫痫发作检测链接：https://arxiv.org/abs/2109.04364

作者：Afshin Shoeibi,Navid Ghassemi,Marjane Khodatars,Parisa Moridian,Roohallah Alizadehsani,Assef Zare,Abbas Khosravi,Abdulhamit Subasi,U. Rajendra Acharya,J. Manuel Gorriz 机构： Subasi is with the Institute of Biomedicine, University of Turku 摘要：癫痫是最重要的神经系统疾病之一，其早期诊断将有助于临床医生为患者提供准确的治疗。脑电图（EEG）信号广泛用于癫痫发作检测，为专家提供有关大脑功能的大量信息。本文介绍了一种基于模糊理论和深度学习技术的新型诊断方法。在波恩大学数据集和弗赖堡数据集上对所提出的方法进行了评估。采用可调谐Q小波变换（TQWT）将脑电信号分解为不同的子带。在特征提取步骤中，从TQWT的不同子带计算13种不同的模糊熵，并计算其计算复杂度，以帮助研究人员选择最佳特征集。在下文中，六层自动编码器（AE）用于降维。最后，使用标准自适应神经模糊推理系统（ANFIS）及其与蚱蜢优化算法（ANFIS-GOA）、粒子群优化算法（ANFIS-PSO）和繁殖群优化算法（ANFIS-BS）的变体进行分类。使用我们提出的方法，ANFIS-BS方法在波恩数据集和弗赖堡数据集上分别获得了99.74%和99.46%的分类准确率和99.46%的分类准确率，在这两个数据集上都达到了最先进的性能。摘要：Epilepsy is one of the most crucial neurological disorders, and its early diagnosis will help the clinicians to provide accurate treatment for the patients. The electroencephalogram (EEG) signals are widely used for epileptic seizures detection, which provides specialists with substantial information about the functioning of the brain. In this paper, a novel diagnostic procedure using fuzzy theory and deep learning techniques are introduced. The proposed method is evaluated on the Bonn University dataset with six classification combinations and also on the Freiburg dataset. The tunable-Q wavelet transform (TQWT) is employed to decompose the EEG signals into different sub-bands. In the feature extraction step, 13 different fuzzy entropies are calculated from different sub-bands of TQWT, and their computational complexities are calculated to help researchers choose the best feature sets. In the following, an autoencoder (AE) with six layers is employed for dimensionality reduction. Finally, the standard adaptive neuro-fuzzy inference system (ANFIS), and also its variants with grasshopper optimization algorithm (ANFIS-GOA), particle swarm optimization (ANFIS-PSO), and breeding swarm optimization (ANFIS-BS) methods are used for classification. Using our proposed method, ANFIS-BS method has obtained an accuracy of 99.74% in classifying into two classes and an accuracy of 99.46% in ternary classification on the Bonn dataset and 99.28% on the Freiburg dataset, reaching state-of-the-art performances on both of them.

分类|识别(4篇)

【1】 Accounting for Variations in Speech Emotion Recognition with Nonparametric Hierarchical Neural Network 标题：用非参数递阶神经网络处理语音情感识别中的变化链接：https://arxiv.org/abs/2109.04316

作者：Lance Ying,Amrit Romana,Emily Mower Provost 机构：University of Michigan, Ann Arbor, Michigan, USA 备注：9 pages, manuscript under peer review 摘要：近年来，基于深度学习的语音情感识别模型的性能优于经典的机器学习模型。以前，神经网络设计，如多任务学习，已经解释了由于人口统计和环境因素引起的情感表达的变化。然而，现有模型面临一些限制：1）它们依赖于域的明确定义（例如性别、噪声条件等）和域标签的可用性；2）他们经常尝试学习领域不变的特征，而情感表达可以是领域特定的。在本研究中，我们提出了非参数层次神经网络（NHNN），一种基于贝叶斯非参数聚类的轻量级层次神经网络模型。与多任务学习方法相比，该模型不需要域/任务标签。在我们的实验中，NHNN模型在语料库内和跨语料库测试中通常优于具有相似复杂度的模型和最先进的模型。通过聚类分析，我们发现NHNN模型能够学习特定于组的特征，并弥合组间的性能差距。摘要：In recent years, deep-learning-based speech emotion recognition models have outperformed classical machine learning models. Previously, neural network designs, such as Multitask Learning, have accounted for variations in emotional expressions due to demographic and contextual factors. However, existing models face a few constraints: 1) they rely on a clear definition of domains (e.g. gender, noise condition, etc.) and the availability of domain labels; 2) they often attempt to learn domain-invariant features while emotion expressions can be domain-specific. In the present study, we propose the Nonparametric Hierarchical Neural Network (NHNN), a lightweight hierarchical neural network model based on Bayesian nonparametric clustering. In comparison to Multitask Learning approaches, the proposed model does not require domain/task labels. In our experiments, the NHNN models generally outperform the models with similar levels of complexity and state-of-the-art models in within-corpus and cross-corpus tests. Through clustering analysis, we show that the NHNN models are able to learn group-specific features and bridge the performance gap between groups.

【2】 DeepEMO: Deep Learning for Speech Emotion Recognition 标题：DeepEMO：语音情感识别的深度学习链接：https://arxiv.org/abs/2109.04081

作者：Enkhtogtokh Togootogtokh,Christian Klasen 机构：Technidoo Solutions Lab, Technidoo Solutions Germany and Mongolian University of Science and Technology, Bavaria, Germany 摘要：我们提出了一种用于语音情感识别任务的行业级深度学习方法。在工业领域，由于训练数据可用性、机器训练成本以及专用人工智能任务的专业化学习，精心提出的深度迁移学习技术显示出了真正的效果。提出的语音识别框架称为DeepEMO，包括两条主要管道，即提取有效主要特征的预处理和训练和识别的深度迁移学习模型。主要源代码位于https://github.com/enkhtogtokh/deepemo 存储库摘要：We proposed the industry level deep learning approach for speech emotion recognition task. In industry, carefully proposed deep transfer learning technology shows real results due to mostly low amount of training data availability, machine training cost, and specialized learning on dedicated AI tasks. The proposed speech recognition framework, called DeepEMO, consists of two main pipelines such that preprocessing to extract efficient main features and deep transfer learning model to train and recognize. Main source code is in https://github.com/enkhtogtokh/deepemo repository

【3】 MutualGraphNet: A novel model for motor imagery classification 标题：MutualGraphNet：一种新的运动图像分类模型链接：https://arxiv.org/abs/2109.04361

作者：Yan Li,Ning Zhong,David Taniar,Haolan Zhang 机构： Maebashi Institute of Technology, Japan, Monash University, Australia, Ningbo Research Institute, Zhejiang University, China 摘要：运动表象分类对于运动障碍患者具有重要意义，如何从运动表象脑电图（EEG）通道中提取和利用有效的特征一直是人们关注的焦点。运动表象分类的方法很多，但对人脑的理解有限，需要更有效的方法来提取脑电数据的特征。图形神经网络（GNNs）已证明其在图形结构分类中的有效性；GNN的使用为脑结构连接特征提取提供了新的可能性。本文提出了一种新的基于原始脑电通道互信息的图神经网络，称为MutualGraphNet。利用互信息作为邻接矩阵，结合时空图卷积网络（ST-GCN）可以更有效地提取运动想象脑电图（EEG）通道数据的转换规则。在运动想象EEG数据集上进行了实验，我们将我们的模型与当前最先进的方法进行了比较，结果表明，MutualGraphNet具有足够的鲁棒性来学习可解释的特征，并且优于当前最先进的方法。摘要：Motor imagery classification is of great significance to humans with mobility impairments, and how to extract and utilize the effective features from motor imagery electroencephalogram(EEG) channels has always been the focus of attention. There are many different methods for the motor imagery classification, but the limited understanding on human brain requires more effective methods for extracting the features of EEG data. Graph neural networks(GNNs) have demonstrated its effectiveness in classifying graph structures; and the use of GNN provides new possibilities for brain structure connection feature extraction. In this paper we propose a novel graph neural network based on the mutual information of the raw EEG channels called MutualGraphNet. We use the mutual information as the adjacency matrix combined with the spatial temporal graph convolution network(ST-GCN) could extract the transition rules of the motor imagery electroencephalogram(EEG) channels data more effectively. Experiments are conducted on motor imagery EEG data set and we compare our model with the current state-of-the-art approaches and the results suggest that MutualGraphNet is robust enough to learn the interpretable features and outperforms the current state-of-the-art methods.

【4】 Simplified Quantum Algorithm for the Oracle Identification Problem 标题：甲骨文识别问题的简化量子算法链接：https://arxiv.org/abs/2109.03902

作者：Leila Taghavi 机构：QuOne Lab, Phanous Research and Innovation Centre, Tehran, Iran 备注：7 pages, 3 images 摘要：在oracle标识问题中，我们允许oracle访问长度为$n$的未知字符串$x$的位，并保证它属于一个已知集合$C\subseteq\{0,1\}^n$。目标是使用尽可能少的对oracle的查询来识别$x$。我们为这个问题开发了一个量子查询算法，查询复杂度为$O\left（\sqrt{\frac{n\log M}{\log（n/\log M）+1}}}}\right）$，其中$M$是$C$的大小。Kothari在2014年已经推导出了这个界限，为此我们提供了一个更优雅更简单的证明。摘要：In the oracle identification problem we have oracle access to bits of an unknown string $x$ of length $n$, with the promise that it belongs to a known set $C\subseteq\{0,1\}^n$. The goal is to identify $x$ using as few queries to the oracle as possible. We develop a quantum query algorithm for this problem with query complexity $O\left(\sqrt{\frac{n\log M }{\log(n/\log M)+1}}\right)$, where $M$ is the size of $C$. This bound is already derived by Kothari in 2014, for which we provide a more elegant simpler proof.

表征(1篇)

【1】 SORNet: Spatial Object-Centric Representations for Sequential Manipulation 标题：SORNet：面向顺序操作的空间对象中心表示法链接：https://arxiv.org/abs/2109.03891

作者：Wentao Yuan,Chris Paxton,Karthik Desingh,Dieter Fox 机构：University of Washington, NVIDIA 摘要：顺序操作任务要求机器人感知环境状态并规划一系列动作，从而达到所需的目标状态，从原始传感器输入推断对象实体之间的空间关系的能力至关重要。以前的工作依赖于显式状态估计或端到端学习与新对象进行斗争。在这项工作中，我们提出了SORNet（空间以对象为中心的表示网络），它从RGB图像中提取以对象为中心的表示，并以感兴趣对象的规范视图为条件。我们发现，SORNet学习的对象嵌入在三个空间推理任务上（空间关系分类、技能前提分类和相对方向回归）将Zero-Shot推广到看不见的对象实体，显著优于基线。此外，我们提供了真实世界的机器人实验，演示了学习对象嵌入在顺序操作任务规划中的使用。摘要：Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.

优化|敛散性(4篇)

【1】 Optimal Reservoir Operations using Long Short-Term Memory Network 标题：基于长短期记忆网络的水库优化调度链接：https://arxiv.org/abs/2109.04255

作者：Asha Devi Singh,Anurag Singh 机构：G.B Pant Institute of Technology, New Delhi, India, Netaji Subhas Institute of Technology, University of Delhi,India 摘要：可靠地预测流入水库的水量是水库优化运行的关键因素。基于流入预测的水库实时运行可带来可观的经济收益。然而，流入预测是一项复杂的任务，因为它必须考虑气候和水文变化的影响。因此，本研究的主要目标是开发一种基于长-短期记忆（LSTM）的流入预测新方法。实时流入量预测，换句话说，水库每日流入量有助于水资源的有效运行。此外，可以有效地监测释放的每日变化，并提高操作的可靠性。本文提出了一种基于LSTM的朴素异常检测算法。换句话说，对于任何基于深度学习的预测模型来说，都是预测洪水和干旱的强大基线。使用印度巴克拉大坝过去20年的每日观测数据证明了该方法的实用性。本文进行的模拟结果清楚地表明，LSTM方法优于传统的预测方法。尽管实验是在印度Bhakra大坝水库的数据上进行的，但LSTM模型和异常检测算法是通用的，可以应用于任何变化最小的盆地。本文介绍的LSTM方法的一个明显的实用优势是，它可以充分模拟历史数据中的非平稳性和非线性。摘要：A reliable forecast of inflows to the reservoir is a key factor in the optimal operation of reservoirs. Real-time operation of the reservoir based on forecasts of inflows can lead to substantial economic gains. However, the forecast of inflow is an intricate task as it has to incorporate the impacts of climate and hydrological changes. Therefore, the major objective of the present work is to develop a novel approach based on long short-term memory (LSTM) for the forecast of inflows. Real-time inflow forecast, in other words, daily inflow at the reservoir helps in efficient operation of water resources. Also, daily variations in the release can be monitored efficiently and the reliability of operation is improved. This work proposes a naive anomaly detection algorithm baseline based on LSTM. In other words, a strong baseline to forecast flood and drought for any deep learning-based prediction model. The practicality of the approach has been demonstrated using the observed daily data of the past 20 years from Bhakra Dam in India. The results of the simulations conducted herein clearly indicate the supremacy of the LSTM approach over the traditional methods of forecasting. Although, experiments are run on data from Bhakra Dam Reservoir in India, LSTM model, and anomaly detection algorithm are general purpose and can be applied to any basin with minimal changes. A distinct practical advantage of the LSTM method presented herein is that it can adequately simulate non-stationarity and non-linearity in the historical data.

【2】 System Optimization in Synchronous Federated Training: A Survey 标题：同步联合训练中的系统优化研究综述链接：https://arxiv.org/abs/2109.03999

作者：Zhifeng Jiang,Wei Wang 机构：The Hong Kong University of Science and Technology 备注：11 pages, 3 figures 摘要：以保护隐私的方式对协作机器学习的前所未有的需求产生了一种新的机器学习范式，称为联合学习（FL）。如果有足够的隐私保障，FL系统的实用性主要取决于其在训练过程中的准确时间性能。尽管FL与传统的分布式训练有一些相似之处，但它有四个明显的挑战，这四个挑战使优化变得更加复杂，即信息不足、对比因素耦合、客户端异构性和巨大的配置空间。出于激励相关研究的需要，在本文中，我们调查了外语文献中高度相关的尝试，并按照标准工作流程中的相关训练阶段进行组织：选择、配置和报告。我们还回顾了探索性工作，包括测量研究和基准测试工具，以友好地支持FL开发人员。虽然已经有一些关于外语的调查文章，但我们的工作在关注点、分类和含义方面和他们有所不同。摘要：The unprecedented demand for collaborative machine learning in a privacy-preserving manner gives rise to a novel machine learning paradigm called federated learning (FL). Given a sufficient level of privacy guarantees, the practicality of an FL system mainly depends on its time-to-accuracy performance during the training process. Despite bearing some resemblance with traditional distributed training, FL has four distinct challenges that complicate the optimization towards shorter time-to-accuracy: information deficiency, coupling for contrasting factors, client heterogeneity, and huge configuration space. Motivated by the need for inspiring related research, in this paper we survey highly relevant attempts in the FL literature and organize them by the related training phases in the standard workflow: selection, configuration, and reporting. We also review exploratory work including measurement studies and benchmarking tools to friendly support FL developers. Although a few survey articles on FL already exist, our work differs from them in terms of the focus, classification, and implications.

【3】 Tom: Leveraging trend of the observed gradients for faster convergence 标题：TOM：利用观测到的梯度趋势以实现更快的收敛链接：https://arxiv.org/abs/2109.03820

作者：Anirudh Maiya,Inumella Sricharan,Anshuman Pandey,Srinivas K. S 机构：Department of Computer Science and Engineering, PES University, Bangalore, India 摘要：深度学习的成功可归因于各种因素，如计算能力的提高、大数据集、深度卷积神经网络、优化器等。特别是优化器的选择影响泛化、收敛速度和训练稳定性。随机梯度下降（SGD）是一种一阶迭代优化器，可对所有参数均匀更新梯度。这种统一的更新可能不适用于整个训练阶段。一个基本的解决方案是使用一个微调的学习率调度器，该调度器将学习率作为迭代的函数降低。为了消除学习率调度器的依赖性，AdaGrad、AdaDelta、RMSProp、Adam等自适应梯度优化器采用了学习率的参数化缩放项，该项是梯度本身的函数。我们提出了Tom（趋势超过动量）优化器，这是Adam的一个新变体，它考虑了神经网络穿过的损失景观中观察到的梯度趋势。在提出的Tom优化器中，引入了一个额外的平滑方程来处理优化过程中观察到的趋势。为趋势引入的平滑参数不需要调整，可以与默认值一起使用。分类数据集（如CIFAR-10、CIFAR-100和CINIC-10图像数据集）的实验结果表明，Tom在精度和收敛速度方面均优于Adagrad、Adadelta、RMSProp和Adam。源代码在以下位置公开提供：https://github.com/AnirudhMaiya/Tom 摘要：The success of deep learning can be attributed to various factors such as increase in computational power, large datasets, deep convolutional neural networks, optimizers etc. Particularly, the choice of optimizer affects the generalization, convergence rate, and training stability. Stochastic Gradient Descent (SGD) is a first order iterative optimizer that updates the gradient uniformly for all parameters. This uniform update may not be suitable across the entire training phase. A rudimentary solution for this is to employ a fine-tuned learning rate scheduler which decreases learning rate as a function of iteration. To eliminate the dependency of learning rate schedulers, adaptive gradient optimizers such as AdaGrad, AdaDelta, RMSProp, Adam employ a parameter-wise scaling term for learning rate which is a function of the gradient itself. We propose Tom (Trend over Momentum) optimizer, which is a novel variant of Adam that takes into account of the trend which is observed for the gradients in the loss landscape traversed by the neural network. In the proposed Tom optimizer, an additional smoothing equation is introduced to address the trend observed during the process of optimization. The smoothing parameter introduced for the trend requires no tuning and can be used with default values. Experimental results for classification datasets such as CIFAR-10, CIFAR-100 and CINIC-10 image datasets show that Tom outperforms Adagrad, Adadelta, RMSProp and Adam in terms of both accuracy and has a faster convergence. The source code is publicly made available at https://github.com/AnirudhMaiya/Tom

【4】 Constants of Motion: The Antidote to Chaos in Optimization and Game Dynamics 标题：运动常数：最优化和博弈动力学中的混沌解药链接：https://arxiv.org/abs/2109.03974

作者：Georgios Piliouras,Xiao Wang 机构：Singapore University of Technology and Design, Shanghai University of Finance and Economics 摘要：最近在在线优化和游戏动力学方面的一些工作已经建立了强大的负面复杂性结果，包括不稳定和混乱的正式出现，即使在这样的小环境中，例如，2美元乘以2美元的游戏。这些结果引发了以下问题：哪些方法学工具可以保证此类动力学的规律性，我们如何将其应用于相关的标准设置，如离散时间一阶优化动力学？我们展示了如何证明不变函数（即运动常数）的存在是这一方向的基本贡献，并在优化和游戏设置中建立了大量这样的积极结果（例如梯度下降、乘法权重更新、交替梯度下降和流形梯度下降）。在技术层面上，对于某些守恒定律，我们提供了一个明确简洁的封闭形式，而对于其他守恒定律，我们使用动力系统的工具给出了非构造性证明。摘要：Several recent works in online optimization and game dynamics have established strong negative complexity results including the formal emergence of instability and chaos even in small such settings, e.g., $2\times 2$ games. These results motivate the following question: Which methodological tools can guarantee the regularity of such dynamics and how can we apply them in standard settings of interest such as discrete-time first-order optimization dynamics? We show how proving the existence of invariant functions, i.e., constant of motions, is a fundamental contribution in this direction and establish a plethora of such positive results (e.g. gradient descent, multiplicative weights update, alternating gradient descent and manifold gradient descent) both in optimization as well as in game settings. At a technical level, for some conservation laws we provide an explicit and concise closed form, whereas for other ones we present non-constructive proofs using tools from dynamical systems.

预测|估计(5篇)

【1】 Estimation of Corporate Greenhouse Gas Emissions via Machine Learning 标题：基于机器学习的企业温室气体排放量估算链接：https://arxiv.org/abs/2109.04318

作者：You Han,Achintya Gopal,Liwen Ouyang,Aaron Key 备注：Accepted for the Tackling Climate Change with Machine Learning Workshop at ICML 2021 摘要：作为履行《巴黎协定》和到2050年实现净零排放的重要步骤，欧盟委员会于2021年4月通过了最雄心勃勃的气候影响措施，以改善资本流向可持续活动。要使这些和其他国际措施取得成功，可靠的数据是关键。了解全球公司碳足迹的能力对于投资者遵守这些措施至关重要。然而，由于只有一小部分公司自愿披露其温室气体（GHG）排放量，投资者几乎不可能将其投资策略与这些措施结合起来。通过对已披露温室气体排放量的机器学习模型进行训练，我们能够估计全球其他未披露其排放量的公司的排放量。在本文中，我们表明，我们的模型为投资者提供了企业温室气体排放的准确估计，使他们能够将其投资与监管措施相一致，并实现净零目标。摘要：As an important step to fulfill the Paris Agreement and achieve net-zero emissions by 2050, the European Commission adopted the most ambitious package of climate impact measures in April 2021 to improve the flow of capital towards sustainable activities. For these and other international measures to be successful, reliable data is key. The ability to see the carbon footprint of companies around the world will be critical for investors to comply with the measures. However, with only a small portion of companies volunteering to disclose their greenhouse gas (GHG) emissions, it is nearly impossible for investors to align their investment strategies with the measures. By training a machine learning model on disclosed GHG emissions, we are able to estimate the emissions of other companies globally who do not disclose their emissions. In this paper, we show that our model provides accurate estimates of corporate GHG emissions to investors such that they are able to align their investments with the regulatory measures and achieve net-zero goals.

【2】 Toward a Perspectivist Turn in Ground Truthing for Predictive Computing 标题：预测计算基础真相的透视主义转向链接：https://arxiv.org/abs/2109.04270

作者：Valerio Basile,Federico Cabitza,Andrea Campagner,Michael Fell 机构： Universita di Torino, Turin, Italy, Universita di Milano-Bicocca, Milan, Italy 备注：16 pages, Accepted at ItaIS2021 this http URL 摘要：大多数人工智能应用程序都基于有监督机器学习（ML），它最终基于人工标注的数据。注释过程通常以多数票的形式进行，这已被证明是有问题的，最近关于ML模型评估的研究强调了这一点。在这篇文章中，我们描述并倡导一种不同的范式，我们称之为数据透视主义，它从传统的金标准数据集转向采用方法，整合参与ML过程知识表示步骤的人类主体的观点和观点。根据之前激发我们提案的作品，我们描述了我们提案的潜力，不仅适用于更主观的任务（如与人类语言相关的任务），也适用于通常理解为客观的任务（如医疗决策），并介绍了在ML中采用透视主义立场的主要优势，以及可能存在的缺点，以及在实践中实施这一立场的各种方式。最后，我们分享一组建议，并概述一个研究议程，以推进ML中的透视主义立场。摘要：Most Artificial Intelligence applications are based on supervised machine learning (ML), which ultimately grounds on manually annotated data. The annotation process is often performed in terms of a majority vote and this has been proved to be often problematic, as highlighted by recent studies on the evaluation of ML models. In this article we describe and advocate for a different paradigm, which we call data perspectivism, which moves away from traditional gold standard datasets, towards the adoption of methods that integrate the opinions and perspectives of the human subjects involved in the knowledge representation step of ML processes. Drawing on previous works which inspired our proposal we describe the potential of our proposal for not only the more subjective tasks (e.g. those related to human language) but also to tasks commonly understood as objective (e.g. medical decision making), and present the main advantages of adopting a perspectivist stance in ML, as well as possible disadvantages, and various ways in which such a stance can be implemented in practice. Finally, we share a set of recommendations and outline a research agenda to advance the perspectivist stance in ML.

【3】 Stationary Density Estimation of Itô Diffusions Using Deep Learning 标题：基于深度学习的信息扩散平稳密度估计链接：https://arxiv.org/abs/2109.03992

作者：Yiqi Gu,John Harlim,Senwei Liang,Haizhao Yang 机构：National University of Singapore, Lower Kent Ridge Road, Singapore, Department of Mathematics, Department of Meteorology and Atmospheric Science, Institute for Computational and Data Sciences, The Pennsylvania State University, University Park, PA , USA 摘要：在本文中，我们考虑了一个离散的时间序列的遍历扩散的平稳测度的密度估计问题，它近似于随机微分方程的解。为了利用抛物型福克-普朗克偏微分方程定态解的密度函数特征，我们进行如下操作。首先，我们通过求解适当的监督学习任务，使用深度神经网络来逼近SDE的漂移项和扩散项。随后，我们用基于神经网络的最小二乘法求解与估计的漂移和扩散系数相关的稳态福克-普朗克方程。我们在适当的数学假设下证明了所提方案的收敛性，考虑了漂移系数和扩散系数回归引起的泛化误差以及偏微分方程解算器。该理论研究依赖于最近的马尔可夫链结果扰动理论，该结果表明密度估计与漂移项估计误差呈线性关系，以及非参数回归和神经网络模型获得的偏微分方程回归解的推广误差结果。该方法的有效性通过二维Student t分布和20维Langevin动力学的数值模拟得到体现。摘要：In this paper, we consider the density estimation problem associated with the stationary measure of ergodic It\^o diffusions from a discrete-time series that approximate the solutions of the stochastic differential equations. To take an advantage of the characterization of density function through the stationary solution of a parabolic-type Fokker-Planck PDE, we proceed as follows. First, we employ deep neural networks to approximate the drift and diffusion terms of the SDE by solving appropriate supervised learning tasks. Subsequently, we solve a steady-state Fokker-Plank equation associated with the estimated drift and diffusion coefficients with a neural-network-based least-squares method. We establish the convergence of the proposed scheme under appropriate mathematical assumptions, accounting for the generalization errors induced by regressing the drift and diffusion coefficients, and the PDE solvers. This theoretical study relies on a recent perturbation theory of Markov chain result that shows a linear dependence of the density estimation to the error in estimating the drift term, and generalization error results of nonparametric regression and of PDE regression solution obtained with neural-network models. The effectiveness of this method is reflected by numerical simulations of a two-dimensional Student's t distribution and a 20-dimensional Langevin dynamics.

【4】 Leveraging Code Clones and Natural Language Processing for Log Statement Prediction 标题：利用代码克隆和自然语言处理进行日志语句预测链接：https://arxiv.org/abs/2109.03859

作者：Sina Gholamian 机构：University of Waterloo, Waterloo, Canada 备注：ASE '21: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering - Doctoral Symposium 摘要：在现代软件开发中，软件开发人员在源代码中嵌入日志语句是一项必不可少的任务，因为日志文件对于跟踪运行时系统问题和对系统管理任务进行故障排除是必需的。先前的研究强调了日志语句在软件系统的操作和调试中的重要性。然而，当前的日志记录过程主要是手动和临时的，因此，日志记录语句的正确位置和内容仍然是一个挑战。为了克服这些挑战，旨在自动化日志放置和日志内容的方法，即“在何处、记录什么和如何记录”是人们非常感兴趣的。因此，我们建议实现本研究的目标，即“利用源代码克隆和自然语言处理（NLP）预测日志语句”，因为这些方法为日志预测提供了额外的上下文和优势。我们追求以下四个研究目标：（RO1）调查源代码克隆是否可用于日志语句位置预测，（RO2）提出基于克隆的日志语句预测方法，（RO3）使用代码克隆和NLP模型预测日志语句的描述，以及（RO4）检查自动预测log语句其他细节的方法，如详细程度和变量。为此，我们对七个开源java项目进行了实验分析，提取它们的方法级代码克隆，研究它们的属性，并利用它们进行日志位置和描述预测。我们的工作证明了日志感知克隆检测对于自动化日志位置和描述预测的有效性，并且优于以前的工作。摘要：Software developers embed logging statements inside the source code as an imperative duty in modern software development as log files are necessary for tracking down runtime system issues and troubleshooting system management tasks. Prior research has emphasized the importance of logging statements in the operation and debugging of software systems. However, the current logging process is mostly manual and ad hoc, and thus, proper placement and content of logging statements remain as challenges. To overcome these challenges, methods that aim to automate log placement and log content, i.e., 'where, what, and how to log', are of high interest. Thus, we propose to accomplish the goal of this research, that is "to predict the log statements by utilizing source code clones and natural language processing (NLP)", as these approaches provide additional context and advantage for log prediction. We pursue the following four research objectives: (RO1) investigate whether source code clones can be leveraged for log statement location prediction, (RO2) propose a clone-based approach for log statement prediction, (RO3) predict log statement's description with code-clone and NLP models, and (RO4) examine approaches to automatically predict additional details of the log statement, such as its verbosity level and variables. For this purpose, we perform an experimental analysis on seven open-source java projects, extract their method-level code clones, investigate their attributes, and utilize them for log location and description prediction. Our work demonstrates the effectiveness of log-aware clone detection for automated log location and description prediction and outperforms the prior work.

【5】 On the estimation of discrete choice models to capture irrational customer behaviors 标题：关于捕捉非理性顾客行为的离散选择模型的估计链接：https://arxiv.org/abs/2109.03882

作者：Sanjay Dominik Jena,Andrea Lodi,Claudio Sole 机构：´Ecole des Sciences de la Gestion, Universit´e du Qu´ebec a Montr´eal, Centre interuniversitaire de recherche sur les r´eseaux d’entreprise, la logistique et le transport (CIRRELT) 摘要：到目前为止，随机效用最大化模型是估计消费者选择行为最常用的框架。然而，行为经济学为非理性选择行为提供了强有力的实证证据，如光环效应，这与该框架不相容。因此，属于随机效用最大化家族的模型可能无法准确捕捉此类非理性行为。因此，人们提出了克服这些限制的更一般的选择模型。然而，这种模型的灵活性是以过度拟合风险增加为代价的。因此，估计此类模型仍然是一项挑战。在这项工作中，我们为最近提出的广义随机偏好选择模型提出了一种估计方法，该模型包含随机效用最大化模型族，并且能够捕获光晕效应。具体来说，我们展示了如何使用部分排序的偏好从交易数据中有效地建模理性和非理性客户类型。我们的估计过程基于列生成，通过扩展包含客户行为的树状数据结构，有效地提取相关客户类型。此外，我们提出了一个新的客户类型主导规则，其效果是优先考虑产品之间的低阶交互。大量的实验评估了该方法的预测精度。我们的研究结果表明，在一个来自大型连锁杂货店和药店的真实数据集上进行测试时，考虑非理性偏好可以平均提高预测准确率12.5%。摘要：The Random Utility Maximization model is by far the most adopted framework to estimate consumer choice behavior. However, behavioral economics has provided strong empirical evidence of irrational choice behavior, such as halo effects, that are incompatible with this framework. Models belonging to the Random Utility Maximization family may therefore not accurately capture such irrational behavior. Hence, more general choice models, overcoming such limitations, have been proposed. However, the flexibility of such models comes at the price of increased risk of overfitting. As such, estimating such models remains a challenge. In this work, we propose an estimation method for the recently proposed Generalized Stochastic Preference choice model, which subsumes the family of Random Utility Maximization models and is capable of capturing halo effects. Specifically, we show how to use partially-ranked preferences to efficiently model rational and irrational customer types from transaction data. Our estimation procedure is based on column generation, where relevant customer types are efficiently extracted by expanding a tree-like data structure containing the customer behaviors. Further, we propose a new dominance rule among customer types whose effect is to prioritize low orders of interactions among products. An extensive set of experiments assesses the predictive accuracy of the proposed approach. Our results show that accounting for irrational preferences can boost predictive accuracy by 12.5% on average, when tested on a real-world dataset from a large chain of grocery and drug stores.

其他神经网络|深度学习|模型|建模(24篇)

【1】 Neural Latents Benchmark '21: Evaluating latent variable models of neural population activity 标题：神经潜伏期基准‘21：评估神经群体活动的潜变量模型链接：https://arxiv.org/abs/2109.04463

作者：Felix Pei,Joel Ye,David Zoltowski,Anqi Wu,Raeed H. Chowdhury,Hansem Sohn,Joseph E. O'Doherty,Krishna V. Shenoy,Matthew T. Kaufman,Mark Churchland,Mehrdad Jazayeri,Lee E. Miller,Jonathan Pillow,Il Memming Park,Eva L. Dyer,Chethan Pandarinath 机构：Georgia Institute of Technology,Carnegie Mellon University,Emory University, Princeton University,Columbia University,University of Pittsburgh, Massachusetts Institute of Technology,Neuralink Corp.,Stanford University 摘要：神经记录技术的进步为研究神经活动提供了前所未有的机会。潜变量模型（LVM）是一种很有前途的工具，可用于分析不同神经系统和行为的丰富活动，因为LVM不依赖于活动和外部实验变量之间的已知关系。然而，由于缺乏标准化，潜在变量建模的进展目前受到阻碍，导致以特别方式开发和比较方法。为了协调这些建模工作，我们引入了一个用于神经种群活动潜变量建模的基准套件。我们从认知、感觉和运动领域整理了四组神经尖峰活动数据集，以推广适用于这些领域的各种活动的模型。我们将无监督评估确定为跨数据集评估模型的通用框架，并应用几个基线来证明基准的多样性。我们通过EvalAI发布这个基准。http://neurallatents.github.io 摘要：Advances in neural recording present increasing opportunities to study neural activity in unprecedented detail. Latent variable models (LVMs) are promising tools for analyzing this rich activity across diverse neural systems and behaviors, as LVMs do not depend on known relationships between the activity and external experimental variables. However, progress in latent variable modeling is currently impeded by a lack of standardization, resulting in methods being developed and compared in an ad hoc manner. To coordinate these modeling efforts, we introduce a benchmark suite for latent variable modeling of neural population activity. We curate four datasets of neural spiking activity from cognitive, sensory, and motor areas to promote models that apply to the wide variety of activity seen across these areas. We identify unsupervised evaluation as a common framework for evaluating models across datasets, and apply several baselines that demonstrate benchmark diversity. We release this benchmark through EvalAI. http://neurallatents.github.io

【2】 Learning from Uneven Training Data: Unlabeled, Single Label, and Multiple Labels 标题：从不均匀的训练数据中学习：无标签、单标签和多标签链接：https://arxiv.org/abs/2109.04408

作者：Shujian Zhang,Chengyue Gong,Eunsol Choi 机构：The University of Texas at Austin 备注：EMNLP 2021; Our code is publicly available at this https URL 摘要：训练NLP系统通常假设访问每个示例具有单个人类标签的带注释数据。鉴于注释者的不完全标注和语言固有的歧义，我们假设单一标注不足以了解语言解释的范围。我们探索了新的标签注释分布方案，为每个示例分配多个标签，用于训练示例的一小部分。以注释较少的示例为代价引入这样的多标签示例，可以在自然语言推理任务和实体类型任务中获得明显的收益，即使我们只是首先使用单个标签数据进行训练，然后使用多标签示例进行微调。通过扩展混合数据增强框架，我们提出了一种学习算法，可以从不均匀的训练示例（具有零个、一个或多个标签）中学习。该算法有效地结合了来自不均匀训练数据的信号，并在较低的注释预算和跨域设置中带来额外的收益。总之，我们的方法在两个任务中获得了一致的精度和标签分布度量，这表明使用不均匀训练数据的训练对于许多NLP任务是有益的。摘要：Training NLP systems typically assumes access to annotated data that has a single human label per example. Given imperfect labeling from annotators and inherent ambiguity of language, we hypothesize that single label is not sufficient to learn the spectrum of language interpretation. We explore new label annotation distribution schemes, assigning multiple labels per example for a small subset of training examples. Introducing such multi label examples at the cost of annotating fewer examples brings clear gains on natural language inference task and entity typing task, even when we simply first train with a single label data and then fine tune with multi label examples. Extending a MixUp data augmentation framework, we propose a learning algorithm that can learn from uneven training examples (with zero, one, or multiple labels). This algorithm efficiently combines signals from uneven training data and brings additional gains in low annotation budget and cross domain settings. Together, our method achieves consistent gains in both accuracy and label distribution metrics in two tasks, suggesting training with uneven training data can be beneficial for many NLP tasks.

【3】 Dynamic Modeling of Hand-Object Interactions via Tactile Sensing 标题：基于触觉的手-物交互动态建模链接：https://arxiv.org/abs/2109.04378

作者：Qiang Zhang,Yunzhu Li,Yiyue Luo,Wan Shou,Michael Foshey,Junchi Yan,Joshua B. Tenenbaum,Wojciech Matusik,Antonio Torralba 机构：Juggling, Tactile glove, Stick balancing, Tactile response, Time 备注：IROS 2021. First two authors contributed equally. Project page: this http URL 摘要：触觉感知对于人类执行日常任务至关重要。虽然在从视觉分析物体抓取方面已经取得了重大进展，但我们如何利用触觉感知来推理和建模手-物体相互作用的动力学尚不清楚。在这项工作中，我们使用一个高分辨率的触觉手套在一组不同的物体上执行四种不同的交互活动。我们在跨模式学习框架上构建模型，并使用视觉处理管道生成标签，以监督触觉模型，然后在测试期间可以单独使用该模型。触觉模型旨在通过预测模型和对比学习模块相结合，纯粹从触摸数据预测手和物体的三维位置。该框架可以从触觉数据推断交互模式，幻觉环境的变化，估计预测的不确定性，并推广到看不见的对象。我们还提供了关于不同系统设计的详细消融研究以及预测轨迹的可视化。这项工作在手-物体交互的动力学建模方面迈出了一步，从稠密的触觉感知开始，这为机器人的活动学习、人机交互和模仿学习的未来应用打开了大门。摘要：Tactile sensing is critical for humans to perform everyday tasks. While significant progress has been made in analyzing object grasping from vision, it remains unclear how we can utilize tactile sensing to reason about and model the dynamics of hand-object interactions. In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects. We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model, which can then be used on its own during the test time. The tactile model aims to predict the 3d locations of both the hand and the object purely from the touch data by combining a predictive model and a contrastive learning module. This framework can reason about the interaction patterns from the tactile data, hallucinate the changes in the environment, estimate the uncertainty of the prediction, and generalize to unseen objects. We also provide detailed ablation studies regarding different system designs as well as visualizations of the predicted trajectories. This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.

【4】 Cross DQN: Cross Deep Q Network for Ads Allocation in Feed 标题：Cross DQN：用于Feed中广告分配的Cross Deep Q网络链接：https://arxiv.org/abs/2109.04353

作者：Guogang Liao,Ze Wang,Xiaoxu Wu,Xiaowen Shi,Chuheng Zhang,Yongkang Wang,Xingxing Wang,Dong Wang 机构： Meituan Group, Beijing, P.R.China, IIIS, Tsinghua University, liaoguogang, wangze, wuxiaoxu, shixiaowen, zhangchuheng, wangyongkang 摘要：电子商务平台通常在提要中显示广告和有机物品的混合列表。一个关键问题是在提要中分配有限的时隙，以最大化总体收入并改善用户体验，这需要一个良好的用户偏好模型。安排信号不是模拟单个项目对用户行为的影响，而是模拟项目安排的影响，并可能导致更好的分配策略。然而，以前的大多数策略都无法对这种信号进行建模，因此导致性能不理想。为此，我们提出了交叉深度Q网络（Cross-DQN），通过交叉不同项的嵌入并处理feed中的交叉序列来提取排列信号。与离线实验中最先进的基线相比，我们的模型带来了更高的收入和更好的用户体验。此外，我们的模型显示了在线a/B测试的显著改进，并已完全部署在美团feed上，为3亿多客户提供服务。摘要：E-commerce platforms usually display a mixed list of ads and organic items in feed. One key problem is to allocate the limited slots in the feed to maximize the overall revenue as well as improve user experience, which requires a good model for user preference. Instead of modeling the influence of individual items on user behaviors, the arrangement signal models the influence of the arrangement of items and may lead to a better allocation strategy. However, most of previous strategies fail to model such a signal and therefore result in suboptimal performance. To this end, we propose Cross Deep Q Network (Cross DQN) to extract the arrangement signal by crossing the embeddings of different items and processing the crossed sequence in the feed. Our model results in higher revenue and better user experience than state-of-the-art baselines in offline experiments. Moreover, our model demonstrates a significant improvement in the online A/B test and has been fully deployed on Meituan feed to serve more than 300 millions of customers.

【5】 NeuralFMU: Towards Structural Integration of FMUs into Neural Networks 标题：NeuralFMU：将FMU结构集成到神经网络中链接：https://arxiv.org/abs/2109.04351

作者：Tobias Thummerer,Josef Kircher,Lars Mikelsons 摘要：本文涵盖两个主要主题：首先，介绍一个名为FMI.jl的新开放源代码库，通过提供加载、参数化和模拟FMU的可能性，将FMI集成到Julia编程环境中。此外，还引入了该库的一个扩展名为FMIFlux.jl，该扩展允许将FMU集成到神经网络拓扑中，以获得NeuralFMU。这种行业典型的黑箱模型和数据驱动的机器学习模型的结构组合，在一个单一的开发环境中结合了这两种建模方法的不同优势。这允许使用先进的数据驱动建模技术对物理效果进行建模，这些物理效果很难根据第一原理进行建模。摘要：This paper covers two major subjects: First, the presentation of a new open-source library called FMI.jl for integrating FMI into the Julia programming environment by providing the possibility to load, parameterize and simulate FMUs. Further, an extension to this library called FMIFlux.jl is introduced, that allows the integration of FMUs into a neural network topology to obtain a NeuralFMU. This structural combination of an industry typical black-box model and a data-driven machine learning model combines the different advantages of both modeling approaches in one single development environment. This allows for the usage of advanced data driven modeling techniques for physical effects that are difficult to model based on first principles.

【6】 Learning Opinion Summarizers by Selecting Informative Reviews 标题：通过选择信息性评论学习意见摘要链接：https://arxiv.org/abs/2109.04325

作者：Arthur Bražinskas,Mirella Lapata,Ivan Titov 机构： ILCC, University of Edinburgh, ILLC, University of Amsterdam 备注：EMNLP 2021 摘要：传统的观点总结方法是采用无监督、弱监督和少量镜头学习技术。在这项工作中，我们收集了超过31000种产品的大量总结数据集，并与用户评论进行了配对，从而实现了监督训练。然而，每个产品的评论数量很大（平均320条），这使得总结——尤其是训练总结员——不切实际。此外，许多评论的内容并没有反映在人类编写的摘要中，因此，在随机评论子集上训练的摘要员会产生幻觉。为了应对这两个挑战，我们将任务描述为共同学习选择信息性评论子集并总结这些子集中表达的观点。审查子集的选择被视为一个潜在变量，由一个小而简单的选择器预测。然后将子集输入一个更强大的摘要器。对于联合训练，我们使用摊销变分推理和策略梯度方法。我们的实验证明了选择信息性评论的重要性，从而提高了总结的质量，减少了幻觉。摘要：Opinion summarization has been traditionally approached with unsupervised, weakly-supervised and few-shot learning techniques. In this work, we collect a large dataset of summaries paired with user reviews for over 31,000 products, enabling supervised training. However, the number of reviews per product is large (320 on average), making summarization - and especially training a summarizer - impractical. Moreover, the content of many reviews is not reflected in the human-written summaries, and, thus, the summarizer trained on random review subsets hallucinates. In order to deal with both of these challenges, we formulate the task as jointly learning to select informative subsets of reviews and summarizing the opinions expressed in these subsets. The choice of the review subset is treated as a latent variable, predicted by a small and simple selector. The subset is then fed into a more powerful summarizer. For joint training, we use amortized variational inference and policy gradient methods. Our experiments demonstrate the importance of selecting informative reviews resulting in improved quality of summaries and reduced hallucinations.

【7】 DAE-PINN: A Physics-Informed Neural Network Model for Simulating Differential-Algebraic Equations with Application to Power Networks 标题：DAE-Pinn：模拟微分代数方程的物理信息神经网络模型及其在电网中的应用链接：https://arxiv.org/abs/2109.04304

作者：Christian Moya,Guang Lin 机构： Lin are with the Department of Mathematics, Pur-due University 摘要：基于深度学习的代理模型正成为学习和模拟动态系统的一种很有前途的方法。然而，深度学习方法发现，学习动态非常具有挑战性。在本文中，我们开发了DAE-PINN，这是第一个有效的深度学习框架，用于学习和模拟非线性微分代数方程（DAE）的解轨迹，它呈现了一种无限刚度的形式，并描述了例如电力网络的动力学。我们的DAE-PINN基于隐式Runge-Kutta时间步进方案（专为解决DAE而设计）和物理信息神经网络（PINN）（我们训练以满足基本问题动力学的深层神经网络）之间的协同效应。此外，我们的框架（i）使用基于惩罚的方法强制神经网络满足DAE作为（近似）硬约束，以及（ii）允许在长时间范围内模拟DAE。通过学习和模拟三总线电力网络的解决方案轨迹，我们展示了DAE-PINN的有效性和准确性。摘要：Deep learning-based surrogate modeling is becoming a promising approach for learning and simulating dynamical systems. Deep-learning methods, however, find very challenging learning stiff dynamics. In this paper, we develop DAE-PINN, the first effective deep-learning framework for learning and simulating the solution trajectories of nonlinear differential-algebraic equations (DAE), which present a form of infinite stiffness and describe, for example, the dynamics of power networks. Our DAE-PINN bases its effectiveness on the synergy between implicit Runge-Kutta time-stepping schemes (designed specifically for solving DAEs) and physics-informed neural networks (PINN) (deep neural networks that we train to satisfy the dynamics of the underlying problem). Furthermore, our framework (i) enforces the neural network to satisfy the DAEs as (approximate) hard constraints using a penalty-based method and (ii) enables simulating DAEs for long-time horizons. We showcase the effectiveness and accuracy of DAE-PINN by learning and simulating the solution trajectories of a three-bus power network.

【8】 NTS-NOTEARS: Learning Nonparametric Temporal DAGs With Time-Series Data and Prior Knowledge 标题：NTS-NOTEARS：利用时序数据和先验知识学习非参数时态DAG 链接：https://arxiv.org/abs/2109.04286

作者：Xiangyu Sun,Guiliang Liu,Pascal Poupart,Oliver Schulte 机构： Simon Fraser University, University of Waterloo 备注：Preprint, under review 摘要：我们针对时间序列数据提出了一种基于分数的DAG结构学习方法，该方法可以捕获变量之间的线性、非线性、滞后和瞬时关系，同时确保整个图的非循环性。该方法扩展了非参数记事本，这是一种最近用于学习非参数瞬时DAG的连续优化方法。该方法比使用非线性条件独立性检验的基于约束的方法速度更快。我们还提倡使用优化约束将先验知识纳入结构学习过程。大量模拟数据的实验表明，该方法比最近的几种比较方法发现了更好的DAG结构。我们还对从NHL冰球比赛中获得的包含连续变量和离散变量的复杂真实数据评估了所提出的方法。该守则可于https://github.com/xiangyu-sun-789/NTS-NOTEARS/. 摘要：We propose a score-based DAG structure learning method for time-series data that captures linear, nonlinear, lagged and instantaneous relations among variables while ensuring acyclicity throughout the entire graph. The proposed method extends nonparametric NOTEARS, a recent continuous optimization approach for learning nonparametric instantaneous DAGs. The proposed method is faster than constraint-based methods using nonlinear conditional independence tests. We also promote the use of optimization constraints to incorporate prior knowledge into the structure learning process. A broad set of experiments with simulated data demonstrates that the proposed method discovers better DAG structures than several recent comparison methods. We also evaluate the proposed method on complex real-world data acquired from NHL ice hockey games containing a mixture of continuous and discrete variables. The code is available at https://github.com/xiangyu-sun-789/NTS-NOTEARS/.

【9】 ECQ^{\text{x}}: Explainability-Driven Quantization for Low-Bit and Sparse DNNs标题：ECQ^{ext{x}}：用于低位和稀疏DNN的可解释性驱动的量化链接：https://arxiv.org/abs/2109.04236

作者：Daniel Becking,Maximilian Dreyer,Wojciech Samek,Karsten Müller,Sebastian Lapuschkin 机构：Karsten M¨uller,†, Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, Germany, BIFOLD – Berlin Institute for the Foundations of Learning and Data, Berlin, Germany 备注：21 pages, 10 figures, 1 table 摘要：深度神经网络（DNN）在各种应用中的显著成功伴随着网络参数和算术运算的显著增加。这种内存和计算需求的增加使得资源受限的硬件平台（如移动设备）无法进行深度学习。最近的工作旨在减少这些开销，同时尽可能保持模型性能，包括参数缩减技术、参数量化和无损压缩技术。在本章中，我们开发并描述了一种新的DNN量化范式：我们的方法利用了可解释AI（XAI）的概念和信息论的概念：而不是根据量化簇的距离分配权重值，分配函数还考虑了从分层相关传播（LRP）和聚类信息含量（熵优化）获得的权重相关性。最终目标是在信息含量最高的量化集群中保留最相关的权重。实验结果表明，这种新的熵约束和XAI调整量化（ECQ$^{\text{x}}}$）方法在保持甚至改善模型性能的同时，生成超低精度（2-5位）和同时稀疏的神经网络。由于降低了参数精度和大量零元素，渲染网络在文件大小方面具有高度可压缩性，与全精度非量化DNN模型相比，高达$103\倍。我们的方法在不同类型的模型和数据集（包括Google语音命令和CIFAR-10）上进行了评估，并与以前的工作进行了比较。摘要：The remarkable success of deep neural networks (DNNs) in various applications is accompanied by a significant increase in network parameters and arithmetic operations. Such increases in memory and computational demands make deep learning prohibitive for resource-constrained hardware platforms such as mobile devices. Recent efforts aim to reduce these overheads, while preserving model performance as much as possible, and include parameter reduction techniques, parameter quantization, and lossless compression techniques. In this chapter, we develop and describe a novel quantization paradigm for DNNs: Our method leverages concepts of explainable AI (XAI) and concepts of information theory: Instead of assigning weight values based on their distances to the quantization clusters, the assignment function additionally considers weight relevances obtained from Layer-wise Relevance Propagation (LRP) and the information content of the clusters (entropy optimization). The ultimate goal is to preserve the most relevant weights in quantization clusters of highest information content. Experimental results show that this novel Entropy-Constrained and XAI-adjusted Quantization (ECQ$^{\text{x}}$) method generates ultra low-precision (2-5 bit) and simultaneously sparse neural networks while maintaining or even improving model performance. Due to reduced parameter precision and high number of zero-elements, the rendered networks are highly compressible in terms of file size, up to $103\times$ compared to the full-precision unquantized DNN model. Our approach was evaluated on different types of models and datasets (including Google Speech Commands and CIFAR-10) and compared with previous work.

【10】 QUINT: Node embedding using network hashing 标题：Quint：使用网络散列的节点嵌入链接：https://arxiv.org/abs/2109.04206

作者：Debajyoti Bera,Rameshwar Pratap,Bhisham Dev Verma,Biswadeep Sen,Tanmoy Chakraborty 机构： Verma are with Indian Institute of Technology, Sen is currently a research associate at department of ComputerScience, National University of Singapore, This work was donewhen he was affiliated with Chennai Mathematical Institute 备注：Accepted in IEEE TKDE 摘要：基于网络嵌入的表征学习因其解决下游任务的有效性而受到广泛关注。流行的嵌入方法（如deepwalk、node2vec、LINE）基于神经架构，因此无法在时间和空间使用方面在大型网络上进行扩展。最近，我们提出了BinSketch，一种用于将二元向量压缩为二元向量的绘制技术。在本文中，我们将展示如何扩展BinSketch并将其用于网络哈希。我们的方案名为QUINT，它基于BinSketch，使用简单的双向操作将稀疏网络的节点嵌入到低维空间中。QUINT是同类产品中的第一款，它在速度和空间利用率方面提供了巨大的收益，而不会对下游任务的准确性造成太大的影响。我们进行了大量实验，将QUINT与七种最先进的网络嵌入方法进行比较，用于两个终端任务——链路预测和节点分类。我们观察到QUINT在加速（高达7000x）和节省空间（高达800x）方面获得了巨大的性能提升，这是因为它具有实现节点嵌入的逐位特性。此外，在所有数据集的基线任务中，QUINT始终是这两项任务的最佳执行者。我们的经验观察得到了严格理论分析的支持，以证明QUINT的有效性。特别地，我们证明了QUINT保留了足够的结构信息，这些信息可以进一步用于以高置信度逼近网络的许多拓扑性质。摘要：Representation learning using network embedding has received tremendous attention due to its efficacy to solve downstream tasks. Popular embedding methods (such as deepwalk, node2vec, LINE) are based on a neural architecture, thus unable to scale on large networks both in terms of time and space usage. Recently, we proposed BinSketch, a sketching technique for compressing binary vectors to binary vectors. In this paper, we show how to extend BinSketch and use it for network hashing. Our proposal named QUINT is built upon BinSketch, and it embeds nodes of a sparse network onto a low-dimensional space using simple bi-wise operations. QUINT is the first of its kind that provides tremendous gain in terms of speed and space usage without compromising much on the accuracy of the downstream tasks. Extensive experiments are conducted to compare QUINT with seven state-of-the-art network embedding methods for two end tasks - link prediction and node classification. We observe huge performance gain for QUINT in terms of speedup (up to 7000x) and space saving (up to 800x) due to its bit-wise nature to obtain node embedding. Moreover, QUINT is a consistent top-performer for both the tasks among the baselines across all the datasets. Our empirical observations are backed by rigorous theoretical analysis to justify the effectiveness of QUINT. In particular, we prove that QUINT retains enough structural information which can be used further to approximate many topological properties of networks with high confidence.

【11】 AutoSmart: An Efficient and Automatic Machine Learning framework for Temporal Relational Data 标题：AutoSmart：一种高效自动的时态关系数据机器学习框架链接：https://arxiv.org/abs/2109.04115

作者：Zhipeng Luo,Zhixing He,Jin Wang,Manqing Dong,Jianqiang Huang,Mingjian Chen,Bohang Zheng 机构：DeepBlue Technology, Beijing, China, Peking University, Beijing, China 备注：Accepted in the ADS track at the SIGKDD 2021 conference 摘要：时态关系数据可能是工业机器学习应用程序中最常用的数据类型，需要劳动密集型的特征工程和数据分析来提供精确的模型预测。需要一个自动机器学习框架来简化手动微调模型的工作，以便专家能够更多地关注真正需要人类参与的其他问题，如问题定义、部署和业务服务。然而，构建时态关系数据的自动解决方案面临三个主要挑战：1）如何从多个表及其关系中有效地自动挖掘有用信息？2）如何自我调节，将时间和内存消耗控制在一定的预算内？3）如何为广泛的任务提供通用解决方案？在这项工作中，我们提出了我们的解决方案，它以端到端的自动方式成功地解决了上述问题。提议的框架AutoSmart是AutoML赛道2019年KDD杯的获奖解决方案，该赛道是迄今为止最大的AutoML竞赛之一（860支队伍，约4955份参赛作品）。该框架包括自动数据处理、表合并、特征工程和模型调整，以及一个时间内存控制器，用于高效、自动地制定模型。该框架在不同领域的多个数据集上的性能明显优于基准解决方案。摘要：Temporal relational data, perhaps the most commonly used data type in industrial machine learning applications, needs labor-intensive feature engineering and data analyzing for giving precise model predictions. An automatic machine learning framework is needed to ease the manual efforts in fine-tuning the models so that the experts can focus more on other problems that really need humans' engagement such as problem definition, deployment, and business services. However, there are three main challenges for building automatic solutions for temporal relational data: 1) how to effectively and automatically mining useful information from the multiple tables and the relations from them? 2) how to be self-adjustable to control the time and memory consumption within a certain budget? and 3) how to give generic solutions to a wide range of tasks? In this work, we propose our solution that successfully addresses the above issues in an end-to-end automatic way. The proposed framework, AutoSmart, is the winning solution to the KDD Cup 2019 of the AutoML Track, which is one of the largest AutoML competition to date (860 teams with around 4,955 submissions). The framework includes automatic data processing, table merging, feature engineering, and model tuning, with a time\&memory controller for efficiently and automatically formulating the models. The proposed framework outperforms the baseline solution significantly on several datasets in various domains.

【12】 Fixing exposure bias with imitation learning needs powerful oracles 标题：用模仿学习修复暴露偏见需要强大的先知链接：https://arxiv.org/abs/2109.04114

作者：Luca Hormann,Artem Sokolov 机构：Heidelberg University∗, Google Research• 摘要：我们使用模仿学习（IL）来解决NMT暴露偏差问题，并评估了基于SMT晶格的oracle，尽管它在无约束的oracle翻译任务中表现出色，但结果证明它过于精简和特殊，无法作为IL的oracle。摘要：We apply imitation learning (IL) to tackle the NMT exposure bias problem with error-correcting oracles, and evaluate an SMT lattice-based oracle which, despite its excellent performance in an unconstrained oracle translation task, turned out to be too pruned and idiosyncratic to serve as the oracle for IL.

【13】 Table-based Fact Verification with Salience-aware Learning 标题：基于显著感知学习的基于表格的事实验证链接：https://arxiv.org/abs/2109.04053

作者：Fei Wang,Kexuan Sun,Jay Pujara,Pedro Szekely,Muhao Chen 机构：Department of Computer Science & Information Sciences Institute, University of Southern California 备注：EMNLP 2021 (Findings) 摘要：表提供了可用于验证文本语句的有价值的知识。虽然许多工作都考虑了基于表的事实验证，但在文本语句中，表格数据与标记的直接对齐很少可用。此外，训练广义事实验证模型需要大量的标记训练数据。在本文中，我们提出了一个新的系统来解决这些问题。受反事实因果关系的启发，我们的系统使用基于探测的显著性估计来识别语句中的标记水平显著性。显著性估计允许从两个角度增强事实验证的学习。从一个角度来看，我们的系统进行掩蔽显著性标记预测，以增强表和语句之间的对齐和推理模型。从另一个角度来看，我们的系统通过替换非显著项，应用显著性感知数据增强来生成更加多样化的训练实例集。在TabFact上的实验结果表明，所提出的显著性感知学习技术有效地提高了SOTA的性能。我们的代码在https://github.com/luka-group/Salience-aware-Learning . 摘要：Tables provide valuable knowledge that can be used to verify textual statements. While a number of works have considered table-based fact verification, direct alignments of tabular data with tokens in textual statements are rarely available. Moreover, training a generalized fact verification model requires abundant labeled training data. In this paper, we propose a novel system to address these problems. Inspired by counterfactual causality, our system identifies token-level salience in the statement with probing-based salience estimation. Salience estimation allows enhanced learning of fact verification from two perspectives. From one perspective, our system conducts masked salient token prediction to enhance the model for alignment and reasoning between the table and the statement. From the other perspective, our system applies salience-aware data augmentation to generate a more diverse set of training instances by replacing non-salient terms. Experimental results on TabFact show the effective improvement by the proposed salience-aware learning techniques, leading to the new SOTA performance on the benchmark. Our code is publicly available at https://github.com/luka-group/Salience-aware-Learning .

【14】 Versions of Gradient Temporal Difference Learning 标题：梯度时差学习的不同版本链接：https://arxiv.org/abs/2109.04033

作者：Donghwan Lee,Han-Dong Lim,Jihoon Park,Okyong Choi 摘要：Sutton、Szepesv\'{a}ri和Maei介绍了与线性函数近似和非策略训练兼容的第一种梯度时间差（GTD）学习算法。本文的目标是（a）通过广泛的比较分析提出GTD的一些变体，（b）为GTD建立新的理论分析框架。这些变体基于GTD的凹凸鞍点解释，有效地将所有GTD统一到一个框架中，并基于原始-对偶梯度动力学的最新结果提供简单的稳定性分析。最后，对这些方法进行了数值比较分析。摘要：Sutton, Szepesv\'{a}ri and Maei introduced the first gradient temporal-difference (GTD) learning algorithms compatible with both linear function approximation and off-policy training. The goal of this paper is (a) to propose some variants of GTDs with extensive comparative analysis and (b) to establish new theoretical analysis frameworks for the GTDs. These variants are based on convex-concave saddle-point interpretations of GTDs, which effectively unify all the GTDs into a single framework, and provide simple stability analysis based on recent results on primal-dual gradient dynamics. Finally, numerical comparative analysis is given to evaluate these approaches.

【15】 AdjointNet: Constraining machine learning models with physics-based codes 标题：AdjointNet：用基于物理的代码约束机器学习模型链接：https://arxiv.org/abs/2109.03956

作者：Satish Karra,Bulbul Ahmmed,Maruti K. Mudunuru 机构：Computational Earth Science Group (EES-,), Earth and Environmental Sciences Division, Los Alamos National, Laboratory, Los Alamos, NM , Watershed & Ecosystem Science, Pacific Northwest National Laboratory, Richland, WA 摘要：基于物理信息的机器学习在从模拟和观测数据中学习物理参数和特征方面具有很大的吸引力。然而，大多数现有的方法不能确保物理，如平衡定律（如质量、动量、能量守恒）受到约束。最近的一些工作（例如，基于物理的神经网络）通过包含基于偏微分方程（PDE）的损失函数来柔和地实施物理约束，但需要使用自动微分重新离散PDE。根据观测数据训练这些神经网络表明，人们可以一次性解决正问题和逆问题。它们评估PDE中的状态变量和参数。对于使用基于物理的代码的领域科学家来说，这种PDE的重新离散化不一定是一个有吸引力的选择，几十年来，基于物理的代码一直采用复杂的离散化技术来解决复杂的过程模型和高级状态方程。本文提出了一个物理约束的机器学习框架AdjointNet，允许领域科学家将其物理代码嵌入到神经网络训练工作流中。这种嵌入确保物理在域中的任何地方都受到约束。此外，对偏微分方程数值解至关重要的数学性质（如一致性、稳定性和收敛性）仍然是令人满意的。我们证明了所提出的伴随网框架可以用于参数估计（以及扩展的不确定性量化）和使用主动学习的实验设计。我们的框架适用于四种流动情况。结果表明，基于伴随网的反演能够以合理的精度估计过程模型参数。这些例子证明了在不改变源代码的情况下使用现有软件进行精确可靠的模型参数反演的适用性。摘要：Physics-informed Machine Learning has recently become attractive for learning physical parameters and features from simulation and observation data. However, most existing methods do not ensure that the physics, such as balance laws (e.g., mass, momentum, energy conservation), are constrained. Some recent works (e.g., physics-informed neural networks) softly enforce physics constraints by including partial differential equation (PDE)-based loss functions but need re-discretization of the PDEs using auto-differentiation. Training these neural nets on observational data showed that one could solve forward and inverse problems in one shot. They evaluate the state variables and the parameters in a PDE. This re-discretization of PDEs is not necessarily an attractive option for domain scientists that work with physics-based codes that have been developed for decades with sophisticated discretization techniques to solve complex process models and advanced equations of state. This paper proposes a physics constrained machine learning framework, AdjointNet, allowing domain scientists to embed their physics code in neural network training workflows. This embedding ensures that physics is constrained everywhere in the domain. Additionally, the mathematical properties such as consistency, stability, and convergence vital to the numerical solution of a PDE are still satisfied. We show that the proposed AdjointNet framework can be used for parameter estimation (and uncertainty quantification by extension) and experimental design using active learning. The applicability of our framework is demonstrated for four flow cases. Results show that AdjointNet-based inversion can estimate process model parameters with reasonable accuracy. These examples demonstrate the applicability of using existing software with no changes in source code to perform accurate and reliable inversion of model parameters.

【16】 SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices 标题：SensiX++：将MLOP和多租户模型服务于感知边缘设备链接：https://arxiv.org/abs/2109.03947

作者：Chulhong Min,Akhil Mathur,Utku Gunay Acer,Alessandro Montanari,Fahim Kawsar 机构：Nokia Bell Labs, Cambridge, UK, Utku Günay Acer, Antwerp, Belgium 备注：13 pages, 15 figures 摘要：我们介绍了SensiX++——一种用于自适应模型执行的多租户运行时，在边缘设备（如摄像头、麦克风或物联网传感器）上集成了MLOP。SensiX++遵循两个基本原则—高度模块化的组件化，以清晰的抽象和以文档为中心的表现形式将数据操作外部化，实现系统范围的编排。首先，数据协调员管理传感器的生命周期，并通过自动转换为模型提供正确的数据。接下来，一个资源感知模型服务器通过模型抽象、管道自动化和特征共享来隔离地执行多个模型。然后，自适应调度器跨异构加速器协调多个模型的尽力而为执行，平衡延迟和吞吐量。最后，带有RESTAPI的微服务提供综合模型预测、系统统计和连续部署。总的来说，这些组件使SensiX++能够通过对边缘设备的细粒度控制有效地为多个模型服务，同时最大限度地减少数据操作冗余，管理数据和设备异构性，减少资源争用并消除手动MLOP。我们在为感官设备设计的不同边缘加速器（Jetson AGX和Coral TPU）上，通过十种不同的视觉和声学模型，对SensiX++进行了基准测试。我们报告了SensiX++各种自动化组件的总体吞吐量和量化效益，并展示了它在显著降低操作复杂性和降低在边缘设备上部署、升级、重新配置和服务嵌入式模型方面的功效。摘要：We present SensiX++ - a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors. SensiX++ operates on two fundamental principles - highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration. First, a data coordinator manages the lifecycle of sensors and serves models with correct data through automated transformations. Next, a resource-aware model server executes multiple models in isolation through model abstraction, pipeline automation and feature sharing. An adaptive scheduler then orchestrates the best-effort executions of multiple models across heterogeneous accelerators, balancing latency and throughput. Finally, microservices with REST APIs serve synthesised model predictions, system statistics, and continuous deployment. Collectively, these components enable SensiX++ to serve multiple models efficiently with fine-grained control on edge devices while minimising data operation redundancy, managing data and device heterogeneity, reducing resource contention and removing manual MLOps. We benchmark SensiX++ with ten different vision and acoustics models across various multi-tenant configurations on different edge accelerators (Jetson AGX and Coral TPU) designed for sensory devices. We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort to deploy, upgrade, reconfigure and serve embedded models on edge devices.

【17】 Juvenile state hypothesis: What we can learn from lottery ticket hypothesis researches? 标题：青少年状态假说：我们可以从彩票假说研究中学到什么？链接：https://arxiv.org/abs/2109.03862

作者：Di Zhang 机构：University of Science and Technology of China 备注：7 pages, 8 figures, Under the review of AAAI2022 摘要：彩票假设的提出揭示了网络结构与初始化参数以及神经网络学习潜力之间的关系。原始彩票假设在训练收敛后进行剪枝和权重重置，使其暴露于遗忘学习知识和潜在高训练成本的问题。因此，我们提出了一种结合神经网络结构搜索思想和剪枝算法的策略来缓解这一问题。该算法在现有中奖票子网上搜索并扩展网络结构，递归生成新的中奖票。这使得训练和修剪过程可以继续进行，而不会影响性能。这种递归方法可以得到一个网络结构更深、泛化能力更强、测试性能更好的中奖子网。该方法可以解决剪枝后子网络训练或性能下降的困难，忘记原彩票假设的权重，以及在没有给出最终网络结构的情况下生成中奖彩票子网络的困难。我们在MNIST和CIFAR-10数据集上验证了该策略。在将其与近年来类似的生物学现象和相关的彩票假设研究联系起来之后，我们将进一步提出一个新的假设来讨论哪些因素可以保持网络青少年，即：。，在训练过程中影响神经网络学习潜力或泛化性能的可能因素。摘要：The proposition of lottery ticket hypothesis revealed the relationship between network structure and initialization parameters and the learning potential of neural networks. The original lottery ticket hypothesis performs pruning and weight resetting after training convergence, exposing it to the problem of forgotten learning knowledge and potential high cost of training. Therefore, we propose a strategy that combines the idea of neural network structure search with a pruning algorithm to alleviate this problem. This algorithm searches and extends the network structure on existing winning ticket sub-network to producing new winning ticket recursively. This allows the training and pruning process to continue without compromising performance. A new winning ticket sub-network with deeper network structure, better generalization ability and better test performance can be obtained in this recursive manner. This method can solve: the difficulty of training or performance degradation of the sub-networks after pruning, the forgetting of the weights of the original lottery ticket hypothesis and the difficulty of generating winning ticket sub-network when the final network structure is not given. We validate this strategy on the MNIST and CIFAR-10 datasets. And after relating it to similar biological phenomena and relevant lottery ticket hypothesis studies in recent years, we will further propose a new hypothesis to discuss which factors that can keep a network juvenile, i.e., those possible factors that influence the learning potential or generalization performance of a neural network during training.

【18】 Online Learning for Cooperative Multi-Player Multi-Armed Bandits 标题：协作式多人多臂土匪的在线学习链接：https://arxiv.org/abs/2109.03818

作者：William Chang,Mehdi Jafarnia-Jahromi,Rahul Jain 机构：University of Southern California 摘要：我们介绍了一个具有多个合作参与者的多武装匪徒（MAB）分散在线学习框架。每轮玩家获得的奖励取决于所有玩家采取的行动。这是一个团队环境，目标是共同的。信息不对称使这个问题变得有趣和富有挑战性。我们考虑了三种信息不对称：当玩家的行为不能被观察到的时候，行为信息不对称，但得到的回报是普遍的；当其他参与者的行为是可观察的，但收到的报酬是来自同一分布的IID时，报酬信息不对称；当我们的行为和报酬信息不对称时。对于第一种设置，我们提出了一种受UCB启发的算法，无论奖励是IID还是马尔可夫奖励，该算法都能达到$O（\log T）$。对于第二部分，我们提供了一个环境，使得针对第一个设置给出的算法具有线性遗憾。对于第三种设置，我们展示了“探索然后提交”算法的一种变体几乎实现了日志后悔。摘要：We introduce a framework for decentralized online learning for multi-armed bandits (MAB) with multiple cooperative players. The reward obtained by the players in each round depends on the actions taken by all the players. It's a team setting, and the objective is common. Information asymmetry is what makes the problem interesting and challenging. We consider three types of information asymmetry: action information asymmetry when the actions of the players can't be observed but the rewards received are common; reward information asymmetry when the actions of the other players are observable but rewards received are IID from the same distribution; and when we have both action and reward information asymmetry. For the first setting, we propose a UCB-inspired algorithm that achieves $O(\log T)$ regret whether the rewards are IID or Markovian. For the second section, we offer an environment such that the algorithm given for the first setting gives linear regret. For the third setting, we show that a variation of the `explore then commit' algorithm achieves almost log regret.

【19】 Protein Folding Neural Networks Are Not Robust 标题：蛋白质折叠神经网络的健壮性不强链接：https://arxiv.org/abs/2109.04460

作者：Sumit Kumar Jha,Arvind Ramanathan,Rickard Ewetz,Alvaro Velasquez,Susmit Jha 机构： Computer Science Department, University of Texas at San Antonio, TX , Data Science and Learning, Argonne National Laboratory, Lemont, IL, Electrical and Computer Engineering Department, University of Central Florida, Orlando, FL 备注：8 pages, 5 figures 摘要：与其他算法方法相比，AlphaFold和RoseTTAFold等深层神经网络可以非常准确地预测蛋白质的结构。众所周知，蛋白质序列中的生物小扰动不会导致蛋白质结构的剧烈变化。在本文中，我们证明了RoseTTAFold尽管具有很高的准确性，但它并没有表现出这样的鲁棒性，并且一些输入序列的生物小扰动导致了完全不同的预测蛋白质结构。这就提出了检测这些预测的蛋白质结构何时不可信的挑战。我们将蛋白质序列的预测结构的稳健性度量定义为预测结构的均方根距离（RMSD）及其逆扰动序列的结构的倒数。我们使用对抗性攻击方法来创建对抗性蛋白质序列，并表明当对抗性干扰在BLOSUM62距离内由20个单位限定时，预测的蛋白质结构中的RMSD范围为0.119\r{A}到34.162\r{A}。这表明预测结构的稳健性度量具有很高的方差。我们表明，我们的稳健性度量与预测结构和地面真值之间的RMSD之间的相关性（0.917）很高，也就是说，具有低稳健性度量的预测是不可信的。这是第一篇证明RoseTTAFold易受对手攻击的论文。摘要：Deep neural networks such as AlphaFold and RoseTTAFold predict remarkably accurate structures of proteins compared to other algorithmic approaches. It is known that biologically small perturbations in the protein sequence do not lead to drastic changes in the protein structure. In this paper, we demonstrate that RoseTTAFold does not exhibit such a robustness despite its high accuracy, and biologically small perturbations for some input sequences result in radically different predicted protein structures. This raises the challenge of detecting when these predicted protein structures cannot be trusted. We define the robustness measure for the predicted structure of a protein sequence to be the inverse of the root-mean-square distance (RMSD) in the predicted structure and the structure of its adversarially perturbed sequence. We use adversarial attack methods to create adversarial protein sequences, and show that the RMSD in the predicted protein structure ranges from 0.119\r{A} to 34.162\r{A} when the adversarial perturbations are bounded by 20 units in the BLOSUM62 distance. This demonstrates very high variance in the robustness measure of the predicted structures. We show that the magnitude of the correlation (0.917) between our robustness measure and the RMSD between the predicted structure and the ground truth is high, that is, the predictions with low robustness measure cannot be trusted. This is the first paper demonstrating the susceptibility of RoseTTAFold to adversarial attacks.

【20】 Modeling Massive Spatial Datasets Using a Conjugate Bayesian Linear Regression Framework 标题：基于共轭贝叶斯线性回归框架的海量空间数据建模链接：https://arxiv.org/abs/2109.04447

作者：Sudipto Banerjee 机构：UCLA Department of Biostatistics, Charles E. Young Drive South, Los Angeles, CA ,-,. 备注：None 摘要：地理信息系统（GIS）和相关技术引起了统计人员对分析大型空间数据集的可扩展方法的极大兴趣。人们提出了多种可扩展的空间过程模型，这些模型可以很容易地嵌入到分层建模框架中进行贝叶斯推理。虽然统计研究的重点主要集中在创新和更复杂的模型开发上，但对于实践科学家或空间分析员易于实现的可伸缩层次模型的方法，关注相对有限。本文讨论了如何将点引用的空间过程模型转换为共轭贝叶斯线性回归，从而快速地对空间过程进行推理。该方法允许从回归参数、潜在过程和预测随机变量的联合后验分布直接进行精确采样（避免马尔可夫链蒙特卡罗等迭代算法），并且可以在统计编程环境（如R。摘要：Geographic Information Systems (GIS) and related technologies have generated substantial interest among statisticians with regard to scalable methodologies for analyzing large spatial datasets. A variety of scalable spatial process models have been proposed that can be easily embedded within a hierarchical modeling framework to carry out Bayesian inference. While the focus of statistical research has mostly been directed toward innovative and more complex model development, relatively limited attention has been accorded to approaches for easily implementable scalable hierarchical models for the practicing scientist or spatial analyst. This article discusses how point-referenced spatial process models can be cast as a conjugate Bayesian linear regression that can rapidly deliver inference on spatial processes. The approach allows exact sampling directly (avoids iterative algorithms such as Markov chain Monte Carlo) from the joint posterior distribution of regression parameters, the latent process and the predictive random variables, and can be easily implemented on statistical programming environments such as R.

【21】 Assessing Machine Learning Approaches to Address IoT Sensor Drift 标题：评估物联网传感器漂移的机器学习方法链接：https://arxiv.org/abs/2109.04356

作者：Haining Zheng,Antonio Paiva 机构：ExxonMobil Research and Engineering Company, Annandale, NJ , Antonio R. Paiva 备注：6 pages, The 4th International Workshop on Artificial Intelligence of Things, In conjunction with the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2021), Virtual conference, Aug. 14-18th 摘要：物联网传感器的扩散及其在各个行业和应用中的部署在这个大数据时代带来了无数分析机会。然而，这些传感器测量值的漂移对自动化数据分析以及持续有效地训练和部署模型的能力提出了重大挑战。在本文中，我们研究并测试了文献中的几种方法在现实条件下处理和适应传感器漂移的能力。这些方法中的大多数都是最新的，因此代表了当前的技术水平。测试是在一个公开的气体传感器数据集上进行的，该数据集显示出随时间的漂移。结果表明，尽管采用了上述方法，但由于传感器漂移，传感性能大幅下降。然后，我们讨论了当前方法确定的几个问题，并概述了解决这些问题的未来研究方向。摘要：The proliferation of IoT sensors and their deployment in various industries and applications has brought about numerous analysis opportunities in this Big Data era. However, drift of those sensor measurements poses major challenges to automate data analysis and the ability to effectively train and deploy models on a continuous basis. In this paper we study and test several approaches from the literature with regard to their ability to cope with and adapt to sensor drift under realistic conditions. Most of these approaches are recent and thus are representative of the current state-of-the-art. The testing was performed on a publicly available gas sensor dataset exhibiting drift over time. The results show substantial drops in sensing performance due to sensor drift in spite of the approaches. We then discuss several issues identified with current approaches and outline directions for future research to tackle them.

【22】 Quantum Machine Learning for Finance 标题：面向金融的量子机器学习链接：https://arxiv.org/abs/2109.04298

作者：Marco Pistoia,Syed Farhan Ahmad,Akshay Ajagekar,Alexander Buts,Shouvanik Chakrabarti,Dylan Herman,Shaohan Hu,Andrew Jena,Pierre Minssen,Pradeep Niroula,Arthur Rattew,Yue Sun,Romina Yalovetzky 机构：Future Lab for Applied Research and Engineering, JPMorgan Chase Bank, N.A. 摘要：在这十年中，量子计算机有望超越经典计算机的计算能力，并对许多行业，特别是金融业产生破坏性影响。事实上，据估计，金融业是第一个从量子计算中长期甚至短期受益的行业。这篇综述文章介绍了量子算法在金融领域的应用现状，特别关注那些可以通过机器学习解决的用例。摘要：Quantum computers are expected to surpass the computational capabilities of classical computers during this decade, and achieve disruptive impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from Quantum Computing not only in the medium and long terms, but even in the short term. This review paper presents the state of the art of quantum algorithms for financial applications, with particular focus to those use cases that can be solved via Machine Learning.

【23】 Machine learning modeling of family wide enzyme-substrate specificity screens 标题：全家庭酶底物特异性筛选的机器学习建模链接：https://arxiv.org/abs/2109.03900

作者：Samuel Goldman,Ria Das,Kevin K. Yang,Connor W. Coley 机构：†MIT Computational and Systems Biology, ‡MIT Chemical Engineering, ¶MIT Electrical Engineering and Computer Science, §Microsoft Research New England 摘要：生物催化是大规模可持续合成药物、复杂天然产物和商品化学品的一种很有前景的方法。然而，生物催化的采用受到我们选择能够在非天然底物上催化其自然化学转化的酶的能力的限制。虽然机器学习和电子定向进化很好地应对了这一预测建模挑战，但迄今为止的工作主要是提高针对单一已知底物的活性，而不是识别能够作用于新的感兴趣底物的酶。为了满足这一需求，我们从文献中策划了6种不同的高质量酶家族筛选，每种筛选针对多种底物测量多种酶。我们比较了基于机器学习的复合蛋白相互作用（CPI）建模方法与用于预测药物-靶点相互作用的文献。令人惊讶的是，将这些基于相互作用的模型与独立（单任务）纯酶或纯底物模型的集合进行比较，发现当前的CPI方法无法在当前的家族级数据体系中学习化合物和蛋白质之间的相互作用。我们进一步验证了这一观察结果，证明我们的无相互作用基线可以优于用于指导激酶抑制剂发现的文献中基于CPI的模型。鉴于基于非相互作用模型的高性能，我们引入了一种新的基于结构的策略来汇集蛋白质序列中的残基表示。总之，这项工作推动了一条有原则的前进道路，以便为生物催化和其他药物发现应用建立和评估有意义的预测模型。摘要：Biocatalysis is a promising approach to sustainably synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale. However, the adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates. While machine learning and in silico directed evolution are well-posed for this predictive modeling challenge, efforts to date have primarily aimed to increase activity against a single known substrate, rather than to identify enzymes capable of acting on new substrates of interest. To address this need, we curate 6 different high-quality enzyme family screens from the literature that each measure multiple enzymes against multiple substrates. We compare machine learning-based compound-protein interaction (CPI) modeling approaches from the literature used for predicting drug-target interactions. Surprisingly, comparing these interaction-based models against collections of independent (single task) enzyme-only or substrate-only models reveals that current CPI approaches are incapable of learning interactions between compounds and proteins in the current family level data regime. We further validate this observation by demonstrating that our no-interaction baseline can outperform CPI-based models from the literature used to guide the discovery of kinase inhibitors. Given the high performance of non-interaction based models, we introduce a new structure-based strategy for pooling residue representations across a protein sequence. Altogether, this work motivates a principled path forward in order to build and evaluate meaningful predictive models for biocatalysis and other drug discovery applications.

【24】 Learning the hypotheses space from data through a U-curve algorithm: a statistically consistent complexity regularizer for Model Selection 标题：通过U-曲线算法从数据中学习假设空间：一种用于模型选择的统计一致性复杂度正则化方法链接：https://arxiv.org/abs/2109.03866

作者：Diego Marcondes,Adilson Simonis,Junior Barrera 备注：This is work is a merger of arXiv:2001.09532 and arXiv:2001.11578 摘要：本文提出了一种数据驱动的、系统的、一致的、非穷举的模型选择方法，它是经典不可知PAC学习模型的扩展。在这种方法中，学习问题不仅由假设空间$\mathcal{H}$建模，而且由学习空间$\mathbb{L}（\mathcal{H}）$建模，它是$\mathcal{H}$子空间的偏序集，覆盖$\mathcal{H}$，并满足相关子空间的VC维的一个性质，这是一个适合模型选择算法的代数搜索空间。我们的主要贡献是一个数据驱动的通用学习算法，用于在$\mathbb{L}（\mathcal{H}）$上执行正则化模型选择，以及一个框架，在该框架下，通过正确建模$\mathbb{L}（\mathcal{H}）$并使用高计算能力，理论上可以更好地估计给定样本大小的目标假设。这种方法的一个显著结果是$\mathbb{L}（\mathcal{H}）$的非穷举搜索可以返回最优解的条件。本文的结果导致了机器学习的一个实用特性，即高计算能力可以缓解实验数据的不足。在计算能力不断普及的背景下，这一特性可能有助于理解为什么机器学习变得如此重要，即使在数据昂贵且难以获取的情况下。摘要：This paper proposes a data-driven systematic, consistent and non-exhaustive approach to Model Selection, that is an extension of the classical agnostic PAC learning model. In this approach, learning problems are modeled not only by a hypothesis space $\mathcal{H}$, but also by a Learning Space $\mathbb{L}(\mathcal{H})$, a poset of subspaces of $\mathcal{H}$, which covers $\mathcal{H}$ and satisfies a property regarding the VC dimension of related subspaces, that is a suitable algebraic search space for Model Selection algorithms. Our main contributions are a data-driven general learning algorithm to perform regularized Model Selection on $\mathbb{L}(\mathcal{H})$ and a framework under which one can, theoretically, better estimate a target hypothesis with a given sample size by properly modeling $\mathbb{L}(\mathcal{H})$ and employing high computational power. A remarkable consequence of this approach are conditions under which a non-exhaustive search of $\mathbb{L}(\mathcal{H})$ can return an optimal solution. The results of this paper lead to a practical property of Machine Learning, that the lack of experimental data may be mitigated by a high computational capacity. In a context of continuous popularization of computational power, this property may help understand why Machine Learning has become so important, even where data is expensive and hard to get.

其他(22篇)

【1】 Leveraging Local Domains for Image-to-Image Translation 标题：利用本地域进行图像到图像的转换链接：https://arxiv.org/abs/2109.04468

作者：Anthony Dell'Eva,Fabio Pizzati,Massimo Bertozzi,Raoul de Charette 机构：VisLab, Parma, Italy, Inria, Paris, France, University of Parma, Parma, Italy 备注：Submitted to conference 摘要：图像到图像（i2i）网络难以捕获局部变化，因为它们不会影响全局场景结构。例如，从公路场景转换到越野场景，i2i网络很容易关注全局颜色特征，但忽略了人类的明显特征，如没有车道标记。在本文中，我们利用人类关于空间域特征的知识，我们称之为“局部域”，并展示其对图像到图像翻译的好处。依靠一个简单的几何指导，我们根据少量的源数据训练了一个基于GAN的补丁，并幻觉出一个新的看不见的域，从而简化了向目标的转移学习。我们在三个任务上进行了实验，从非结构化环境到恶劣天气。我们的综合评估设置表明，我们能够以最少的先验知识生成真实的翻译，并且只对少数图像进行训练。此外，当对我们的翻译图像进行训练时，我们发现所有测试的代理任务都得到了显著的改进，在训练时从未看到目标域。摘要：Image-to-image (i2i) networks struggle to capture local changes because they do not affect the global scene structure. For example, translating from highway scenes to offroad, i2i networks easily focus on global color features but ignore obvious traits for humans like the absence of lane markings. In this paper, we leverage human knowledge about spatial domain characteristics which we refer to as 'local domains' and demonstrate its benefit for image-to-image translation. Relying on a simple geometrical guidance, we train a patch-based GAN on few source data and hallucinate a new unseen domain which subsequently eases transfer learning to target. We experiment on three tasks ranging from unstructured environments to adverse weather. Our comprehensive evaluation setting shows we are able to generate realistic translations, with minimal priors, and training only on a few images. Furthermore, when trained on our translations images we show that all tested proxy tasks are significantly improved, without ever seeing target domain at training.

【2】 HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints 标题：HintdBT：使用质量和音译提示增强回译链接：https://arxiv.org/abs/2109.04443

作者：Sahana Ramnath,Melvin Johnson,Abhirut Gupta,Aravindan Raghuveer 机构：Google Research 备注：17 pages including references and appendix. Accepted at EMNLP 2021 摘要：目标单语语料库的反向翻译（BT）是一种广泛应用于神经机器翻译（NMT）的数据增强策略，尤其是对于低资源语言对。为了提高可用BT数据的有效性，我们引入了HintedBT——一系列向编码器和解码器提供提示（通过标签）的技术。首先，我们提出了一种同时使用高质量和低质量BT数据的新方法，通过向模型提供关于每个源-目标对质量的提示（如编码器上的源标记）。我们没有过滤掉低质量的数据，而是表明这些提示使模型能够有效地从噪声数据中学习。其次，我们解决了预测源标记是否需要翻译或音译为目标语言的问题，这在跨脚本翻译任务中很常见（即，源和目标不共享书面脚本）。对于这种情况，我们建议使用额外的提示（如解码器上的目标标记）来训练模型，这些提示提供有关源上所需操作（翻译或翻译和音译）的信息。我们对标准WMT基准测试进行了实验和详细分析，测试对象为三对跨脚本低/中资源语言：{印地语、古吉拉特语、泰米尔语}到英语。我们的方法与五个强大且完善的基线进行了比较。我们发现，使用这些提示，无论是单独使用还是结合使用，都能显著提高翻译质量，并在相应的双语环境中，在所有三种语言对中取得最先进的表现。摘要：Back-translation (BT) of target monolingual corpora is a widely used data augmentation strategy for neural machine translation (NMT), especially for low-resource language pairs. To improve effectiveness of the available BT data, we introduce HintedBT -- a family of techniques which provides hints (through tags) to the encoder and decoder. First, we propose a novel method of using both high and low quality BT data by providing hints (as source tags on the encoder) to the model about the quality of each source-target pair. We don't filter out low quality data but instead show that these hints enable the model to learn effectively from noisy data. Second, we address the problem of predicting whether a source token needs to be translated or transliterated to the target language, which is common in cross-script translation tasks (i.e., where source and target do not share the written script). For such cases, we propose training the model with additional hints (as target tags on the decoder) that provide information about the operation required on the source (translation or both translation and transliteration). We conduct experiments and detailed analyses on standard WMT benchmarks for three cross-script low/medium-resource language pairs: {Hindi,Gujarati,Tamil}-to-English. Our methods compare favorably with five strong and well established baselines. We show that using these hints, both separately and together, significantly improves translation quality and leads to state-of-the-art performance in all three language pairs in corresponding bilingual settings.

【3】 Gradual (In)Compatibility of Fairness Criteria 标题：公平标准的逐步相容(不相容) 链接：https://arxiv.org/abs/2109.04399

作者：Corinna Hertweck,Tim Räz 机构：Institute for Data Analysis and Process Design, Zurich University of Applied Sciences, Winterthur, Switzerland, Department of Informatics, University of Zurich, Zurich, Switzerland, Institute of Philosophy, University of Bern, Bern, Switzerland 备注：Code available on GitHub: this https URL 摘要：不可能性结果表明，在合理的假设下，不可能同时满足重要的公平性度量（独立性、分离性、充分性）。本文探讨了我们是否能够在一定程度上同时满足和/或改进这些公平措施。我们介绍了公平测度的信息论公式，并基于这些公式定义了公平度。信息理论公式表明，三个公平性指标之间存在未经探索的理论关系。在实验部分，我们使用信息论表达式作为正则化子来获得三个标准数据集的公平正则化预测值。我们的实验表明，正如我们的理论发现所表明的那样，a）公平正则化直接增加了公平度量，与现有工作一致；b）一些公平正则化间接增加了其他公平度量。这就确立了在提高某些公平措施的同时满足程度的可能性——某些公平措施逐渐兼容。摘要：Impossibility results show that important fairness measures (independence, separation, sufficiency) cannot be satisfied at the same time under reasonable assumptions. This paper explores whether we can satisfy and/or improve these fairness measures simultaneously to a certain degree. We introduce information-theoretic formulations of the fairness measures and define degrees of fairness based on these formulations. The information-theoretic formulations suggest unexplored theoretical relations between the three fairness measures. In the experimental part, we use the information-theoretic expressions as regularizers to obtain fairness-regularized predictors for three standard datasets. Our experiments show that a) fairness regularization directly increases fairness measures, in line with existing work, and b) some fairness regularizations indirectly increase other fairness measures, as suggested by our theoretical findings. This establishes that it is possible to increase the degree to which some fairness measures are satisfied at the same time -- some fairness measures are gradually compatible.

【4】 ErfAct: Non-monotonic smooth trainable Activation Functions 标题：ErfAct：非单调光滑可训练激活函数链接：https://arxiv.org/abs/2109.04386

作者：Koushik Biswas,Sandeep Kumar,Shilpak Banerjee,Ashish Kumar Pandey 摘要：激活函数是神经网络的重要组成部分，它在网络中引入非线性。神经网络的最新性能取决于激活函数的完美选择。我们提出了两个新的非单调光滑可训练激活函数，称为ErfAct-1和ErfAct-2。实验表明，与广泛使用的ReLU、Swish和Mish等激活相比，所提出的功能显著提高了网络性能。将ReLU替换为ErfAct-1和ErfAct-2，我们在CIFAR100数据集中的PreactResNet-34网络上的top-1精度提高了5.21%和5.04%，在CIFAR10数据集中的PreactResNet-34网络上的top-1精度提高了2.58%和2.76%，在Pascal VOC数据集中的SSD300模型上的平均精度（mAP）提高了1.0%和1.0%。摘要：An activation function is a crucial component of a neural network that introduces non-linearity in the network. The state-of-the-art performance of a neural network depends on the perfect choice of an activation function. We propose two novel non-monotonic smooth trainable activation functions, called ErfAct-1 and ErfAct-2. Experiments suggest that the proposed functions improve the network performance significantly compared to the widely used activations like ReLU, Swish, and Mish. Replacing ReLU by ErfAct-1 and ErfAct-2, we have 5.21% and 5.04% improvement for top-1 accuracy on PreactResNet-34 network in CIFAR100 dataset, 2.58% and 2.76% improvement for top-1 accuracy on PreactResNet-34 network in CIFAR10 dataset, 1.0%, and 1.0% improvement on mean average precision (mAP) on SSD300 model in Pascal VOC dataset.

【5】 COLUMBUS: Automated Discovery of New Multi-Level Features for Domain Generalization via Knowledge Corruption 标题：Columbus：通过知识损坏自动发现用于领域综合的新的多级特征链接：https://arxiv.org/abs/2109.04320

作者：Ahmed Frikha,Denis Krompaß,Volker Tresp 机构：Siemens AI Lab, Siemens Technology, Ludwig Maximilian University of Munich 摘要：机器学习模型，可以推广到看不见的领域是必不可少的，当应用于现实世界的情况下涉及强大的领域转移。我们解决了具有挑战性的领域泛化（DG）问题，其中在一组源领域上训练的模型有望在看不见的领域中很好地泛化，而不会暴露于它们的数据。DG的主要挑战是，从源域学习到的特性不一定存在于看不见的目标域中，从而导致性能下降。我们认为，学习一组更丰富的特征对于改进向更广泛未知领域的转移至关重要。出于这个原因，我们提出了COLUMBUS，一种通过有针对性地破坏最相关的输入和数据的多级表示来实施新特征发现的方法。我们进行了广泛的实证评估，以证明所提出的方法的有效性，该方法在DomainBed框架中的多个DG基准数据集上优于18个DG算法，从而获得了最新的结果。摘要：Machine learning models that can generalize to unseen domains are essential when applied in real-world scenarios involving strong domain shifts. We address the challenging domain generalization (DG) problem, where a model trained on a set of source domains is expected to generalize well in unseen domains without any exposure to their data. The main challenge of DG is that the features learned from the source domains are not necessarily present in the unseen target domains, leading to performance deterioration. We assume that learning a richer set of features is crucial to improve the transfer to a wider set of unknown domains. For this reason, we propose COLUMBUS, a method that enforces new feature discovery via a targeted corruption of the most relevant input and multi-level representations of the data. We conduct an extensive empirical evaluation to demonstrate the effectiveness of the proposed approach which achieves new state-of-the-art results by outperforming 18 DG algorithms on multiple DG benchmark datasets in the DomainBed framework.

【6】 Social Media Monitoring for IoT Cyber-Threats 标题：针对物联网网络威胁的社交媒体监控链接：https://arxiv.org/abs/2109.04306

作者：Sofia Alevizopoulou,Paris Koloveas,Christos Tryfonopoulos,Paraskevi Raftopoulou 机构：Dept. of Informatics & Telecommunications, University of the Peloponnese, GR, Tripolis, Greece 备注：None 摘要：物联网应用的快速发展及其在日常生活各个领域的应用导致各种可能的网络威胁数量不断增加，因此提高了物联网设备安全的需求。从各种在线来源收集网络威胁情报（例如，零日漏洞或趋势攻击），并利用其主动保护物联网系统或准备缓解方案已被证明是一个有希望的方向。在这项工作中，我们重点关注社交媒体监控，并调查来自Twitter流的实时网络威胁情报检测。最初，我们比较并广泛评估了六种不同的基于机器学习的分类方案，这些方案使用漏洞描述进行训练，并使用来自Twitter流的真实数据进行测试，以确定最合适的解决方案。随后，基于我们的研究结果，我们提出了一种适合物联网领域的新型社交媒体监控系统；该系统允许用户识别物联网设备上的最近/趋势漏洞和漏洞利用。最后，为了帮助该领域的研究并支持我们结果的再现性，我们公开发布了在此过程中创建的所有带注释的数据集。摘要：The rapid development of IoT applications and their use in various fields of everyday life has resulted in an escalated number of different possible cyber-threats, and has consequently raised the need of securing IoT devices. Collecting Cyber-Threat Intelligence (e.g., zero-day vulnerabilities or trending exploits) from various online sources and utilizing it to proactively secure IoT systems or prepare mitigation scenarios has proven to be a promising direction. In this work, we focus on social media monitoring and investigate real-time Cyber-Threat Intelligence detection from the Twitter stream. Initially, we compare and extensively evaluate six different machine-learning based classification alternatives trained with vulnerability descriptions and tested with real-world data from the Twitter stream to identify the best-fitting solution. Subsequently, based on our findings, we propose a novel social media monitoring system tailored to the IoT domain; the system allows users to identify recent/trending vulnerabilities and exploits on IoT devices. Finally, to aid research on the field and support the reproducibility of our results we publicly release all annotated datasets created during this process.

【7】 Online Enhanced Semantic Hashing: Towards Effective and Efficient Retrieval for Streaming Multi-Modal Data 标题：在线增强语义散列：面向多模态数据流的高效检索链接：https://arxiv.org/abs/2109.04260

作者：Xiao-Ming Wu,Xin Luo,Yu-Wei Zhan,Chen-Lu Ding,Zhen-Duo Chen,Xin-Shun Xu 机构：Shandong University 备注：9 pages, 5 figures 摘要：随着多媒体设备和应用的蓬勃发展，大规模多模态数据的高效检索已成为一个热门的研究课题。其中，哈希由于其检索效率高、存储成本低而成为一种流行的选择。尽管近年来多模哈希引起了人们的广泛关注，但仍然存在一些问题。第一点是，现有的方法主要是在批处理模式下设计的，不能有效地处理流式多模态数据。第二点是，所有现有的在线多模式散列方法都无法有效地处理流数据块中不断出现的看不见的新类。在本文中，我们提出了一个新的模型，称为在线增强语义哈希（OASIS）。我们设计了一种新的数据语义增强表示方法，可以帮助处理新出现的类，从而构造增强的语义目标函数。针对OASIS提出了一种高效的离散在线优化算法。大量实验表明，我们的方法可以超越最先进的模型。为了获得良好的再现性并使社区受益，我们的代码和数据已在补充材料中提供，并将公开提供。摘要：With the vigorous development of multimedia equipment and applications, efficient retrieval of large-scale multi-modal data has become a trendy research topic. Thereinto, hashing has become a prevalent choice due to its retrieval efficiency and low storage cost. Although multi-modal hashing has drawn lots of attention in recent years, there still remain some problems. The first point is that existing methods are mainly designed in batch mode and not able to efficiently handle streaming multi-modal data. The second point is that all existing online multi-modal hashing methods fail to effectively handle unseen new classes which come continuously with streaming data chunks. In this paper, we propose a new model, termed Online enhAnced SemantIc haShing (OASIS). We design novel semantic-enhanced representation for data, which could help handle the new coming classes, and thereby construct the enhanced semantic objective function. An efficient and effective discrete online optimization algorithm is further proposed for OASIS. Extensive experiments show that our method can exceed the state-of-the-art models. For good reproducibility and benefiting the community, our code and data are already available in supplementary material and will be made publicly available.

【8】 A Systematic Approach to Group Fairness in Automated Decision Making 标题：自动决策中群体公平性的一种系统方法链接：https://arxiv.org/abs/2109.04230

作者：Corinna Hertweck,Christoph Heitz 机构：Zurich University of Applied Sciences, University of Zurich, Zurich, Switzerland, Winterthur, Switzerland 备注：None 摘要：虽然算法公平性领域已经提出了许多方法来衡量和改进机器学习模型的公平性，但这些发现仍然没有在实践中得到广泛应用。我们怀疑其中一个原因是算法公平性领域提出了很多公平性的定义，这些定义很难理解。本文的目标是为数据科学家提供关于组公平性度量的易懂介绍，并对关注这些度量的哲学推理提供一些见解。为此，我们将考虑在何种意义上比较社会人口群体，以便就公平性发表声明。摘要：While the field of algorithmic fairness has brought forth many ways to measure and improve the fairness of machine learning models, these findings are still not widely used in practice. We suspect that one reason for this is that the field of algorithmic fairness came up with a lot of definitions of fairness, which are difficult to navigate. The goal of this paper is to provide data scientists with an accessible introduction to group fairness metrics and to give some insight into the philosophical reasoning for caring about these metrics. We will do this by considering in which sense socio-demographic groups are compared for making a statement on fairness.

【9】 Incentivizing an Unknown Crowd 标题：激励不为人知的人群链接：https://arxiv.org/abs/2109.04226

作者：Jing Dong,Shuai Li,Baoxiang Wang 机构：The Chinese University of Hong Kong, Shenzhen, Shanghai Jiao Tong University 摘要：基于众包标签中常见的战略活动，我们研究了具有异质性和未知人群的工人在未经验证的情况下顺序获取信息（EIWV）的问题。我们提出了一种基于强化学习的方法，该方法可以有效应对各种环境，包括潜在的非理性和工人之间的共谋。借助于代价高昂的oracle和推理方法，我们的方法可以动态地决定oracle调用，并在频繁的共谋活动下获得鲁棒性。大量的实验表明了我们方法的优势。我们的研究结果还首次在大规模真实数据集上进行了EIWV综合实验，并首次对环境变量的影响进行了深入研究。摘要：Motivated by the common strategic activities in crowdsourcing labeling, we study the problem of sequential eliciting information without verification (EIWV) for workers with a heterogeneous and unknown crowd. We propose a reinforcement learning-based approach that is effective against a wide range of settings including potential irrationality and collusion among workers. With the aid of a costly oracle and the inference method, our approach dynamically decides the oracle calls and gains robustness even under the presence of frequent collusion activities. Extensive experiments show the advantage of our approach. Our results also present the first comprehensive experiments of EIWV on large-scale real datasets and the first thorough study of the effects of environmental variables.

【10】 Compositional Affinity Propagation: When Clusters Have Compositional Structure 标题：组成亲和力传播：当团簇具有组成结构时链接：https://arxiv.org/abs/2109.04160

作者：Jacob Whitehill,Zeqian Li 机构：Worcester Polytechnic Institute 摘要：我们考虑了一种新的聚类问题，其中簇不必彼此独立，而是可以与其它簇具有组合关系（例如，图像集由矩形、圆以及矩形和圆的组合组成）。这项任务的动机是最近在合成嵌入模型上的少数镜头学习工作，该模型构造嵌入空间以区分分配给示例的标签集，而不仅仅是单个标签。为了解决这个聚类问题，我们提出了一种新的算法，称为合成亲和传播（CAP）。与标准的相似性传播以及其他多视图和层次聚类算法相比，CAP可以自动推断聚类之间的组合性。与现有的几种聚类算法相比，我们在MultiMNIST、OmniGlot和LibriSpeech数据集上显示了有希望的结果。我们的工作已经应用于多目标图像识别和多个说话人同时语音的说话人二值化。摘要：We consider a new kind of clustering problem in which clusters need not be independent of each other, but rather can have compositional relationships with other clusters (e.g., an image set consists of rectangles, circles, as well as combinations of rectangles and circles). This task is motivated by recent work in few-shot learning on compositional embedding models that structure the embedding space to distinguish the label sets, not just the individual labels, assigned to the examples. To tackle this clustering problem, we propose a new algorithm called Compositional Affinity Propagation (CAP). In contrast to standard Affinity Propagation as well as other algorithms for multi-view and hierarchical clustering, CAP can deduce compositionality among clusters automatically. We show promising results, compared to several existing clustering algorithms, on the MultiMNIST, OmniGlot, and LibriSpeech datasets. Our work has applications to multi-object image recognition and speaker diarization with simultaneous speech from multiple speakers.

【11】 Automated Security Assessment for the Internet of Things 标题：物联网的自动化安全评估链接：https://arxiv.org/abs/2109.04029

作者：Xuanyu Duan,Mengmeng Ge,Triet H. M. Le,Faheem Ullah,Shang Gao,Xuequan Lu,M. Ali Babar 机构：∗School of Information Technology, Deakin University, Geelong, Australia, †School of Computing Technologies, RMIT University, Melbourne, Australia, ‡School of Computer Science, The University of Adelaide, Adelaide, Australia 备注：Accepted for publication at the 26th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2021) 摘要：基于物联网（IoT）的应用面临着越来越多的潜在安全风险，需要系统地评估和解决这些风险。基于专家的物联网安全手动评估是一种主要方法，通常效率较低。为了解决这个问题，我们提出了一个物联网自动安全评估框架。我们的框架首先利用机器学习和自然语言处理来分析漏洞描述，以预测漏洞度量。然后将预测的指标输入到两层图形安全模型中，该模型由上层的攻击图表示网络连通性，底层的攻击树表示网络中每个节点的漏洞信息。该安全模型通过捕获潜在攻击路径，自动评估物联网网络的安全性。我们使用概念验证智能建筑系统模型评估我们方法的可行性，该模型包含各种现实世界的物联网设备和潜在漏洞。我们对拟议框架的评估表明，该框架在自动预测新漏洞的漏洞度量方面具有有效性，平均准确率超过90%，并可识别物联网内最易受攻击的攻击路径。生成的评估结果可作为网络安全专业人员采取进一步行动和及时降低风险的指南。摘要：Internet of Things (IoT) based applications face an increasing number of potential security risks, which need to be systematically assessed and addressed. Expert-based manual assessment of IoT security is a predominant approach, which is usually inefficient. To address this problem, we propose an automated security assessment framework for IoT networks. Our framework first leverages machine learning and natural language processing to analyze vulnerability descriptions for predicting vulnerability metrics. The predicted metrics are then input into a two-layered graphical security model, which consists of an attack graph at the upper layer to present the network connectivity and an attack tree for each node in the network at the bottom layer to depict the vulnerability information. This security model automatically assesses the security of the IoT network by capturing potential attack paths. We evaluate the viability of our approach using a proof-of-concept smart building system model which contains a variety of real-world IoT devices and potential vulnerabilities. Our evaluation of the proposed framework demonstrates its effectiveness in terms of automatically predicting the vulnerability metrics of new vulnerabilities with more than 90% accuracy, on average, and identifying the most vulnerable attack paths within an IoT network. The produced assessment results can serve as a guideline for cybersecurity professionals to take further actions and mitigate risks in a timely manner.

【12】 Distributionally Robust Multilingual Machine Translation 标题：分布式健壮的多语言机器翻译链接：https://arxiv.org/abs/2109.04020

作者：Chunting Zhou,Daniel Levy,Xian Li,Marjan Ghazvininejad,Graham Neubig 机构：Language Technologies Institute, Carnegie Mellon University, Stanford University, Facebook AI 备注：Long paper accepted by EMNLP2021 main conference 摘要：多语言神经机器翻译（MNMT）学习用一个模型翻译多个语言对，潜在地提高了部署模型的准确性和存储器效率。然而，语言之间严重的数据不平衡阻碍了模型在语言对之间的统一执行。在本文中，我们提出了一个新的基于分布鲁棒优化的MNMT学习目标，该目标最小化语言对集合上最坏情况下的期望损失。我们进一步展示了如何使用迭代最佳响应方案对大型翻译语料库的这一目标进行实际优化，与标准经验风险最小化相比，该方案既有效又产生可忽略的额外计算成本。我们在两个数据集中的三组语言上进行了广泛的实验，结果表明，在多对一和一对多翻译设置下，我们的方法在平均和每种语言的性能方面始终优于强基线方法。摘要：Multilingual neural machine translation (MNMT) learns to translate multiple language pairs with a single model, potentially improving both the accuracy and the memory-efficiency of deployed models. However, the heavy data imbalance between languages hinders the model from performing uniformly across language pairs. In this paper, we propose a new learning objective for MNMT based on distributionally robust optimization, which minimizes the worst-case expected loss over the set of language pairs. We further show how to practically optimize this objective for large translation corpora using an iterated best response scheme, which is both effective and incurs negligible additional computational cost compared to standard empirical risk minimization. We perform extensive experiments on three sets of languages from two datasets and show that our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.

【13】 The challenge of reproducible ML: an empirical study on the impact of bugs 标题：可复制ML的挑战：关于缺陷影响的实证研究链接：https://arxiv.org/abs/2109.03991

作者：Emilio Rivera-Landos,Foutse Khomh,Amin Nikanjam 机构：SWAT Lab., Polytechnique Montreal, Quebec, Canada 摘要：再现性是科学研究中的一个关键要求。当研究结果和科学论文被发现难以或不可能复制时，我们面临着一个挑战，这就是所谓的再现性危机。尽管机器学习（ML）中对再现性的要求在文献中得到了承认，但主要障碍是ML训练和推理中固有的不确定性。在本文中，我们建立了ML系统中导致非确定性的基本因素。然后介绍了一个框架REPLACEML，用于在真实受控环境中对ML实验进行确定性评估。REPLACEML允许研究人员调查软件配置对ML训练和推理的影响。使用reproducteml，我们进行了一个案例研究：调查ML库中的bug对ML实验性能的影响。本研究试图量化流行的ML框架PyTorch中bug的出现对经过训练的模型性能的影响。为此，提出了一种全面的方法来收集错误版本的ML库，并使用ReproducteML运行确定性ML实验。我们的初步发现是，基于我们有限的数据集，没有证据表明PyTorch中发生的错误确实会影响经过训练的模型的性能。所提出的方法和REPLACEML可用于进一步研究非确定性和缺陷。摘要：Reproducibility is a crucial requirement in scientific research. When results of research studies and scientific papers have been found difficult or impossible to reproduce, we face a challenge which is called reproducibility crisis. Although the demand for reproducibility in Machine Learning (ML) is acknowledged in the literature, a main barrier is inherent non-determinism in ML training and inference. In this paper, we establish the fundamental factors that cause non-determinism in ML systems. A framework, ReproduceML, is then introduced for deterministic evaluation of ML experiments in a real, controlled environment. ReproduceML allows researchers to investigate software configuration effects on ML training and inference. Using ReproduceML, we run a case study: investigation of the impact of bugs inside ML libraries on performance of ML experiments. This study attempts to quantify the impact that the occurrence of bugs in a popular ML framework, PyTorch, has on the performance of trained models. To do so, a comprehensive methodology is proposed to collect buggy versions of ML libraries and run deterministic ML experiments using ReproduceML. Our initial finding is that there is no evidence based on our limited dataset to show that bugs which occurred in PyTorch do affect the performance of trained models. The proposed methodology as well as ReproduceML can be employed for further research on non-determinism and bugs.

【14】 NU:BRIEF -- A Privacy-aware Newsletter Personalization Engine for Publishers 标题：NU：Brief--面向出版商的隐私感知时事通讯个性化引擎链接：https://arxiv.org/abs/2109.03955

作者：Ernesto Diaz-Aviles,Claudia Orellana-Rodriguez,Igor Brigadir,Reshma Narayanan Kutty 机构：recsyslabs, University College Dublin, Ireland 备注：Fifteenth ACM Conference on Recommender Systems (RecSys '21), September 27-October 1, 2021, Amsterdam, Netherlands 摘要：时事通讯已经（重新）成为出版商直接和更有效地与读者接触的有力工具。尽管读者的多样性，出版商的时事通讯在很大程度上仍然是一刀切的产品，这是次优的。在本文中，我们将介绍NU:BRIEF，这是一个面向出版商的web应用程序，使他们能够在不获取个人数据的情况下对新闻稿进行个性化设置。个性化时事通讯形成了一种习惯，并成为出版商的一种很好的转换工具，为日益衰落的以广告/点击诱饵为中心的商业模式提供了一种由读者创造收入的替代模式。摘要：Newsletters have (re-) emerged as a powerful tool for publishers to engage with their readers directly and more effectively. Despite the diversity in their audiences, publishers' newsletters remain largely a one-size-fits-all offering, which is suboptimal. In this paper, we present NU:BRIEF, a web application for publishers that enables them to personalize their newsletters without harvesting personal data. Personalized newsletters build a habit and become a great conversion tool for publishers, providing an alternative readers-generated revenue model to a declining ad/clickbait-centered business model.

【15】 LSB: Local Self-Balancing MCMC in Discrete Spaces 标题：LSB：离散空间中的局部自平衡MCMC 链接：https://arxiv.org/abs/2109.03867

作者：Emanuele Sansone 机构：Department of Computer Science, KU Leuven 摘要：马尔可夫链蒙特卡罗（MCMC）方法是解决高维目标分布样本问题的有效方法。虽然MCMC方法具有良好的理论性质，如保证收敛和混合到真实目标，但在实践中，它们的采样效率取决于建议分布和手头目标的选择。本工作考虑使用机器学习使建议分布适应目标，以提高纯离散域中的采样效率。具体而言，（i）它为一系列提案分布提出了一种新的参数化，称为局部平衡提案，（ii）它定义了一个基于互信息的目标函数，（iii）它设计了一个学习程序，使提案的参数适应目标，从而实现快速收敛和快速混合。我们将得到的采样器称为局部自平衡采样器（LSB）。我们通过伊辛模型和贝叶斯网络的实验分析表明，LSB确实能够提高基于局部平衡方案的最先进采样器的效率，从而减少收敛所需的迭代次数，同时实现可比的混合性能。摘要：Markov Chain Monte Carlo (MCMC) methods are promising solutions to sample from target distributions in high dimensions. While MCMC methods enjoy nice theoretical properties, like guaranteed convergence and mixing to the true target, in practice their sampling efficiency depends on the choice of the proposal distribution and the target at hand. This work considers using machine learning to adapt the proposal distribution to the target, in order to improve the sampling efficiency in the purely discrete domain. Specifically, (i) it proposes a new parametrization for a family of proposal distributions, called locally balanced proposals, (ii) it defines an objective function based on mutual information and (iii) it devises a learning procedure to adapt the parameters of the proposal to the target, thus achieving fast convergence and fast mixing. We call the resulting sampler as the Locally Self-Balancing Sampler (LSB). We show through experimental analysis on the Ising model and Bayesian networks that LSB is indeed able to improve the efficiency over a state-of-the-art sampler based on locally balanced proposals, thus reducing the number of iterations required to converge, while achieving comparable mixing performance.

【16】 Knowledge mining of unstructured information: application to cyber-domain 标题：非结构化信息的知识挖掘：在网络领域的应用链接：https://arxiv.org/abs/2109.03848

作者：Tuomas Takko,Kunal Bhattacharya,Martti Lehto,Pertti Jalasvirta,Aapo Cederberg,Kimmo Kaski 机构：Department of Computer Science, Aalto University School of Science, Department of Industrial Engineering and Management, University of Jyväskylä, PO Box , Finland, Cyberwatch Finland Oy, Tietokuja , Finland, The Alan Turing Institute 摘要：网络情报在许多公开的在线来源中广泛而丰富，并有关于漏洞和事件的报告。这种持续不断的嘈杂信息流需要新的工具和技术，才能使各种组织的分析师和调查人员受益。在本文中，我们提出并实现了一个新的知识图和知识挖掘框架，用于从网络领域事件的自由文本中提取相关信息。我们的框架包括一个基于机器学习的管道，以及使用我们的非技术性网络本体生成实体图、攻击者和相关信息的爬行方法。我们在公开的网络事件数据集上测试我们的框架，以评估我们的知识挖掘方法的准确性以及框架在网络分析师使用中的有用性。我们的结果表明，分析使用新框架构建的知识图，分析师可以从当前网络环境中推断出更多信息，包括对不同实体的风险以及行业和国家之间的风险传播。扩展框架以容纳更多技术和操作层面的信息可以提高知识图中趋势和风险的准确性和可解释性。摘要：Cyber intelligence is widely and abundantly available in numerous open online sources with reports on vulnerabilities and incidents. This constant stream of noisy information requires new tools and techniques if it is to be used for the benefit of analysts and investigators in various organizations. In this paper we present and implement a novel knowledge graph and knowledge mining framework for extracting relevant information from free-form text about incidents in the cyber domain. Our framework includes a machine learning based pipeline as well as crawling methods for generating graphs of entities, attackers and the related information with our non-technical cyber ontology. We test our framework on publicly available cyber incident datasets to evaluate the accuracy of our knowledge mining methods as well as the usefulness of the framework in the use of cyber analysts. Our results show analyzing the knowledge graph constructed using the novel framework, an analyst can infer additional information from the current cyber landscape in terms of risk to various entities and the propagation of risk between industries and countries. Expanding the framework to accommodate more technical and operational level information can increase the accuracy and explainability of trends and risk in the knowledge graph.

【17】 Extreme Bandits using Robust Statistics 标题：基于稳健统计的极值带链接：https://arxiv.org/abs/2109.04433

作者：Sujay Bhatt,Ping Li,Gennady Samorodnitsky 机构：Cognitive Computing Lab, Baidu Research, NE ,th St. Bellevue, WA , USA, School of ORIE, Cornell University, Frank T Rhodes Hall, Ithaca, NY , USA 摘要：我们认为一个多武装的强盗问题的动机的情况下，只有极值，而不是预期的价值在经典土匪设置，是有兴趣的。我们提出了使用鲁棒统计的无分布算法，并描述了其统计特性。我们证明了所提出的算法在较弱的条件下实现了消失极值后悔。通过数值实验验证了算法在有限样本条件下的性能。结果表明，与已知的算法相比，本文提出的算法具有更好的性能。摘要：We consider a multi-armed bandit problem motivated by situations where only the extreme values, as opposed to expected values in the classical bandit setting, are of interest. We propose distribution free algorithms using robust statistics and characterize the statistical properties. We show that the provided algorithms achieve vanishing extremal regret under weaker conditions than existing algorithms. Performance of the algorithms is demonstrated for the finite-sample setting using numerical experiments. The results show superior performance of the proposed algorithms compared to the well known algorithms.

【18】 Coordinate Descent Methods for DC Minimization 标题：直流极小化的坐标下降法链接：https://arxiv.org/abs/2109.04228

作者：Ganzhao Yuan 机构：†Peng Cheng Laboratory, China 摘要：凸函数差分（DC）极小化是指两个凸函数差分的极小化问题，在统计学习中有着广泛的应用，几十年来得到了广泛的研究。然而，现有的方法主要基于多阶段凸松弛，只导致临界点的弱最优性。提出了一种基于序贯非凸逼近的最小化直流函数的坐标下降法。我们的方法迭代地全局求解一个非凸一维子问题，并且保证收敛到一个坐标平稳点。证明了当目标函数弱凸时，新的最优性条件总是强于临界点条件和方向点条件。为了进行比较，我们在研究中还包括了一种基于序列凸近似的坐标下降方法的简单变体。当目标函数满足另一个称为{emph{sharpness}的正则条件时，具有适当初始化的坐标下降方法将{emph{linear}收敛到最优解集。此外，对于许多感兴趣的应用，我们证明了使用断点搜索方法可以准确有效地计算非凸一维子问题。我们提出了一些讨论和扩展我们提出的方法。最后，我们对几个统计学习任务进行了广泛的实验，以证明我们的方法的优越性。关键词：坐标下降，DC最小化，DC规划，凸规划差分，非凸优化，稀疏优化，二元优化。摘要：Difference-of-Convex (DC) minimization, referring to the problem of minimizing the difference of two convex functions, has been found rich applications in statistical learning and studied extensively for decades. However, existing methods are primarily based on multi-stage convex relaxation, only leading to weak optimality of critical points. This paper proposes a coordinate descent method for minimizing DC functions based on sequential nonconvex approximation. Our approach iteratively solves a nonconvex one-dimensional subproblem globally, and it is guaranteed to converge to a coordinate-wise stationary point. We prove that this new optimality condition is always stronger than the critical point condition and the directional point condition when the objective function is weakly convex. For comparisons, we also include a naive variant of coordinate descent methods based on sequential convex approximation in our study. When the objective function satisfies an additional regularity condition called \emph{sharpness}, coordinate descent methods with an appropriate initialization converge \emph{linearly} to the optimal solution set. Also, for many applications of interest, we show that the nonconvex one-dimensional subproblem can be computed exactly and efficiently using a breakpoint searching method. We present some discussions and extensions of our proposed method. Finally, we have conducted extensive experiments on several statistical learning tasks to show the superiority of our approach. Keywords: Coordinate Descent, DC Minimization, DC Programming, Difference-of-Convex Programs, Nonconvex Optimization, Sparse Optimization, Binary Optimization.

【19】 IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System 标题：IMG2SMI：将分子结构图像转换为简化的分子输入线路输入系统链接：https://arxiv.org/abs/2109.04202

作者：Daniel Campos,Heng Ji 机构：Department of Computer Science, University of Illinois Urbana-Champaign, Urbana IL, USA 摘要：像许多科学领域一样，新的化学文献也以惊人的速度增长，每个月发表数千篇论文。化学文献的大部分集中在新分子和分子间的反应。最重要的信息是通过分子的二维图像传达的，这些图像代表了所描述的潜在分子或反应。为了确保再现性和机器可读的分子表示，创建了基于文本的分子描述符，如微笑和自拍。这些基于文本的分子表示提供了分子生成，但不幸的是，很少出现在已发表的文献中。在缺乏分子描述符的情况下，从文献中呈现的二维图像生成分子描述符对于大规模理解化学文献是必要的。光学结构识别应用程序（OSRA）和ChemSchematicsResolver等成功方法能够提取化学论文中分子结构的位置，并推断分子描述和反应。现有的系统虽然有效，但希望化学家纠正输出，使其不适合无监督的大规模数据挖掘。利用DECIMER引入的图像字幕任务公式，我们引入IMG2SMI，该模型利用深度残差网络进行图像特征提取，并使用编码器-解码器-转换器层生成分子描述。与以前基于神经网络的系统不同，IMG2SMI围绕分子描述生成任务构建，这使得IMG2SMI在分子MACC指纹Tanimoto相似度测量的分子相似度预测方面优于基于OSRA的系统163%。此外，为了促进这项任务的进一步研究，我们发布了一个新的分子预测数据集。包括8100万个用于分子描述生成的分子摘要：Like many scientific fields, new chemistry literature has grown at a staggering pace, with thousands of papers released every month. A large portion of chemistry literature focuses on new molecules and reactions between molecules. Most vital information is conveyed through 2-D images of molecules, representing the underlying molecules or reactions described. In order to ensure reproducible and machine-readable molecule representations, text-based molecule descriptors like SMILES and SELFIES were created. These text-based molecule representations provide molecule generation but are unfortunately rarely present in published literature. In the absence of molecule descriptors, the generation of molecule descriptors from the 2-D images present in the literature is necessary to understand chemistry literature at scale. Successful methods such as Optical Structure Recognition Application (OSRA), and ChemSchematicResolver are able to extract the locations of molecules structures in chemistry papers and infer molecular descriptions and reactions. While effective, existing systems expect chemists to correct outputs, making them unsuitable for unsupervised large-scale data mining. Leveraging the task formulation of image captioning introduced by DECIMER, we introduce IMG2SMI, a model which leverages Deep Residual Networks for image feature extraction and an encoder-decoder Transformer layers for molecule description generation. Unlike previous Neural Network-based systems, IMG2SMI builds around the task of molecule description generation, which enables IMG2SMI to outperform OSRA-based systems by 163% in molecule similarity prediction as measured by the molecular MACCS Fingerprint Tanimoto Similarity. Additionally, to facilitate further research on this task, we release a new molecule prediction dataset. including 81 million molecules for molecule description generation

【20】 MaterialsAtlas.org: A Materials Informatics Web App Platform for Materials Discovery and Survey of State-of-the-Art 标题：MaterialsAtlas.org：用于材料发现和最新研究的材料信息学Web应用平台链接：https://arxiv.org/abs/2109.04007

作者：Jianjun Hu,Stanislav Stefanov,Yuqi Song,Sadman Sadeed Omee,Steph-Yves Louis,Edirisuriya M. D. Siriwardane,Yong Zhao 机构：Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 备注：16 pages 摘要：大规模实验和计算材料数据的可用性和易访问性使材料性能预测、结构预测和材料生成性设计的算法和模型得以加速发展。然而，缺乏用户友好的材料信息学web服务器严重限制了材料科学家在材料筛选、修补和设计空间探索的日常实践中广泛采用此类工具。在此，我们首先调查当前的材料信息学web应用程序，然后提出并开发MaterialsAtlas.org，这是一个用于材料发现的基于web的材料信息学工具箱，其中包括探索性材料发现所需的各种常规工具，包括材料成分和结构检查（例如中性、电负性平衡、动态稳定性、泡林规则）、材料性质预测（例如带隙、弹性模量、硬度、热导率）和对假设材料的搜索。这些用户友好的工具可在\url{www.materialsatlas.org}上免费访问我们认为，社区应广泛开发此类材料信息学应用程序，以加快材料发现过程。摘要：The availability and easy access of large scale experimental and computational materials data have enabled the emergence of accelerated development of algorithms and models for materials property prediction, structure prediction, and generative design of materials. However, lack of user-friendly materials informatics web servers has severely constrained the wide adoption of such tools in the daily practice of materials screening, tinkering, and design space exploration by materials scientists. Herein we first survey current materials informatics web apps and then propose and develop MaterialsAtlas.org, a web based materials informatics toolbox for materials discovery, which includes a variety of routinely needed tools for exploratory materials discovery, including materials composition and structure check (e.g. for neutrality, electronegativity balance, dynamic stability, Pauling rules), materials property prediction (e.g. band gap, elastic moduli, hardness, thermal conductivity), and search for hypothetical materials. These user-friendly tools can be freely accessed at \url{www.materialsatlas.org}. We argue that such materials informatics apps should be widely developed by the community to speed up the materials discovery processes.

【21】 Matrix Completion of World Trade 标题：世界贸易的矩阵完备化链接：https://arxiv.org/abs/2109.03930

作者：Gnecco Giorgio,Nutarelli Federico,Riccaboni Massimo 机构： and Massimo Riccaboni 1 1IMT School for Advanced Studies 备注：The main paper contains 11 pages with the Appendix. Supplemental material is also reported 摘要：这项工作应用了矩阵完成（MC）——一类在推荐系统中常用的机器学习方法——来分析经济复杂性。MC用于重建显示比较优势（RCA）矩阵，该矩阵的元素表示特定类别产品中国家的相对优势，年度贸易流证明了这一点。从MC的应用中衍生出一种高精度的二元分类器，其目的是区分RCA矩阵中分别高于或低于一个的元素。我们引入了一种新的基于MC的经济复杂性矩阵完成指数（MONEY），它与国家RCA的可预测性相关（可预测性越低，复杂性越高）。与先前开发的经济复杂性指数不同，货币指数考虑了MC重构矩阵的各种奇异向量，而其他指数仅基于从RCA矩阵导出的适当对称矩阵的一个/两个特征向量。最后，将MC与最新的经济复杂性指数（GENEPY）进行比较。我们表明，从MC的平均输入输出开始构造的二元分类器的每个国家的假阳性率可以用作GENEPY的代理。摘要：This work applies Matrix Completion (MC) -- a class of machine-learning methods commonly used in the context of recommendation systems -- to analyse economic complexity. MC is applied to reconstruct the Revealed Comparative Advantage (RCA) matrix, whose elements express the relative advantage of countries in given classes of products, as evidenced by yearly trade flows. A high-accuracy binary classifier is derived from the application of MC, with the aim of discriminating between elements of the RCA matrix that are, respectively, higher or lower than one. We introduce a novel Matrix cOmpletion iNdex of Economic complexitY (MONEY) based on MC, which is related to the predictability of countries' RCA (the lower the predictability, the higher the complexity). Differently from previously-developed indices of economic complexity, the MONEY index takes into account the various singular vectors of the matrix reconstructed by MC, whereas other indices are based only on one/two eigenvectors of a suitable symmetric matrix, derived from the RCA matrix. Finally, MC is compared with a state-of-the-art economic complexity index (GENEPY). We show that the false positive rate per country of a binary classifier constructed starting from the average entry-wise output of MC can be used as a proxy of GENEPY.

【22】 Initialization for Nonnegative Matrix Factorization: a Comprehensive Review 标题：非负矩阵因式分解的初始化：综述链接：https://arxiv.org/abs/2109.03874

作者：Sajad Fathi Hafshejani,Zahra Moaberfard 机构：Department of Applied Mathematics, Shiraz University of Technology, Shiraz, Iran 摘要：非负矩阵分解（NMF）通过从观测到的非负数据矩阵中提取非负基特征来表示有意义的数据，已成为一种流行的方法。该方法在识别隐藏数据方面的一些独特特性使其成为机器学习领域的强大方法之一。NMF是一个已知的非凸优化问题，初始点对找到有效的局部解有重要影响。在本文中，我们研究了迄今为止最流行的NMF初始化过程。我们描述了每种方法，并介绍了它们的一些优缺点。最后，给出了一些数值结果来说明每种算法的性能。摘要：Non-negative matrix factorization (NMF) has become a popular method for representing meaningful data by extracting a non-negative basis feature from an observed non-negative data matrix. Some of the unique features of this method in identifying hidden data put this method amongst the powerful methods in the machine learning area. The NMF is a known non-convex optimization problem and the initial point has a significant effect on finding an efficient local solution. In this paper, we investigate the most popular initialization procedures proposed for NMF so far. We describe each method and present some of their advantages and disadvantages. Finally, some numerical results to illustrate the performance of each algorithm are presented.

机器翻译，仅供参考

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-09-10，如有侵权请联系 cloudcommunity@tencent.com 删除

linux