机器学习学术速递[8.31]

公众号-arXiv每日学术速递

发布于 2021-09-16 14:52:43

1.7K0

发布于 2021-09-16 14:52:43

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计116篇

Graph相关(图学习|图神经网络|图优化等)(7篇)

【1】 Whole Brain Vessel Graphs: A Dataset and Benchmark for Graph Learning and Neuroscience (VesselGraph) 标题：全脑血管图：图学习和神经科学(VesselGraph)的数据集和基准链接：https://arxiv.org/abs/2108.13233

作者：Johannes C. Paetzold,Julian McGinnis,Suprosanna Shit,Ivan Ezhov,Paul Büschl,Chinmay Prabhakar,Mihail I. Todorov,Anjany Sekuboyina,Georgios Kaissis,Ali Ertürk,Stephan Günnemann,Bjoern H. Menze 机构：†Technical University of Munich, ‡Helmholtz Zentrum München, University of Zürich 摘要：生物神经网络定义了人类和其他哺乳动物的大脑功能和智力，并形成了超大的空间结构图。它们的神经元组织与大脑微血管的空间组织紧密相连，微血管为神经元提供氧气，并构建互补的空间图。这种脉管系统（或血管结构）在神经科学中起着重要作用；例如，血管结构的组织（和变化）可以代表各种病理的早期迹象，例如阿尔茨海默病或中风。最近，在组织清除方面的进展使全脑成像和分割整个小鼠脑血管系统成为可能。在这些成像技术进步的基础上，我们提出了一个基于特定成像协议的全脑血管图的可扩展数据集。具体地说，我们利用体绘制引擎Voreen使用一种改进的图形提取方案提取血管图，并通过OGB和PyTorch几何数据加载程序以可访问和可调整的形式提供血管图。此外，我们还使用引入的血管图数据集，在血管预测和血管分类的生物相关任务上对许多最先进的图形学习算法进行了基准测试。我们的工作为将图形学习研究推进到神经科学领域铺平了道路。作为补充，所提供的数据集为机器学习社区提出了具有挑战性的图形学习研究问题，包括将生物先验知识纳入学习算法，或扩展这些算法以处理具有数百万个节点和边的稀疏空间图。所有数据集和代码可从以下网址下载：https://github.com/jocpae/VesselGraph . 摘要：Biological neural networks define the brain function and intelligence of humans and other mammals, and form ultra-large, spatial, structured graphs. Their neuronal organization is closely interconnected with the spatial organization of the brain's microvasculature, which supplies oxygen to the neurons and builds a complementary spatial graph. This vasculature (or the vessel structure) plays an important role in neuroscience; for example, the organization of (and changes to) vessel structure can represent early signs of various pathologies, e.g. Alzheimer's disease or stroke. Recently, advances in tissue clearing have enabled whole brain imaging and segmentation of the entirety of the mouse brain's vasculature. Building on these advances in imaging, we are presenting an extendable dataset of whole-brain vessel graphs based on specific imaging protocols. Specifically, we extract vascular graphs using a refined graph extraction scheme leveraging the volume rendering engine Voreen and provide them in an accessible and adaptable form through the OGB and PyTorch Geometric dataloaders. Moreover, we benchmark numerous state-of-the-art graph learning algorithms on the biologically relevant tasks of vessel prediction and vessel classification using the introduced vessel graph dataset. Our work paves a path towards advancing graph learning research into the field of neuroscience. Complementarily, the presented dataset raises challenging graph learning research questions for the machine learning community, in terms of incorporating biological priors into learning algorithms, or in scaling these algorithms to handle sparse,spatial graphs with millions of nodes and edges. All datasets and code are available for download at https://github.com/jocpae/VesselGraph .

【2】 Black-Box and Modular Meta-Learning for Power Control via Random Edge Graph Neural Networks 标题：基于随机边图神经网络的黑盒模块化元学习功率控制链接：https://arxiv.org/abs/2108.13178

作者：Ivana Nikoloska,Osvaldo Simeone 机构：Simeone, Fellow, IEEE. 备注：Submitted for publication 摘要：在本文中，我们考虑具有任意时变拓扑的无线网络的功率控制问题，包括可能的添加或移除节点。为了有效地将信道状态信息（CSI）映射到发射功率的功率控制策略参数化，采用了一种利用图形神经网络（GNNs）的数据驱动设计方法。特定的GNN架构称为随机边GNN（REGNN），定义了一个非线性图卷积滤波器，其空间权重与信道系数相关。虽然之前的工作假设了一种联合训练方法，其中基于REGNN的策略在所有拓扑中共享，但本文的目标是基于当前拓扑的有限CSI数据来调整功率控制策略。为此，我们提出了黑盒和模块化元学习技术。黑箱元学习通过（随机）梯度下降优化通用适应过程，而模块元学习则发现一组可重用的模块，这些模块可以构成任何新网络拓扑解决方案的组件。数值结果验证了元学习在功率控制问题上优于联合训练方案，并证明了在数据可用性极为有限的情况下模块元学习的优势。摘要：In this paper, we consider the problem of power control for a wireless network with an arbitrarily time-varying topology, including the possible addition or removal of nodes. A data-driven design methodology that leverages graph neural networks (GNNs) is adopted in order to efficiently parametrize the power control policy mapping the channel state information (CSI) to transmit powers. The specific GNN architecture, known as random edge GNN (REGNN), defines a non-linear graph convolutional filter whose spatial weights are tied to the channel coefficients. While prior work assumed a joint training approach whereby the REGNN-based policy is shared across all topologies, this paper targets adaptation of the power control policy based on limited CSI data regarding the current topology. To this end, we propose both black-box and modular meta-learning techniques. Black-box meta-learning optimizes a general-purpose adaptation procedure via (stochastic) gradient descent, while modular meta-learning finds a set of reusable modules that can form components of a solution for any new network topology. Numerical results validate the benefits of meta-learning for power control problems over joint training schemes, and demonstrate the advantages of modular meta-learning when data availability is extremely limited.

【3】 Demystifying Drug Repurposing Domain Comprehension with Knowledge Graph Embedding 标题：用知识图嵌入法揭开药物再利用领域理解的神秘面纱链接：https://arxiv.org/abs/2108.13051

作者：Edoardo Ramalli,Alberto Parravicini,Guido Walter Di Donato,Mirko Salaris,Céline Hudelot,Marco Domenico Santambrogio 机构：C´eline Hudelot†, Marco D. Santambrogio∗, ∗Politecnico di Milano, DEIB, Milan, Italy, †Universit´e Paris-Saclay CentraleSupel´ec, MICS Lab, Gif-sur-Yvette, France 备注：5 pages, IEEE BioCAS 2021 摘要：由于药物开发成本不断上升以及需要快速应对新出现的疾病，药物再利用比以往任何时候都更加重要。知识图嵌入可以使用异构数据源结合最先进的机器学习模型重新调整药物用途，以预测知识图中的新药-疾病联系。在许多机器学习应用中，理解预测模型的行为仍然需要大量的工作。我们提出了一种结构化的方法来理解更好的机器学习模型对药物再利用的结果，建议知识图的关键元素来改进预测，同时节省计算资源。我们减少了11.05%的训练集和31.87%的嵌入空间，只减少了2%的准确度，在开放ogbl biokg图上增加了60%的准确度，只增加了1.53%的新三元组。摘要：Drug repurposing is more relevant than ever due to drug development's rising costs and the need to respond to emerging diseases quickly. Knowledge graph embedding enables drug repurposing using heterogeneous data sources combined with state-of-the-art machine learning models to predict new drug-disease links in the knowledge graph. As in many machine learning applications, significant work is still required to understand the predictive models' behavior. We propose a structured methodology to understand better machine learning models' results for drug repurposing, suggesting key elements of the knowledge graph to improve predictions while saving computational resources. We reduce the training set of 11.05% and the embedding space by 31.87%, with only a 2% accuracy reduction, and increase accuracy by 60% on the open ogbl-biokg graph adding only 1.53% new triples.

【4】 Single Node Injection Attack against Graph Neural Networks 标题：针对图神经网络的单节点注入攻击链接：https://arxiv.org/abs/2108.13049

作者：Shuchang Tao,Qi Cao,Huawei Shen,Junjie Huang,Yunfan Wu,Xueqi Cheng 机构：Data Intelligence System Research Center,CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, University of Chinese Academy of Sciences, Beijing, China 备注：Accepted by the CIKM 2021; Code: this https URL 摘要：图神经网络（GNNs）节点注入攻击是一种新兴的、实用的攻击方案，攻击者通过注入恶意节点而不是修改原始节点或边来影响GNNs的性能。然而，现有的节点注入攻击忽略了极其有限的场景，即注入的节点可能过多，以至于目标GNN可以感知到它们。在本文中，我们关注一个非常有限的单节点注入规避攻击场景，即在测试阶段，攻击者只允许注入一个单节点，以损害GNN的性能。网络结构的离散性以及网络结构与节点特征之间的耦合效应给这一极其有限的场景带来了巨大的挑战。我们首先提出了一种基于优化的方法来探索单节点注入规避攻击的性能上限。实验结果表明，三个公共数据集上的100%、98.60%和94.98%的节点在仅注入一个节点的情况下都被成功攻击，证实了单节点注入规避攻击的可行性。然而，这种基于优化的方法需要针对每次攻击进行重新优化，这在计算上是无法忍受的。为了解决这一难题，我们进一步提出了一种可推广的节点注入攻击模型，即G-NIA，在保证攻击性能的同时提高攻击效率。实验是在三个著名的GNN上进行的。我们提出的G-NIA显著优于最先进的基线，并且在推断时比基于优化的方法快500倍。摘要：Node injection attack on Graph Neural Networks (GNNs) is an emerging and practical attack scenario that the attacker injects malicious nodes rather than modifying original nodes or edges to affect the performance of GNNs. However, existing node injection attacks ignore extremely limited scenarios, namely the injected nodes might be excessive such that they may be perceptible to the target GNN. In this paper, we focus on an extremely limited scenario of single node injection evasion attack, i.e., the attacker is only allowed to inject one single node during the test phase to hurt GNN's performance. The discreteness of network structure and the coupling effect between network structure and node features bring great challenges to this extremely limited scenario. We first propose an optimization-based method to explore the performance upper bound of single node injection evasion attack. Experimental results show that 100%, 98.60%, and 94.98% nodes on three public datasets are successfully attacked even when only injecting one node with one edge, confirming the feasibility of single node injection evasion attack. However, such an optimization-based method needs to be re-optimized for each attack, which is computationally unbearable. To solve the dilemma, we further propose a Generalizable Node Injection Attack model, namely G-NIA, to improve the attack efficiency while ensuring the attack performance. Experiments are conducted across three well-known GNNs. Our proposed G-NIA significantly outperforms state-of-the-art baselines and is 500 times faster than the optimization-based method when inferring.

【5】 Adversarial Stein Training for Graph Energy Models 标题：图能量模型的对抗性斯坦因训练链接：https://arxiv.org/abs/2108.12982

作者：Shiv Shankar 机构：University of Massachussets 备注：Appeared at Machine Learning for Molecules Workshop at NeurIPS 2020.this https URL 摘要：在图结构数据上学习分布是一项具有挑战性的任务，在生物和化学中有许多应用。在这项工作中，我们使用基于能量的模型（EBM）的多通道图神经网络（GNN）学习置换不变的非正规密度函数的图形。与标准EBM训练方法不同，我们的方法是通过最小化对抗性差异来学习模型。模型中的样本可以通过基于Langevin dynamics的MCMC获得。我们发现，与基准模型相比，这种方法在图生成方面取得了竞争性的结果。摘要：Learning distributions over graph-structured data is a challenging task with many applications in biology and chemistry. In this work we use an energy-based model (EBM) based on multi-channel graph neural networks (GNN) to learn permutation invariant unnormalized density functions on graphs. Unlike standard EBM training methods our approach is to learn the model via minimizing adversarial stein discrepancy. Samples from the model can be obtained via Langevin dynamics based MCMC. We find that this approach achieves competitive results on graph generation compared to benchmark models.

【6】 Using Graph Neural Networks to model the performance of Deep Neural Networks 标题：用图神经网络对深度神经网络的性能进行建模链接：https://arxiv.org/abs/2108.12489

作者：Shikhar Singh,Benoit Steiner,James Hegarty,Hugh Leather 机构：frameworks like Tensorflow [ 3] and PyTorch [ 4] rely on aShikhar Singh is with the Department of Electrical and Computer Engi-neering, University of Texas 摘要：随着机器学习软件的空前普及，越来越需要为此类应用程序生成高效的代码。最先进的深度学习编译器（如TVM和Halide）结合了基于学习的性能模型，以搜索给定深度学习算法的有效实现空间。对于给定的应用程序，该模型生成一个性能指标，例如运行时，而不在硬件上执行应用程序。这样的模型不需要在硬件上对大量候选实现（称为时间表）进行基准测试，从而加快了编译过程。现有的性能模型采用前馈网络、递归网络或决策树集成来估计神经网络的不同实现的性能。图为深入学习网络建模提供了一种自然直观的方法，其中每个节点代表一个计算阶段或操作。将这些工作负载的固有图形结构合并到性能模型中可以更好地表示和学习阶段间的交互。性能模型的准确性直接影响搜索策略的效率，使其成为这类深度学习编译器的关键组件。在这项工作中，我们开发了一个新的性能模型，采用了图形表示。在我们的模型中，计算的每个阶段代表一个节点，其特征是捕获该阶段执行的操作的特征。使用图卷积实现节点之间的交互。实验评估表明，与卤化物模型和TVM模型相比，预测误差分别减少了7:75倍和12倍。摘要：With the unprecedented proliferation of machine learning software, there is an ever-increasing need to generate efficient code for such applications. State-of-the-art deep-learning compilers like TVM and Halide incorporate a learning-based performance model to search the space of valid implementations of a given deep learning algorithm. For a given application, the model generates a performance metric such as the run time without executing the application on hardware. Such models speed up the compilation process by obviating the need to benchmark an enormous number of candidate implementations, referred to as schedules, on hardware. Existing performance models employ feed-forward networks, recurrent networks, or decision tree ensembles to estimate the performance of different implementations of a neural network. Graphs present a natural and intuitive way to model deep-learning networks where each node represents a computational stage or operation. Incorporating the inherent graph structure of these workloads in the performance model can enable a better representation and learning of inter-stage interactions. The accuracy of a performance model has direct implications on the efficiency of the search strategy, making it a crucial component of this class of deep-learning compilers. In this work, we develop a novel performance model that adopts a graph representation. In our model, each stage of computation represents a node characterized by features that capture the operations performed by the stage. The interaction between nodes is achieved using graph convolutions. Experimental evaluation shows a 7:75x and 12x reduction in prediction error compared to the Halide and TVM models, respectively.

【7】 Mal2GCN: A Robust Malware Detection Approach Using Deep Graph Convolutional Networks With Non-Negative Weights 标题：Mal2GCN：一种基于非负权深图卷积网络的鲁棒恶意软件检测方法链接：https://arxiv.org/abs/2108.12473

作者：Omid Kargarnovin,Amir Mahdi Sadeghzadeh,Rasool Jalili 机构： Sharif University of Technology 备注：12 pages, 12 figures, 5 tables 摘要：随着使用机器学习解决各种问题的速度不断加快，保护这些模型不受对手攻击已成为研究人员关注的主要问题之一。最近的研究表明，在对抗性环境中，机器学习模型容易受到对抗性示例的攻击，而对手可以创建精心编制的输入来愚弄模型。随着深度神经网络的出现，许多研究人员将深度神经网络用于各种任务，并取得了令人印象深刻的成果。这些模型在安全部署之前必须对攻击具有鲁棒性，特别是在安全相关领域，如恶意软件检测。在本文中，我们首先提出了一种基于黑盒源代码的恶意软件生成方法，该方法可用于评估恶意软件检测模型对真实世界对手的鲁棒性。该方法将对抗性代码注入恶意软件源代码的各个位置，以规避恶意软件检测模型。然后，我们提出了一种健壮的恶意软件检测模型Mal2GCN。Mal2GCN利用图卷积网络的表示能力，结合非负权重训练方法，创建了一个检测精度高的恶意软件检测模型，该模型还对向输入添加良性特征的敌对攻击具有鲁棒性。摘要：With the growing pace of using machine learning to solve various problems, securing these models against adversaries has become one of the main concerns of researchers. Recent studies have shown that in an adversarial environment, machine learning models are vulnerable to adversarial examples, and adversaries can create carefully crafted inputs to fool the models. With the advent of deep neural networks, many researchers have used deep neural networks for various tasks, and have achieved impressive results. These models must become robust against attacks before being deployed safely, especially in security-related fields such as malware detection. In this paper, we first present a black-box source code-based adversarial malware generation approach that can be used to evaluate the robustness of malware detection models against real-world adversaries. The proposed approach injects adversarial codes into the various locations of malware source codes to evade malware detection models. We then propose Mal2GCN, a robust malware detection model. Mal2GCN uses the representation power of graph convolutional networks combined with the non-negative weights training method to create a malware detection model with high detection accuracy, which is also robust against adversarial attacks that add benign features to the input.

Transformer(4篇)

【1】 Extracting Qualitative Causal Structure with Transformer-Based NLP 标题：基于Transformer的NLP算法提取定性因果结构链接：https://arxiv.org/abs/2108.13304

作者：Scott E. Friedman,Ian H. Magnusson,Sonja M. Schmer-Galunder 机构：SIFT, Minneapolis, MN, USA, Northeastern University, MA, USA 备注：7 pages, 7 figures, IJCAI Workshop on Qualitative Reasoning 摘要：定性因果关系紧密地表达了世界上离散或连续交互的方向、依赖性、时间约束和单调性约束。在日常或学术语言中，我们可以表达数量之间的相互作用（例如，睡眠减少压力），离散事件或实体之间的相互作用（例如，一种蛋白质抑制另一种蛋白质的转录），或有意因素或功能因素之间的相互作用（例如，医院患者祈祷减轻疼痛）。本文提出了一种基于转换器的NLP体系结构，它联合识别和提取（1）语言中描述的变量或因素，（2）这些变量的定性因果关系，以及（3）约束这些因果关系的限定符和大小。我们演示了这种方法，并在两个用例中包括了有希望的结果，即处理来自学术出版物、新闻文章和社交媒体的文本输入。摘要：Qualitative causal relationships compactly express the direction, dependency, temporal constraints, and monotonicity constraints of discrete or continuous interactions in the world. In everyday or academic language, we may express interactions between quantities (e.g., sleep decreases stress), between discrete events or entities (e.g., a protein inhibits another protein's transcription), or between intentional or functional factors (e.g., hospital patients pray to relieve their pain). This paper presents a transformer-based NLP architecture that jointly identifies and extracts (1) variables or factors described in language, (2) qualitative causal relationships over these variables, and (3) qualifiers and magnitudes that constrain these causal relationships. We demonstrate this approach and include promising results from in two use cases, processing textual inputs from academic publications, news articles, and social media.

【2】 Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning 标题：SHATTER：一种高效的单头自关注相对序列分割Transformer编码器链接：https://arxiv.org/abs/2108.13032

作者：Ran Tian,Joshua Maynez,Ankur P. Parikh 机构：Google Research Language 摘要：高度流行的Transformer架构，基于自我关注，是大型预训练模型的基础，如BERT，已经成为一个持久的范例在NLP。虽然功能强大，但预训练此类模型所需的计算资源和时间可能会令人望而却步。在这项工作中，我们提出了另一种自我注意体系结构SHARTER，它通过软划分相对位置的空间并将不同的值矩阵应用于序列的不同部分来更有效地编码序列信息。这一机制进一步允许我们将Transformer中的多头注意简化为单头注意。我们进行了大量的实验，结果表明Shatter比BERT具有更好的性能，每一步的预训练速度更快（TPU上为15%），收敛的步骤更少，并且提供了可观的内存节省（>50%）。综合起来，Shatter可以在7天内在8个V100 GPU上进行预训练，并与BERT_Base的性能相匹配——这使得预训练的成本更加低廉。摘要：The highly popular Transformer architecture, based on self-attention, is the foundation of large pretrained models such as BERT, that have become an enduring paradigm in NLP. While powerful, the computational resources and time required to pretrain such models can be prohibitive. In this work, we present an alternative self-attention architecture, Shatter, that more efficiently encodes sequence information by softly partitioning the space of relative positions and applying different value matrices to different parts of the sequence. This mechanism further allows us to simplify the multi-headed attention in Transformer to single-headed. We conduct extensive experiments showing that Shatter achieves better performance than BERT, with pretraining being faster per step (15% on TPU), converging in fewer steps, and offering considerable memory savings (>50%). Put together, Shatter can be pretrained on 8 V100 GPUs in 7 days, and match the performance of BERT_Base -- making the cost of pretraining much more affordable.

【3】 TCCT: Tightly-Coupled Convolutional Transformer on Time Series Forecasting 标题：TCCT：基于时间序列预测的紧耦合卷积Transformer 链接：https://arxiv.org/abs/2108.12784

作者：Li Shen,Yangzhu Wang 机构： Yangzhu Wang 2 1Affiliation 1 and Correspondence; Beihang University; shenli, cn 2Affiliation 2; Beihang University; wangyangzhu 摘要：时间序列预测对于广泛的实际应用至关重要。最近的研究表明，Transformer在处理此类问题，特别是长序列时间序列输入（LSTI）和长序列时间序列预测（LSTF）问题方面具有优越性。为了提高效率和增强Transformer的局部性，这些研究在不同程度上将Transformer与CNN相结合。然而，它们的组合是松散耦合的，不能充分利用CNN。为了解决这个问题，我们提出了紧耦合卷积变换器（TCCT）的概念和三种TCCT架构，它们将转换后的CNN架构应用到变换器中：（1）CSP注意：通过将CSPNet与自我注意机制融合，自我注意机制的计算成本降低了30%，内存使用率降低了50%，同时达到了相当或超过预测精度(2）扩展的因果卷积：这种方法是修改Informer提出的提取操作，用扩展的因果卷积层替换标准卷积层，以获得指数级的感受野增长(3）传递机制：将传递机制应用于自关注块堆栈有助于类似转换器的模型获得更细粒度的信息，而额外的计算成本可以忽略不计。我们在真实数据集上的实验表明，我们的TCCT体系结构可以大大提高现有最先进的Transformer模型在时间序列预测方面的性能，并且具有更低的计算和内存成本，包括规范Transformer、LogTrans和Informer。摘要：Time series forecasting is essential for a wide range of real-world applications. Recent studies have shown the superiority of Transformer in dealing with such problems, especially long sequence time series input(LSTI) and long sequence time series forecasting(LSTF) problems. To improve the efficiency and enhance the locality of Transformer, these studies combine Transformer with CNN in varying degrees. However, their combinations are loosely-coupled and do not make full use of CNN. To address this issue, we propose the concept of tightly-coupled convolutional Transformer(TCCT) and three TCCT architectures which apply transformed CNN architectures into Transformer: (1) CSPAttention: through fusing CSPNet with self-attention mechanism, the computation cost of self-attention mechanism is reduced by 30% and the memory usage is reduced by 50% while achieving equivalent or beyond prediction accuracy. (2) Dilated causal convolution: this method is to modify the distilling operation proposed by Informer through replacing canonical convolutional layers with dilated causal convolutional layers to gain exponentially receptive field growth. (3) Passthrough mechanism: the application of passthrough mechanism to stack of self-attention blocks helps Transformer-like models get more fine-grained information with negligible extra computation costs. Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-art Transformer models on time series forecasting with much lower computation and memory costs, including canonical Transformer, LogTrans and Informer.

【4】 Multi-Channel Transformer Transducer for Speech Recognition 标题：用于语音识别的多通道互感器链接：https://arxiv.org/abs/2108.12953

作者：Feng-Ju Chang,Martin Radfar,Athanasios Mouchtaris,Maurizio Omologo 机构：Alexa Machine Learning, Amazon, USA 备注：None 摘要：与单通道相比，多通道输入具有许多优点，可提高设备上语音识别系统的鲁棒性。最近关于多通道Transformer的工作提出了一种将此类输入合并到端到端ASR中以提高精度的方法。然而，这种方法的特点是计算复杂度高，这使得它无法部署在设备上的系统中。在本文中，我们提出了一种新的语音识别模型，多通道Transformer换能器（MCTT），它具有端到端多通道训练、低计算成本和低延迟的特点，因此适合于设备语音识别中的流解码。在远场内部数据集中，我们的MCTT优于分段多通道模型，Transformer传感器的相对功率改善率（WERR）高达6.01%。此外，MCTT比多通道Transformer的WERR高达11.62%，推理速度快15.8倍。我们进一步表明，我们可以通过限制注意力计算中的未来和先前上下文来提高MCTT的计算成本。摘要：Multi-channel inputs offer several advantages over single-channel, to improve the robustness of on-device speech recognition systems. Recent work on multi-channel transformer, has proposed a way to incorporate such inputs into end-to-end ASR for improved accuracy. However, this approach is characterized by a high computational complexity, which prevents it from being deployed in on-device systems. In this paper, we present a novel speech recognition model, Multi-Channel Transformer Transducer (MCTT), which features end-to-end multi-channel training, low computation cost, and low latency so that it is suitable for streaming decoding in on-device speech recognition. In a far-field in-house dataset, our MCTT outperforms stagewise multi-channel models with transformer-transducer up to 6.01% relative WER improvement (WERR). In addition, MCTT outperforms the multi-channel transformer up to 11.62% WERR, and is 15.8 times faster in terms of inference speed. We further show that we can improve the computational cost of MCTT by constraining the future and previous context in attention computations.

GAN|对抗|攻击|生成相关(10篇)

【1】 ML-based IoT Malware Detection Under Adversarial Settings: A Systematic Evaluation 标题：敌意环境下基于ML的物联网恶意软件检测系统评价链接：https://arxiv.org/abs/2108.13373

作者：Ahmed Abusnaina,Afsah Anwar,Sultan Alshamrani,Abdulrahman Alabduljabbar,RhongHo Jang,Daehun Nyang,David Mohaisen 机构：†University of Central Florida, ⋄ Wayne State University, ‡ Ewha Womans University 备注：11 pages 摘要：物联网（IoT）设备的快速增长与它们处于恶意攻击的前线是平行的。这导致了物联网恶意软件数量的爆炸性增长，并伴随着不断的变异、进化和复杂性。这些恶意软件是使用机器学习（ML）算法和传统的基于签名的方法检测的。尽管基于ML的检测器提高了检测性能，但它们容易受到恶意软件进化和复杂程度的影响，使它们仅限于经过训练的模式。这种持续的趋势推动了大量关于恶意软件分析和检测研究的文献，许多系统不断涌现，并超越了它们的前辈。在这项工作中，我们系统地研究了最先进的恶意软件检测方法，这些方法利用各种表示和学习技术，在一系列对抗性设置下。我们的分析强调了所提出的检测器在区分良性和恶意软件的学习模式中的不稳定性。结果表明，具有保留功能的操作（如剥离和填充）的软件突变显著降低了此类检测器的准确性。此外，我们对行业标准恶意软件检测器的分析表明，它们对恶意软件突变的不稳定性。摘要：The rapid growth of the Internet of Things (IoT) devices is paralleled by them being on the front-line of malicious attacks. This has led to an explosion in the number of IoT malware, with continued mutations, evolution, and sophistication. These malicious software are detected using machine learning (ML) algorithms alongside the traditional signature-based methods. Although ML-based detectors improve the detection performance, they are susceptible to malware evolution and sophistication, making them limited to the patterns that they have been trained upon. This continuous trend motivates the large body of literature on malware analysis and detection research, with many systems emerging constantly, and outperforming their predecessors. In this work, we systematically examine the state-of-the-art malware detection approaches, that utilize various representation and learning techniques, under a range of adversarial settings. Our analyses highlight the instability of the proposed detectors in learning patterns that distinguish the benign from the malicious software. The results exhibit that software mutations with functionality-preserving operations, such as stripping and padding, significantly deteriorate the accuracy of such detectors. Additionally, our analysis of the industry-standard malware detectors shows their instability to the malware mutations.

【2】 StackGAN: Facial Image Generation Optimizations 标题：StackGAN：面部图像生成优化链接：https://arxiv.org/abs/2108.13290

作者：Badr Belhiti,Justin Milushev,Avinash Gupta,John Breedis,Johnson Dinh,Jesse Pisel,Michael Pyrcz 机构： The University of Colorado atBoulder, Colorado 3Cockrell School of Engineering, The University of Texas at Austin, Texas 4Jackson School of Geosciences 摘要：当前最先进的照片级真实感生成器计算成本高，涉及不稳定的训练过程，并且在高维空间中具有不同的真实和合成分布。为了解决这些问题，我们提出了StackGAN架构的一个变体。新的体系结构结合了条件生成器，以在多个阶段构建图像。在我们的模型中，我们在两个不同的阶段生成灰度人脸图像：噪声到边缘（第一阶段）和边缘到灰度（第二阶段）。我们的模型使用CelebA面部图像数据集进行训练，边缘图像的Fr？echet初始距离（FID）分数为73，使用合成边缘图像生成的灰度图像的分数为59。尽管我们的模型相对于最先进的模型取得了低于标准的结果，但退出层可以减少条件映射中的过度拟合。此外，由于大多数图像可以分解为重要的特征，因此对我们模型的改进可以推广到其他数据集。因此，我们的模型有可能成为生成照片级真实感图像的传统方法的更好的替代方法。摘要：Current state-of-the-art photorealistic generators are computationally expensive, involve unstable training processes, and have real and synthetic distributions that are dissimilar in higher-dimensional spaces. To solve these issues, we propose a variant of the StackGAN architecture. The new architecture incorporates conditional generators to construct an image in many stages. In our model, we generate grayscale facial images in two different stages: noise to edges (stage one) and edges to grayscale (stage two). Our model is trained with the CelebA facial image dataset and achieved a Fr\'echet Inception Distance (FID) score of 73 for edge images and a score of 59 for grayscale images generated using the synthetic edge images. Although our model achieved subpar results in relation to state-of-the-art models, dropout layers could reduce the overfitting in our conditional mapping. Additionally, since most images can be broken down into important features, improvements to our model can generalize to other datasets. Therefore, our model can potentially serve as a superior alternative to traditional means of generating photorealistic images.

【3】 Adaptive perturbation adversarial training: based on reinforcement learning 标题：基于强化学习的自适应扰动对抗训练链接：https://arxiv.org/abs/2108.13239

作者：Zhishen Nie,Ying Lin,Sp Ren,Lan Zhang 机构： School of Software, Yunnan University, Kunming, Yunnan Province, China., Key Laboratory in Software Engineering of Yunnan Province. 摘要：对抗性训练已成为对抗对抗性样本的主要方法。然而，由于许多缺点，它很难实际应用。对抗训练的缺点之一是会降低正常样本的识别精度。为了缓解这一问题，提出了自适应扰动对抗训练。它使用接近决策边界但不跨越决策边界的边缘对抗性样本进行对抗性训练，在保持模型鲁棒性的同时提高了模型识别的准确性。然而，搜索边际对抗样本会带来额外的计算成本。提出了一种基于强化学习的边缘对抗样本发现方法，并将其与最新的快速对抗训练技术相结合，有效地加快了训练过程，降低了训练成本。摘要：Adversarial training has become the primary method to defend against adversarial samples. However, it is hard to practically apply due to many shortcomings. One of the shortcomings of adversarial training is that it will reduce the recognition accuracy of normal samples. Adaptive perturbation adversarial training is proposed to alleviate this problem. It uses marginal adversarial samples that are close to the decision boundary but does not cross the decision boundary for adversarial training, which improves the accuracy of model recognition while maintaining the robustness of the model. However, searching for marginal adversarial samples brings additional computational costs. This paper proposes a method for finding marginal adversarial samples based on reinforcement learning, and combines it with the latest fast adversarial training technology, which effectively speeds up training process and reduces training costs.

【4】 Wasserstein Generative Adversarial Uncertainty Quantification in Physics-Informed Neural Networks 标题：Wasserstein生成性对抗性不确定性在物理信息神经网络中的量化链接：https://arxiv.org/abs/2108.13054

作者：Yihang Gao,Michael K. Ng 机构： Deep learning has been widely applied in solving more complicated and specificDepartment of Mathematics, The University of Hong Kong, †Department of Mathematics 摘要：在本文中，我们研究了用于偏微分方程解中不确定性量化的Wasserstein生成对抗网络（WGANs）的物理信息算法。通过在对抗性网络鉴别器中使用groupsort激活函数，利用网络生成器学习从初始/边界数据观察到的偏微分方程解的不确定性。在温和的假设下，我们证明了当样本数足够多时，计算生成器的泛化误差以高概率收敛到网络的近似误差。根据我们确定的误差范围，我们还发现我们的物理信息WGANs对鉴别器的容量要求比发生器的容量要求更高。报告了偏微分方程合成示例的数值结果，以验证我们的理论结果，并说明如何获得偏微分方程解和初始/边界数据分布的不确定性量化。摘要：In this paper, we study a physics-informed algorithm for Wasserstein Generative Adversarial Networks (WGANs) for uncertainty quantification in solutions of partial differential equations. By using groupsort activation functions in adversarial network discriminators, network generators are utilized to learn the uncertainty in solutions of partial differential equations observed from the initial/boundary data. Under mild assumptions, we show that the generalization error of the computed generator converges to the approximation error of the network with high probability, when the number of samples are sufficiently taken. According to our established error bound, we also find that our physics-informed WGANs have higher requirement for the capacity of discriminators than that of generators. Numerical results on synthetic examples of partial differential equations are reported to validate our theoretical results and demonstrate how uncertainty quantification can be obtained for solutions of partial differential equations and the distributions of initial/boundary data.

【5】 Generating Answer Candidates for Quizzes and Answer-Aware Question Generators 标题：为测验和可识别答案的问题生成器生成候选答案链接：https://arxiv.org/abs/2108.12898

作者：Kristiyan Vachev,Momchil Hardalov,Georgi Karadzhov,Georgi Georgiev,Ivan Koychev,Preslav Nakov 机构：FMI, Sofia University, “St. Kliment Ohridski”, Sofia, Bulgaria, Department of Computer, Science and Technology, University of Cambridge, UK, Releva AI, FMI and GATE, Qatar Computing Research Institute, HBKU, Doha, Qatar 备注：None 摘要：在教育领域，开放式测验已经成为评估学生知识的重要工具。然而，手动准备此类问题是一项繁琐的任务，因此，自动生成问题已被提议作为一种可能的替代方案。到目前为止，绝大多数研究都集中在生成问题文本上，依赖于具有现成答案的问答数据集，而如何首先提出答案候选者的问题基本上被忽略了。在这里，我们的目标是弥合这一差距。特别是，我们提出了一个模型，可以为给定的文本段生成指定数量的候选答案，然后讲师可以使用该模型手动编写问题，或者将其作为输入传递给自动答案感知问题生成器。我们的实验表明，我们提出的答案候选生成模型优于几个基线。摘要：In education, open-ended quiz questions have become an important tool for assessing the knowledge of students. Yet, manually preparing such questions is a tedious task, and thus automatic question generation has been proposed as a possible alternative. So far, the vast majority of research has focused on generating the question text, relying on question answering datasets with readily picked answers, and the problem of how to come up with answer candidates in the first place has been largely ignored. Here, we aim to bridge this gap. In particular, we propose a model that can generate a specified number of answer candidates for a given passage of text, which can then be used by instructors to write questions manually or can be passed as an input to automatic answer-aware question generators. Our experiments show that our proposed answer candidate generation model outperforms several baselines.

【6】 DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks 标题：DropAttack：一种提高神经网络泛化能力的掩蔽权重对抗性训练方法链接：https://arxiv.org/abs/2108.12805

作者：Shiwen Ni,Jiawen Li,Hung-Yu Kao 机构：Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan 摘要：对抗训练已被证明是一种有效的正则化方法，可以提高模型的泛化能力。然而，现有的对抗性训练方法仅对原始输入样本或嵌入向量进行攻击，其攻击缺乏覆盖性和多样性。为了进一步提高攻击的广度和深度，我们提出了一种新的蒙面重量对抗训练方法DropAttack，它通过在不同维度的输入层和隐藏层中故意添加最坏情况下的对抗性扰动来增强模型的泛化，并最小化每一层产生的对抗性风险。DropAttack是一种通用技术，可以应用于各种不同结构的神经网络。为了验证该方法的有效性，我们使用了自然语言处理（NLP）和计算机视觉（CV）领域的五个公共数据集进行实验评估。我们将该方法与其他对抗训练方法和正则化方法进行了比较，我们的方法在所有数据集上都达到了最新水平。此外，与其他标准训练方法相比，Dropattack仅使用一半的训练数据，就可以获得相同的性能。理论分析表明，DropAttack可以对模型的一些输入参数和wight参数进行随机梯度正则化。进一步的可视化实验表明，DropAttack可以将模型的最小风险推到更低、更平坦的损失区域。我们的源代码在https://github.com/nishiwen1214/DropAttack. 摘要：Adversarial training has been proven to be a powerful regularization method to improve the generalization of models. However, current adversarial training methods only attack the original input sample or the embedding vectors, and their attacks lack coverage and diversity. To further enhance the breadth and depth of attack, we propose a novel masked weight adversarial training method called DropAttack, which enhances generalization of model by adding intentionally worst-case adversarial perturbations to both the input and hidden layers in different dimensions and minimize the adversarial risks generated by each layer. DropAttack is a general technique and can be adopt to a wide variety of neural networks with different architectures. To validate the effectiveness of the proposed method, we used five public datasets in the fields of natural language processing (NLP) and computer vision (CV) for experimental evaluating. We compare the proposed method with other adversarial training methods and regularization methods, and our method achieves state-of-the-art on all datasets. In addition, Dropattack can achieve the same performance when it use only a half training data compared to other standard training method. Theoretical analysis reveals that DropAttack can perform gradient regularization at random on some of the input and wight parameters of the model. Further visualization experiments show that DropAttack can push the minimum risk of the model to a lower and flatter loss landscapes. Our source code is publicly available on https://github.com/nishiwen1214/DropAttack.

【7】 Power-Based Attacks on Spatial DNN Accelerators 标题：基于能量的空间DNN加速器攻击链接：https://arxiv.org/abs/2108.12579

作者：Ge Li,Mohit Tiwari,Michael Orshansky 机构： The University of Texas at Austin, Department of Electrical and Computer Engineering 备注：18 pages, 10 figures, accepted by the ACM Journal on Emerging Technologies in Computing Systems 摘要：随着基于DNN的应用的激增，DNN模型的机密性是一个重要的商业目标。空间加速器，并行矩阵/向量运算，用于提高DNN计算的能量效率。最近，使用从加密设备上的差分功率分析（DPA）攻击派生的方法，演示了对简单加速器的模型提取攻击（使用单个处理元素或运行二值化网络）。本文使用通用的8位数字表示法研究了真实空间加速器的脆弱性。我们研究了两种具有权重平稳数据流的脉动阵列结构：（1）用于点积运算的3$\倍$1阵列，以及（2）用于矩阵向量乘法的3$\倍$3阵列。两者都在SAKURA-G FPGA板上实现。我们证明了这两种体系结构最终都是易受攻击的。传统DPA在1D阵列上完全成功，需要20K功率测量。但是，2D阵列即使具有460K记录道，也具有更高的安全性。我们证明这是因为2D阵列本质上需要同时依赖于同一输入的多个MAC。然而，我们发现一种具有多个分析阶段的新型基于模板的DPA能够完全破坏只有40K道的2D阵列。空间DNN加速器需要研究相应的对策。摘要：With proliferation of DNN-based applications, the confidentiality of DNN model is an important commercial goal. Spatial accelerators, that parallelize matrix/vector operations, are utilized for enhancing energy efficiency of DNN computation. Recently, model extraction attacks on simple accelerators, either with a single processing element or running a binarized network, were demonstrated using the methodology derived from differential power analysis (DPA) attack on cryptographic devices. This paper investigates the vulnerability of realistic spatial accelerators using general, 8-bit, number representation. We investigate two systolic array architectures with weight-stationary dataflow: (1) a 3 $\times$ 1 array for a dot-product operation, and (2) a 3 $\times$ 3 array for matrix-vector multiplication. Both are implemented on the SAKURA-G FPGA board. We show that both architectures are ultimately vulnerable. A conventional DPA succeeds fully on the 1D array, requiring 20K power measurements. However, the 2D array exhibits higher security even with 460K traces. We show that this is because the 2D array intrinsically entails multiple MACs simultaneously dependent on the same input. However, we find that a novel template-based DPA with multiple profiling phases is able to fully break the 2D array with only 40K traces. Corresponding countermeasures need to be investigated for spatial DNN accelerators.

【8】 Disrupting Adversarial Transferability in Deep Neural Networks 标题：深度神经网络中破坏敌意可转移性的研究链接：https://arxiv.org/abs/2108.12492

作者：Christopher Wiedeman,Ge Wang 机构：Rensselaer Polytechnic Institute 备注：18 pages, 12 figures 摘要：对抗性攻击的可转移性是深度学习中公认的现象。先前的工作通过识别共同的对抗子空间和决策边界之间的相关性部分解释了可转移性，但我们在文献中发现除此之外几乎没有其他解释。在本文中，我们提出，表面上不同的模型之间的可转移性是由于不同深度神经网络提取的特征之间具有高度的线性相关性。换句话说，在同一任务中训练的两个模型在参数空间中似乎相距遥远，它们可能以相同的方式提取特征，只是在潜在空间之间进行微小的移动和旋转。此外，我们还展示了如何应用特征相关损失（将提取的特征在潜在空间中解相关）来显著降低模型之间对抗性攻击的可转移性，这表明模型以语义不同的方式完成任务。最后，我们提出了一种双颈自动编码器（DNA），它利用这种特征相关性损失来创建输入信息的两种有意义的不同编码，降低了可传输性。摘要：Adversarial attack transferability is a well-recognized phenomenon in deep learning. Prior work has partially explained transferability by recognizing common adversarial subspaces and correlations between decision boundaries, but we have found little explanation in the literature beyond this. In this paper, we propose that transferability between seemingly different models is due to a high linear correlation between features that different deep neural networks extract. In other words, two models trained on the same task that are seemingly distant in the parameter space likely extract features in the same fashion, just with trivial shifts and rotations between the latent spaces. Furthermore, we show how applying a feature correlation loss, which decorrelates the extracted features in a latent space, can drastically reduce the transferability of adversarial attacks between models, suggesting that the models complete tasks in semantically different ways. Finally, we propose a Dual Neck Autoencoder (DNA), which leverages this feature correlation loss to create two meaningfully different encodings of input information with reduced transferability.

【9】 ReGen: Reinforcement Learning for Text and Knowledge Base Generation using Pretrained Language Models 标题：Regen：使用预先训练的语言模型生成文本和知识库的强化学习链接：https://arxiv.org/abs/2108.12472

作者：Pierre L. Dognin,Inkit Padhi,Igor Melnyk,Payel Das 机构：IBM Research 备注：Accepted to appear in the main conference of EMNLP 2021 摘要：从文本中自动构建相关知识库（KBs）以及从KBs中生成语义上有意义的文本都是机器学习的长期目标。在本文中，我们介绍了ReGen，一种利用强化学习（RL）来提高性能的文本和图形的双向生成。图线性化使我们能够将两个任务重新构建为序列到序列的生成问题，而不管生成方向如何，这反过来允许使用强化学习进行序列训练，其中模型本身被用作其自身的批评家，从而导致自临界序列训练（SCST）。我们进行了广泛的调查，证明通过SCST使用RL有利于WebNLG+2020和TekGen数据集上的图形和文本生成。我们的系统通过显著改进WebNLG 2020+挑战发布的文本到图形和图形到文本生成任务的结果，在WebNLG+2020上提供最先进的结果。摘要：Automatic construction of relevant Knowledge Bases (KBs) from text, and generation of semantically meaningful text from KBs are both long-standing goals in Machine Learning. In this paper, we present ReGen, a bidirectional generation of text and graph leveraging Reinforcement Learning (RL) to improve performance. Graph linearization enables us to re-frame both tasks as a sequence to sequence generation problem regardless of the generative direction, which in turn allows the use of Reinforcement Learning for sequence training where the model itself is employed as its own critic leading to Self-Critical Sequence Training (SCST). We present an extensive investigation demonstrating that the use of RL via SCST benefits graph and text generation on WebNLG+ 2020 and TekGen datasets. Our system provides state-of-the-art results on WebNLG+ 2020 by significantly improving upon published results from the WebNLG 2020+ Challenge for both text-to-graph and graph-to-text generation tasks.

【10】 A Dual Adversarial Calibration Framework for Automatic Fetal Brain Biometry 标题：一种用于胎儿脑生物自动测量的双重对抗性校准框架链接：https://arxiv.org/abs/2108.12719

作者：Yuan Gao,Lok Hin Lee,Richard Droste,Rachel Craik,Sridevi Beriwal,Aris Papageorghiou,Alison Noble 机构： 3Sridevi Beriwal 1Aris Papageorghiou 1Alison Noble 1 1 University of Oxford 2 Amazon 3 Kings College LondonAbstractThis paper presents a novel approach to automatic fetalbrain biometry motivated by needs in low- and medium- in-come countries 备注：CVAMD ICCV 2021 摘要：本文提出了一种基于中低收入国家需求的自动胎脑生物测量新方法。具体而言，我们利用高端（HE）超声图像为低成本（LC）护理点超声图像构建生物测量解决方案。我们提出了一种新的无监督域自适应方法来训练深度模型，使其对图像类型之间的显著图像分布偏移保持不变。我们提出的方法采用了双重对抗性校准（DAC）框架，包括对抗性路径，该路径强制执行模型不变性；i） LC图像特征空间中的对抗性扰动，以及ii）外观域差异。我们的双重对抗性校准方法在低成本超声设备的图像上估计经小脑直径和头围，平均绝对误差（MAE）为2.43mm和1.65mm，而SOTA的平均绝对误差分别为7.28mm和5.65mm。摘要：This paper presents a novel approach to automatic fetal brain biometry motivated by needs in low- and medium- income countries. Specifically, we leverage high-end (HE) ultrasound images to build a biometry solution for low-cost (LC) point-of-care ultrasound images. We propose a novel unsupervised domain adaptation approach to train deep models to be invariant to significant image distribution shift between the image types. Our proposed method, which employs a Dual Adversarial Calibration (DAC) framework, consists of adversarial pathways which enforce model invariance to; i) adversarial perturbations in the feature space derived from LC images, and ii) appearance domain discrepancy. Our Dual Adversarial Calibration method estimates transcerebellar diameter and head circumference on images from low-cost ultrasound devices with a mean absolute error (MAE) of 2.43mm and 1.65mm, compared with 7.28 mm and 5.65 mm respectively for SOTA.

半/弱/无/有监督|不确定性|主动学习(6篇)

【1】 Noisy Labels for Weakly Supervised Gamma Hadron Classification 标题：弱监督伽马强子分类的噪声标号链接：https://arxiv.org/abs/2108.13396

作者：Lukas Pfahler,Mirko Bunse,Katharina Morik 机构：Artificial Intelligence Group, TU Dortmund University, Dortmund, Germany 摘要：伽马强子分类是伽马射线天文学中的一项核心机器学习任务，通常采用监督学习来解决。然而，监督方法要求在复杂且昂贵的模拟中生成带注释的训练数据。我们建议用一种只使用真实望远镜记录的未标记数据的噪声标签方法来解决伽马强子分类问题。为此，我们将检测的重要性作为一种学习标准来解决这种形式的弱监督。我们表明，基于检测重要性的模型提供了最先进的结果，尽管只接受了噪声标签的训练；换句话说，我们的模型不需要天文学家用于分类器训练的昂贵的模拟地面真实值标签。我们的弱监督模型在源于各种其他应用领域的不平衡数据集上也表现出竞争性能。与现有的关于类条件标签噪声的工作相比，我们假设只有一个类噪声率是已知的。摘要：Gamma hadron classification, a central machine learning task in gamma ray astronomy, is conventionally tackled with supervised learning. However, the supervised approach requires annotated training data to be produced in sophisticated and costly simulations. We propose to instead solve gamma hadron classification with a noisy label approach that only uses unlabeled data recorded by the real telescope. To this end, we employ the significance of detection as a learning criterion which addresses this form of weak supervision. We show that models which are based on the significance of detection deliver state-of-the-art results, despite being exclusively trained with noisy labels; put differently, our models do not require the costly simulated ground-truth labels that astronomers otherwise employ for classifier training. Our weakly supervised models exhibit competitive performances also on imbalanced data sets that stem from a variety of other application domains. In contrast to existing work on class-conditional label noise, we assume that only one of the class-wise noise rates is known.

【2】 Unsupervised Learning of Deep Features for Music Segmentation 标题：深度特征的无监督学习在音乐分割中的应用链接：https://arxiv.org/abs/2108.12955

作者：Matthew C. McCallum 机构：Gracenote Inc. 备注：None 摘要：音乐分割是指在流行音乐中识别不同音乐片段（如合唱、韵文、桥牌等）之间的边界和标记的双重问题。一系列音乐分割算法的性能取决于选择的音频特征来表示音频。一些方法提出了从音乐片段注释数据学习特征转换，尽管创建此类数据非常耗时或昂贵，因此这些方法可能受到其数据集大小的限制。虽然带注释的音乐分割数据是一种稀缺资源，但可用音乐音频的数量要多得多。在邻近的语义音频领域，无监督深度学习在通过示例和声音分类任务改进查询解决方案的性能方面显示出了希望。在这项工作中，使用卷积神经网络（CNN）对深度特征嵌入进行无监督训练，用于音乐分割。所提出的技术仅利用任何音频时间线中隐含的音频特征的时间接近性。在一个经典的音乐分割算法中使用这些嵌入不仅可以显著提高算法的性能，而且在无监督的音乐分割中可以获得最先进的性能。摘要：Music segmentation refers to the dual problem of identifying boundaries between, and labeling, distinct music segments, e.g., the chorus, verse, bridge etc. in popular music. The performance of a range of music segmentation algorithms has been shown to be dependent on the audio features chosen to represent the audio. Some approaches have proposed learning feature transformations from music segment annotation data, although, such data is time consuming or expensive to create and as such these approaches are likely limited by the size of their datasets. While annotated music segmentation data is a scarce resource, the amount of available music audio is much greater. In the neighboring field of semantic audio unsupervised deep learning has shown promise in improving the performance of solutions to the query-by-example and sound classification tasks. In this work, unsupervised training of deep feature embeddings using convolutional neural networks (CNNs) is explored for music segmentation. The proposed techniques exploit only the time proximity of audio features that is implicit in any audio timeline. Employing these embeddings in a classic music segmentation algorithm is shown not only to significantly improve the performance of this algorithm, but obtain state of the art performance in unsupervised music segmentation.

【3】 Uncertainty quantification for multiclass data description 标题：多类数据描述的不确定性量化链接：https://arxiv.org/abs/2108.12857

作者：Leila Kalantari,Jose Principe,Kathryn E. Sieving 机构：Electrical & Computer Engineering, University of Florida, Wildlife Ecology & Conservation, University of Florida 摘要：本文提出了一种基于核马氏距离（MDD-KM）的自适应超参数多类数据描述模型。MDD-KM提供了不确定性量化，可用于为实际场景（测试数据中存在分布外（OOD）样本）构建分类系统。给定一个测试信号，计算与信号和每个训练类之间的经验核马氏距离相关的量。由于这些数量对应于相同的再生核希尔BERT空间，它们是可公度的，因此可以很容易地作为分类分数处理，而无需进一步应用融合技术。为了设置核参数，我们利用高斯过程的预测方差（GP）在使用集中核时是经验核马氏距离这一事实，并建议使用GP的负似然函数作为代价函数。我们对鸟类音符分类的实际问题进行了实验。我们报告了一个以MDD-KM为组件的基于分层线性动力系统的原型分类系统。我们的分类系统不需要声音事件检测作为预处理步骤，并且能够在测试音频剪辑中的OOD样本（对应于未知的不感兴趣的音符）中找到长度不同的训练鸟类音符的实例。利用领域知识根据原始分类分数做出清晰的决策。我们证明了MDD-KM优于可能的K-最近邻。摘要：In this manuscript, we propose a multiclass data description model based on kernel Mahalanobis distance (MDD-KM) with self-adapting hyperparameter setting. MDD-KM provides uncertainty quantification and can be deployed to build classification systems for the realistic scenario where out-of-distribution (OOD) samples are present among the test data. Given a test signal, a quantity related to empirical kernel Mahalanobis distance between the signal and each of the training classes is computed. Since these quantities correspond to the same reproducing kernel Hilbert space, they are commensurable and hence can be readily treated as classification scores without further application of fusion techniques. To set kernel parameters, we exploit the fact that predictive variance according to a Gaussian process (GP) is empirical kernel Mahalanobis distance when a centralized kernel is used, and propose to use GP's negative likelihood function as the cost function. We conduct experiments on the real problem of avian note classification. We report a prototypical classification system based on a hierarchical linear dynamical system with MDD-KM as a component. Our classification system does not require sound event detection as a preprocessing step, and is able to find instances of training avian notes with varying length among OOD samples (corresponding to unknown notes of disinterest) in the test audio clip. Domain knowledge is leveraged to make crisp decisions from raw classification scores. We demonstrate the superior performance of MDD-KM over possibilistic K-nearest neighbor.

【4】 Deep Dive into Semi-Supervised ELBO for Improving Classification Performance 标题：深入研究半监督ELBO改进分类性能链接：https://arxiv.org/abs/2108.12734

作者：Fahim Faisal Niloy,M. Ashraful Amin,AKM Mahbubur Rahman,Amin Ahsan Ali 机构：Agency Lab, Independent University Bangladesh, agencylab.org 备注：Under Review 摘要：对用于密度估计的VAE证据下限（ELBO）目标的分解揭示了VAE用于表征学习的不足，并提出了改进模型的方法。在本文中，我们研究是否可以通过使用VAE模型分解ELBO进行半监督分类来获得类似的见解。具体来说，我们表明，在ELBO目标最大化过程中，输入和类标签之间的互信息减少。我们提出了一种解决这个问题的方法。我们还实施聚类假设来帮助分类。在不同数据集上的实验表明，我们的方法可以提高现有基于VAE的半监督模型的分类性能。实验还表明，这可以在不牺牲模型生成能力的情况下实现。摘要：Decomposition of the evidence lower bound (ELBO) objective of VAE used for density estimation revealed the deficiency of VAE for representation learning and suggested ways to improve the model. In this paper, we investigate whether we can get similar insights by decomposing the ELBO for semi-supervised classification using VAE model. Specifically, we show that mutual information between input and class labels decreases during maximization of ELBO objective. We propose a method to address this issue. We also enforce cluster assumption to aid in classification. Experiments on a diverse datasets verify that our method can be used to improve the classification performance of existing VAE based semi-supervised models. Experiments also show that, this can be achieved without sacrificing the generative power of the model.

【5】 WALNUT: A Benchmark on Weakly Supervised Learning for Natural Language Understanding 标题：核桃：自然语言理解的弱监督学习基准链接：https://arxiv.org/abs/2108.12603

作者：Guoqing Zheng,Giannis Karamanolakis,Kai Shu,Ahmed Hassan Awadallah 机构：Microsoft Research, Columbia University, Illinois Institute of Technology 摘要：为自然语言理解（NLU）任务建立高质量的机器学习模型在很大程度上依赖于标记数据。当大量标记数据不可用或获取成本高昂时，弱监管可以提供有价值的监管。现有的研究NLU弱监督的工作要么主要集中在特定的任务上，要么模拟来自地面真相标签的弱监督信号。到目前为止，对于一系列NLU任务，具有真实世界弱监督信号的NLU基准仍然不可用。在本文中，我们提出了这样一个基准，名为核桃，以倡导和促进对NLU薄弱监管的研究。WALNUT由不同类型的NLU任务组成，包括文档级预测任务和令牌级预测任务，每个任务都包含由多个实际弱源生成的弱标签。我们在基准上进行基线评估，以系统地测试NLU任务弱监督的价值，使用各种弱监督方法和模型架构。我们展示了弱监管对低资源NLU任务的好处，并期望WALNUT能够促进对方法学的进一步研究，以最好地利用弱监管。基准和基线代码将在aka.ms/walnut_benchmark上公开。摘要：Building quality machine learning models for natural language understanding (NLU) tasks relies heavily on labeled data. Weak supervision has been shown to provide valuable supervision when large amount of labeled data is unavailable or expensive to obtain. Existing works studying weak supervision for NLU either mostly focus on a specific task or simulate weak supervision signals from ground-truth labels. To date a benchmark for NLU with real world weak supervision signals for a collection of NLU tasks is still not available. In this paper, we propose such a benchmark, named WALNUT, to advocate and facilitate research on weak supervision for NLU. WALNUT consists of NLU tasks with different types, including both document-level prediction tasks and token-level prediction tasks and for each task contains weak labels generated by multiple real-world weak sources. We conduct baseline evaluations on the benchmark to systematically test the value of weak supervision for NLU tasks, with various weak supervision methods and model architectures. We demonstrate the benefits of weak supervision for low-resource NLU tasks and expect WALNUT to stimulate further research on methodologies to best leverage weak supervision. The benchmark and code for baselines will be publicly available at aka.ms/walnut_benchmark.

【6】 Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function 标题：使用不确定性感知概率损失函数的DNA编码库计数数据的机器学习链接：https://arxiv.org/abs/2108.12471

作者：Katherine S. Lim,Andrew G. Reidenbach,Bruce K. Hua,Jeremy W. Mason,Christopher J. Gerry,Paul A. Clemons,Connor W. Coley 机构：Coley∗,†,⊥,¶, †Department of Electrical Engineering and Computer Science, Massachusetts Institute of, Technology, Massachusetts Avenue, Cambridge, Massachusetts , United States, ‡Department of Biology, Massachusetts Institute of Technology, Massachusetts Avenue 备注：45+101 pages (manuscript+SI) 摘要：DNA编码文库（DEL）筛选和定量构效关系（QSAR）建模是药物发现中用于寻找结合蛋白质靶点的小分子的两种技术。将QSAR模型应用于DEL数据有助于选择用于DNA外合成和评估的化合物。最近，通过训练二进制分类器来学习聚合的“双同步子”的DEL丰富度，以适应DEL数据的稀疏和噪声特性，展示了这种组合方法。然而，二元分类器无法区分不同的富集程度，并且在双音聚合过程中可能会丢失信息。在这里，我们展示了一种使用自定义负对数似然损失函数学习单个分子DEL富集的回归方法，该函数有效地去噪DEL数据，并引入了学习到的结构-活性关系（SAR）可视化的机会。我们的方法在一个频繁者的观点下，明确地对DEL实验工作流中使用的排序过程的泊松统计进行建模。我们在针对CAIX筛选的108k化合物数据集和针对sEH和SIRT2筛选的570万化合物数据集上说明了这种方法。由于通过负对数似然损失函数处理数据中的不确定性，模型可以忽略低置信异常值。虽然我们的方法没有证明外推到新结构的益处，但我们期望我们的去噪和可视化管道在确定DEL数据中的SAR趋势和丰富的药效团方面是有用的。此外，这种不确定性感知回归方法适用于其他稀疏或噪声数据集，其中随机性的性质已知或可以建模；特别是，我们使用的泊松富集比指标可以应用于比较两个实验条件之间测序计数数据的其他设置。摘要：DNA-encoded library (DEL) screening and quantitative structure-activity relationship (QSAR) modeling are two techniques used in drug discovery to find small molecules that bind a protein target. Applying QSAR modeling to DEL data can facilitate the selection of compounds for off-DNA synthesis and evaluation. Such a combined approach has been shown recently by training binary classifiers to learn DEL enrichments of aggregated "disynthons" to accommodate the sparse and noisy nature of DEL data. However, a binary classifier cannot distinguish between different levels of enrichment, and information is potentially lost during disynthon aggregation. Here, we demonstrate a regression approach to learning DEL enrichments of individual molecules using a custom negative log-likelihood loss function that effectively denoises DEL data and introduces opportunities for visualization of learned structure-activity relationships (SAR). Our approach explicitly models the Poisson statistics of the sequencing process used in the DEL experimental workflow under a frequentist view. We illustrate this approach on a dataset of 108k compounds screened against CAIX, and a dataset of 5.7M compounds screened against sEH and SIRT2. Due to the treatment of uncertainty in the data through the negative log-likelihood loss function, the models can ignore low-confidence outliers. While our approach does not demonstrate a benefit for extrapolation to novel structures, we expect our denoising and visualization pipeline to be useful in identifying SAR trends and enriched pharmacophores in DEL data. Further, this approach to uncertainty-aware regression is applicable to other sparse or noisy datasets where the nature of stochasticity is known or can be modeled; in particular, the Poisson enrichment ratio metric we use can apply to other settings that compare sequencing count data between two experimental conditions.

迁移|Zero/Few/One-Shot|自适应(2篇)

【1】 Partial Domain Adaptation without Domain Alignment 标题：不需要域对齐的部分域自适应链接：https://arxiv.org/abs/2108.12867

作者：Weikai Li,Songcan Chen 机构： Nan-jing University of Aeronautics and Astronautics of (NUAA) 备注：10 pages 摘要：无监督领域自适应（UDA）旨在将知识从标记良好的源领域转移到具有相同标记空间的不同但相关的未标记目标领域。目前，解决UDA的主要工作是域对齐，这已被证明是成功的。然而，通常很难找到具有相同标签空间的适当源域。一个更实际的场景是所谓的部分域适配（PDA），其中源标签集或空间包含目标标签集或空间。不幸的是，在PDA中，由于源域中存在不相关的类别，很难获得完美的对齐，从而导致模式崩溃和负迁移。尽管已经通过降低不相关源类别的权重做出了一些努力，但所使用的策略往往是繁重和危险的，因为哪些不相关类别是未知的。这些挑战促使我们找到一种相对简单的解决PDA的方法。为了实现这一点，我们首先提供了一个彻底的理论分析，这表明目标风险受到模型平滑度和域间差异的限制。考虑到在求解PDA时完美对齐的困难，我们转而关注模型的平滑性，而放弃了风险较大的域对齐，以增强模型的适应性。具体来说，我们将模型平滑度实例化为一个非常简单的域内结构保持（IDSP）。据我们所知，这是第一次在没有域对齐的情况下解决PDA的天真尝试。最后，我们在多个基准数据集上的实证结果表明，IDSP不仅在某些基准上显著优于PDA SOTA（例如，在Cl->Rw上为+10%，在Ar->Rw上为+8%），而且还补充了标准UDA中的域对齐摘要：Unsupervised domain adaptation (UDA) aims to transfer knowledge from a well-labeled source domain to a different but related unlabeled target domain with identical label space. Currently, the main workhorse for solving UDA is domain alignment, which has proven successful. However, it is often difficult to find an appropriate source domain with identical label space. A more practical scenario is so-called partial domain adaptation (PDA) in which the source label set or space subsumes the target one. Unfortunately, in PDA, due to the existence of the irrelevant categories in the source domain, it is quite hard to obtain a perfect alignment, thus resulting in mode collapse and negative transfer. Although several efforts have been made by down-weighting the irrelevant source categories, the strategies used tend to be burdensome and risky since exactly which irrelevant categories are unknown. These challenges motivate us to find a relatively simpler alternative to solve PDA. To achieve this, we first provide a thorough theoretical analysis, which illustrates that the target risk is bounded by both model smoothness and between-domain discrepancy. Considering the difficulty of perfect alignment in solving PDA, we turn to focus on the model smoothness while discard the riskier domain alignment to enhance the adaptability of the model. Specifically, we instantiate the model smoothness as a quite simple intra-domain structure preserving (IDSP). To our best knowledge, this is the first naive attempt to address the PDA without domain alignment. Finally, our empirical results on multiple benchmark datasets demonstrate that IDSP is not only superior to the PDA SOTAs by a significant margin on some benchmarks (e.g., +10% on Cl->Rw and +8% on Ar->Rw ), but also complementary to domain alignment in the standard UDA

【2】 Variational Inference with NoFAS: Normalizing Flow with Adaptive Surrogate for Computationally Expensive Models 标题：使用NOFAS的变分推理：使用自适应代理对计算昂贵的模型进行流归一化链接：https://arxiv.org/abs/2108.12657

作者：Yu Wang,Fang Liu,Daniele E. Schiavazzi 机构：Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA 摘要：从数据中快速推断数值模型参数是生成广泛应用的预测模型的重要前提。当每个似然评估的计算成本很高时，使用基于抽样的方法（如马尔可夫链蒙特卡罗）可能会变得很困难。将变分推理与归一化流相结合的新方法的特点是计算量仅随潜在变量空间的维数线性增长，并且依赖基于梯度的优化而不是采样，从而为模型参数的贝叶斯推理提供了一种更有效的方法。此外，通过使用离线训练的替代模型（如神经网络）替换真实模型，可以降低频繁评估昂贵可能性的成本。然而，当替代物在后验模态周围不够精确时，这种方法可能会产生显著的偏差。为了在不牺牲推理精度的情况下降低计算成本，我们提出了一种使用自适应代理（NoFAS）的归一化流优化策略，该策略交替更新归一化流参数和神经网络代理模型的权重。我们还提出了一种用于代理模型训练的有效样本加权方案，该方案在捕获产生观测数据的参数的可能区域的同时，确保了代理模型的某些全局准确性。我们展示了NoFAS相对于各种基准的推理和计算优势，包括底层模型缺乏可识别性的情况。本研究所用的源代码和数值实验可在https://github.com/cedricwangyu/NoFAS. 摘要：Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and the weights of a neural network surrogate model. We also propose an efficient sample weighting scheme for surrogate model training that ensures some global accuracy of the surrogate while capturing the likely regions of the parameters that yield the observed data. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at https://github.com/cedricwangyu/NoFAS.

强化学习(5篇)

【1】 Deep Reinforcement Learning at the Edge of the Statistical Precipice 标题：统计悬崖边缘的深度强化学习链接：https://arxiv.org/abs/2108.13264

作者：Rishabh Agarwal,Max Schwarzer,Pablo Samuel Castro,Aaron Courville,Marc G. Bellemare 机构：Google Research, Brain Team, MILA, Université de Montréal, CIFAR Fellow, MILA, McGill University 摘要：深度强化学习（RL）算法主要是通过比较它们在大量任务中的相对性能来评估的。在deep RL基准上公布的大多数结果都比较了总体绩效的点估计值，如任务的平均分和中位数，忽略了使用有限次数的训练所隐含的统计不确定性。从Arcade Learning Environment（ALE）开始，向计算要求高的基准的转变导致了每个任务只评估少量运行的实践，加剧了点估计的统计不确定性。在本文中，我们认为，在少数运行深度RL制度下的可靠评估不能忽视结果的不确定性，而不会降低该领域进展的风险。我们使用Atari 100k基准的案例研究来说明这一点，我们发现单独从点估计得出的结论与更彻底的统计分析得出的结论之间存在重大差异。为了通过少量的运行提高现场对报告结果的信心，我们提倡报告总体绩效的区间估计，并提出绩效概况，以说明结果的可变性，同时提出更稳健和有效的总体指标，如四分位平均分，实现结果的小不确定性。使用这些统计工具，我们仔细检查了其他广泛使用的RL基准（包括ALE、Procgen和DeepMind Control Suite）上现有算法的性能评估，再次揭示了先前比较中的差异。我们的发现要求改变我们评估deep RL性能的方式，为此，我们提出了更严格的评估方法，并提供了一个开源的rliable库，以防止不可靠的结果使该领域停滞不前。摘要：Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field.

【2】 Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning 标题：多智能体强化学习中智能体的学习元表示链接：https://arxiv.org/abs/2108.12988

作者：Shenao Zhang,Li Shen,Lei Han,Li Shen 机构：Georgia Institute of Technology, Tencent AI Lab, Tencent Robtics X, JD Explore Academy 摘要：在多智能体强化学习中，智能体在单个马尔可夫博弈（MG）中学习的行为通常限于给定的智能体数（即种群大小）。每一个由不同种群规模引起的MG都可能具有不同的最优联合策略和特定于博弈的知识，这些在现代多智能体算法中是独立建模的。在这项工作中，我们专注于创建在不同人群MG中通用的代理。与学习单峰策略不同，每个代理学习一个策略集，该策略集由各种游戏中的有效策略组成。我们提出了代理元表示（MRA），明确地模拟了游戏中常见的和特定于游戏的战略知识。通过使用多模态潜在策略表示策略集，通过迭代优化过程发现公共策略知识和不同的策略模式。我们证明了作为约束互信息最大化目标的近似，在Lipschitz博弈的假设下，在足够大的潜在空间上，所学习的策略可以在每个评估MG中达到纳什均衡。当在实际的有限尺寸的潜在模型上部署时，利用一阶梯度信息可以实现快速自适应。大量实验表明，MRA在硬游戏和看不见游戏的训练性能和泛化能力方面都是有效的。摘要：In multi-agent reinforcement learning, the behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number (i.e., population size). Every single MG induced by varying population sizes may possess distinct optimal joint strategies and game-specific knowledge, which are modeled independently in modern multi-agent algorithms. In this work, we focus on creating agents that generalize across population-varying MGs. Instead of learning a unimodal policy, each agent learns a policy set that is formed by effective strategies across a variety of games. We propose Meta Representations for Agents (MRA) that explicitly models the game-common and game-specific strategic knowledge. By representing the policy sets with multi-modal latent policies, the common strategic knowledge and diverse strategic modes are discovered with an iterative optimization procedure. We prove that as an approximation to a constrained mutual information maximization objective, the learned policies can reach Nash Equilibrium in every evaluation MG under the assumption of Lipschitz game on a sufficiently large latent space. When deploying it at practical latent models with limited size, fast adaptation can be achieved by leveraging the first-order gradient information. Extensive experiments show the effectiveness of MRA on both training performance and generalization ability in hard and unseen games.

【3】 A Policy Efficient Reduction Approach to Convex Constrained Deep Reinforcement Learning 标题：一种策略有效的凸约束深度强化学习约简方法链接：https://arxiv.org/abs/2108.12916

作者：Tianchi Cai,Wenpeng Zhang,Lihong Gu,Xiaodong Zeng,Jinjie Gu 备注：Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 38th International Conference on Machine Learning, 2021 摘要：尽管基于价值的方法在一般强化学习（RL）中已得到广泛应用，但在约束强化学习（CRL）中却很少有人研究，因为它们无法找到可以在多个动作之间随机进行的策略。为了将基于值的方法应用于CRL，最近一种开创性的博弈论方法使用了混合策略，该策略在一组精心生成的策略中随机化，以收敛到期望的满足约束的策略。然而，这些方法需要存储大量策略，这不是策略有效的，并且在受约束的深度RL中可能会产生令人望而却步的内存开销。为了解决这个问题，我们提出了一种替代方法。我们的方法首先将CRL转化为一个等价的距离优化问题。通过一个专门设计的线性优化oracle，我们导出了一个元算法，该算法使用任何现成的RL算法和任何条件梯度（CG）类型的算法作为子例程来解决该问题。然后，我们提出了一种新的CG型算法，它推广了最小范数点（MNP）方法。该方法与现有博弈论方法的收敛速度相匹配，达到了最坏情况下的最优策略效率。在一个导航任务上的实验表明，该方法在降低存储开销的同时取得了较好的性能，证明了其有效性和效率。摘要：Although well-established in general reinforcement learning (RL), value-based methods are rarely explored in constrained RL (CRL) for their incapability of finding policies that can randomize among multiple actions. To apply value-based methods to CRL, a recent groundbreaking line of game-theoretic approaches uses the mixed policy that randomizes among a set of carefully generated policies to converge to the desired constraint-satisfying policy. However, these approaches require storing a large set of policies, which is not policy efficient, and may incur prohibitive memory costs in constrained deep RL. To address this problem, we propose an alternative approach. Our approach first reformulates the CRL to an equivalent distance optimization problem. With a specially designed linear optimization oracle, we derive a meta-algorithm that solves it using any off-the-shelf RL algorithm and any conditional gradient (CG) type algorithm as subroutines. We then propose a new variant of the CG-type algorithm, which generalizes the minimum norm point (MNP) method. The proposed method matches the convergence rate of the existing game-theoretic approaches and achieves the worst-case optimal policy efficiency. The experiments on a navigation task show that our method reduces the memory costs by an order of magnitude, and meanwhile achieves better performance, demonstrating both its effectiveness and efficiency.

【4】 Harvesting Idle Resources in Serverless Computing via Reinforcement Learning 标题：基于强化学习的无服务器计算中空闲资源的获取链接：https://arxiv.org/abs/2108.12717

作者：Hanfei Yu,Hao Wang,Jian Li,Seung-Jong Park 机构：CSE Division, Louisiana State University ,ECE Department, SUNY-Binghamton University 摘要：无服务器计算已经成为一种新的云计算范式，它承诺通过细粒度的自动资源扩展提供高成本效率和简化的云部署。用户将云应用程序解耦为链式功能，并分别在兆字节级和核心级预设每个无服务器功能的内存和CPU需求。然后，无服务器平台会自动调整功能数量以适应工作负载。然而，链式功能的复杂性使得准确确定每个功能对用户的资源需求变得非常重要，从而导致单个功能的资源供应过度或不足。本文介绍了FaaSRM，一种用于无服务器平台的新资源管理器（RM），它通过动态地从供应过剩的函数到供应不足的函数获取空闲资源，从而最大限度地提高资源效率。FaaSRM实时监控每个功能的资源利用率，检测过度配置和不足配置，并应用深度强化学习，使用保护机制安全地获取空闲资源，有效地加速功能。我们已经在一个13节点的ApacheOpenWhisk集群中实现并部署了一个FaaSRM原型。OpenWhisk集群上的实验结果表明，FaaSRM通过从38.8%的调用中获取空闲资源并加速39.2%的调用，将98%的函数调用的执行时间比基线RMs减少了35.81%。摘要：Serverless computing has become a new cloud computing paradigm that promises to deliver high cost-efficiency and simplified cloud deployment with automated resource scaling at a fine granularity. Users decouple a cloud application into chained functions and preset each serverless function's memory and CPU demands at megabyte-level and core-level, respectively. Serverless platforms then automatically scale the number of functions to accommodate the workloads. However, the complexities of chained functions make it non-trivial to accurately determine the resource demands of each function for users, leading to either resource over-provision or under-provision for individual functions. This paper presents FaaSRM, a new resource manager (RM) for serverless platforms that maximizes resource efficiency by dynamically harvesting idle resources from functions over-supplied to functions under-supplied. FaaSRM monitors each function's resource utilization in real-time, detects over-provisioning and under-provisioning, and applies deep reinforcement learning to harvest idle resources safely using a safeguard mechanism and accelerate functions efficiently. We have implemented and deployed a FaaSRM prototype in a 13-node Apache OpenWhisk cluster. Experimental results on the OpenWhisk cluster show that FaaSRM reduces the execution time of 98% of function invocations by 35.81% compared to the baseline RMs by harvesting idle resources from 38.8% of the invocations and accelerating 39.2% of the invocations.

【5】 Influence-based Reinforcement Learning for Intrinsically-motivated Agents 标题：基于影响的内在激励智能体强化学习链接：https://arxiv.org/abs/2108.12581

作者：Ammar Fayad,Majd Ibrahim 机构：Massachusetts Institute of Technology, Higher Institute for Applied Sciences and Technology 备注：10 pages, 1 figure, 3 Tables 摘要：强化学习（RL）的研究领域非常活跃，有几个重要的应用。然而，某些挑战仍然需要解决，其中可以提到的是，在解决特定任务的同时，找到能够实现充分探索和协调的政策的能力。在这项工作中，我们提出了一个由两个具有不同目标的RL代理组成的算法框架。我们引入了一种新的函数近似方法来评估某一政策对其他政策的影响。当优化$F$作为$\pi$目标的正则化器时，代理学习协调团队行为，同时利用解决方案空间的高回报区域。此外，两个代理都使用预测错误作为内在动机来学习行为尽可能不同的策略，从而达到探索标准。我们的方法在一系列OpenAI健身房任务以及合作和混合场景中进行了评估，在这些场景中，代理群体能够发现各种物理和信息协调策略，与著名的基线相比，表现出最先进的性能。摘要：The reinforcement learning (RL) research area is very active, with several important applications. However, certain challenges still need to be addressed, amongst which one can mention the ability to find policies that achieve sufficient exploration and coordination while solving a given task. In this work, we present an algorithmic framework of two RL agents each with a different objective. We introduce a novel function approximation approach to assess the influence $F$ of a certain policy on others. While optimizing $F$ as a regularizer of $\pi$'s objective, agents learn to coordinate team behavior while exploiting high-reward regions of the solution space. Additionally, both agents use prediction error as intrinsic motivation to learn policies that behave as differently as possible, thus achieving the exploration criterion. Our method was evaluated on the suite of OpenAI gym tasks as well as cooperative and mixed scenarios, where agent populations are able to discover various physical and informational coordination strategies, showing state-of-the-art performance when compared to famous baselines.

医学相关(4篇)

【1】 Ovarian Cancer Prediction from Ovarian Cysts Based on TVUS Using Machine Learning Algorithms 标题：基于TVUS的机器学习算法在卵巢囊肿卵巢癌预测中的应用链接：https://arxiv.org/abs/2108.13387

作者：Laboni Akter,Nasrin Akhter 机构：Department of Biomedical Engineering, Khulna University of Engineering, & Technology, Khulna, Bangladesh 备注：This paper has been published in International Conference on Big Data, IoT and Machine Learning 2021 (BIM 2021) 摘要：卵巢癌（OC）是一种女性生殖系统恶性肿瘤，可在年轻女孩中发现，且大多数为生育期或生殖期妇女。很少有囊肿是危险的，可能会导致癌症。因此，通过阴道超声（TVUS）筛查进行预测是非常重要的，可以从不同类型的筛查中进行检测。在这项研究中，我们使用了一个名为PLCO的实际数据集，在三个目标变量内使用了TVUS筛选和三种机器学习（ML）技术，分别是随机森林KNN和XGBoost。在准确率、召回率、f1分数和准确率方面，我们分别获得了99.50%、99.50%、99.49%和99.50%的近似值。在这些随机森林算法、KNN算法和XGB算法中观察到的AUC得分分别为99.87%、98.97%和99.88%。这种方法有助于医生和嫌疑人及早识别卵巢风险，减少卵巢恶性肿瘤相关并发症和死亡。摘要：Ovarian Cancer (OC) is type of female reproductive malignancy which can be found among young girls and mostly the women in their fertile or reproductive. There are few number of cysts are dangerous and may it cause cancer. So, it is very important to predict and it can be from different types of screening are used for this detection using Transvaginal Ultrasonography (TVUS) screening. In this research, we employed an actual datasets called PLCO with TVUS screening and three machine learning (ML) techniques, respectively Random Forest KNN, and XGBoost within three target variables. We obtained a best performance from this algorithms as far as accuracy, recall, f1 score and precision with the approximations of 99.50%, 99.50%, 99.49% and 99.50% individually. The AUC score of 99.87%, 98.97% and 99.88% are observed in these Random Forest, KNN and XGB algorithms .This approach helps assist physicians and suspects in identifying ovarian risks early on, reducing ovarian malignancy-related complications and deaths.

【2】 Privacy-preserving Machine Learning for Medical Image Classification 标题：基于隐私保护的机器学习在医学图像分类中的应用链接：https://arxiv.org/abs/2108.12816

作者：Shreyansh Singh,K. K. Shukla 机构：Indian Institute of Technology (BHU), Varanasi, India 摘要：随着机器学习（ML）和深度学习（DL）在各行各业的应用日益广泛，医疗行业也紧随其后。在这个行业中，ML的一个非常简单但非常重要的用例是用于图像分类。这对于医生帮助他们及时发现某些疾病非常重要，从而有助于减少人类判断错误的机会。然而，当使用像这样的自动化系统时，也存在隐私问题。攻击者不应能够访问患者的医疗记录和图像。还要求模型是安全的，发送给模型的数据和接收到的预测都不应以明文形式透露给模型。在这项研究中，我们旨在通过检查胸部x射线图像来检测肺炎的医学图像分类问题来解决这些问题。摘要：With the rising use of Machine Learning (ML) and Deep Learning (DL) in various industries, the medical industry is also not far behind. A very simple yet extremely important use case of ML in this industry is for image classification. This is important for doctors to help them detect certain diseases timely, thereby acting as an aid to reduce chances of human judgement error. However, when using automated systems like these, there is a privacy concern as well. Attackers should not be able to get access to the medical records and images of the patients. It is also required that the model be secure, and that the data that is sent to the model and the predictions that are received both should not be revealed to the model in clear text. In this study, we aim to solve these problems in the context of a medical image classification problem of detection of pneumonia by examining chest x-ray images.

【3】 Combining chest X-rays and EHR data using machine learning to diagnose acute respiratory failure 标题：利用机器学习结合胸部X线和EHR数据诊断急性呼吸衰竭链接：https://arxiv.org/abs/2108.12530

作者：Sarah Jabbour,David Fouhey,Ella Kazerooni,Jenna Wiens,Michael W Sjoding 机构：Affiliations:, . Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, . Department of Radiology, University of Michigan Medical School, Ann Arbor, Michigan 备注：43 pages, 10 tables, 4 figures 摘要：当患者出现急性呼吸衰竭时，准确识别潜在病因对于确定最佳治疗至关重要，但在临床实践中区分常见诊断可能具有挑战性。机器学习模型可以通过增强临床决策来改善医学诊断，并在急性呼吸衰竭患者的诊断评估中发挥作用。虽然已经开发了机器学习模型来识别胸片（如肺炎）的常见发现，但通过分析电子健康记录（EHR）中的临床相关数据来增强这些方法可能有助于急性呼吸衰竭的诊断。对机器学习模型进行训练，以使用胸片和EHR数据预测急性呼吸衰竭（肺炎、心力衰竭和/或COPD）的原因，这些数据来自内部队列中的患者，使用基于医生图表审查的诊断。模型还使用出院诊断代码对外部队列中的患者进行了测试。结合胸片和EHR数据的模型在肺炎和COPD方面优于单独基于每种模式的模型。对于肺炎，联合模型AUROC为0.79（0.78-0.79），图像模型AUROC为0.73（0.72-0.75），EHR模型AUROC为0.73（0.70-0.76）；对于慢性阻塞性肺病，综合指数为0.89（0.83-0.91），影像指数为0.85（0.77-0.89），EHR为0.80（0.76-0.84）；对于心力衰竭，合并：0.80（0.77-0.84），图像：0.77（0.71-0.81），EHR：0.80（0.75-0.82）。在外部队列中，心力衰竭和COPD的表现是一致的，但肺炎的表现略有下降。总的来说，结合胸片和EHR数据的机器学习模型可以准确区分急性呼吸衰竭的常见原因。需要进一步的工作来确定这些模型是否可以帮助临床医生在临床环境中诊断急性呼吸衰竭。摘要：When patients develop acute respiratory failure, accurately identifying the underlying etiology is essential for determining the best treatment, but it can be challenging to differentiate between common diagnoses in clinical practice. Machine learning models could improve medical diagnosis by augmenting clinical decision making and play a role in the diagnostic evaluation of patients with acute respiratory failure. While machine learning models have been developed to identify common findings on chest radiographs (e.g. pneumonia), augmenting these approaches by also analyzing clinically relevant data from the electronic health record (EHR) could aid in the diagnosis of acute respiratory failure. Machine learning models were trained to predict the cause of acute respiratory failure (pneumonia, heart failure, and/or COPD) using chest radiographs and EHR data from patients within an internal cohort using diagnoses based on physician chart review. Models were also tested on patients in an external cohort using discharge diagnosis codes. A model combining chest radiographs and EHR data outperformed models based on each modality alone for pneumonia and COPD. For pneumonia, the combined model AUROC was 0.79 (0.78-0.79), image model AUROC was 0.73 (0.72-0.75), and EHR model AUROC was 0.73 (0.70-0.76); for COPD, combined: 0.89 (0.83-0.91), image: 0.85 (0.77-0.89), and EHR: 0.80 (0.76-0.84); for heart failure, combined: 0.80 (0.77-0.84), image: 0.77 (0.71-0.81), and EHR: 0.80 (0.75-0.82). In the external cohort, performance was consistent for heart failure and COPD, but declined slightly for pneumonia. Overall, machine learning models combing chest radiographs and EHR data can accurately differentiate between common causes of acute respiratory failure. Further work is needed to determine whether these models could aid clinicians in the diagnosis of acute respiratory failure in clinical settings.

【4】 On the impact of using X-ray energy response imagery for object detection via Convolutional Neural Networks 标题：利用X射线能量响应图像进行卷积神经网络目标检测的影响链接：https://arxiv.org/abs/2108.12505

作者：Neelanjan Bhowmik,Yona Falinie A. Gaus,Toby P. Breckon 机构： 2Department of {Computer Science 1 | Engineering 2}, Durham University 摘要：在复杂杂乱的X射线安全图像中自动检测违禁物品对于维护运输安全至关重要，此前有关违禁物品自动检测的工作主要集中在伪彩色（rgb}）X射线图像上。在这项工作中，我们通过使用深度卷积神经网络（CNN）进行X射线行李安检中提出的联合目标检测和分割任务，研究了不同X射线图像的影响，即与rgb相比，X射线能量响应（高、低）和有效-z。我们评估了最先进的CNN架构（Mask R-CNN、YOLACT、CARAFE和Cascade Mask R-CNN），以探索在不同的X射线安全扫描仪（显示不同的成像几何结构、图像分辨率和材料颜色剖面）之间使用这种“原始”变体图像训练的模型的可转移性。总的来说，我们观察到使用CARAFE的最大检测性能，归因于使用rgb、高、低和有效-z X射线图像的组合进行训练，获得了六类目标检测问题的0.7平均精度（mAP）。我们的研究结果还显示，通过结合rgb、高、低和有效-z图像，在单类目标检测问题的交叉扫描仪可转移性（AP:0.835/0.611）方面，具有显著的推广能力。摘要：Automatic detection of prohibited items within complex and cluttered X-ray security imagery is essential to maintaining transport security, where prior work on automatic prohibited item detection focus primarily on pseudo-colour (rgb}) X-ray imagery. In this work we study the impact of variant X-ray imagery, i.e., X-ray energy response (high, low}) and effective-z compared to rgb, via the use of deep Convolutional Neural Networks (CNN) for the joint object detection and segmentation task posed within X-ray baggage security screening. We evaluate state-of-the-art CNN architectures (Mask R-CNN, YOLACT, CARAFE and Cascade Mask R-CNN) to explore the transferability of models trained with such 'raw' variant imagery between the varying X-ray security scanners that exhibits differing imaging geometries, image resolutions and material colour profiles. Overall, we observe maximal detection performance using CARAFE, attributable to training using combination of rgb, high, low, and effective-z X-ray imagery, obtaining 0.7 mean Average Precision (mAP) for a six class object detection problem. Our results also exhibit a remarkable degree of generalisation capability in terms of cross-scanner transferability (AP: 0.835/0.611) for a one class object detection problem by combining rgb, high, low, and effective-z imagery.

蒸馏|知识提取(3篇)

【1】 FedKD: Communication Efficient Federated Learning via Knowledge Distillation 标题：FedKD：基于知识提炼的通信高效联合学习链接：https://arxiv.org/abs/2108.13323

作者：Chuhan Wu,Fangzhao Wu,Ruixuan Liu,Lingjuan Lyu,Yongfeng Huang,Xing Xie 机构：Department of Electronic Engineering & BNRist, Tsinghua University, Microsoft Research Asia,Renmin University of China,Sony AI 摘要：联邦学习广泛用于从分散的数据中学习智能模型。在联合学习中，客户机需要在模型学习的每个迭代中传递其本地模型更新。然而，如果模型包含许多参数，则模型更新的规模很大，并且通常需要多次通信，直到模型收敛。因此，联合学习中的通信成本可能相当高。本文提出了一种基于知识提取的通信高效的联邦学习方法。我们没有直接在客户端和服务器之间通信大型模型，而是提出了一个自适应的相互蒸馏框架，在每个客户端上交互学习一个学生和一个教师模型，其中只有学生模型由不同的客户端共享，并协同更新，以降低通信成本。每个客户的教师和学生都可以从其本地数据和从彼此提取的知识中学习，其提取强度由其预测质量控制。为了进一步降低通信开销，我们提出了一种基于奇异值分解的动态梯度逼近方法，以动态精度逼近交换梯度。在不同任务的基准数据集上进行的大量实验表明，我们的方法可以有效地降低通信成本并获得有竞争力的结果。摘要：Federated learning is widely used to learn intelligent models from decentralized data. In federated learning, clients need to communicate their local model updates in each iteration of model learning. However, model updates are large in size if the model contains numerous parameters, and there usually needs many rounds of communication until model converges. Thus, the communication cost in federated learning can be quite heavy. In this paper, we propose a communication efficient federated learning method based on knowledge distillation. Instead of directly communicating the large models between clients and server, we propose an adaptive mutual distillation framework to reciprocally learn a student and a teacher model on each client, where only the student model is shared by different clients and updated collaboratively to reduce the communication cost. Both the teacher and student on each client are learned on its local data and the knowledge distilled from each other, where their distillation intensities are controlled by their prediction quality. To further reduce the communication cost, we propose a dynamic gradient approximation method based on singular value decomposition to approximate the exchanged gradients with dynamic precision. Extensive experiments on benchmark datasets in different tasks show that our approach can effectively reduce the communication cost and achieve competitive results.

【2】 Lipschitz Continuity Guided Knowledge Distillation 标题：Lipschitz连续性指导的知识蒸馏链接：https://arxiv.org/abs/2108.12905

作者：Yuzhang Shang,Bin Duan,Ziliang Zong,Liqiang Nie,Yan Yan 机构：Department of Computer Science, Illinois Institute of Technology, USA, Department of Computer Science, Texas State University, USA, School of Computer Science and Technology, Shandong University, China 备注：This work has been accepted by ICCV 2021 摘要：通过将知识从较大的教师网络提取到较小的学生网络，知识提取已成为最重要的模型压缩技术之一。尽管先前的蒸馏方法通过精心设计各种类型的知识取得了巨大的成功，但它们忽略了神经网络的功能特性，这使得将这些技术应用于新任务的过程不可靠且非常繁琐。为了缓解这一问题，本文首先利用Lipschitz连续性来更好地表示神经网络的功能特性，并指导知识提取过程。特别地，我们提出了一种新的Lipschitz连续性引导的知识提取框架，通过最小化两个神经网络Lipschitz常数之间的距离来忠实地提取知识，这使得教师网络能够更好地正则化学生网络并提高相应的性能。我们推导了一个可解释的近似算法，并给出了一个明确的理论推导，以解决计算Lipschitz常数的NP难问题。实验结果表明，在CIFAR-100、ImageNet和PASCAL VOC数据集上，我们的方法优于其他基准测试，优于一些知识提取任务（例如分类、分割和对象检测）。摘要：Knowledge distillation has become one of the most important model compression techniques by distilling knowledge from larger teacher networks to smaller student ones. Although great success has been achieved by prior distillation methods via delicately designing various types of knowledge, they overlook the functional properties of neural networks, which makes the process of applying those techniques to new tasks unreliable and non-trivial. To alleviate such problem, in this paper, we initially leverage Lipschitz continuity to better represent the functional characteristic of neural networks and guide the knowledge distillation process. In particular, we propose a novel Lipschitz Continuity Guided Knowledge Distillation framework to faithfully distill knowledge by minimizing the distance between two neural networks' Lipschitz constants, which enables teacher networks to better regularize student networks and improve the corresponding performance. We derive an explainable approximation algorithm with an explicit theoretical derivation to address the NP-hard problem of calculating the Lipschitz constant. Experimental results have shown that our method outperforms other benchmarks over several knowledge distillation tasks (e.g., classification, segmentation and object detection) on CIFAR-100, ImageNet, and PASCAL VOC datasets.

【3】 Feature Extraction for Machine Learning-based Intrusion Detection in IoT Networks 标题：物联网网络中基于机器学习的入侵检测特征提取链接：https://arxiv.org/abs/2108.12722

作者：Mohanad Sarhan,Siamak Layeghy,Nour Moustafa,Marcus Gallagher,Marius Portmann 机构：a University of Queensland, Brisbane QLD , Australia, b University of New South Wales, Canberra ACT , Australia 摘要：物联网中发生的大量网络安全漏洞证明了当前网络入侵检测系统（NIDS）的不可靠性。因此，网络中断和敏感数据丢失的发生导致了改进NIDS技术的积极研究领域。在分析相关工作的过程中，观察到大多数研究人员旨在通过在NIDS数据集上使用一组未经尝试的特征约简（FR）和机器学习（ML）技术组合来获得更好的分类结果。然而，这些数据集在功能集、攻击类型和网络设计方面是不同的。因此，本文旨在发现这些技术是否可以推广到各种数据集。使用了六种ML模型：深度前馈、卷积神经网络、递归神经网络、决策树、逻辑回归和朴素贝叶斯。三种特征提取算法的检测精度；使用三个基准数据集对主成分分析（PCA）、自动编码器（AE）和线性判别分析（LDA）进行评估；UNSW-NB15，吨物联网和CSE-CIC-IDS2018。虽然PCA和AE算法已被广泛使用，但确定它们的最佳提取维数却被忽略。结果表明，对于所有数据集，没有明确的FE方法或ML模型可以获得最佳分数。已经为每个数据集确定了提取维度的最佳数量，并且LDA降低了两个数据集上ML模型的性能。方差用于分析LDA和PCA的提取维度。最后，本文得出结论，数据集的选择显著改变了应用技术的性能，我们主张需要一个通用（基准）功能集，以促进这一研究领域的进一步发展和进步。摘要：The tremendous numbers of network security breaches that have occurred in IoT networks have demonstrated the unreliability of current Network Intrusion Detection Systems (NIDSs). Consequently, network interruptions and loss of sensitive data have occurred which led to an active research area for improving NIDS technologies. During an analysis of related works, it was observed that most researchers aimed to obtain better classification results by using a set of untried combinations of Feature Reduction (FR) and Machine Learning (ML) techniques on NIDS datasets. However, these datasets are different in feature sets, attack types, and network design. Therefore, this paper aims to discover whether these techniques can be generalised across various datasets. Six ML models are utilised: a Deep Feed Forward, Convolutional Neural Network, Recurrent Neural Network, Decision Tree, Logistic Regression, and Naive Bayes. The detection accuracy of three Feature Extraction (FE) algorithms; Principal Component Analysis (PCA), Auto-encoder (AE), and Linear Discriminant Analysis (LDA) is evaluated using three benchmark datasets; UNSW-NB15, ToN-IoT and CSE-CIC-IDS2018. Although PCA and AE algorithms have been widely used, determining their optimal number of extracted dimensions has been overlooked. The results obtained indicate that there is no clear FE method or ML model that can achieve the best scores for all datasets. The optimal number of extracted dimensions has been identified for each dataset and LDA decreases the performance of the ML models on two datasets. The variance is used to analyse the extracted dimensions of LDA and PCA. Finally, this paper concludes that the choice of datasets significantly alters the performance of the applied techniques and we argue for the need for a universal (benchmark) feature set to facilitate further advancement and progress in this field of research.

推荐(1篇)

【1】 Incremental Learning for Personalized Recommender Systems 标题：基于增量学习的个性化推荐系统链接：https://arxiv.org/abs/2108.13299

作者：Yunbo Ouyang,Jun Shi,Haichao Wei,Huiji Gao 机构：LinkedIn Corporation, Mountain View, CA 摘要：无处不在的个性化推荐系统是为了实现两个看似相互冲突的目标而构建的，一个是为个人用户的口味定制高质量内容，另一个是快速适应不断变化的环境。前者需要一个基于大量数据的复杂机器学习模型；后者需要频繁地更新模型。我们提出了一种增量学习解决方案，以提供训练效率和模型质量。我们的解决方案基于顺序贝叶斯更新和二次近似。我们的重点是大规模个性化逻辑回归模型，并扩展到深度学习模型。本文通过解决将增量学习应用于大型个性化推荐系统时出现的一些实现挑战，填补了理论与实践之间的空白。详细的离线和在线实验表明，我们的方法可以显著缩短训练时间，同时保持模型的准确性。该解决方案部署在LinkedIn中，直接适用于工业规模的推荐系统。摘要：Ubiquitous personalized recommender systems are built to achieve two seemingly conflicting goals, to serve high quality content tailored to individual user's taste and to adapt quickly to the ever changing environment. The former requires a complex machine learning model that is trained on a large amount of data; the latter requires frequent update to the model. We present an incremental learning solution to provide both the training efficiency and the model quality. Our solution is based on sequential Bayesian update and quadratic approximation. Our focus is on large-scale personalized logistic regression models, with extensions to deep learning models. This paper fills in the gap between the theory and the practice by addressing a few implementation challenges that arise when applying incremental learning to large personalized recommender systems. Detailed offline and online experiments demonstrated our approach can significantly shorten the training time while maintaining the model accuracy. The solution is deployed in LinkedIn and directly applicable to industrial scale recommender systems.

聚类(1篇)

【1】 DKM: Differentiable K-Means Clustering Layer for Neural Network Compression 标题：DKM：神经网络压缩的可微K-均值聚类层链接：https://arxiv.org/abs/2108.12659

作者：Minsik Cho,Keivan A. Vahid,Saurabh Adya,Mohammad Rastegari 机构：Apple 摘要：深度神经网络（DNN）模型压缩用于有效的设备推断，对于减少内存需求和将用户数据保存在设备上变得越来越重要。为此，我们提出了一种新的可微k-均值聚类层（DKM），并将其应用于基于训练时间权重聚类的DNN模型压缩。DKM将k-均值聚类作为一个关注问题，并支持参数和聚类质心的联合优化。与以前依赖额外正则化器和参数的工作不同，基于DKM的压缩保持了原始损失函数和模型结构的固定。我们评估了用于计算机视觉和自然语言处理（NLP）任务的各种DNN模型上基于DKM的压缩。我们的结果表明，DMK在ImageNet1k和GLUE基准测试上提供了优越的压缩和精度权衡。例如，基于DKM的压缩可以在3.3MB模型大小（29.4x模型压缩系数）的ResNet50 DNN模型上提供74.5%的top-1 ImageNet1k精度。对于MobileNet-v1，这是一个具有挑战性的DNN压缩，DKM提供62.8%的top-1 ImageNet1k精度，模型大小为0.74 MB（模型压缩系数为22.4倍）。这一结果比当前最先进的DNN压缩算法的top-1精度高6.8%，模型尺寸相对较小33%。此外，DKM可以将DistilBERT模型压缩11.8倍，而GLUE NLP基准测试的精度损失最小（1.1%）。摘要：Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the parameters and clustering centroids. Unlike prior works that rely on additional regularizers and parameters, DKM-based compression keeps the original loss function and model architecture fixed. We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DMK delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5% top-1 ImageNet1k accuracy on ResNet50 DNN model with 3.3MB model size (29.4x model compression factor). For MobileNet-v1, which is a challenging DNN to compress, DKM delivers 62.8% top-1 ImageNet1k accuracy with 0.74 MB model size (22.4x model compression factor). This result is 6.8% higher top-1 accuracy and 33% relatively smaller model size than the current state-of-the-art DNN compression algorithms. Additionally, DKM enables compression of DistilBERT model by 11.8x with minimal (1.1%) accuracy loss on GLUE NLP benchmarks.

自动驾驶|车辆|车道检测等(4篇)

【1】 The missing link: Developing a safety case for perception components in automated driving 标题：缺失的环节：为自动驾驶中的感知部件开发安全案例链接：https://arxiv.org/abs/2108.13294

作者：Rick Salay,Krzysztof Czarnecki,Hiroshi Kuwajima,Hirotoshi Yasuoka,Toshihiro Nakae,Vahdat Abdelzad,Chengjie Huang,Maximilian Kahn,Van Duong Nguyen 机构：University of Waterloo, Waterloo, Canada, DENSO CORPORATION, Tokyo, Japan 摘要：安全保证是自动驾驶（AD）系统开发和社会接受的核心问题。感知是AD的一个关键方面，它严重依赖于机器学习（ML）。尽管基于ML的组件的安全保证存在已知的挑战，但最近出现了针对这些组件的单元级安全案例的建议。不幸的是，AD安全案例表达了系统级的安全要求，而这些工作缺少将系统级的安全要求与单元级的组件性能要求联系起来的关键论点。在本文中，我们提出了一个通用的模板，专门为感知组件定制的链接参数。该模板采用演绎和形式化方法来定义级别之间的强可追溯性。我们通过详细的案例研究证明了模板的适用性，并讨论了其作为支持perception组件增量开发的工具的使用。摘要：Safety assurance is a central concern for the development and societal acceptance of automated driving (AD) systems. Perception is a key aspect of AD that relies heavily on Machine Learning (ML). Despite the known challenges with the safety assurance of ML-based components, proposals have recently emerged for unit-level safety cases addressing these components. Unfortunately, AD safety cases express safety requirements at the system-level and these efforts are missing the critical linking argument connecting safety requirements at the system-level to component performance requirements at the unit-level. In this paper, we propose a generic template for such a linking argument specifically tailored for perception components. The template takes a deductive and formal approach to define strong traceability between levels. We demonstrate the applicability of the template with a detailed case study and discuss its use as a tool to support incremental development of perception components.

【2】 Integrated Decision and Control at Multi-Lane Intersections with Mixed Traffic Flow 标题：混合车流的多车道交叉口综合决策与控制链接：https://arxiv.org/abs/2108.13038

作者：Jianhua Jiang,Yangang Ren,Yang Guan,Shengbo Eben Li,Yuming Yin,Xiaoping Jin 机构：Tsinghua University, Beijing, China, Dongjie Yu, China Agricultural University 备注：8 pages, 10 figures, 11 equations and 14 conferences 摘要：交叉口自动驾驶是最复杂和最容易发生事故的交通场景之一，尤其是车辆、自行车和行人等混合交通参与者。驾驶政策应做出安全决策，以处理动态交通条件，并满足车载计算的要求。然而，目前的研究大多集中在只考虑周围车辆和理想交通灯的简化交叉口上。本文改进了综合决策与控制框架，提出了一种基于学习的混合交通流复杂交叉口处理算法，该算法既能考虑交通信号灯的现实特性，又能学习不同安全约束下的安全策略。我们首先考虑不同的速度模型的绿色和红灯在训练过程中，并使用有限状态机来处理不同模式的光转换。然后分别为车辆、交通灯、行人、自行车设计不同类型的距离约束，并将要优化的约束最优控制问题（OCP）公式化。最后，采用具有价值和策略网络的强化学习（RL）来解决OCP问题。为了验证该方法的安全性和有效性，我们设计了一个存在大规模混合交通参与者的多车道交叉口，并设置了实际的交通灯相位。仿真结果表明，经过训练的决策控制策略能够很好地平衡安全性和跟踪性能。与模型预测控制（MPC）相比，计算时间减少了三个数量级。摘要：Autonomous driving at intersections is one of the most complicated and accident-prone traffic scenarios, especially with mixed traffic participants such as vehicles, bicycles and pedestrians. The driving policy should make safe decisions to handle the dynamic traffic conditions and meet the requirements of on-board computation. However, most of the current researches focuses on simplified intersections considering only the surrounding vehicles and idealized traffic lights. This paper improves the integrated decision and control framework and develops a learning-based algorithm to deal with complex intersections with mixed traffic flows, which can not only take account of realistic characteristics of traffic lights, but also learn a safe policy under different safety constraints. We first consider different velocity models for green and red lights in the training process and use a finite state machine to handle different modes of light transformation. Then we design different types of distance constraints for vehicles, traffic lights, pedestrians, bicycles respectively and formulize the constrained optimal control problems (OCPs) to be optimized. Finally, reinforcement learning (RL) with value and policy networks is adopted to solve the series of OCPs. In order to verify the safety and efficiency of the proposed method, we design a multi-lane intersection with the existence of large-scale mixed traffic participants and set practical traffic light phases. The simulation results indicate that the trained decision and control policy can well balance safety and tracking performance. Compared with model predictive control (MPC), the computational time is three orders of magnitude lower.

【3】 Markov Switching Model for Driver Behavior Prediction: Use cases on Smartphones 标题：用于驾驶员行为预测的马尔可夫切换模型：智能手机上的使用案例链接：https://arxiv.org/abs/2108.12801

作者：Ahmed B. Zaky,Mohamed A. Khamis,Walid Gomaa 机构：Benha University, Cairo , Egypt, Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen , Guangdong, China, Cyber-Physical Systems Lab, Egypt-Japan University of Science and Technology (E-JUST) 摘要：一些智能交通系统专注于研究各种驾驶员行为，以实现多种目标。这包括分析驾驶员动作、敏感性、分心和响应时间的能力。由于数据收集是学习和验证不同驾驶状况的主要关注点之一，因此我们提出了一个通过使用智能手机的低成本数据收集解决方案验证的驾驶员行为转换模型。使用真实数据集对所提出的模型进行了验证，以预测驾驶员在短时间内的行为。对运动检测（特别是使用智能手机的驾驶行为检测）进行了文献综述。采用多重马尔可夫切换变量自回归（MSVAR）模型对收集的驾驶员行为数据进行精密拟合。这不仅可以对驾驶员行为进行更准确的预测，而且可以对整个驾驶情况进行更准确的预测。还介绍了所提出模型的性能以及合适的模型选择标准。提出的驾驶员行为预测框架可用于事故预测和驾驶员安全系统。摘要：Several intelligent transportation systems focus on studying the various driver behaviors for numerous objectives. This includes the ability to analyze driver actions, sensitivity, distraction, and response time. As the data collection is one of the major concerns for learning and validating different driving situations, we present a driver behavior switching model validated by a low-cost data collection solution using smartphones. The proposed model is validated using a real dataset to predict the driver behavior in short duration periods. A literature survey on motion detection (specifically driving behavior detection using smartphones) is presented. Multiple Markov Switching Variable Auto-Regression (MSVAR) models are implemented to achieve a sophisticated fitting with the collected driver behavior data. This yields more accurate predictions not only for driver behavior but also for the entire driving situation. The performance of the presented models together with a suitable model selection criteria is also presented. The proposed driver behavior prediction framework can potentially be used in accident prediction and driver safety systems.

【4】 Predicting Road Flooding Risk with Machine Learning Approaches Using Crowdsourced Reports and Fine-grained Traffic Data 标题：基于众包报告和细粒度交通数据的机器学习方法预测道路洪涝风险链接：https://arxiv.org/abs/2108.13265

作者：Faxi Yuan,William Mobley,Hamed Farahmand,Yuanchang Xu,Russell Blessing,Ali Mostafavi,Samuel D. Brody 机构：Urban Resilience.AI Lab, Zachry Department of Civil and, Environmental Engineering, College Station, TX , Institute for a Disaster Resilient Texas, Department of Marine Sciences, Texas A&M University at Galveston, Galveston, TX 备注：17 pages, 7 figures 摘要：本研究的目的是利用机器学习模型，根据地形、水文和时间降水特征预测道路洪水风险。道路网络洪水状态的预测性洪水监测在社区减灾、备灾和响应活动中起着至关重要的作用。与道路淹没估算相关的现有研究要么缺乏用于模型验证的观测道路淹没数据，要么主要侧重于基于洪水图的道路淹没暴露评估。本研究通过使用众包和细粒度交通数据作为道路淹没的指标，以及地形、水文和时间降水特征作为预测变量，解决了这一局限性。然后对两个基于树的机器学习模型（随机森林和AdaBoost）进行测试和训练，以预测2017年飓风哈维和2019年德克萨斯州哈里斯县热带风暴伊梅尔达的道路淹没情况。Harvey飓风的调查结果表明，降雨是预测道路淹没敏感性的最重要特征，在两种风暴情况下，地形特征比水文特征对预测道路淹没更为重要。随机森林模型和AdaBoost模型的AUC得分相对较高（Harvey分别为0.860和0.810，Imelda分别为0.790和0.720），随机森林模型在这两种情况下表现更好。随机森林模型对Harvey表现出稳定的表现，而对Imelda则表现出显著的变化。本研究在道路层面预测洪水风险地图方面推进了智能洪水恢复力的新兴领域。例如，此类模型可以帮助受影响的社区和应急管理机构制定更好的准备和响应战略，并在极端天气事件发生时提高对道路淹没可能性的情景意识。摘要：The objective of this study is to predict road flooding risks based on topographic, hydrologic, and temporal precipitation features using machine learning models. Predictive flood monitoring of road network flooding status plays an essential role in community hazard mitigation, preparedness, and response activities. Existing studies related to the estimation of road inundations either lack observed road inundation data for model validations or focus mainly on road inundation exposure assessment based on flood maps. This study addresses this limitation by using crowdsourced and fine-grained traffic data as an indicator of road inundation, and topographic, hydrologic, and temporal precipitation features as predictor variables. Two tree-based machine learning models (random forest and AdaBoost) were then tested and trained for predicting road inundations in the contexts of 2017 Hurricane Harvey and 2019 Tropical Storm Imelda in Harris County, Texas. The findings from Hurricane Harvey indicate that precipitation is the most important feature for predicting road inundation susceptibility, and that topographic features are more essential than hydrologic features for predicting road inundations in both storm cases. The random forest and AdaBoost models had relatively high AUC scores (0.860 and 0.810 for Harvey respectively and 0.790 and 0.720 for Imelda respectively) with the random forest model performing better in both cases. The random forest model showed stable performance for Harvey, while varying significantly for Imelda. This study advances the emerging field of smart flood resilience in terms of predictive flood risk mapping at the road level. For example, such models could help impacted communities and emergency management agencies develop better preparedness and response strategies with improved situational awareness of road inundation likelihood as an extreme weather event unfolds.

点云|SLAM|雷达|激光|深度RGBD相关(2篇)

【1】 Neural Network Gaussian Processes by Increasing Depth 标题：递增深度的神经网络高斯过程链接：https://arxiv.org/abs/2108.12862

作者：Shao-Qun Zhang,Feng-Lei Fan 机构：National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing , China, AI-based X-ray Imaging System (AXIS) Lab, Rensselaer Polytechnic Institute, Troy , NY, USA 摘要：近年来，人们对无限宽网络和高斯过程之间的对应关系越来越感兴趣。尽管当前的神经网络高斯过程理论有效且优雅，但据我们所知，所有的神经网络高斯过程本质上都是由宽度增加引起的。然而，在深度学习时代，我们更关心的是神经网络的深度以及深度如何影响网络的行为。受广度-深度对称性考虑的启发，我们使用一个快捷网络来表明，增加神经网络的深度也可以产生高斯过程，这是对现有理论的一个有价值的补充，有助于揭示深度学习的真实图景。除了所提出的高斯过程的深度，我们从理论上刻画了它的一致紧性及其相关核的最小特征值。这些表征不仅可以增强我们对深度诱导高斯过程的理解，而且为将来的应用铺平了道路。最后，我们通过在两个真实数据集上的回归实验来检验所提出的高斯过程的性能。摘要：Recent years have witnessed an increasing interest in the correspondence between infinitely wide networks and Gaussian processes. Despite the effectiveness and elegance of the current neural network Gaussian process theory, to the best of our knowledge, all the neural network Gaussian processes are essentially induced by increasing width. However, in the era of deep learning, what concerns us more regarding a neural network is its depth as well as how depth impacts the behaviors of a network. Inspired by a width-depth symmetry consideration, we use a shortcut network to show that increasing the depth of a neural network can also give rise to a Gaussian process, which is a valuable addition to the existing theory and contributes to revealing the true picture of deep learning. Beyond the proposed Gaussian process by depth, we theoretically characterize its uniform tightness property and the smallest eigenvalue of its associated kernel. These characterizations can not only enhance our understanding of the proposed depth-induced Gaussian processes, but also pave the way for future applications. Lastly, we examine the performance of the proposed Gaussian process by regression experiments on two real-world data sets.

【2】 Learning Inner-Group Relations on Point Clouds 标题：学习点云上的群内关系链接：https://arxiv.org/abs/2108.12468

作者：Haoxi Ran,Wei Zhuo,Jun Liu,Li Lu 机构： Sichuan University, † Tencent 备注：ICCV 2021. arXiv admin note: text overlap with arXiv:2011.14285 摘要：计算机视觉中关系网络的流行与未充分探索的基于点的方法形成了鲜明的对比。在本文中，我们探讨了局部关系算子的可能性，并考察了它们的可行性。我们提出了一个可扩展且高效的模块，称为组关系聚合器。该模块根据几何关系和语义关系加权的内部组点的特征聚合来计算组的特征。我们采用这个模块来设计我们的RPNet。我们进一步验证了RPNet在深度和宽度两个方面对分类和分割任务的可扩展性。令人惊讶的是，实证结果表明，较宽的RPNet适合分类，而较深的RPNet更适合分割。RPNet在具有挑战性的基准上实现了最先进的分类和分割。我们还将本地聚合器与PointNet++进行了比较，它们的参数大约为30%，计算量节省了50%。最后，我们通过实验验证了RPNet对刚性变换和噪声的鲁棒性。摘要：The prevalence of relation networks in computer vision is in stark contrast to underexplored point-based methods. In this paper, we explore the possibilities of local relation operators and survey their feasibility. We propose a scalable and efficient module, called group relation aggregator. The module computes a feature of a group based on the aggregation of the features of the inner-group points weighted by geometric relations and semantic relations. We adopt this module to design our RPNet. We further verify the expandability of RPNet, in terms of both depth and width, on the tasks of classification and segmentation. Surprisingly, empirical results show that wider RPNet fits for classification, while deeper RPNet works better on segmentation. RPNet achieves state-of-the-art for classification and segmentation on challenging benchmarks. We also compare our local aggregator with PointNet++, with around 30% parameters and 50% computation saving. Finally, we conduct experiments to reveal the robustness of RPNet with regard to rigid transformation and noises.

联邦学习|隐私保护|加密(1篇)

【1】 Private Multi-Task Learning: Formulation and Applications to Federated Learning 标题：私人多任务学习：公式及其在联合学习中的应用链接：https://arxiv.org/abs/2108.12978

作者：Shengyuan Hu,Zhiwei Steven Wu,Virginia Smith 机构：CMU 备注：12 pages 摘要：机器学习中的许多问题都依赖于多任务学习（MTL），其目标是同时解决多个相关的机器学习任务。MTL特别适用于医疗、金融和物联网计算等领域的隐私敏感应用程序，在这些领域，来自多个不同来源的敏感数据共享用于学习。在这项工作中，我们通过联合差分隐私（JDP）形式化了MTL任务级隐私的概念，JDP是一种用于机制设计和分布式优化的差分隐私的放松。然后，我们提出了一种平均正则化MTL算法，该目标通常用于个性化联合学习中的应用，受JDP约束。我们分析我们的目标和解决方案，为隐私和实用性提供可证明的保证。从经验上看，我们发现我们的方法允许相对于通用联邦学习基准的全局基线改进隐私/效用权衡。摘要：Many problems in machine learning rely on multi-task learning (MTL), in which the goal is to solve multiple related machine learning tasks simultaneously. MTL is particularly relevant for privacy-sensitive applications in areas such as healthcare, finance, and IoT computing, where sensitive data from multiple, varied sources are shared for the purpose of learning. In this work, we formalize notions of task-level privacy for MTL via joint differential privacy(JDP), a relaxation of differential privacy for mechanism design and distributed optimization. We then propose an algorithm for mean-regularized MTL, an objective commonly used for applications in personalized federated learning, subject to JDP. We analyze our objective and solver, providing certifiable guarantees on both privacy and utility. Empirically, we find that our method allows for improved privacy/utility trade-offs relative to global baselines across common federated learning benchmarks.

推理|分析|理解|解释(6篇)

【1】 VTLayout: Fusion of Visual and Text Features for Document Layout Analysis 标题：VTLayout：融合视觉和文本特征进行文档布局分析链接：https://arxiv.org/abs/2108.13297

作者：Shoubin Li,Xuyan Ma,Shuaiqun Pan,Jun Hu,Lin Shi,Qing Wang 机构：Wang, University of Chinese Academy of Sciences, Beijing, China, The Institute of Software, Chinese Academy of Sciences, Beijing, China 摘要：文档通常包含复杂的物理结构，这使得文档布局分析（DLA）任务具有挑战性。作为内容提取的一个预处理步骤，DLA有可能大规模捕获历史或科学文档中的丰富信息。尽管许多基于计算机视觉的深度学习方法在检测文档中的\emph{Figure}方面已经取得了优异的性能，但在识别DLA中的\emph{List}、\emph{Table}、\emph{Text}和\emph{Title}类别块方面仍然不能令人满意。本文提出了一种融合文档深层视觉、浅层视觉和文本特征的VTLayout模型，用于定位和识别不同的类别块。该模型主要分为两个阶段，第二阶段构建了三个特征抽取器。在第一阶段，级联掩码R-CNN模型被直接应用于文档的所有类别块的定位。在第二阶段，提取深度视觉、浅层视觉和文本特征进行融合，以识别文档的类别块。因此，我们在现有定位技术的基础上增强了不同类别块的分类能力。实验结果表明，VTLayout的识别能力优于最先进的基于PubLayNet数据集的DLA方法，F1得分高达0.9599。摘要：Documents often contain complex physical structures, which make the Document Layout Analysis (DLA) task challenging. As a pre-processing step for content extraction, DLA has the potential to capture rich information in historical or scientific documents on a large scale. Although many deep-learning-based methods from computer vision have already achieved excellent performance in detecting \emph{Figure} from documents, they are still unsatisfactory in recognizing the \emph{List}, \emph{Table}, \emph{Text} and \emph{Title} category blocks in DLA. This paper proposes a VTLayout model fusing the documents' deep visual, shallow visual, and text features to localize and identify different category blocks. The model mainly includes two stages, and the three feature extractors are built in the second stage. In the first stage, the Cascade Mask R-CNN model is applied directly to localize all category blocks of the documents. In the second stage, the deep visual, shallow visual, and text features are extracted for fusion to identify the category blocks of documents. As a result, we strengthen the classification power of different category blocks based on the existing localization technique. The experimental results show that the identification capability of the VTLayout is superior to the most advanced method of DLA based on the PubLayNet dataset, and the F1 score is as high as 0.9599.

【2】 An Introduction to Variational Inference 标题：变分推理导论链接：https://arxiv.org/abs/2108.13083

作者：Ankush Ganguly,Samuel W. F. Earp 机构：Sertis Vision Lab† 备注：13 pages, 9 figures 摘要：近似复概率密度是现代统计学中的一个核心问题。在本文中，我们引入了变分推理（VI）的概念，这是机器学习中一种流行的方法，它使用优化技术来估计复杂的概率密度。这种特性使得VI比经典方法（如马尔可夫链蒙特卡罗抽样）收敛更快。从概念上讲，VI的工作原理是选择一系列概率密度函数，然后找到最接近实际概率密度的函数——通常使用Kullback-Leibler（KL）散度作为优化指标。我们引入证据下界来方便地计算近似概率密度，并回顾了平均场变分推理背后的思想。最后，我们讨论了虚拟仪器在变分自动编码器（VAE）和VAE生成对抗网络（VAE-GAN）中的应用。通过本文，我们旨在解释虚拟仪器的概念，并用这种方法帮助未来的研究。摘要：Approximating complex probability densities is a core problem in modern statistics. In this paper, we introduce the concept of Variational Inference (VI), a popular method in machine learning that uses optimization techniques to estimate complex probability densities. This property allows VI to converge faster than classical methods, such as, Markov Chain Monte Carlo sampling. Conceptually, VI works by choosing a family of probability density functions and then finding the one closest to the actual probability density -- often using the Kullback-Leibler (KL) divergence as the optimization metric. We introduce the Evidence Lower Bound to tractably compute the approximated probability density and we review the ideas behind mean-field variational inference. Finally, we discuss the applications of VI to variational auto-encoders (VAE) and VAE-Generative Adversarial Network (VAE-GAN). With this paper, we aim to explain the concept of VI and assist in future research with this approach.

【3】 Communication-Computation Efficient Device-Edge Co-Inference via AutoML 标题：基于AutoML的通信计算高效设备边缘协同推理链接：https://arxiv.org/abs/2108.13009

作者：Xinjie Zhang,Jiawei Shao,Yuyi Mao,Jun Zhang 机构：∗Dept. of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, †Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong 摘要：设备边缘协同推理（Device-edge-co-inference）是一种在资源受限的移动设备和边缘服务器之间划分深度神经网络的方法，最近成为支持智能移动应用的一种很有前途的范例。为了加快推理过程，设备模型稀疏化和中间特征压缩被认为是两种重要的技术。然而，由于设备模型稀疏度水平和中间特征压缩比分别直接影响计算量和通信开销，并且两者都影响推理精度，因此，由于搜索空间大，寻找这些超参数的最优值带来了重大挑战。在本文中，我们致力于开发一种有效的算法来确定这些超参数。通过选择合适的模型分割点和一对编码器/解码器作为中间特征向量，将该问题转化为序列决策问题，提出了一种基于深度强化学习（DRL）的自动机器学习（AutoML）框架。在一个图像分类任务上的实验结果证明了该框架的有效性，在不同的基线方案下，该框架实现了更好的通信计算权衡和显著的推理加速。摘要：Device-edge co-inference, which partitions a deep neural network between a resource-constrained mobile device and an edge server, recently emerges as a promising paradigm to support intelligent mobile applications. To accelerate the inference process, on-device model sparsification and intermediate feature compression are regarded as two prominent techniques. However, as the on-device model sparsity level and intermediate feature compression ratio have direct impacts on computation workload and communication overhead respectively, and both of them affect the inference accuracy, finding the optimal values of these hyper-parameters brings a major challenge due to the large search space. In this paper, we endeavor to develop an efficient algorithm to determine these hyper-parameters. By selecting a suitable model split point and a pair of encoder/decoder for the intermediate feature vector, this problem is casted as a sequential decision problem, for which, a novel automated machine learning (AutoML) framework is proposed based on deep reinforcement learning (DRL). Experiment results on an image classification task demonstrate the effectiveness of the proposed framework in achieving a better communication-computation trade-off and significant inference speedup against various baseline schemes.

【4】 Feature Analysis for ML-based IIoT Intrusion Detection 标题：基于ML的IIoT入侵检测的特征分析链接：https://arxiv.org/abs/2108.12732

作者：Mohanad Sarhan,Siamak Layeghy,Marius Portmann 机构：The University of Queensland, St Lucia QLD 备注：22 pages, 6 figures 摘要：工业物联网（IIoT）网络已成为越来越有吸引力的网络攻击目标。强大的机器学习（ML）模型最近被用来实现网络入侵检测系统（NIDSs），它可以保护IIoT网络。为了成功地训练此类ML模型，重要的是选择正确的数据特征集，以最大限度地提高检测精度和计算效率。本文从网络攻击的重要性和预测能力的角度对最优特征集进行了广泛的分析。三种特征选择算法；卡方检验、信息增益和相关性已用于识别数据特征并对其进行排序。将特征输入到两个ML分类器中；深度前馈和随机森林，以衡量其攻击检测精度。实验评估考虑了三个NIDS数据集：UNSW-NB15、CSE-CIC-IDS2018和专有流量格式的吨物联网。此外，还考虑了NetFlow格式的相应变体，即NF-UNSW-NB15、NF-CSE-CIC-IDS2018和NF-ToN IoT。实验评估探索了逐个添加功能的边际效益。我们的结果表明，准确度最初随着特征的增加而迅速增加，但很快收敛到可达到的最大检测准确度。我们的结果表明，在保持接近最优的检测精度的同时，降低NIDS的计算和存储成本具有巨大的潜力。这在IIoT系统中具有特别的相关性，通常计算和存储资源有限。摘要：Industrial Internet of Things (IIoT) networks have become an increasingly attractive target of cyberattacks. Powerful Machine Learning (ML) models have recently been adopted to implement Network Intrusion Detection Systems (NIDSs), which can protect IIoT networks. For the successful training of such ML models, it is important to select the right set of data features, which maximise the detection accuracy as well as computational efficiency. This paper provides an extensive analysis of the optimal feature sets in terms of the importance and predictive power of network attacks. Three feature selection algorithms; chi-square, information gain and correlation have been utilised to identify and rank data features. The features are fed into two ML classifiers; deep feed-forward and random forest, to measure their attack detection accuracy. The experimental evaluation considered three NIDS datasets: UNSW-NB15, CSE-CIC-IDS2018, and ToN-IoT in their proprietary flow format. In addition, the respective variants in NetFlow format were also considered, i.e., NF-UNSW-NB15, NF-CSE-CIC-IDS2018, and NF-ToN-IoT. The experimental evaluation explored the marginal benefit of adding features one-by-one. Our results show that the accuracy initially increases rapidly with the addition of features, but converges quickly to the maximum achievable detection accuracy. Our results demonstrate a significant potential of reducing the computational and storage cost of NIDS while maintaining near-optimal detection accuracy. This has particular relevance in IIoT systems, with typically limited computational and storage resource.

【5】 Avoiding unwanted results in locally linear embedding: A new understanding of regularization 标题：避免局部线性嵌入的无用结果：对正则化的新理解链接：https://arxiv.org/abs/2108.12680

作者：Liren Lin 机构：Department of Applied Mathematics, National Sun Yat-sen University, Taiwan 备注：11 pages 摘要：我们证明了当不使用正则化时，局部线性嵌入（LLE）固有地允许一些不需要的结果，即使在原始算法中不需要正则化的情况下也是如此。在数据的每个邻域都实现精确的局部线性关系的情况下，从数学上证明了一种特殊类型的结果的存在性，我们称之为“投影模式”。这些特殊模式以及在更一般的情况下可能出现的一些其他奇异结果，通过在高维空间中嵌入孔的瑞士辊上的数值例子显示出来。观察到，使用正则化可以有效地防止所有这些不良结果。摘要：We demonstrate that locally linear embedding (LLE) inherently admits some unwanted results when no regularization is used, even for cases in which regularization is not supposed to be needed in the original algorithm. The existence of one special type of result, which we call ``projection pattern'', is mathematically proved in the situation that an exact local linear relation is achieved in each neighborhood of the data. These special patterns as well as some other bizarre results that may occur in more general situations are shown by numerical examples on the Swiss roll with a hole embedded in a high dimensional space. It is observed that all these bad results can be effectively prevented by using regularization.

【6】 Learning Energy-Based Approximate Inference Networks for Structured Applications in NLP 标题：基于能量学习的近似推理网络在NLP中的结构化应用链接：https://arxiv.org/abs/2108.12522

作者：Lifu Tu 机构：A DISSERTATION SUBMITTED AT, TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO, IN PARTIAL FULFILLMENT OF THE REQUIREMENTS, FOR THE DEGREE OF, DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE, Thesis Committee:, Kevin Gimpel (Thesis Advisor), Karen Livescu, Sam Wiseman, Kyunghyun Cho 备注：Ph.D. Thesis 摘要：自然语言处理中的结构化预测有着悠久的历史。结构化应用的复杂模型带来了学习和推理的困难。这些困难导致研究人员更多地关注具有简单结构组件（例如，局部分类器）的模型。近年来，深度表征学习越来越流行。另一方面，其方法的结构组成通常相对简单。本文主要研究复杂结构模型。我们为复杂的结构化模型提供了一个学习框架，并提供了一种推理方法，该方法具有更好的速度/准确性/搜索错误权衡。本文首先对基于能量的模型进行了概述。在NLP和其他应用中，能量函数与评分函数的概念相当。在本论文中，我们讨论了能量函数的概念和具有不同能量函数的结构化模型。然后，我们提出了一种在结构化能量函数下训练神经网络进行argmax推理的方法，将训练后的网络称为“推理网络”或“基于能量的推理网络”。然后，我们开发了使用对抗性学习框架联合学习能量函数和推理网络的方法。尽管基于能量的模型存在推理和学习困难，我们在本论文中提出的方法使基于能量的模型更容易应用于结构化NLP应用中。摘要：Structured prediction in natural language processing (NLP) has a long history. The complex models of structured application come at the difficulty of learning and inference. These difficulties lead researchers to focus more on models with simple structure components (e.g., local classifier). Deep representation learning has become increasingly popular in recent years. The structure components of their method, on the other hand, are usually relatively simple. We concentrate on complex structured models in this dissertation. We provide a learning framework for complicated structured models as well as an inference method with a better speed/accuracy/search error trade-off. The dissertation begins with a general introduction to energy-based models. In NLP and other applications, an energy function is comparable to the concept of a scoring function. In this dissertation, we discuss the concept of the energy function and structured models with different energy functions. Then, we propose a method in which we train a neural network to do argmax inference under a structured energy function, referring to the trained networks as "inference networks" or "energy-based inference networks". We then develop ways of jointly learning energy functions and inference networks using an adversarial learning framework. Despite the inference and learning difficulties of energy-based models, we present approaches in this thesis that enable energy-based models more easily to be applied in structured NLP applications.

检测相关(7篇)

【1】 Edge-Cloud Collaborated Object Detection via Difficult-Case Discriminator 标题：基于疑难判别器的边缘云协同目标检测链接：https://arxiv.org/abs/2108.12858

作者：Zhiqiang Cao,Zhijun Li,Pan Heng,Yongrui Chen,Daqi Xie,Jie Liu 备注：10 pages infocom2022 摘要：目标检测作为计算机视觉的基本任务之一，在许多智能应用中得到了广泛的应用。然而，目标检测算法的计算量通常很大，阻碍了它们在资源受限的边缘设备上的实现。当前的边缘云协作方法，如边缘云设备上的CNN分区，不适合用于目标检测，因为中间结果的巨大数据量将引入昂贵的通信成本。为了应对这一挑战，我们提出了一个小-大模型框架，在云中部署一个大模型，在边缘设备上部署一个小模型。当接收到数据时，边缘设备操作困难情况鉴别器，以根据图像的特定语义将图像分类为容易情况和困难情况。容易的案例将在边缘本地处理，困难的案例将上传到云端。使用两种不同的目标检测算法在VOC、COCO和头盔数据集上进行的实验结果表明，使用SSD时，小-大模型系统可以检测94.01%-97.84%的目标，只有大约50%的图像上传到云中。此外，小-大模型平均达到将所有图像上传到云的方案的91.22%-92.52%端到端地图。摘要：As one of the basic tasks of computer vision, object detection has been widely used in many intelligent applications. However, object detection algorithms are usually heavyweight in computation, hindering their implementations on resource-constrained edge devices. Current edge-cloud collaboration methods, such as CNN partition over Edge-cloud devices, are not suitable for object detection since the huge data size of the intermediate results will introduce extravagant communication costs. To address this challenge, we propose a small-big model framework that deploys a big model in the cloud and a small model on the edge devices. Upon receiving data, the edge device operates a difficult-case discriminator to classify the images into easy cases and difficult cases according to the specific semantics of the images. The easy cases will be processed locally at the edge, and the difficult cases will be uploaded to the cloud. Experimental results on the VOC, COCO, HELMET datasets using two different object detection algorithms demonstrate that the small-big model system can detect 94.01%-97.84% of objects with only about 50% images uploaded to the cloud when using SSD. In addition, the small-big model averagely reaches 91.22%- 92.52% end-to-end mAP of the scheme that uploading all images to the cloud.

【2】 Interpretable Propaganda Detection in News Articles 标题：新闻文章中的可解释性宣传检测链接：https://arxiv.org/abs/2108.12802

作者：Seunghak Yu,Giovanni Da San Martino,Mitra Mohtarami,James Glass,Preslav Nakov 机构： Amazon Alexa AI, Seattle, WA, USA, Department of Mathematics, University of Padova, Italy, MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA, Qatar Computing Research Institute, HBKU, Qatar 备注：None 摘要：如今，在线用户每天都会接触到误导性和宣传性的新闻文章和媒体帖子。为了应对这种情况，已经设计了许多方法，旨在实现更健康、更安全的在线新闻和媒体消费。自动系统能够支持人类检测此类内容；然而，广泛采用这些系统的一个主要障碍是，除了准确之外，这些系统的决定还需要具有可解释性，以便得到用户的信任和广泛采用。由于误导性和宣传性内容通过使用大量欺骗技术影响读者，我们建议检测并展示此类技术的使用，以提供可解释性。特别是，我们定义了定性描述特征，并分析了它们对检测欺骗技术的适用性。我们进一步表明，我们的可解释特征可以很容易地与预先训练的语言模型相结合，产生最先进的结果。摘要：Online users today are exposed to misleading and propagandistic news articles and media posts on a daily basis. To counter thus, a number of approaches have been designed aiming to achieve a healthier and safer online news and media consumption. Automatic systems are able to support humans in detecting such content; yet, a major impediment to their broad adoption is that besides being accurate, the decisions of such systems need also to be interpretable in order to be trusted and widely adopted by users. Since misleading and propagandistic content influences readers through the use of a number of deception techniques, we propose to detect and to show the use of such techniques as a way to offer interpretability. In particular, we define qualitatively descriptive features and we analyze their suitability for detecting deception techniques. We further show that our interpretable features can be easily combined with pre-trained language models, yielding state-of-the-art results.

【3】 HeadlineCause: A Dataset of News Headlines for Detecting Casualties 标题：HeadlineCause：用于检测人员伤亡的新闻标题数据集链接：https://arxiv.org/abs/2108.12626

作者：Ilya Gusev,Alexey Tikhonov 机构：Moscow Institute of Physics and Technology, Moscow, Russia, Yandex, Berlin, Germany 摘要：检测文本中隐含的因果关系是一项既需要常识又需要世界知识的任务。现有的数据集集中于常识因果推理或显式因果关系。在这项工作中，我们提出了HeadlineCause，一个用于检测新闻标题对之间隐含因果关系的数据集。该数据集包括5000多对来自英语新闻的标题对和9000多对来自俄罗斯新闻的标题对，这些标题对通过众包进行标记。从完全不相关或属于同一个一般主题到因果关系和反驳关系，这两个词对各不相同。我们还提供了一组模型和实验来证明数据集的有效性，包括基于多语言XLM-RoBERTa的因果关系检测模型和基于GPT-2的可能影响预测模型。摘要：Detecting implicit causal relations in texts is a task that requires both common sense and world knowledge. Existing datasets are focused either on commonsense causal reasoning or explicit causal relations. In this work, we present HeadlineCause, a dataset for detecting implicit causal relations between pairs of news headlines. The dataset includes over 5000 headline pairs from English news and over 9000 headline pairs from Russian news labeled through crowdsourcing. The pairs vary from totally unrelated or belonging to the same general topic to the ones including causation and refutation relations. We also present a set of models and experiments that demonstrates the dataset validity, including a multilingual XLM-RoBERTa based model for causality detection and a GPT-2 based model for possible effects prediction.

【4】 Mitigation of Diachronic Bias in Fake News Detection Dataset 标题：假新闻检测数据集中历时偏差的消除链接：https://arxiv.org/abs/2108.12601

作者：Taichi Murayama,Shoko Wakamiya,Eiji Aramaki 机构：Nara Institute of Science and Technology (NAIST) 备注：7 pages 摘要：假新闻对社会造成了巨大的危害。为了应对这些假新闻，人们进行了一些关于建立检测模型和整理数据集的研究。大多数假新闻数据集取决于特定的时间段。因此，在这样的数据集上训练的检测模型难以检测由政治变化和社会变化产生的新假新闻；它们可能导致输入的输出有偏差，包括特定的人名和组织名。我们将此问题称为\textbf{Diachronic Bias}，因为它是由每个数据集中新闻的创建日期引起的。在这项研究中，我们从每个数据集中短语出现的偏差来确认偏差，尤其是包括人名在内的专有名词。基于这些发现，我们提出了使用Wikidata的掩蔽方法，以减轻人名的影响，并通过域内和域外数据的实验验证它们是否使假新闻检测模型具有鲁棒性。摘要：Fake news causes significant damage to society.To deal with these fake news, several studies on building detection models and arranging datasets have been conducted. Most of the fake news datasets depend on a specific time period. Consequently, the detection models trained on such a dataset have difficulty detecting novel fake news generated by political changes and social changes; they may possibly result in biased output from the input, including specific person names and organizational names. We refer to this problem as \textbf{Diachronic Bias} because it is caused by the creation date of news in each dataset. In this study, we confirm the bias, especially proper nouns including person names, from the deviation of phrase appearances in each dataset. Based on these findings, we propose masking methods using Wikidata to mitigate the influence of person names and validate whether they make fake news detection models robust through experiments with in-domain and out-of-domain data.

【5】 Robustness Disparities in Commercial Face Detection 标题：商用人脸检测中的鲁棒性差异链接：https://arxiv.org/abs/2108.12508

作者：Samuel Dooley,Tom Goldstein,John P. Dickerson 机构：University of Maryland 摘要：在过去的十年中，面部检测和分析系统已经被大公司部署，并受到学者和活动家的批评。聚焦于系统性能的评论分析了系统输出的差异，即，针对不同的菲茨帕特里克皮肤类型或感知的性别检测人脸的频率。然而，我们关注这些系统输出在噪声自然扰动下的鲁棒性。我们展示了三个这样的系统的健壮性的第一个详细基准：Amazon Rekognion、Microsoft Azure和Google云平台。我们使用标准和最近发布的学术面部数据集，定量分析每种面部数据的稳健性趋势。在所有数据集和系统中，我们普遍发现，与其他身份的人相比，年龄较大、男性化、肤色较深或光线暗淡的人的照片更容易出错。摘要：Facial detection and analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Critiques that focus on system performance analyze disparity of the system's output, i.e., how frequently is a face detected for different Fitzpatrick skin types or perceived genders. However, we focus on the robustness of these system outputs under noisy natural perturbations. We present the first of its kind detailed benchmark of the robustness of three such systems: Amazon Rekognition, Microsoft Azure, and Google Cloud Platform. We use both standard and recently released academic facial datasets to quantitatively analyze trends in robustness for each. Across all the datasets and systems, we generally find that photos of individuals who are older, masculine presenting, of darker skin type, or have dim lighting are more susceptible to errors than their counterparts in other identities.

【6】 StressNAS: Affect State and Stress Detection Using Neural Architecture Search 标题：StressNAS：使用神经体系结构搜索的影响状态和应力检测链接：https://arxiv.org/abs/2108.12502

作者：Lam Huynh,Tri Nguyen,Thu Nguyen,Susanna Pirttikangas,Pekka Siirtola 机构：Center for Machine Vision and Signal Analysis, University of Oulu, Center for Ubiquitous Computing, University of Oulu, Economics and Business Administration, University of Oulu, Biomimetics and Intelligent Systems Group, University of Oulu 备注：5 pages, 2 figures 摘要：智能手表已迅速发展成为能够准确捕捉生理信号的功能。作为一种吸引人的应用，压力检测因其对人类健康的潜在益处而吸引了许多研究。深入研究深度神经网络（DNN）在通过生理信号增强人类决策方面的适用性是非常有利的。然而，由于这种现象的复杂性，手动工程DNN证明是一项繁琐的任务，尤其是在应力检测中。为此，我们提出了一种优化的深度神经网络训练方案，该方案仅使用来自WESAD的腕部磨损数据，使用神经结构搜索。实验表明，在三状态和两状态分类器中，使用WESAD腕部信号的组合，我们的方法比传统的ML方法分别高8.22%和6.02%。此外，该方法可以最大限度地减少对人工设计DNN的需求，同时将性能提高4.39%（三态）和8.99%（二进制）。摘要：Smartwatches have rapidly evolved towards capabilities to accurately capture physiological signals. As an appealing application, stress detection attracts many studies due to its potential benefits to human health. It is propitious to investigate the applicability of deep neural networks (DNN) to enhance human decision-making through physiological signals. However, manually engineering DNN proves a tedious task especially in stress detection due to the complex nature of this phenomenon. To this end, we propose an optimized deep neural network training scheme using neural architecture search merely using wrist-worn data from WESAD. Experiments show that our approach outperforms traditional ML methods by 8.22% and 6.02% in the three-state and two-state classifiers, respectively, using the combination of WESAD wrist signals. Moreover, the proposed method can minimize the need for human-design DNN while improving performance by 4.39% (three-state) and 8.99% (binary).

【7】 Learning JPEG Compression Artifacts for Image Manipulation Detection and Localization 标题：用于图像操作检测和定位的JPEG压缩伪影学习链接：https://arxiv.org/abs/2108.12947

作者：Myung-Joon Kwon,Seung-Hun Nam,In-Jae Yu,Heung-Kyu Lee,Changick Kim 机构：Received: date Accepted: date 备注：preprint (under review); Code is available at: this https URL 摘要：检测和定位图像操作是对抗恶意使用图像编辑技术的必要条件。因此，有必要通过分析图像中的内在统计信息来区分真实区域和篡改区域。我们关注图像采集和编辑过程中留下的JPEG压缩伪影。我们提出了一种卷积神经网络（CNN），它使用离散余弦变换（DCT）系数来定位图像操作，其中压缩伪影仍然存在。标准CNN无法学习DCT系数的分布，因为卷积丢弃了DCT系数所必需的空间坐标。我们说明了如何设计和训练一个能够学习DCT系数分布的神经网络。此外，我们还介绍了压缩伪影跟踪网络（CAT-Net），它联合使用图像采集伪影和压缩伪影。在检测和定位篡改区域方面，它明显优于传统的基于深度神经网络的方法。摘要：Detecting and localizing image manipulation are necessary to counter malicious use of image editing techniques. Accordingly, it is essential to distinguish between authentic and tampered regions by analyzing intrinsic statistics in an image. We focus on JPEG compression artifacts left during image acquisition and editing. We propose a convolutional neural network (CNN) that uses discrete cosine transform (DCT) coefficients, where compression artifacts remain, to localize image manipulation. Standard CNNs cannot learn the distribution of DCT coefficients because the convolution throws away the spatial coordinates, which are essential for DCT coefficients. We illustrate how to design and train a neural network that can learn the distribution of DCT coefficients. Furthermore, we introduce Compression Artifact Tracing Network (CAT-Net) that jointly uses image acquisition artifacts and compression artifacts. It significantly outperforms traditional and deep neural network-based methods in detecting and localizing tampered regions.

分类|识别(5篇)

【1】 Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification 标题：调整它或不使用它：基准数据-高效的图像分类链接：https://arxiv.org/abs/2108.13122

作者：Lorenzo Brigato,Björn Barz,Luca Iocchi,Joachim Denzler 机构：“Tune It or Don’t Use It: Benchmarking Data-Efficient Image Classification.”, Proceedings of the IEEECVF International Conference on Computer Vision (ICCV) Workshops ,. 备注：Accepted at the 2nd Visual Inductive Priors for Data-Efficient Deep Learning Workshop (ICCV 2021) 摘要：在只有少量标记数据的环境中，使用深度神经网络进行数据高效的图像分类是最近的一个活跃研究领域。然而，在已发表的方法之间进行客观比较是困难的，因为现有的研究使用不同的数据集进行评估，并且经常与具有默认超参数的不协调基线进行比较。我们为数据高效的图像分类设计了一个基准，由跨越不同领域（如自然图像、医学图像、卫星数据）和数据类型（RGB、灰度、多光谱）的六个不同数据集组成。利用该基准，我们重新评估了2017年至2021年间在著名场馆发布的标准交叉熵基线和八种数据高效深度学习方法。为了进行公平和现实的比较，我们仔细地调整每个数据集上所有方法的超参数。令人惊讶的是，我们发现在一个单独的验证分割上调整学习率、权重衰减和批量大小会产生一个具有高度竞争力的基线，其表现优于除一种特殊方法之外的所有方法，并与其他方法具有竞争性。摘要：Data-efficient image classification using deep neural networks in settings, where only small amounts of labeled data are available, has been an active research area in the recent past. However, an objective comparison between published methods is difficult, since existing works use different datasets for evaluation and often compare against untuned baselines with default hyper-parameters. We design a benchmark for data-efficient image classification consisting of six diverse datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). Using this benchmark, we re-evaluate the standard cross-entropy baseline and eight methods for data-efficient deep learning published between 2017 and 2021 at renowned venues. For a fair and realistic comparison, we carefully tune the hyper-parameters of all methods on each dataset. Surprisingly, we find that tuning learning rate, weight decay, and batch size on a separate validation split results in a highly competitive baseline, which outperforms all but one specialized method and performs competitively to the remaining one.

【2】 MEDIC: A Multi-Task Learning Dataset for Disaster Image Classification 标题：MEDIC：一种用于灾害图像分类的多任务学习数据集链接：https://arxiv.org/abs/2108.12828

作者：Firoj Alam,Tanvirul Alam,Md. Arid Hasan,Abul Hasnat,Muhammad Imran,Ferda Ofli 机构： Qatar Computing Research Institute, HBKU, Qatar, Rochester Institute of Technology, Rochester, USA, Cognitive Insight Limited, Dhaka, Bangladesh 备注：Multi-task Learning, Social media images, Image Classification, Natural disasters, Crisis Informatics, Deep learning, Dataset 摘要：灾害信息学的最新研究展示了一个基于社交媒体内容（文本和图像）的人工智能的实用和重要用例，用于在自然灾害后拯救人类生命和苦难。虽然在使用文本方面取得了显著的进展，但对利用图像的研究仍相对不足。为了推进基于图像的方法，我们建议MEDIC（可在以下网址获得）：https://crisisnlp.qcri.org/medic/index.html)，这是最大的用于人道主义响应的社交媒体图像分类数据集，由71198张图像组成，用于在多任务学习设置中处理四个不同任务。这是第一个此类数据集：社交媒体图像、灾难响应和多任务学习研究。该数据集的一个重要特性是其对多任务学习研究的巨大潜力，最近受到机器学习界的广泛关注，并在记忆、推理速度、性能和泛化能力方面取得了显著的成果。因此，该数据集是推进基于图像的灾害管理和多任务机器学习研究的重要资源。摘要：Recent research in disaster informatics demonstrates a practical and important use case of artificial intelligence to save human lives and sufferings during post-natural disasters based on social media contents (text and images). While notable progress has been made using texts, research on exploiting the images remains relatively under-explored. To advance the image-based approach, we propose MEDIC (available at: https://crisisnlp.qcri.org/medic/index.html), which is the largest social media image classification dataset for humanitarian response consisting of 71,198 images to address four different tasks in a multi-task learning setup. This is the first dataset of its kind: social media image, disaster response, and multi-task learning research. An important property of this dataset is its high potential to contribute research on multi-task learning, which recently receives much interest from the machine learning community and has shown remarkable results in terms of memory, inference speed, performance, and generalization capability. Therefore, the proposed dataset is an important resource for advancing image-based disaster management and multi-task machine learning research.

【3】 Attempt to Predict Failure Case Classification in a Failure Database by using Neural Network Models 标题：尝试用神经网络模型预测故障数据库中的故障案例分类链接：https://arxiv.org/abs/2108.12788

作者：Koichi Bando,Kenji Tanaka 机构： Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan, -,-, Chofugaoka, Chofu-Shi, Tokyo ,-, Japan 备注：Editor: Barbara Gallina. 17th European Dependable Computing Conference (EDCC 2021), September 13-16, 2021, Munich, Germany. Fast Abstract Proceedings- EDCC 2021 摘要：随着信息技术的进步，网络信息系统的使用范围迅速扩大。银行和公司之间的电子商务和电子支付，以及一般公众使用的在线购物和社交网络服务就是此类系统的例子。因此，为了维护和提高这些系统的可靠性，我们正在根据过去的故障案例构建故障数据库。将新故障案例导入数据库时，需要根据故障类型对这些案例进行分类。问题是分类的准确性和效率。特别是在与多个人合作时，需要统一分类。因此，我们正在尝试使用机器学习自动分类。作为评估模型，我们选择了多层感知器（MLP）、卷积神经网络（CNN）和递归神经网络（RNN），它们都是使用神经网络的模型。因此，精度方面的最佳模型是MLP，其次是CNN，并且分类的处理时间是实际的。摘要：With the recent progress of information technology, the use of networked information systems has rapidly expanded. Electronic commerce and electronic payments between banks and companies, and online shopping and social networking services used by the general public are examples of such systems. Therefore, in order to maintain and improve the dependability of these systems, we are constructing a failure database from past failure cases. When importing new failure cases to the database, it is necessary to classify these cases according to failure type. The problems are the accuracy and efficiency of the classification. Especially when working with multiple individuals, unification of classification is required. Therefore, we are attempting to automate classification using machine learning. As evaluation models, we selected the multilayer perceptron (MLP), the convolutional neural network (CNN), and the recurrent neural network (RNN), which are models that use neural networks. As a result, the optimal model in terms of accuracy is first the MLP followed by the CNN, and the processing time of the classification is practical.

【4】 Representation of binary classification trees with binary features by quantum circuits 标题：具有二进制特征的二叉树的量子电路表示链接：https://arxiv.org/abs/2108.13207

作者：Raoul Heese,Patricia Bickert,Astrid Elisa Niederle 机构：Fraunhofer ITWM, Kaiserslautern, Germany, BASF SE, Ludwigshafen, Germany 备注：41 pages, 20 figures, 3 tables 摘要：我们提出了一种基于概率方法的具有二元特征的二元分类树的量子表示。通过使用量子计算机作为概率分布处理器，可以通过测量量子电路来实现决策树的概率遍历。我们描述了如何将树归纳和查询数据的类标签预测集成到这个框架中。一种按需采样方法可以使用恒定数量的经典内存插槽进行预测，与树深度无关。我们使用量子计算模拟器和实际的IBM量子硬件对我们的方法进行了实验研究。据我们所知，这是第一次在量子设备上实现决策树分类器。摘要：We propose a quantum representation of binary classification trees with binary features based on a probabilistic approach. By using the quantum computer as a processor for probability distributions, a probabilistic traversal of the decision tree can be realized via measurements of a quantum circuit. We describe how tree inductions and the prediction of class labels of query data can be integrated into this framework. An on-demand sampling method enables predictions with a constant number of classical memory slots, independent of the tree depth. We experimentally study our approach using both a quantum computing simulator and actual IBM quantum hardware. To our knowledge, this is the first realization of a decision tree classifier on a quantum device.

【5】 Speech Representations and Phoneme Classification for Preserving the Endangered Language of Ladin 标题：保护濒危语言的语音表征和音素分类链接：https://arxiv.org/abs/2108.12531

作者：Zane Durante,Leena Mathur,Eric Ye,Sichong Zhao,Tejas Ramdas,Khalil Iskarous 机构：Center for Artificial Intelligence in Society, Department of Linguistics, University of Southern California, Los Angeles, USA 备注：Accepted to ICSA MLSLP 2021 (held with Interspeech 2021) 摘要：世界7000种语言中的绝大多数预计将在本世纪内灭绝，包括来自意大利阿尔卑斯山的濒危语言拉丁。致力于保持一种语言的语音和语音结构的语言学家可以花费数小时转录母语人士的每一分钟讲话。为了在Ladin的背景下解决这个问题，我们首先分析了语音表征和机器学习模型，对32个Ladin音素进行了分类。我们用一个新颖的拉丹筋膜方言数据集进行实验，该数据集是从以意大利为母语的人那里收集的。我们创建了帧级和段级语音特征提取方法，并在9种不同的语音表示上使用8种不同的分类器进行了大量实验。我们的语音表示从传统特征（MFCC、LPC）到通过深度神经网络模型（自动编码器、LSTM自动编码器和WaveNet）学习的特征。我们的最高性能分类器，在语音信号的MFCC表示上进行训练，在所有Ladin音素中实现了86%的平均准确率。我们还获得了所有Ladin音素亚组的平均准确率超过77%。我们的研究结果有助于深入了解有区别的Ladin音素表征，并展示了利用机器学习和语音信号处理保护Ladin和其他濒危语言的潜力。摘要：A vast majority of the world's 7,000 spoken languages are predicted to become extinct within this century, including the endangered language of Ladin from the Italian Alps. Linguists who work to preserve a language's phonetic and phonological structure can spend hours transcribing each minute of speech from native speakers. To address this problem in the context of Ladin, our paper presents the first analysis of speech representations and machine learning models for classifying 32 phonemes of Ladin. We experimented with a novel dataset of the Fascian dialect of Ladin, collected from native speakers in Italy. We created frame-level and segment-level speech feature extraction approaches and conducted extensive experiments with 8 different classifiers trained on 9 different speech representations. Our speech representations ranged from traditional features (MFCC, LPC) to features learned with deep neural network models (autoencoders, LSTM autoencoders, and WaveNet). Our highest-performing classifier, trained on MFCC representations of speech signals, achieved an 86% average accuracy across all Ladin phonemes. We also obtained average accuracies above 77% for all Ladin phoneme subgroups examined. Our findings contribute insights for learning discriminative Ladin phoneme representations and demonstrate the potential for leveraging machine learning and speech signal processing to preserve Ladin and other endangered languages.

表征(3篇)

【1】 Fine-Grained Chemical Entity Typing with Multimodal Knowledge Representation 标题：基于多模态知识表示的细粒度化学实体分类链接：https://arxiv.org/abs/2108.12899

作者：Chenkai Sun,Weijiang Li,Jinfeng Xiao,Nikolaus Nova Parulian,ChengXiang Zhai,Heng Ji 摘要：从趋势化学文献中自动发现知识对于更有效的生物医学研究至关重要。如何从核心化学文献中提取关于化学反应的详细知识是一个尚未得到很好研究的新挑战。在本文中，我们研究了细粒度化学实体类型的新问题，这带来了有趣的新挑战，特别是因为化学文献中经常提到复杂的名称和实体的图形表示。我们引入了一个新的基准数据集（CHEMET）来促进新任务的研究，并提出了一个新的多模态表示学习框架，通过利用具有化学结构的外部资源和使用跨模态注意来学习文本的有效表示来解决细粒度化学实体类型化问题化学领域。实验结果表明，该框架的性能优于多种先进的方法。摘要：Automated knowledge discovery from trending chemical literature is essential for more efficient biomedical research. How to extract detailed knowledge about chemical reactions from the core chemistry literature is a new emerging challenge that has not been well studied. In this paper, we study the new problem of fine-grained chemical entity typing, which poses interesting new challenges especially because of the complex name mentions frequently occurring in chemistry literature and graphic representation of entities. We introduce a new benchmark data set (CHEMET) to facilitate the study of the new task and propose a novel multi-modal representation learning framework to solve the problem of fine-grained chemical entity typing by leveraging external resources with chemical structures and using cross-modal attention to learn effective representation of text in the chemistry domain. Experiment results show that the proposed framework outperforms multiple state-of-the-art methods.

【2】 Compact representations of convolutional neural networks via weight pruning and quantization 标题：基于权值修剪和量化的卷积神经网络紧致表示链接：https://arxiv.org/abs/2108.12704

作者：Giosuè Cataldo Marinò,Alessandro Petrini,Dario Malchiodi,Marco Frasca 机构：Università degli Studi di Milano, Milano, Italy 摘要：目前，卷积神经网络（CNN）在解决一些实际问题方面具有最先进的性能。这种学习模型利用了深度学习领域的最新成果，通常会产生具有（至少）数百万个参数的高性能但非常大的神经网络。因此，当只有少量RAM可用时，或者通常在资源有限的平台内，不可能部署此类模型，因此压缩CNN的策略变得至关重要。在本文中，我们提出了一种新的基于信源编码的CNN无损存储格式，并利用了权值修剪和量化。我们从理论上推导了所提出的结构的空间上界，显示了它们与权重矩阵的稀疏性和量化水平的关系。针对矩阵压缩的参考方法，对压缩率和执行时间进行了测试，还讨论了基于权重共享的最新量化方案的经验评估，以评估其应用于卷积层和完全连接层时对性能的影响。在分类和回归问题的四个基准上，与基线预训练的未压缩网络相比，我们在完全连接的层上实现了高达0.6%的空间占用减少，在整个网络上实现了5.44%的空间占用减少，同时表现出至少与基线一样的竞争力。摘要：The state-of-the-art performance for several real-world problems is currently reached by convolutional neural networks (CNN). Such learning models exploit recent results in the field of deep learning, typically leading to highly performing, yet very large neural networks with (at least) millions of parameters. As a result, the deployment of such models is not possible when only small amounts of RAM are available, or in general within resource-limited platforms, and strategies to compress CNNs became thus of paramount importance. In this paper we propose a novel lossless storage format for CNNs based on source coding and leveraging both weight pruning and quantization. We theoretically derive the space upper bounds for the proposed structures, showing their relationship with both sparsity and quantization levels of the weight matrices. Both compression rates and excution times have been tested against reference methods for matrix compression, and an empirical evaluation of state-of-the-art quantization schemes based on weight sharing is also discussed, to assess their impact on the performance when applied to both convolutional and fully connected layers. On four benchmarks for classification and regression problems and comparing to the baseline pre-trained uncompressed network, we achieved a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.

【3】 Representation Memorization for Fast Learning New Knowledge without Forgetting 标题：表征记忆在快速学习新知识中的应用链接：https://arxiv.org/abs/2108.12596

作者：Fei Mi,Tao Lin,Boi Faltings 机构：Huawei Noah’s Ark Lab, LIA, EPFL 摘要：快速学习新知识（例如新类或数据分布）的能力是迈向人类智能水平的一大步。在本文中，我们考虑的情况下，需要学习新的类或数据分布迅速和增量随着时间的推移，因为它经常发生在现实世界的动态环境。我们提出了“基于记忆的Hebbian参数自适应”（Hebb），以在一个统一的框架内解决实现这一目标的两个主要挑战（即灾难性遗忘和样本效率）。为了减轻灾难性遗忘，Hebb增加了一个常规的神经分类器，其中包含一个不断更新的存储模块，以存储以前数据的表示。为了提高采样效率，我们提出了一种基于著名Hebbian理论的参数自适应方法，该方法直接将输出网络的参数与从存储器中检索到的相似表示“连接”。我们通过在广泛的学习任务（图像分类、语言模型）和学习场景（连续、增量、在线）上进行的大量实验，实证验证了Hebb的卓越性能。我们证明了Hebb有效地缓解了灾难性遗忘，而且它确实比目前最先进的技术更好更快地学习新知识。摘要：The ability to quickly learn new knowledge (e.g. new classes or data distributions) is a big step towards human-level intelligence. In this paper, we consider scenarios that require learning new classes or data distributions quickly and incrementally over time, as it often occurs in real-world dynamic environments. We propose "Memory-based Hebbian Parameter Adaptation" (Hebb) to tackle the two major challenges (i.e., catastrophic forgetting and sample efficiency) towards this goal in a unified framework. To mitigate catastrophic forgetting, Hebb augments a regular neural classifier with a continuously updated memory module to store representations of previous data. To improve sample efficiency, we propose a parameter adaptation method based on the well-known Hebbian theory, which directly "wires" the output network's parameters with similar representations retrieved from the memory. We empirically verify the superior performance of Hebb through extensive experiments on a wide range of learning tasks (image classification, language model) and learning scenarios (continual, incremental, online). We demonstrate that Hebb effectively mitigates catastrophic forgetting, and it indeed learns new knowledge better and faster than the current state-of-the-art.

优化|敛散性(2篇)

【1】 DQLEL: Deep Q-Learning for Energy-Optimized LoS/NLoS UWB Node Selection 标题：DQLEL：能量优化LOS/NLOS超宽带节点选择的深度Q学习链接：https://arxiv.org/abs/2108.13157

作者：Zohreh Hajiakhondi-Meybodi,Arash Mohammadi,Ming Hou,Konstantinos N. Plataniotis 摘要：物联网（IoTs）的最新发展引起了人们对室内定位的兴趣，目的是提供可靠、准确和节能的室内导航/定位系统。超宽带（UWB）技术已成为满足上述要求的潜在候选技术。虽然超宽带技术由于使用了较宽的频谱，可以提高室内定位的精度，但其高效实现仍面临着关键挑战。一方面，实现高精度定位依赖于识别/缓解非视线（NLoS）链路，导致定位框架的复杂性显著增加。另一方面，超宽带信标的电池寿命有限，这在实际情况下尤其有问题，因为某些信标位于战略位置。为了应对这些挑战，我们引入了一种有效的节点选择框架，以提高定位精度，而无需使用复杂的NLoS缓解方法，同时保持UWB信标剩余电池寿命之间的平衡。移动用户被称为深度Q学习能量优化LoS/NLoS（DQLEL）UWB节点选择框架，该框架基于二维到达时间差（TDoA）框架自主训练以确定要定位的最佳UWB信标对。从链路条件、UWB信标剩余电池寿命偏差、定位误差和累积奖励等方面评估了所提出的DQLEL框架的有效性。根据仿真结果，所提出的DQLEL框架在上述方面的性能明显优于同类框架。摘要：Recent advancements in Internet of Things (IoTs) have brought about a surge of interest in indoor positioning for the purpose of providing reliable, accurate, and energy-efficient indoor navigation/localization systems. Ultra Wide Band (UWB) technology has been emerged as a potential candidate to satisfy the aforementioned requirements. Although UWB technology can enhance the accuracy of indoor positioning due to the use of a wide-frequency spectrum, there are key challenges ahead for its efficient implementation. On the one hand, achieving high precision in positioning relies on the identification/mitigation Non Line of Sight (NLoS) links, leading to a significant increase in the complexity of the localization framework. On the other hand, UWB beacons have a limited battery life, which is especially problematic in practical circumstances with certain beacons located in strategic positions. To address these challenges, we introduce an efficient node selection framework to enhance the location accuracy without using complex NLoS mitigation methods, while maintaining a balance between the remaining battery life of UWB beacons. Referred to as the Deep Q-Learning Energy-optimized LoS/NLoS (DQLEL) UWB node selection framework, the mobile user is autonomously trained to determine the optimal pair of UWB beacons to be localized based on the 2-D Time Difference of Arrival (TDoA) framework. The effectiveness of the proposed DQLEL framework is evaluated in terms of the link condition, the deviation of the remaining battery life of UWB beacons, location error, and cumulative rewards. Based on the simulation results, the proposed DQLEL framework significantly outperformed its counterparts across the aforementioned aspects.

【2】 Convergence Rates for Learning Linear Operators from Noisy Data 标题：从噪声数据中学习线性算子的收敛速度链接：https://arxiv.org/abs/2108.12515

作者：Maarten V. de Hoop,Nikola B. Kovachki,Nicholas H. Nelsen,Andrew M. Stuart 机构： Rice University 备注：33 pages, 4 figures, 3 tables 摘要：我们研究了从随机输入数据上的噪声逐点估计学习Hilbert空间上线性算子的贝叶斯逆问题。我们的框架假设该目标算子是自伴和对角的，在一个与高斯先验和噪声协方差算子共享的基础上，由强加的统计模型产生，并且能够处理紧凑、有界甚至无界的目标算子。当数据数量趋于无穷大时，我们建立了关于Bochner范数族的后验收缩率，并推导了估计误差的相关下界。在大数据限制下，我们还提供了与后验平均点估计相关的适当定义的超额风险和广义间隙泛函的渐近收敛速度。在此过程中，我们将后验一致性结果与非参数学习理论联系起来。此外，与学习有界线性算子或紧致线性算子相比，这些收敛速度突出并量化了学习无界线性算子的难度。数值实验证实了这一理论，并证明在更一般的问题环境中也可以预期类似的结论。摘要：We study the Bayesian inverse problem of learning a linear operator on a Hilbert space from its noisy pointwise evaluations on random input data. Our framework assumes that this target operator is self-adjoint and diagonal in a basis shared with the Gaussian prior and noise covariance operators arising from the imposed statistical model and is able to handle target operators that are compact, bounded, or even unbounded. We establish posterior contraction rates with respect to a family of Bochner norms as the number of data tend to infinity and derive related lower bounds on the estimation error. In the large data limit, we also provide asymptotic convergence rates of suitably defined excess risk and generalization gap functionals associated with the posterior mean point estimator. In doing so, we connect the posterior consistency results to nonparametric learning theory. Furthermore, these convergence rates highlight and quantify the difficulty of learning unbounded linear operators in comparison with the learning of bounded or compact ones. Numerical experiments confirm the theory and demonstrate that similar conclusions may be expected in more general problem settings.

预测|估计(4篇)

【1】 Survival Prediction of Heart Failure Patients using Stacked Ensemble Machine Learning Algorithm 标题：基于堆叠集成机器学习算法的心力衰竭患者生存预测链接：https://arxiv.org/abs/2108.13367

作者：S. M Mehedi Zaman,Wasay Mahmood Qureshi,Md. Mohsin Sarker Raihan,Ocean Monjur,Abdullah Bin Shams 机构：Department of Electrical and Electronic Engineering, Islamic University of Technology, Gazipur , Bangladesh, Department of Biomedical Engineering Khulna University of Engineering & Technology, Khulna , Bangladesh 备注：This article has been submitted for publication in Biomedical Physics & Engineering Express 摘要：心血管疾病，尤其是心力衰竭，是我们这个时代的主要健康危害问题之一，也是全世界死亡的主要原因。使用机器学习（ML）模型的数据挖掘技术的进步为有希望的预测方法铺平了道路。数据挖掘是将医疗机构创建的大量原始数据转换为有意义的信息的过程，这些信息有助于做出预测和关键决策。本研究的主要目的是收集心力衰竭患者的各种随访数据，分析这些数据，并利用几种ML模型预测心血管病患者的生存可能性。由于数据集中类的不平衡性，实现了合成少数过采样技术（SMOTE）。我们的研究中使用了两个无监督模型（K-Means和模糊C-Means聚类）和三个有监督分类器（随机森林、XGBoost和决策树）。经过深入的研究，我们的结果证明了监督ML算法优于无监督模型。此外，我们还设计并提出了一个有监督的堆叠集成学习模型，该模型可以实现99.98%的准确率、准确率、召回率和F1分数。我们的研究表明，使用有监督的ML算法，只有从患者身上收集的某些属性才能成功预测心力衰竭后存活的可能性。摘要：Cardiovascular disease, especially heart failure is one of the major health hazard issues of our time and is a leading cause of death worldwide. Advancement in data mining techniques using machine learning (ML) models is paving promising prediction approaches. Data mining is the process of converting massive volumes of raw data created by the healthcare institutions into meaningful information that can aid in making predictions and crucial decisions. Collecting various follow-up data from patients who have had heart failures, analyzing those data, and utilizing several ML models to predict the survival possibility of cardiovascular patients is the key aim of this study. Due to the imbalance of the classes in the dataset, Synthetic Minority Oversampling Technique (SMOTE) has been implemented. Two unsupervised models (K-Means and Fuzzy C-Means clustering) and three supervised classifiers (Random Forest, XGBoost and Decision Tree) have been used in our study. After thorough investigation, our results demonstrate a superior performance of the supervised ML algorithms over unsupervised models. Moreover, we designed and propose a supervised stacked ensemble learning model that can achieve an accuracy, precision, recall and F1 score of 99.98%. Our study shows that only certain attributes collected from the patients are imperative to successfully predict the surviving possibility post heart failure, using supervised ML algorithms.

【2】 An Interpretable Web-based Glioblastoma Multiforme Prognosis Prediction Tool using Random Forest Model 标题：基于随机森林模型的可解释Web多形性胶质母细胞瘤预后预测工具链接：https://arxiv.org/abs/2108.13039

作者：Yeseul Kim,Kyung Hwan Kim,Junyoung Park,Hong In Yoon,Wonmo Sung 机构：The Catholic University of Korea, Yonsei University College of Medicine, KAIST 备注：17 pages, 16 figures 摘要：我们提出了预测模型，用于估计GBM患者治疗后一年的健康状况（分类任务），在个体水平上预测GBM患者的长期预后（生存任务）。我们使用了467例GBM患者的临床资料，包括13个特征和两个随访日期。对于随机森林分类器（RFC）和随机生存森林模型（RSF）的基线模型，分别介绍了广义线性模型（GLM）、支持向量机（SVM）和Cox比例危险模型（Cox）、加速失效时间模型（AFT）。在对分层的5倍数据集进行预处理和预处理后，我们使用递归特征消除过程为模型类型生成性能最佳的模型。通过递归特征消除过程，对表现最佳的一年生存/进展状态RFC模型和RSF模型共提取了10、4和13个特征。在分类任务中，表现最好的RFC的AUROC记录为0.6990（一年生存状态分类）和0.7076（一年进展分类），而次佳基线模型（两种情况下的GLM）的AUROC分别记录为0.6691和0.6997。关于生存任务，表现最好的RSF模型的C指数最高为0.7157，IBS最低为0.1038，而次佳的基线模型的C指数分别为0.6556和0.1139。GBM患者的每个特征和预后之间的简化线性相关性（从LIME和虚拟患者组分析中提取）与已证实的医学知识一致。我们的机器学习模型表明，影响GBM患者生存的前三大预后因素是MGMT基因启动子、切除范围和年龄。据我们所知，本研究是第一个引入可解释且医学知识一致的GBM预后预测模型的研究。摘要：We propose predictive models that estimate GBM patients' health status of one-year after treatments (Classification task), predict the long-term prognosis of GBM patients at an individual level (Survival task). We used total of 467 GBM patients' clinical profile consists of 13 features and two follow-up dates. For baseline models of random forest classifier(RFC) and random survival forest model (RSF), we introduced generalized linear model (GLM), support vector machine (SVM) and Cox proportional hazardous model (COX), accelerated failure time model (AFT) respectively. After preprocessing and prefixing stratified 5-fold data set, we generated best performing models for model types using recursive feature elimination process. Total 10, 4, and 13 features were extracted for best performing one-year survival/progression status RFC models and RSF model via the recursive feature elimination process. In classification task, AUROC of best performing RFC recorded 0.6990 (for one-year survival status classification) and 0.7076 (for one-year progression classification) while that of second best baseline models (GLM in both cases) recorded 0.6691 and 0.6997 respectively. About survival task, the highest C-index of 0.7157 and the lowest IBS of 0.1038 came from the best performing RSF model while that of second best baseline models were 0.6556 and 0.1139 respectively. A simplified linear correlation (extracted from LIME and virtual patient group analysis) between each feature and prognosis of GBM patient were consistent with proven medical knowledge. Our machine learning models suggest that the top three prognostic factors for GBM patient survival were MGMT gene promoter, the extent of resection, and age. To the best of our knowledge, this study is the very first study introducing a interpretable and medical knowledge consistent GBM prognosis predictive models.

【3】 Convolutional versus Dense Neural Networks: Comparing the Two Neural Networks Performance in Predicting Building Operational Energy Use Based on the Building Shape 标题：卷积神经网络与稠密神经网络：基于建筑形状预测建筑运行能耗的两种神经网络性能比较链接：https://arxiv.org/abs/2108.12929

作者：Farnaz Nazari,Wei Yan 机构：Department of Architecture, Texas A&M University, College Station, TX, USA 备注：The original paper was published in Building Simulation 2021 Conference Proceedings, IBPSA. Errata: the MSE values in the paper have been corrected to be unit free 摘要：除了其他主要影响因素（如材料和窗墙比）外，建筑物自遮阳形状对建筑物接收的直射阳光量有很大影响，并对建筑物运行能耗有很大影响。深度学习有可能通过有效预测建筑节能性能来帮助设计师和工程师。本文评估了两种不同的神经网络结构，密集神经网络（DNN）和卷积神经网络（CNN）在预测建筑物形状方面的运行能耗方面的适用性。两种神经网络的比较表明，DNN模型在性能、简单性和计算时间方面优于CNN模型。然而，基于图像的CNN具有利用建筑图形促进设计交流的优势。摘要：A building self-shading shape impacts substantially on the amount of direct sunlight received by the building and contributes significantly to building operational energy use, in addition to other major contributing variables, such as materials and window-to-wall ratios. Deep Learning has the potential to assist designers and engineers by efficiently predicting building energy performance. This paper assesses the applicability of two different neural networks structures, Dense Neural Network (DNN) and Convolutional Neural Network (CNN), for predicting building operational energy use with respect to building shape. The comparison between the two neural networks shows that the DNN model surpasses the CNN model in performance, simplicity, and computation time. However, image-based CNN has the benefit of utilizing architectural graphics that facilitates design communication.

【4】 Predicting the Factuality of Reporting of News Media Using Observations About User Attention in Their YouTube Channels 标题：利用对YouTube频道用户注意力的观察预测新闻媒体报道的真实性链接：https://arxiv.org/abs/2108.12519

作者：Krasimira Bozhanova,Yoan Dinkov,Ivan Koychev,Maria Castaldo,Tommaso Venturini,Preslav Nakov 机构：FMI, Sofia University, “St. Kliment Ohridski”, Sofia, Bulgaria, FMI and GATE, Sofia University, Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, GIPSA-lab, F-, France, Qatar Computing Research Institute, HBKU, Doha, Qatar 备注：None 摘要：我们通过研究YouTube频道中的用户注意周期，提出了一个预测新闻媒体报道真实性的新框架。特别是，我们设计了一组丰富的特征，这些特征源于视频的视图、喜欢、不喜欢和评论数量的时间演变，然后我们将其聚合到通道级别。我们为这项任务开发并发布了一个数据集，其中包含489家新闻媒体在YouTube频道上对用户注意力的观察。我们的实验表明，与最先进的文本表示法相比，两者具有互补性，并有相当大的改进。摘要：We propose a novel framework for predicting the factuality of reporting of news media outlets by studying the user attention cycles in their YouTube channels. In particular, we design a rich set of features derived from the temporal evolution of the number of views, likes, dislikes, and comments for a video, which we then aggregate to the channel level. We develop and release a dataset for the task, containing observations of user attention on YouTube channels for 489 news media. Our experiments demonstrate both complementarity and sizable improvements over state-of-the-art textual representations.

其他神经网络|深度学习|模型|建模(18篇)

【1】 DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion 标题：DNNFusion：利用高级算子融合加速深度神经网络的执行链接：https://arxiv.org/abs/2108.13342

作者：Wei Niu,Jiexiong Guan,Yanzhi Wang,Gagan Agrawal,Bin Ren 机构：William & Mary, USA, Northeastern University, USA 摘要：深度神经网络（DNN）已成为移动设备上许多主要应用的核心促成因素。为了实现高精度，DNN模型已经变得越来越深，有数百甚至数千个操作符层，从而导致推理的高内存和计算要求。运算符融合（或内核/层融合）是许多最先进的DNN执行框架（如TensorFlow、TVM和MNN）中的关键优化。然而，这些框架通常采用基于特定模式的融合方法，这些模式限制性太强，无法覆盖操作符和层连接的多样性。另一方面，基于多面体的循环融合技术可以在没有操作员级信息的情况下进行低级的计算，也可能错过潜在的融合机会。为了应对这一挑战，本文提出了一种新颖而广泛的循环融合框架，称为DNNFusion。这项工作的基本思想是从DNN的操作员角度开展工作，但通过开发单个操作员及其组合的分类来扩大融合机会。此外，DNNFusion包括1）一个新的基于数学属性的图重写框架，以降低评估成本并促进后续操作员融合；2）一个集成的融合计划生成，利用高级分析和精确的轻量级分析；3）在融合代码生成过程中进行额外优化。DNNFusion在15个DNN模型上进行了广泛的评估，这些模型具有不同类型的任务、模型大小和层数。评估结果表明，DNNFusion发现了高达8.8倍的融合机会，优于四种最先进的DNN执行框架，加速比为9.3倍。内存需求的减少和加速可以在移动设备上执行许多目标模型，甚至使它们成为实时应用程序的一部分。摘要：Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8x higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3x speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.

【2】 E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling 标题：基于提升模型的在线多选背包电子商务推广个性化链接：https://arxiv.org/abs/2108.13298

作者：Javier Albert,Dmitri Goldenberg 摘要：促销和折扣是现代电子商务平台的重要组成部分，通常用于激励客户完成购买。促销也会影响收入，并可能导致金钱损失，而金钱损失通常受到专用促销预算的限制。我们研究了在线约束多项选择促销个性化问题，其中优化目标是为每个客户选择要提供的促销，以最大限度地提高购买完成率，同时遵守全球预算限制。我们的工作将该问题形式化为一个在线多项选择背包问题，并通过解决具有负权重和负值的案例扩展了现有文献。我们提供了一种实时自适应方法，确保符合预算约束，并在各种数据集上实现99.7%以上的最佳促销效果。我们的方法是在世界领先的在线旅游平台之一的大规模实验研究中评估的。摘要：Promotions and discounts are essential components of modern e-commerce platforms, where they are often used to incentivize customers towards purchase completion. Promotions also affect revenue and may incur a monetary loss that is often limited by a dedicated promotional budget. We study the Online Constrained Multiple-Choice Promotions Personalization Problem, where the optimization goal is to select for each customer which promotion to present in order to maximize purchase completions, while also complying with global budget limitations. Our work formalizes the problem as an Online Multiple Choice Knapsack Problem and extends the existent literature by addressing cases with negative weights and values. We provide a real-time adaptive method that guarantees budget constraints compliance and achieves above 99.7% of the optimal promotional impact on various datasets. Our method is evaluated on a large-scale experimental study at one of the leading online travel platforms in the world.

【3】 Feature Importance in a Deep Learning Climate Emulator 标题：深度学习气候模拟器中的特征重要性链接：https://arxiv.org/abs/2108.13203

作者：Wei Xu,Xihaier Luo,Yihui Ren,Ji Hwan Park,Shinjae Yoo,Balasubramanya T. Nadiga 机构：Computational Science Initiative, Brookhaven National Laboratory, Upton, NY , USA, Los Alamos National Laboratory, Los Alamos, NM , USA 摘要：我们提出了一项使用一类事后局部解释方法的研究，即“理解”深度学习（DL）气候模拟器的特征重要性方法。具体而言，我们考虑多输入单输出仿真器，使用密集的编码器解码器架构，并训练以预测海表面温度（SST）的年际变化在1, 6个月和9个月的提前时间使用前36个月（适当过滤）SST数据。首先，对单个预测采用特征重要性方法，在时空上识别在所选地理区域和所选预测提前期对模型预测非常重要的输入特征。在第二步中，我们还通过考虑训练样本上重要性热图的聚集，从广义上检查特征重要性的行为。我们发现：1）气候模拟器对任何地理位置的预测主要取决于它周围的一个小社区；2）预测提前期越长，“重要性”延伸得越远；3）对于主导秩序，“重要性”的时间衰减与地理位置无关。通过烧蚀实验验证了研究结果。从气候动力学的角度来看，这些结果表明，占主导地位的作用，本地过程和可忽略不计的作用遥遥在空间和时间尺度上，我们考虑。从网络架构的角度来看，我们发现输入和输出之间的时空关系暗示了潜在的模型改进。我们将讨论我们的方法的进一步扩展，其中一些我们正在进行的工作中考虑。摘要：We present a study using a class of post-hoc local explanation methods i.e., feature importance methods for "understanding" a deep learning (DL) emulator of climate. Specifically, we consider a multiple-input-single-output emulator that uses a DenseNet encoder-decoder architecture and is trained to predict interannual variations of sea surface temperature (SST) at 1, 6, and 9 month lead times using the preceding 36 months of (appropriately filtered) SST data. First, feature importance methods are employed for individual predictions to spatio-temporally identify input features that are important for model prediction at chosen geographical regions and chosen prediction lead times. In a second step, we also examine the behavior of feature importance in a generalized sense by considering an aggregation of the importance heatmaps over training samples. We find that: 1) the climate emulator's prediction at any geographical location depends dominantly on a small neighborhood around it; 2) the longer the prediction lead time, the further back the "importance" extends; and 3) to leading order, the temporal decay of "importance" is independent of geographical location. An ablation experiment is adopted to verify the findings. From the perspective of climate dynamics, these findings suggest a dominant role for local processes and a negligible role for remote teleconnections at the spatial and temporal scales we consider. From the perspective of network architecture, the spatio-temporal relations between the inputs and outputs we find suggest potential model refinements. We discuss further extensions of our methods, some of which we are considering in ongoing work.

【4】 Reachability Is NP-Complete Even for the Simplest Neural Networks 标题：即使对于最简单的神经网络，可达性也是NP完全的链接：https://arxiv.org/abs/2108.13179

作者：Marco Sälzer,Martin Lange 机构：School of Electr. Eng. and Computer Science, University of Kassel, Germany 摘要：我们研究了（深度）神经网络可达性问题的复杂性：它是否在给定有效输入的情况下计算有效输出？最近有人声称，对于一般神经网络和连接输入/输出规范，该问题是NP完全问题。我们修复了原始上下限证明中的一些缺陷。然后，我们证明了NP硬度已经适用于限制类的简单规范和只有一层的神经网络，以及对出现的参数要求最低的神经网络。摘要：We investigate the complexity of the reachability problem for (deep) neural networks: does it compute valid output given some valid input? It was recently claimed that the problem is NP-complete for general neural networks and conjunctive input/output specifications. We repair some flaws in the original upper and lower bound proofs. We then show that NP-hardness already holds for restricted classes of simple specifications and neural networks with just one layer, as well as neural networks with minimal requirements on the occurring parameters.

【5】 Data-driven Small-signal Modeling for Converter-based Power Systems 标题：基于换流器的电力系统数据驱动小信号建模链接：https://arxiv.org/abs/2108.13046

作者：Francesca Rossi,Eduardo Prieto-Araujo,Marc Cheah-Mane,Oriol Gomis-Bellmunt 机构：CITCEA-UPC, Department of Electrical Engineering, Universitat Politecnica de Catalunya, Av. Diagonal , Barcelona, Spain, Spain 摘要：本文详细介绍了推导数据驱动小信号模型的完整过程，该模型可用于执行基于变流器的电力系统相关研究。为了计算模型，采用了决策树（DT）回归（使用单DT和集成DT）以及样条回归，并从精度、训练和计算时间方面比较了它们的性能。该方法包括一个全面的逐步开发模型的过程：通过常规模拟和数学模型生成数据、数据库（DBs）安排、回归训练和测试、实现对新实例的预测。该方法已使用基本网络开发，然后在更复杂的系统上进行测试，以证明所建议方法的有效性和实用性。这两个电力系统测试用例都具有基于换流器的电力系统的基本特征，模拟了换流器接口发电的高渗透性和HVDC链路的存在。此外，还提出了如何利用DT回归以可视化方式表示各种系统运行条件下的小信号稳定性分析结果。最后，讨论了该模型可能的应用，强调了该模型在电力系统小信号相关研究中的潜力。摘要：This article details a complete procedure to derive a data-driven small-signal-based model useful to perform converter-based power system related studies. To compute the model, Decision Tree (DT) regression, both using single DT and ensemble DT, and Spline regression have been employed and their performances have been compared, in terms of accuracy, training and computing time. The methodology includes a comprehensive step-by-step procedure to develop the model: data generation by conventional simulation and mathematical models, databases (DBs) arrangement, regression training and testing, realizing prediction for new instances. The methodology has been developed using an essential network and then tested on a more complex system, to show the validity and usefulness of the suggested approach. Both power systems test cases have the essential characteristics of converter-based power systems, simulating high penetration of converter interfaced generation and the presence of HVDC links. Moreover, it is proposed how to represent in a visual manner the results of the small-signal stability analysis for a wide range of system operating conditions, exploiting DT regressions. Finally, the possible applications of the model are discussed, highlighting the potential of the developed model in further power system small-signal related studies.

【6】 Normalizing Field Flows: Solving forward and inverse stochastic differential equations using Physics-Informed flow model 标题：归一化场流：使用物理信息流模型求解正反随机微分方程链接：https://arxiv.org/abs/2108.12956

作者：Ling Guo,Hao Wu,Tao Zhou 机构：Department of Mathematics, Shanghai Normal University, Shanghai, China, School odmathematical sciences, Tongji University, Shanghai, China 摘要：在这项工作中，我们介绍了从散射测量中学习随机场的归一化场流（NFF）。更准确地说，我们在参考随机场（例如，具有Karhunen-Lo\`eve展开结构的高斯随机场）和目标随机场之间构造了一个双射变换（由神经网络表征的规范化流），其中KL展开系数和可逆网络通过最大化分散测量的对数似然和进行训练。该NFF模型可用于在统一框架内求解数据驱动的正、逆和混合正/逆随机偏微分方程。我们证明了所提出的NFF模型学习非高斯过程、混合高斯过程以及正逆随机偏微分方程的能力。摘要：We introduce in this work the normalizing field flows (NFF) for learning random fields from scattered measurements. More precisely, we construct a bijective transformation (a normalizing flow characterizing by neural networks) between a reference random field (say, a Gaussian random field with the Karhunen-Lo\`eve expansion structure) and the target stochastic field, where the KL expansion coefficients and the invertible networks are trained by maximizing the sum of the log-likelihood on scattered measurements. This NFF model can be used to solve data-driven forward, inverse, and mixed forward/inverse stochastic partial differential equations in a unified framework. We demonstrate the capability of the proposed NFF model for learning Non Gaussian processes, mixed Gaussian processes, and forward & inverse stochastic partial differential equations.

【7】 Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks 标题：增长余弦单元：一种新的加速训练和降低卷积神经网络参数的振荡激活函数链接：https://arxiv.org/abs/2108.12943

作者：Mathew Mithra Noel,Arunkumar L,Advait Trivedi,Praneet Dutta 机构：Vellore Institute of Technology 摘要：卷积神经网络已经成功地解决了许多重要的社会和经济问题。他们分层学习复杂高维函数的能力可归因于使用非线性激活函数。使训练深度网络可行的一个关键发现是采用校正线性单元（ReLU）激活函数来缓解使用饱和激活函数引起的消失梯度问题。从那时起，已经提出了许多改进的ReLU激活变体。然而，由于其生物学合理性，目前使用的大多数激活函数都是非振荡和单调递增的。本文证明了振荡激活函数可以改善梯度流，减小网络规模。研究表明，振荡激活函数允许神经元在神经元超平面正半空间和负半空间内部切换分类（输出符号），允许神经元数量较少的复杂决策。提出了一种新的振荡激活函数C（z）=z cos z，它在各种体系结构和基准测试上都优于Sigmoids、Swish、Mish和ReLU。这种新的激活函数甚至允许单个神经元表现出非线性决策边界。本文给出了著名的异或问题的单神经元解法。实验结果表明，用C（z）取代卷积层中的激活函数可以显著提高CIFAR-10、CIFAR-100和Imagenette的性能。摘要：Convolution neural networks have been successful in solving many socially important and economically significant problems. Their ability to learn complex high-dimensional functions hierarchically can be attributed to the use of nonlinear activation functions. A key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function to alleviate the vanishing gradient problem caused by using saturating activation functions. Since then many improved variants of the ReLU activation have been proposed. However a majority of activation functions used today are non-oscillatory and monotonically increasing due to their biological plausibility. This paper demonstrates that oscillatory activation functions can improve gradient flow and reduce network size. It is shown that oscillatory activation functions allow neurons to switch classification (sign of output) within the interior of neuronal hyperplane positive and negative half-spaces allowing complex decisions with fewer neurons. A new oscillatory activation function C(z) = z cos z that outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures and benchmarks is presented. This new activation function allows even single neurons to exhibit nonlinear decision boundaries. This paper presents a single neuron solution to the famous XOR problem. Experimental results indicate that replacing the activation function in the convolutional layers with C(z) significantly improves performance on CIFAR-10, CIFAR-100 and Imagenette.

【8】 CrossedWires: A Dataset of Syntactically Equivalent but Semantically Disparate Deep Learning Models 标题：CrossedWires：句法等价但语义不同的深度学习模型的数据集链接：https://arxiv.org/abs/2108.12768

作者：Max Zvyagin,Thomas Brettin,Arvind Ramanathan,Sumit Kumar Jha 机构：Argonne National Laboratory, Lemont IL , Sumit K. Jha, Computer Science Department, University of Texas at San Antonio 备注：13 pages 摘要：尽管使用相同的神经网络结构和相同的训练超参数（如学习率和优化算法的选择），但使用不同的深度学习框架对神经网络进行训练可能会导致精度水平的显著差异。目前，我们构建标准化深度学习模型的能力受到一套神经网络和相应的训练超参数基准的限制，这些基准暴露了现有深度学习框架之间的差异。在本文中，我们提供了一个模型和超参数的动态数据集，称为CrossedWires，它揭示了两种流行的深度学习框架：PyTorch和Tensorflow之间的语义差异。CrossedWires数据集目前由使用三种不同计算机视觉体系结构（VGG16、ResNet50和DenseNet121）在大型超参数空间中对CIFAR10图像进行训练的模型组成。使用超参数优化，三个模型中的每一个都根据超空间搜索算法建议的400组超参数进行训练。CrossedWires数据集包括Pytork和Tensforflow模型，在语法等效模型和相同超参数选择上的测试精度差异为0.681。本文提供的340 GB数据集和基准包括所有1200个超参数选择的性能统计、训练曲线和模型权重，总共有2400个模型。CrossedWires数据集提供了一个研究跨流行深度学习框架的语法等价模型之间语义差异的机会。此外，从本研究中获得的见解可以帮助开发算法和工具，提高深度学习框架的可靠性和再现性。该数据集可在以下网站免费获取：https://github.com/maxzvyagin/crossedwires 通过Python API和直接下载链接。摘要：The training of neural networks using different deep learning frameworks may lead to drastically differing accuracy levels despite the use of the same neural network architecture and identical training hyperparameters such as learning rate and choice of optimization algorithms. Currently, our ability to build standardized deep learning models is limited by the availability of a suite of neural network and corresponding training hyperparameter benchmarks that expose differences between existing deep learning frameworks. In this paper, we present a living dataset of models and hyperparameters, called CrossedWires, that exposes semantic differences between two popular deep learning frameworks: PyTorch and Tensorflow. The CrossedWires dataset currently consists of models trained on CIFAR10 images using three different computer vision architectures: VGG16, ResNet50 and DenseNet121 across a large hyperparameter space. Using hyperparameter optimization, each of the three models was trained on 400 sets of hyperparameters suggested by the HyperSpace search algorithm. The CrossedWires dataset includes PyTorch and Tensforflow models with test accuracies as different as 0.681 on syntactically equivalent models and identical hyperparameter choices. The 340 GB dataset and benchmarks presented here include the performance statistics, training curves, and model weights for all 1200 hyperparameter choices, resulting in 2400 total models. The CrossedWires dataset provides an opportunity to study semantic differences between syntactically equivalent models across popular deep learning frameworks. Further, the insights obtained from this study can enable the development of algorithms and tools that improve reliability and reproducibility of deep learning frameworks. The dataset is freely available at https://github.com/maxzvyagin/crossedwires through a Python API and direct download link.

【9】 Prototypes-Guided Memory Replay for Continual Learning 标题：用于持续学习的原型引导记忆回放链接：https://arxiv.org/abs/2108.12641

作者：Stella Ho,Ming Liu,Lan Du,Longxiang Gao,Yong Xiang 机构： Gao is with the School of Information Tech-nology, Deakin University 摘要：持续学习（CL）指的是一种机器学习范式，它仅使用少量的训练样本和先前学习的知识来提高学习性能。CL模型以顺序的方式学习来自不同领域的任务。CL的主要困难是由于数据分布的变化而导致的对先前学习任务的灾难性遗忘。现有的CL模型通常采用基于重播的方法来减少灾难性遗忘。大多数CL模型随机选择以前看到的样本来保留所学知识。然而，随着学习任务的积累，占用内存的大小不断增大。因此，我们提出了一种内存有效的CL方法。我们设计了一个动态原型引导记忆回放模块，将其整合到在线元学习模型中。我们进行了大量的文本分类实验，并进一步研究了训练集顺序对CL模型性能的影响。实验结果证明了该方法在缓解灾难性遗忘和实现有效知识转移方面的优越性。摘要：Continual learning (CL) refers to a machine learning paradigm that using only a small account of training samples and previously learned knowledge to enhance learning performance. CL models learn tasks from various domains in a sequential manner. The major difficulty in CL is catastrophic forgetting of previously learned tasks, caused by shifts in data distributions. The existing CL models often employ a replay-based approach to diminish catastrophic forgetting. Most CL models stochastically select previously seen samples to retain learned knowledge. However, occupied memory size keeps enlarging along with accumulating learned tasks. Hereby, we propose a memory-efficient CL method. We devise a dynamic prototypes-guided memory replay module, incorporating it into an online meta-learning model. We conduct extensive experiments on text classification and additionally investigate the effect of training set orders on CL model performance. The experimental results testify the superiority of our method in alleviating catastrophic forgetting and enabling efficient knowledge transfer.

【10】 Layer-wise Model Pruning based on Mutual Information 标题：基于互信息的分层模型剪枝链接：https://arxiv.org/abs/2108.12594

作者：Chun Fan,Jiwei Li,Xiang Ao,Fei Wu,Yuxian Meng,Xiaofei Sun 机构：♦Zhejiang University, ♠Computer Center of Peking University, ⋆Peng Cheng Laboratory, ▼Key Lab of Intelligent Information Processing of Chinese Academy of Sciences, ♣ Shannon.AI 备注：To appear at EMNLP2021 摘要：与基于权重的剪枝技术相比，所提出的剪枝策略具有以下优点：（1）避免了不规则的内存访问，因为表示和矩阵可以压缩到更小但更密集的对应项中，从而导致更大的加速(2）该方法采用自上而下的剪枝方式，基于顶层的训练信号，从更全局的角度进行操作，并通过将全局信号的效果传播到各层来剪枝，从而在相同的稀疏度水平下获得更好的性能。大量实验表明，在相同的稀疏度水平下，该策略比基于权重的剪枝方法（如幅度剪枝、运动剪枝）具有更高的加速比和性能。摘要：The proposed pruning strategy offers merits over weight-based pruning techniques: (1) it avoids irregular memory access since representations and matrices can be squeezed into their smaller but dense counterparts, leading to greater speedup; (2) in a manner of top-down pruning, the proposed method operates from a more global perspective based on training signals in the top layer, and prunes each layer by propagating the effect of global signals through layers, leading to better performances at the same sparsity level. Extensive experiments show that at the same sparsity level, the proposed strategy offers both greater speedup and higher performances than weight-based pruning methods (e.g., magnitude pruning, movement pruning).

【11】 Approximate Bayesian Optimisation for Neural Networks 标题：神经网络的近似贝叶斯优化链接：https://arxiv.org/abs/2108.12461

作者：Nadhir Hassen,Irina Rish 机构：MILA, Quebec, Canada 摘要：已经做了大量工作来自动化机器学习算法，以突出模型选择的重要性。自动选择最佳预测模型及其相应参数的过程可以改善广泛的实际应用。贝叶斯优化（BO）使用黑盒优化方法，通过采集函数根据勘探开发权衡标准提出解决方案。BO框架包含两个关键要素：一个是概率替代模型，该模型由未知目标函数（数据相关）的先验信念组成，另一个是描述模型拟合的最优程度的目标函数。选择最佳模型及其相关超参数可能非常昂贵，并且通常使用高斯过程（GPs）进行拟合，并且在某些情况下由于其难处理性而应用近似推理。然而，由于全球定位系统（GPs）与观测值的数量成立方比例，因此处理优化需要许多评估的目标一直是一个挑战。此外，大多数真实数据集是非平稳的，它们对代理模型做出了理想化的假设。以随机方式解决分析可处理性和计算可行性的必要性能够确保贝叶斯优化的效率和适用性。在本文中，我们探讨了使用神经网络代替GPs来模拟函数上的分布，我们提供了密度比估计和基于近似推理的类概率估计之间的联系，这种重新构造提供了算法效率和可处理性。摘要：A body of work has been done to automate machine learning algorithm to highlight the importance of model choice. Automating the process of choosing the best forecasting model and its corresponding parameters can result to improve a wide range of real-world applications. Bayesian optimisation (BO) uses a blackbox optimisation methods to propose solutions according to an exploration-exploitation trade-off criterion through acquisition functions. BO framework imposes two key ingredients: a probabilistic surrogate model that consist of prior belief of the unknown objective function(data-dependant) and an objective function that describes how optimal is the model-fit. Choosing the best model and its associated hyperparameters can be very expensive, and is typically fit using Gaussian processes (GPs) and at some extends applying approximate inference due its intractability. However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations. In addition, most real-dataset are non-stationary which make idealistic assumptions on surrogate models. The necessity to solve the analytical tractability and the computational feasibility in a stochastic fashion enables to ensure the efficiency and the applicability of Bayesian optimisation. In this paper we explore the use of neural networks as an alternative to GPs to model distributions over functions, we provide a link between density-ratio estimation and class probability estimation based on approximate inference, this reformulation provides algorithm efficiency and tractability.

【12】 Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via Generative Models 标题：基于产生式模型的高维异构数据集多模态数据融合链接：https://arxiv.org/abs/2108.12445

作者：Yasin Yilmaz,Mehmet Aktukmak,Alfred O. Hero 机构：and 摘要：常用的潜在空间嵌入技术，如主成分分析、因子分析和流形学习技术，通常用于学习同质数据的有效表示。然而，它们不容易扩展到由数字和分类变量组合而成的异构数据，例如，由链接GPS和文本数据产生的数据。在本文中，我们感兴趣的是以无监督的方式从高维异构数据中学习概率生成模型。学习的生成模型提供潜在的统一表示，捕捉数据多维度的共同因素，从而能够为各种机器学习任务融合多模态数据。根据贝叶斯方法，我们提出了一个通用框架，通过指数分布族的自然参数化，将不同的数据类型组合在一起。为了将模型推理扩展到具有数千个特征的数百万实例，我们使用拉普拉斯-伯恩斯坦近似进行涉及非线性连接函数的后验计算。针对具有实值（高斯）和分类（多项式）特征的常见异构数据集，详细介绍了该算法。在两个高维异构数据集（NYC-Taxi和MovieLens-10M）上的实验表明，该算法在不同的机器学习任务（如异常检测、数据插补和推荐系统）上具有可扩展性和竞争力。摘要：The commonly used latent space embedding techniques, such as Principal Component Analysis, Factor Analysis, and manifold learning techniques, are typically used for learning effective representations of homogeneous data. However, they do not readily extend to heterogeneous data that are a combination of numerical and categorical variables, e.g., arising from linked GPS and text data. In this paper, we are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion. The learned generative model provides latent unified representations that capture the factors common to the multiple dimensions of the data, and thus enable fusing multimodal data for various machine learning tasks. Following a Bayesian approach, we propose a general framework that combines disparate data types through the natural parameterization of the exponential family of distributions. To scale the model inference to millions of instances with thousands of features, we use the Laplace-Bernstein approximation for posterior computations involving nonlinear link functions. The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features. Experiments on two high-dimensional and heterogeneous datasets (NYC Taxi and MovieLens-10M) demonstrate the scalability and competitive performance of the proposed algorithm on different machine learning tasks such as anomaly detection, data imputation, and recommender systems.

【13】 Functional Nanomaterials Design in the Workflow of Building Machine-Learning Models 标题：构建机器学习模型工作流程中的功能纳米材料设计链接：https://arxiv.org/abs/2108.13171

作者：Zhexu Xi 机构：Bristol Centre for Functional Nanomaterials, University of Bristol, Bristol, UK 备注：12 pages, 1 figure, 84 references 摘要：机器学习（ML）技术在新型功能材料的设计、合成、制造、表征和应用方面，特别是在纳米尺度上，取得了快速、高效的发现，从而彻底改变了化学和材料科学的许多研究领域。其原因是时间效率、预测精度和良好的泛化能力，逐渐取代了传统的实验或计算工作。ML具有解决更多现实问题的巨大潜力，它在构建ML模型的基本程序下，提供了对分子/材料组合的更全面的洞察，如根据给定参数预测特性或功能，纳米架构设计和生成用于其他目的的特定模型。纳米材料发现进展的关键在于如何将输入指纹和输出值定量地联系起来。最后，总结了该领域的一些重大机遇和技术挑战。摘要：Machine-learning (ML) techniques have revolutionized a host of research fields of chemical and materials science with accelerated, high-efficiency discoveries in design, synthesis, manufacturing, characterization and application of novel functional materials, especially at the nanometre scale. The reason is the time efficiency, prediction accuracy and good generalization abilities, which gradually replaces the traditional experimental or computational work. With enormous potentiality to tackle more real-world problems, ML provides a more comprehensive insight into combinations with molecules/materials under the fundamental procedures for constructing ML models, like predicting properties or functionalities from given parameters, nanoarchitecture design and generating specific models for other purposes. The key to the advances in nanomaterials discovery is how input fingerprints and output values can be linked quantitatively. Finally, some great opportunities and technical challenges are concluded in this fantastic field.

【14】 Thermodynamics-based Artificial Neural Networks (TANN) for multiscale modeling of materials with inelastic microstructure 标题：基于热力学的人工神经网络(TANN)在非弹性微结构材料多尺度建模中的应用链接：https://arxiv.org/abs/2108.13137

作者：Filippo Masi,Ioannis Stefanou 机构：Institut de Recherche en G´enie Civil et M´ecanique, UMR , CNRS, Ecole Centrale de Nantes, Universit´e de Nantes, rue de la No¨e, F-, Nantes, France. 摘要：具有微观结构的非弹性材料的力学行为非常复杂，很难用启发式的经验本构模型来理解。为此，多尺度均匀化方法通常用于对微观结构固体的宏观力学行为进行可靠、准确的预测。然而，这种方法的计算成本非常高，并且对于涉及非弹性材料的实际规模应用来说是禁止的。近年来，基于深度学习的数据驱动方法已成为替代特殊本构关系和加速多尺度数值方法的一种有希望的替代方法。然而，这些方法缺乏基于物理定律的严格框架。因此，它们在非弹性复杂微观结构模型材料中的应用尚未建立。在这里，我们提出了基于热力学的人工神经网络（TANN）用于非弹性和复杂微观结构材料的本构建模。我们的方法集成了热力学感知降维技术和深度神经网络来识别复杂非弹性材料的本构关系和内部状态变量。在微观和宏观尺度上，通过几个例子证明了TANN在提供高保真度、物理一致性预测方面的能力。特别是，我们展示了TANN在预测非弹性中规则和扰动晶格微观结构的平均和局部应力应变响应、内能和耗散方面的效率和准确性。最后，采用双尺度均匀化方法求解了一个大尺度边值问题。通过详细的比较说明了使用TANN的均匀化模型的高性能。对于各种单调的和循环的应力-应变路径，结果显示了极好的一致性。摘要：The mechanical behavior of inelastic materials with microstructure is very complex and hard to grasp with heuristic, empirical constitutive models. For this purpose, multiscale, homogenization approaches are often used for performing reliable, accurate predictions of the macroscopic mechanical behavior of microstructured solids. Nevertheless, the calculation cost of such approaches is extremely high and prohibitive for real-scale applications involving inelastic materials. Recently, data-driven approaches based on deep learning have risen as a promising alternative to replace ad-hoc constitutive laws and speed-up multiscale numerical methods. However, such approaches lack a rigorous frame based on the laws of physics. As a result, their application to model materials with complex microstructure in inelasticity is not yet established. Here, we propose Thermodynamics-based Artificial Neural Networks (TANN) for the constitutive modeling of materials with inelastic and complex microstructure. Our approach integrates thermodynamics-aware dimensionality reduction techniques and deep neural networks to identify the constitutive laws and the internal state variables of complex inelastic materials. The ability of TANN in delivering high-fidelity, physically consistent predictions is demonstrated through several examples both at the microscopic and macroscopic scale. In particular, we show the efficiency and accuracy of TANN in predicting the average and local stress-strain response, the internal energy and the dissipation of both regular and perturbed lattice microstructures in inelasticity. Finally, a double-scale homogenization scheme is used to solve a large scale boundary value problem. The high performance of the homogenized model using TANN is illustrated through detailed comparisons. An excellent agreement is shown for a variety of monotonous and cyclic stress-strain paths.

【15】 Photonic Quantum Policy Learning in OpenAI Gym 标题：OpenAI健身房中的光子量子策略学习链接：https://arxiv.org/abs/2108.12926

作者：Dániel Nagy,Zsolt Tabi,Péter Hága,Zsófia Kallus,Zoltán Zimborás 机构：Ericsson Hungary and, E¨otv¨os Lor´and University, Budapest, Hungary, P´eter H´aga, Ericsson Research, Zs´ofia Kallus, Zolt´an Zimbor´as, Wigner Research Centre for Physics and, MTA-BME Lend¨ulet QIT Research Group 备注：7 pages 摘要：近几年来，短期噪声中尺度量子（NISQ）计算设备已成为可能。利用NISQ量子计算机原型最有前途的应用领域之一是量子机器学习。虽然量子神经网络被广泛用于监督学习，但量子强化学习仍然是这一领域的一个新兴领域。为了解决一个经典的连续控制问题，我们使用了连续变量量子机器学习方法。我们介绍了光子变分量子代理的近端策略优化，并研究了数据重新上传的效果。我们通过使用草莓场、光子模拟器Fock后端和连接到OpenAI健身房环境和TensorFlow的混合训练框架进行实证研究，提出性能评估。对于受限CartPole问题，光子策略学习的两种变体实现了类似的性能水平，并且比具有相同可训练参数数的基线经典神经网络具有更快的收敛速度。摘要：In recent years, near-term noisy intermediate scale quantum (NISQ) computing devices have become available. One of the most promising application areas to leverage such NISQ quantum computer prototypes is quantum machine learning. While quantum neural networks are widely studied for supervised learning, quantum reinforcement learning is still just an emerging field of this area. To solve a classical continuous control problem, we use a continuous-variable quantum machine learning approach. We introduce proximal policy optimization for photonic variational quantum agents and also study the effect of the data re-uploading. We present performance assessment via empirical study using Strawberry Fields, a photonic simulator Fock backend and a hybrid training framework connected to an OpenAI Gym environment and TensorFlow. For the restricted CartPole problem, the two variations of the photonic policy learning achieve comparable performance levels and a faster convergence than the baseline classical neural network of same number of trainable parameters.

【16】 Generalized Huber Loss for Robust Learning and its Efficient Minimization for a Robust Statistics 标题：稳健学习的广义Huber损失及其稳健统计量的有效最小化链接：https://arxiv.org/abs/2108.12627

作者：Kaan Gokcesu,Hakan Gokcesu 摘要：我们提出了Huber损失的一个广义公式。我们证明了选择合适的函数，特别是log-exp变换；我们可以得到一个损失函数，它结合了绝对损失和二次损失的理想性质。我们提供了一个算法来寻找这种损失函数的最小值，并表明找到一个集中度量并不比传统的均值和中值困难多少。摘要：We propose a generalized formulation of the Huber loss. We show that with a suitable function of choice, specifically the log-exp transform; we can achieve a loss function which combines the desirable properties of both the absolute and the quadratic loss. We provide an algorithm to find the minimizer of such loss functions and show that finding a centralizing metric is not that much harder than the traditional mean and median.

【17】 Convolutional Autoencoders for Reduced-Order Modeling 标题：用于降阶建模的卷积自动编码器链接：https://arxiv.org/abs/2108.12453

作者：Sreeram Venkat,Ralph C. Smith,Carl T. Kelley 机构：Department of Mathematics, North Carolina State University, Raleigh, NC , USA 摘要：在动力系统降阶模型的构造中，通常采用线性投影方法，如适当的正交分解。然而，对于许多动力系统，状态空间的低维表示可以用非线性流形最精确地描述。以前的研究表明，深度学习可以提供执行非线性降维的有效方法，尽管它们依赖于训练数据的可用性，并且通常是针对特定问题的。在这里，我们利用随机训练数据创建和训练卷积自动编码器，对wave和Kuramoto-Shivasinsky方程执行非线性降维。此外，我们提出了独立于全阶模型样本的训练方法，并使用流形最小二乘Petrov-Galerkin投影方法，使用相同的自动编码器为热浪和Kuramoto-Shivasinsky方程定义降阶模型。摘要：In the construction of reduced-order models for dynamical systems, linear projection methods, such as proper orthogonal decompositions, are commonly employed. However, for many dynamical systems, the lower dimensional representation of the state space can most accurately be described by a \textit{nonlinear} manifold. Previous research has shown that deep learning can provide an efficient method for performing nonlinear dimension reduction, though they are dependent on the availability of training data and are often problem-specific \citep[see][]{carlberg_ca}. Here, we utilize randomized training data to create and train convolutional autoencoders that perform nonlinear dimension reduction for the wave and Kuramoto-Shivasinsky equations. Moreover, we present training methods that are independent of full-order model samples and use the manifold least-squares Petrov-Galerkin projection method to define a reduced-order model for the heat, wave, and Kuramoto-Shivasinsky equations using the same autoencoder.

【18】 Stock prices assessment: proposal of a new index based on volume weighted historical prices through the use of computer modeling 标题：股价评估：通过使用计算机建模，根据成交量加权的历史价格提出一个新的指数链接：https://arxiv.org/abs/1206.5224

作者：Tiago Colliri,Fernando F. Ferreira 机构：Universidade de São Paulo, Brazil 备注：9 pages, 15 figures 摘要：考虑成交量以分析股价变动的重要性可以被认为是金融领域公认的做法。然而，当我们审视这一领域的科学成果时，我们仍然无法找到一个统一的模型，该模型包含了用于库存评估的数量和价格变化。在本文中，我们提出了一个可以弥补这一差距的计算机模型，提出了一个新的指数，根据股票的历史价格和交易量来评估股票价格。除了可以认为该模型在数学上非常简单外，它还能够显著提高使用真实金融数据的代理的性能。基于获得的结果，以及我们模型的非常直观的逻辑，我们相信这里提出的指数可以非常有用地帮助投资者确定在金融市场买卖股票的理想价格范围。摘要：The importance of considering the volumes to analyze stock prices movements can be considered as a well-accepted practice in the financial area. However, when we look at the scientific production in this field, we still cannot find a unified model that includes volume and price variations for stock assessment purposes. In this paper we present a computer model that could fulfill this gap, proposing a new index to evaluate stock prices based on their historical prices and volumes traded. Besides the model can be considered mathematically very simple, it was able to improve significantly the performance of agents operating with real financial data. Based on the results obtained, and also on the very intuitive logic of our model, we believe that the index proposed here can be very useful to help investors on the activity of determining ideal price ranges for buying and selling stocks in the financial market.

其他(21篇)

【1】 Trustworthy AI for Process Automation on a Chylla-Haase Polymerization Reactor 标题：在Chylla-Haase聚合反应器上实现过程自动化的值得信赖的人工智能链接：https://arxiv.org/abs/2108.13381

作者：Daniel Hein,Daniel Labisch 机构：⋆⋆Siemens AG, Digital Industries, Karlsruhe, Germany (e-mail: 备注：None 摘要：本文利用遗传规划强化学习（GPRL）为乳糜蛋白酶聚合反应器生成人类可解释的控制策略。这种带夹套冷却的连续搅拌釜式反应器（CSTR）广泛应用于化工行业，用于精细化学品、颜料、聚合物和医疗产品的生产。尽管看起来相当简单，但在实际应用中控制CSTR是一个相当具有挑战性的问题。GPRL利用来自反应堆的现有数据，并完全自动生成一组优化的简单控制策略，即领域专家可以选择的所谓策略。请注意，这些策略是低复杂度的白盒模型，这使得它们易于在目标控制系统（如SIMATIC PCS 7）中验证和实施。然而，尽管其复杂度较低，自动生成的策略在反应堆温度控制偏差方面产生了高性能，我们在原始反应器模板上进行了经验评估。摘要：In this paper, genetic programming reinforcement learning (GPRL) is utilized to generate human-interpretable control policies for a Chylla-Haase polymerization reactor. Such continuously stirred tank reactors (CSTRs) with jacket cooling are widely used in the chemical industry, in the production of fine chemicals, pigments, polymers, and medical products. Despite appearing rather simple, controlling CSTRs in real-world applications is quite a challenging problem to tackle. GPRL utilizes already existing data from the reactor and generates fully automatically a set of optimized simplistic control strategies, so-called policies, the domain expert can choose from. Note that these policies are white-box models of low complexity, which makes them easy to validate and implement in the target control system, e.g., SIMATIC PCS 7. However, despite its low complexity the automatically-generated policy yields a high performance in terms of reactor temperature control deviation, which we empirically evaluate on the original reactor template.

【2】 On the approximation of a matrix 标题：关于矩阵的逼近链接：https://arxiv.org/abs/2108.13195

作者：Samriddha Sanyal 机构：Indian Statistical Institute, B.T. road, Kolkata , India 摘要：设$F^{*}$是由非随机方法导出的给定$（a\乘以b）$矩阵$F$的近似值。我们证明了对于给定的$F$和$F^{*}$，$H$和$T$可以通过随机算法计算，使得$（HT）$比$F^{*}$更接近$F$。摘要：Let $F^{*}$ be an approximation of a given $(a \times b)$ matrix $F$ derived by methods that are not randomized. We prove that for a given $F$ and $F^{*}$, $H$ and $T$ can be computed by randomized algorithm such that $(HT)$ is an approximation of $F$ better than $F^{*}$.

【3】 Investigating Vulnerabilities of Deep Neural Policies 标题：调查深层神经策略的脆弱性链接：https://arxiv.org/abs/2108.13093

作者：Ezgi Korkmaz 机构：KTH Royal Institute of Technology , Stockholm, Sweden. 备注：Presented at the Conference on Uncertainty in Artificial Intelligence (UAI) 2021 摘要：基于深度神经网络的强化学习策略很容易受到其输入的不可察觉的对抗性扰动的影响，这与神经网络图像分类器非常相似。最近的工作提出了几种方法，以提高深度强化学习代理在存在这些不可察觉干扰（即对抗性训练）的情况下基于训练的对抗性干扰的鲁棒性。在本文中，我们研究了对抗性训练对agent学习的神经策略的影响。特别是，我们采用两种不同的平行方法，研究基于最坏情况分布转移和特征敏感性的深层神经策略对抗性训练的结果。对于第一种方法，我们比较了针对对抗训练和普通训练神经策略计算的最小扰动的傅里叶谱。通过在OpenAI Atari环境中的实验，我们表明，针对敌对训练策略计算的最小扰动更集中于傅立叶域中的低频，这表明这些策略对低频扰动的敏感性更高。对于第二种方法，我们提出了一种新的方法来测量深度神经策略的特征敏感性，并比较了最先进的对抗训练深度神经策略和普通训练深度神经策略的这些特征敏感性差异。我们相信，我们的研究结果可以作为了解对抗性训练与神经策略鲁棒性不同概念之间关系的第一步。摘要：Reinforcement learning policies based on deep neural networks are vulnerable to imperceptible adversarial perturbations to their inputs, in much the same way as neural network image classifiers. Recent work has proposed several methods to improve the robustness of deep reinforcement learning agents to adversarial perturbations based on training in the presence of these imperceptible perturbations (i.e. adversarial training). In this paper, we study the effects of adversarial training on the neural policy learned by the agent. In particular, we follow two distinct parallel approaches to investigate the outcomes of adversarial training on deep neural policies based on worst-case distributional shift and feature sensitivity. For the first approach, we compare the Fourier spectrum of minimal perturbations computed for both adversarially trained and vanilla trained neural policies. Via experiments in the OpenAI Atari environments we show that minimal perturbations computed for adversarially trained policies are more focused on lower frequencies in the Fourier domain, indicating a higher sensitivity of these policies to low frequency perturbations. For the second approach, we propose a novel method to measure the feature sensitivities of deep neural policies and we compare these feature sensitivity differences in state-of-the-art adversarially trained deep neural policies and vanilla trained deep neural policies. We believe our results can be an initial step towards understanding the relationship between adversarial training and different notions of robustness for neural policies.

【4】 GeoVectors: A Linked Open Corpus of OpenStreetMap Embeddings on World Scale 标题：GeoVectors：世界尺度上OpenStreetMap嵌入的链接开放语料库链接：https://arxiv.org/abs/2108.13092

作者：Nicolas Tempelmeier,Simon Gottschalk,Elena Demidova 机构：L,S Research Center, Leibniz Universität Hannover, Hannover, Germany, Data Science & Intelligent Systems (DSIS), University of Bonn, Bonn, Germany 备注：None 摘要：OpenStreetMap（OSM）是目前世界上最丰富的地理实体（如建筑物和道路）公开信息源。然而，在机器学习模型和其他应用中使用OSM实体具有挑战性，因为OSM规模巨大，实体注释的极端异构性，并且缺乏定义良好的本体来描述实体语义和属性。本文介绍了GeoVectors——一个独特的、全面的、世界范围的OSM实体嵌入开放语料库，覆盖了整个OSM数据集，提供了180个国家超过9.8亿个地理实体的潜在表示。GeoVectors语料库捕获OSM实体的语义和地理维度，并使机器学习算法和语义应用程序可以直接访问这些实体。我们创建GeoVectors语料库的语义描述，包括指向Wikidata和DBpedia知识图的身份链接，以提供上下文信息。此外，我们还提供了SPARQL端点—一个语义接口，它提供了对OSM中地理实体的语义和潜在表示的直接访问。摘要：OpenStreetMap (OSM) is currently the richest publicly available information source on geographic entities (e.g., buildings and roads) worldwide. However, using OSM entities in machine learning models and other applications is challenging due to the large scale of OSM, the extreme heterogeneity of entity annotations, and a lack of a well-defined ontology to describe entity semantics and properties. This paper presents GeoVectors - a unique, comprehensive world-scale linked open corpus of OSM entity embeddings covering the entire OSM dataset and providing latent representations of over 980 million geographic entities in 180 countries. The GeoVectors corpus captures semantic and geographic dimensions of OSM entities and makes these entities directly accessible to machine learning algorithms and semantic applications. We create a semantic description of the GeoVectors corpus, including identity links to the Wikidata and DBpedia knowledge graphs to supply context information. Furthermore, we provide a SPARQL endpoint - a semantic interface that offers direct access to the semantic and latent representations of geographic entities in OSM.

【5】 To tune or not to tune? An Approach for Recommending Important Hyperparameters 标题：调音还是不调音？一种推荐重要超参数的方法链接：https://arxiv.org/abs/2108.13066

作者：Mohamadjavad Bahmani,Radwa El Shawi,Nshan Potikyan,Sherif Sakr 机构：Data Systems Group, University of Tartu, Tartu, Estonia 备注：Presented on The Fifth International Workshop on Automation in Machine Learning, A workshop to be held in conjunction with the KDD 2021 Conference 摘要：自动机器学习中的新技术减轻了算法选择和超参数优化的复杂性。超参数对于机器学习模型非常重要，因为它们显著影响机器学习模型的性能。许多优化技术在超参数调整方面取得了显著的成功，超过了人类专家的性能。然而，依靠诸如黑盒算法之类的技术，机器学习实践者可能无法深入了解不同超参数的相对重要性。在本文中，我们考虑建立机器学习模型的性能和它们的超参数之间的关系，发现趋势和增益洞察力，基于六个分类器和200个数据集的实证结果。我们的结果使用户能够决定是否值得执行可能耗时的调优策略，关注最重要的超参数，并选择足够的超参数空间进行调优。实验结果表明，梯度boosting和Adaboost在200个问题上优于其他分类器。但是，它们需要调整以提高性能。总的来说，从这项研究中获得的结果提供了一个定量基础，以集中精力进行引导式自动化超参数优化，并有助于开发更好的自动化机器学习框架。摘要：Novel technologies in automated machine learning ease the complexity of algorithm selection and hyperparameter optimization. Hyperparameters are important for machine learning models as they significantly influence the performance of machine learning models. Many optimization techniques have achieved notable success in hyperparameter tuning and surpassed the performance of human experts. However, depending on such techniques as blackbox algorithms can leave machine learning practitioners without insight into the relative importance of different hyperparameters. In this paper, we consider building the relationship between the performance of the machine learning models and their hyperparameters to discover the trend and gain insights, with empirical results based on six classifiers and 200 datasets. Our results enable users to decide whether it is worth conducting a possibly time-consuming tuning strategy, to focus on the most important hyperparameters, and to choose adequate hyperparameter spaces for tuning. The results of our experiments show that gradient boosting and Adaboost outperform other classifiers across 200 problems. However, they need tuning to boost their performance. Overall, the results obtained from this study provide a quantitative basis to focus efforts toward guided automated hyperparameter optimization and contribute toward the development of better-automated machine learning frameworks.

【6】 Auto-Split: A General Framework of Collaborative Edge-Cloud AI 标题：Auto-Split：一种协同边缘云人工智能的通用框架链接：https://arxiv.org/abs/2108.13041

作者：Amin Banitalebi-Dehkordi,Naveen Vedula,Jian Pei,Fei Xia,Lanjun Wang,Yong Zhang 机构：Huawei Technologies Canada Co. Ltd., Vancouver, Canada, School of Computing Science, Simon Fraser University, Shenzhen, China 备注：None 摘要：在许多行业规模的应用程序中，大型且消耗资源的机器学习模型驻留在功能强大的云服务器中。同时，大量的输入数据被收集在云的边缘。推理结果也会传递给用户或传递给边缘的下游任务。edge通常由大量低功耗设备组成。设计工业产品以支持复杂的深层模型部署并以高效的方式进行模型推理，从而保持高的模型精度和低的端到端延迟，这是一个巨大的挑战。本文描述了华为云的边缘云协作原型Auto Split背后的技术和工程实践。这项专利技术已经在选定的应用程序上得到验证，正在进行更广泛的系统边缘云应用程序集成，并作为端到端云边缘协作智能部署的自动化管道服务提供给公众使用。据我们所知，目前还没有提供深度神经网络（DNN）拆分功能的现有工业产品。摘要：In many industry scale applications, large and resource consuming machine learning models reside in powerful cloud servers. At the same time, large amounts of input data are collected at the edge of cloud. The inference results are also communicated to users or passed to downstream tasks at the edge. The edge often consists of a large number of low-power devices. It is a big challenge to design industry products to support sophisticated deep model deployment and conduct model inference in an efficient manner so that the model accuracy remains high and the end-to-end latency is kept low. This paper describes the techniques and engineering practice behind Auto-Split, an edge-cloud collaborative prototype of Huawei Cloud. This patented technology is already validated on selected applications, is on its way for broader systematic edge-cloud application integration, and is being made available for public use as an automated pipeline service for end-to-end cloud-edge collaborative intelligence deployment. To the best of our knowledge, there is no existing industry product that provides the capability of Deep Neural Network (DNN) splitting.

【7】 Evaluating Bayes Error Estimators on Read-World Datasets with FeeBee 标题：用FeeBee评估读世界数据集上的Bayes误差估计器链接：https://arxiv.org/abs/2108.13034

作者：Cedric Renggli,Luka Rimanic,Nora Hollenstein,Ce Zhang 机构：ETH Zurich, University of Copenhagen 备注：arXiv admin note: text overlap with arXiv:2010.08410 摘要：贝叶斯错误率（BER）是机器学习中的一个基本概念，它量化了任何分类器在固定概率分布上可能达到的最佳精度。尽管多年来一直在研究建立误码率上下限的估计器，但这些估计器通常仅在已知概率分布的合成数据集上进行比较，留下两个关键问题尚未解答：（1）它们在真实数据集上的性能如何？（2）它们的实用性如何？回答这些问题并非小事。除了对真实数据集未知误码率的明显挑战外，任何误码率估计器都需要克服两个主要方面才能适用于真实环境：（1）计算和样本复杂性，（2）超参数的敏感性和选择。在这项工作中，我们提出了FeeBee，这是分析和比较任何具有未知概率分布的现代真实数据集上的误码率估计器的第一个原则框架。我们通过注入受控数量的标签噪声并对一系列不同的噪声级进行多重评估来实现这一点，理论结果支持这一点，从而得出关于误码率演变的结论。通过在计算机视觉和NLP领域的6个常用数据集上实施和分析7个多类误码率估计器，FeeBee允许对这些估计器进行彻底的研究，清楚地确定每个估计器的优缺点，同时可以轻松地部署到任何未来的误码率估计器上。摘要：The Bayes error rate (BER) is a fundamental concept in machine learning that quantifies the best possible accuracy any classifier can achieve on a fixed probability distribution. Despite years of research on building estimators of lower and upper bounds for the BER, these were usually compared only on synthetic datasets with known probability distributions, leaving two key questions unanswered: (1) How well do they perform on real-world datasets?, and (2) How practical are they? Answering these is not trivial. Apart from the obvious challenge of an unknown BER for real-world datasets, there are two main aspects any BER estimator needs to overcome in order to be applicable in real-world settings: (1) the computational and sample complexity, and (2) the sensitivity and selection of hyper-parameters. In this work, we propose FeeBee, the first principled framework for analyzing and comparing BER estimators on any modern real-world dataset with unknown probability distribution. We achieve this by injecting a controlled amount of label noise and performing multiple evaluations on a series of different noise levels, supported by a theoretical result which allows drawing conclusions about the evolution of the BER. By implementing and analyzing 7 multi-class BER estimators on 6 commonly used datasets of the computer vision and NLP domains, FeeBee allows a thorough study of these estimators, clearly identifying strengths and weaknesses of each, whilst being easily deployable on any future BER estimator.

【8】 SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts 标题：SHIFT15M：具有分布位移的多目标大规模时尚数据集链接：https://arxiv.org/abs/2108.12992

作者：Masanari Kimura,Takuma Nakamura,Yuki Saito 机构：ZOZO Research, Department of Statistical Science, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies (SOKENDAI) 摘要：许多机器学习算法假设训练数据和测试数据遵循相同的分布。然而，在现实世界的机器学习问题中，这样的假设经常被违背。在本文中，我们提出了SHIFT15M，这是一个数据集，可用于在数据分布在训练和测试之间发生变化的情况下正确评估模型。SHIFT15M数据集有几个好的特性：（i）多目标。数据集中的每个实例都有几个可用作目标变量的数值(ii）大规模。SHIFT15M数据集由1500万张时装图片组成(iii）数据集转移类型的覆盖范围。SHIFT15M包含多个数据集移位问题设置（例如，协变量移位或目标移位）。SHIFT15M还可以通过切换幅度，在不同数据集移位幅度下对模型进行性能评估。此外，我们还提供软件，以非常简单的方式处理移位15米：https://github.com/st-tech/zozo-shift15m. 摘要：Many machine learning algorithms assume that the training data and the test data follow the same distribution. However, such assumptions are often violated in real-world machine learning problems. In this paper, we propose SHIFT15M, a dataset that can be used to properly evaluate models in situations where the distribution of data changes between training and testing. The SHIFT15M dataset has several good properties: (i) Multiobjective. Each instance in the dataset has several numerical values that can be used as target variables. (ii) Large-scale. The SHIFT15M dataset consists of 15million fashion images. (iii) Coverage of types of dataset shifts. SHIFT15M contains multiple dataset shift problem settings (e.g., covariate shift or target shift). SHIFT15M also enables the performance evaluation of the model under various magnitudes of dataset shifts by switching the magnitude. In addition, we provide software to handle SHIFT15M in a very simple way: https://github.com/st-tech/zozo-shift15m.

【9】 Approximating Pandora's Box with Correlations 标题：用相关函数逼近潘多拉盒子链接：https://arxiv.org/abs/2108.12976

作者：Shuchi Chawla,Evangelia Gergatsouli,Jeremy McMahan,Christos Tzamos 机构：UW-Madison, The Pandora’s Box problem asks to find a search strategy over n alternatives given stochastic, information about their values, aiming to minimize the sum of the search cost and the value of the 摘要：潘多拉盒子问题要求在给定关于其值的随机信息的情况下，在$n$备选方案上找到一种搜索策略，目的是最小化搜索成本和所选备选方案的价值之和。即使我们已经很好地理解了独立分布值的情况，但一旦放弃独立性假设，我们对问题的算法理解就非常有限。我们的工作旨在描述在相关值分布下近似潘多拉盒子问题的复杂性。为此，我们对潘多拉盒子的一个更简单版本进行了总体简化，它只要求找到低于某个阈值的值，并且无需对搜索过程中出现的未来值进行推理。利用这个通用工具，我们研究了两个相关案例；明确给出的$m$支持分布情况和$m$产品分布混合情况$\bullet$在第一种情况下，我们将潘多拉盒子与经过充分研究的最优决策树问题联系起来，获得了$O（\log m）$近似值，但也表明该问题更为简单，因为它与统一决策树问题等价（在常数因子下）$\bullet$在混合产品分布的情况下，问题再次与最优决策树的噪声变量有关，这显然更具挑战性。我们给出了一个常数因子近似值，该近似值在时间$n^{\tilde O（m^2/\varepsilon^2）}$中运行，用于$m$混合成分，其在每个备选方案上的边缘要么相同，要么在TV距离上以$\varepsilon$分隔。摘要：The Pandora's Box problem asks to find a search strategy over $n$ alternatives given stochastic information about their values, aiming to minimize the sum of the search cost and the value of the chosen alternative. Even though the case of independently distributed values is well understood, our algorithmic understanding of the problem is very limited once the independence assumption is dropped. Our work aims to characterize the complexity of approximating the Pandora's Box problem under correlated value distributions. To that end, we present a general reduction to a simpler version of Pandora's Box, that only asks to find a value below a certain threshold, and eliminates the need to reason about future values that will arise during the search. Using this general tool, we study two cases of correlation; the case of explicitly given distributions of support $m$ and the case of mixtures of $m$ product distributions. $\bullet$ In the first case, we connect Pandora's Box to the well studied problem of Optimal Decision Tree, obtaining an $O(\log m)$ approximation but also showing that the problem is strictly easier as it is equivalent (up to constant factors) to the Uniform Decision Tree problem. $\bullet$ In the case of mixtures of product distributions, the problem is again related to the noisy variant of Optimal Decision Tree which is significantly more challenging. We give a constant-factor approximation that runs in time $n^{ \tilde O( m^2/\varepsilon^2 ) }$ for $m$ mixture components whose marginals on every alternative are either identical or separated in TV distance by $\varepsilon$.

【10】 Leveraging Transprecision Computing for Machine Vision Applications at the Edge 标题：利用传输精确度计算实现边缘的机器视觉应用链接：https://arxiv.org/abs/2108.12914

作者：Umar Ibrahim Minhas,Lev Mukhanov,Georgios Karakonstantis,Hans Vandierendonck,Roger Woods 机构：School of Electronics, Electrical Engineering and Computer Engineering, Queens University Belfast, Belfast, UK 摘要：机器视觉任务对资源受限的边缘设备提出了挑战，特别是当它们以可变的工作负载执行多个任务时。需要一种能够在运行时动态调整的健壮方法，同时在资源约束范围内保持最大服务质量（QoS）。本文介绍了一种轻量级的方法，该方法监控运行时工作负载约束，并利用准确性和吞吐量的权衡。包括优化技术，为每个任务找到最佳精度、能量和内存的配置，并管理配置之间的透明切换。对于1%的精度下降，我们显示了1.6倍更高的实现帧处理率，在较低的精度下可能会有进一步的改进。摘要：Machine vision tasks present challenges for resource constrained edge devices, particularly as they execute multiple tasks with variable workloads. A robust approach that can dynamically adapt in runtime while maintaining the maximum quality of service (QoS) within resource constraints, is needed. The paper presents a lightweight approach that monitors the runtime workload constraint and leverages accuracy-throughput trade-off. Optimisation techniques are included which find the configurations for each task for optimal accuracy, energy and memory and manages transparent switching between configurations. For an accuracy drop of 1%, we show a 1.6x higher achieved frame processing rate with further improvements possible at lower accuracy.

【11】 Analyzing and Mitigating Interference in Neural Architecture Search 标题：神经结构搜索中的干扰分析与抑制链接：https://arxiv.org/abs/2108.12821

作者：Jin Xu,Xu Tan,Kaitao Song,Renqian Luo,Yichong Leng,Tao Qin,Tie-Yan Liu,Jian Li 机构：Institute for Interdisciplinary Information Sciences, Tsinghua University, China, University of Science and Technology of China, Microsoft Research Asia 摘要：权重共享已成为减少神经结构搜索（NAS）训练成本的一种方法，它通过重用先前训练的子模型中共享算子的权重。然而，由于权重共享造成不同子模型之间的干扰，这些子模型的估计精度与地面真实度的相关性较低。在本文中，我们通过对不同子模型进行抽样并计算共享算子的梯度相似性来研究干扰问题，并观察到：1）两个子模型之间对共享算子的干扰与它们之间不同算子的数量正相关；2）当共享运算符的输入和输出更相似时，干扰更小。受这两个观察结果的启发，我们提出了两种方法来缓解干扰：1）我们不随机抽样子模型进行优化，而是通过在相邻优化步骤之间修改一个算子来逐步修改方案，以最小化对共享算子的干扰；2）强制所有子模型中操作员的输入和输出相似，以减少干扰。在一个BERT搜索空间上的实验证明，通过我们提出的每一种方法来减少干扰可以提高super-pet的秩相关性，并且结合这两种方法可以获得更好的结果。在GLUE基准的开发和测试集上，我们搜索的体系结构比RoBERTa${\rm base}$高1.1分和0.6分，比ELECTRA${\rm base}$高1.6分和1.1分。对BERT压缩任务、班数据集和其他搜索空间的大量结果也证明了我们提出的方法的有效性和通用性。摘要：Weight sharing has become the \textit{de facto} approach to reduce the training cost of neural architecture search (NAS) by reusing the weights of shared operators from previously trained child models. However, the estimated accuracy of those child models has a low rank correlation with the ground truth accuracy due to the interference among different child models caused by weight sharing. In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe that: 1) the interference on a shared operator between two child models is positively correlated to the number of different operators between them; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar. Inspired by these two observations, we propose two approaches to mitigate the interference: 1) rather than randomly sampling child models for optimization, we propose a gradual modification scheme by modifying one operator between adjacent optimization steps to minimize the interference on the shared operators; 2) forcing the inputs and outputs of the operator across all child models to be similar to reduce the interference. Experiments on a BERT search space verify that mitigating interference via each of our proposed methods improves the rank correlation of super-pet and combining both methods can achieve better results. Our searched architecture outperforms RoBERTa$_{\rm base}$ by 1.1 and 0.6 scores and ELECTRA$_{\rm base}$ by 1.6 and 1.1 scores on the dev and test set of GLUE benchmark. Extensive results on the BERT compression task, SQuAD datasets and other search spaces also demonstrate the effectiveness and generality of our proposed methods.

【12】 Certifying One-Phase Technology-Assisted Reviews 标题：认证一阶段技术辅助评审链接：https://arxiv.org/abs/2108.12746

作者：David D. Lewis,Eugene Yang,Ophir Frieder 机构：Reveal-Brainspace, Chicago, IL, USA, IR Lab, Georgetown University, Washington, DC, USA 备注：10 pages, 4 figures, accepted at CIKM 2021 摘要：基于迭代主动学习的技术辅助评审（TAR）工作流广泛应用于文档评审应用中。大多数单阶段TAR工作流的停止规则缺乏有效的统计保证，这在某些法律环境中阻碍了它们的使用。基于分位数估计理论，我们为单相TAR提供了第一个广泛适用且统计上有效的基于样本的停止规则。我们进一步从理论和经验上证明，超过召回目标（在过去的停止规则评估中被视为无害或可取）是单阶段TAR工作流中超额成本的主要来源。与直觉相反，在几乎所有情况下，为减少过度召回而产生更大的抽样成本会降低总成本。摘要：Technology-assisted review (TAR) workflows based on iterative active learning are widely used in document review applications. Most stopping rules for one-phase TAR workflows lack valid statistical guarantees, which has discouraged their use in some legal contexts. Drawing on the theory of quantile estimation, we provide the first broadly applicable and statistically valid sample-based stopping rules for one-phase TAR. We further show theoretically and empirically that overshooting a recall target, which has been treated as innocuous or desirable in past evaluations of stopping rules, is a major source of excess cost in one-phase TAR workflows. Counterintuitively, incurring a larger sampling cost to reduce excess recall leads to lower total cost in almost all scenarios.

【13】 Characterizing Malicious URL Campaigns 标题：描述恶意URL活动的特征链接：https://arxiv.org/abs/2108.12726

作者：Mahathir Almashor,Ejaz Ahmed,Benjamin Pick,Sharif Abuadbba,Raj Gaire,Seyit Camtepe,Surya Nepal 机构：Shuo Wang∗† 摘要：URL是众多网络安全威胁的核心，从网络钓鱼到恶意软件的传播。攻击者不断滥用其固有的易用性和熟悉性来逃避防御和欺骗最终用户。看似不同的URL被有组织地用于实施网络钓鱼攻击和分发恶意软件。我们将此类行为称为战役，其假设是，经常协调攻击以最大化成功率并制定规避战术。其目的是更好地了解活动，加强我们对其特点的掌握，从而帮助社区制定更强有力的解决方案。为此，我们对2019年12月至2020年1月提交给VirusTotal的3.11亿条记录进行了广泛的研究和分析，这些记录包含770万个唯一的真实URL。从该数据集中，根据附加的元数据识别出260万个可疑活动，其中77810个被双重验证为恶意活动。利用这些恶意活动中的3810万条记录和990万个URL，我们提供了不同的见解，如其目标受害者品牌以及URL大小和异构性。观察到了一些令人惊讶的发现，例如，对于使用100多个独特URL的活动，检测率下降到仅13.27%。本文最后通过几个案例研究说明了攻击者为威胁用户和规避防御而使用的常见恶意技术。摘要：URLs are central to a myriad of cyber-security threats, from phishing to the distribution of malware. Their inherent ease of use and familiarity is continuously abused by attackers to evade defences and deceive end-users. Seemingly dissimilar URLs are being used in an organized way to perform phishing attacks and distribute malware. We refer to such behaviours as campaigns, with the hypothesis being that attacks are often coordinated to maximize success rates and develop evasion tactics. The aim is to gain better insights into campaigns, bolster our grasp of their characteristics, and thus aid the community devise more robust solutions. To this end, we performed extensive research and analysis into 311M records containing 77M unique real-world URLs that were submitted to VirusTotal from Dec 2019 to Jan 2020. From this dataset, 2.6M suspicious campaigns were identified based on their attached metadata, of which 77,810 were doubly verified as malicious. Using the 38.1M records and 9.9M URLs within these malicious campaigns, we provide varied insights such as their targeted victim brands as well as URL sizes and heterogeneity. Some surprising findings were observed, such as detection rates falling to just 13.27% for campaigns that employ more than 100 unique URLs. The paper concludes with several case-studies that illustrate the common malicious techniques employed by attackers to imperil users and circumvent defences.

【14】 Master memory function for delay-based reservoir computers with single-variable dynamics 标题：基于延迟的单变量动态油藏计算机的主存功能链接：https://arxiv.org/abs/2108.12643

作者：Felix Köster,Serhiy Yanchuk,Kathy Lüdge 机构：Master memory function for delay-based, reservoir computers with single-variable, dynamics 备注：To be published 摘要：我们证明了文献中考虑的许多基于延迟的水库计算机可以用通用主存储器函数（MMF）来描述。一旦计算出两个独立的参数，该函数就为任何具有小输入的基于延迟的单变量库提供线性存储容量。此外，我们提出了一个MMF的分析描述，使其能够高效快速地计算。我们的方法不仅适用于由已知动力学规则（如Mackey Glass或Ikeda-like系统）控制的水库，也适用于动力学模型不可用的水库。我们还比较了水库计算机的性能和MMF给出的内存容量。摘要：We show that many delay-based reservoir computers considered in the literature can be characterized by a universal master memory function (MMF). Once computed for two independent parameters, this function provides linear memory capacity for any delay-based single-variable reservoir with small inputs. Moreover, we propose an analytical description of the MMF that enables its efficient and fast computation. Our approach can be applied not only to reservoirs governed by known dynamical rules such as Mackey-Glass or Ikeda-like systems but also to reservoirs whose dynamical model is not available. We also present results comparing the performance of the reservoir computer and the memory capacity given by the MMF.

【15】 Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing 标题：因果自举拉动：训练前去偏的因果数据增强链接：https://arxiv.org/abs/2108.12510

作者：Sindhu C. M. Gowda,Shalmali Joshi,Haoran Zhang,Marzyeh Ghassemi 机构：University of Toronto, Vector Institute, Toronto, Ontario, Canada, Harvard University, Cambridge, Massachusetts, USA, MIT 备注：Published in CIKM 2021 摘要：机器学习模型在许多有监督的学习任务上实现了最先进的性能。然而，先前的证据表明，这些模型可能会学习依赖于捷径偏差或虚假相关性（直觉上，相关性在测试中并不成立，因为它们在训练中成立）来获得良好的预测性能。这样的模型在部署环境中无法提供准确的预测。虽然从因果关系的角度来看问题是有用的，但将因果关系技术无缝地集成到机器学习管道中仍然是麻烦和昂贵的。在这项工作中，我们研究并扩展了一种称为因果自举（CB）的因果训练前借记技术，该技术适用于五种实际的混淆数据生成采集场景（已知和未知混淆）。在这些设置下，我们系统地研究了混杂偏差对深度学习模型性能的影响，证明了当这些偏差没有得到适当解释时，他们倾向于依赖快捷偏差。我们证明，这种因果预训练技术可以显著优于现有的基本实践，以减轻对现实世界领域泛化基准任务的混淆偏见。这项系统的调查强调了解释底层数据生成机制和利用因果框架强化数据预处理管道的重要性，以开发对混杂偏见具有鲁棒性的方法。摘要：Machine learning models achieve state-of-the-art performance on many supervised learning tasks. However, prior evidence suggests that these models may learn to rely on shortcut biases or spurious correlations (intuitively, correlations that do not hold in the test as they hold in train) for good predictive performance. Such models cannot be trusted in deployment environments to provide accurate predictions. While viewing the problem from a causal lens is known to be useful, the seamless integration of causation techniques into machine learning pipelines remains cumbersome and expensive. In this work, we study and extend a causal pre-training debiasing technique called causal bootstrapping (CB) under five practical confounded-data generation-acquisition scenarios (with known and unknown confounding). Under these settings, we systematically investigate the effect of confounding bias on deep learning model performance, demonstrating their propensity to rely on shortcut biases when these biases are not properly accounted for. We demonstrate that such a causal pre-training technique can significantly outperform existing base practices to mitigate confounding bias on real-world domain generalization benchmarking tasks. This systematic investigation underlines the importance of accounting for the underlying data-generating mechanisms and fortifying data-preprocessing pipelines with a causal framework to develop methods robust to confounding biases.

【16】 Neural HMMs are all you need (for high-quality attention-free TTS) 标题：神经HMM是您所需要的全部(以获得高质量的无注意力TTS) 链接：https://arxiv.org/abs/2108.13320

作者：Shivam Mehta,Éva Székely,Jonas Beskow,Gustav Eje Henter 机构：Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden 备注：5 pages, 1 figure 摘要：与使用HMMs的经典统计参数语音合成相比，神经序列到序列TTS具有显著更好的输出质量。然而，新范式不是概率性的，非单调注意的使用既增加了训练时间，又引入了生产中不可接受的“唠叨”故障模式。在本文中，我们证明了新旧范式可以结合起来，通过用神经网络定义的自回归左-右无跳隐马尔可夫模型取代Tacotron 2中的注意力，从而获得两个世界的优势。这导致了一个基于HMM的神经TTS模型与单调对齐，训练最大化全序列的可能性没有近似。我们将讨论如何结合古典和现代TTS的创新，以获得最佳效果。最终的系统比Tacotron 2更小、更简单，能够以更少的迭代次数学习对齐和说话，同时获得相同的语音自然度。与Tacotron 2不同，它还可以轻松控制通话速率。音频示例和代码可在https://shivammehta007.github.io/Neural-HMM/ 摘要：Neural sequence-to-sequence TTS has demonstrated significantly better output quality over classical statistical parametric speech synthesis using HMMs. However, the new paradigm is not probabilistic and the use of non-monotonic attention both increases training time and introduces "babbling" failure modes that are unacceptable in production. In this paper, we demonstrate that the old and new paradigms can be combined to obtain the advantages of both worlds, by replacing the attention in Tacotron 2 with an autoregressive left-right no-skip hidden-Markov model defined by a neural network. This leads to an HMM-based neural TTS model with monotonic alignment, trained to maximise the full sequence likelihood without approximations. We discuss how to combine innovations from both classical and contemporary TTS for best results. The final system is smaller and simpler than Tacotron 2 and learns to align and speak with fewer iterations, while achieving the same speech naturalness. Unlike Tacotron 2, it also allows easy control over speaking rate. Audio examples and code are available at https://shivammehta007.github.io/Neural-HMM/

【17】 Open Set RF Fingerprinting using Generative Outlier Augmentation 标题：使用生成性孤立点增强的开放集合射频指纹链接：https://arxiv.org/abs/2108.13099

作者：Samurdhi Karunaratne,Samer Hanna,Danijela Cabric 机构：Electrical and Computer Engineering Department, University of California, Los Angeles 备注：Accepted to IEEE GLOBECOM 2021 摘要：射频设备可以通过其传输的信号中嵌入的独特缺陷（称为射频指纹）来识别。此类装置的封闭集分类已得到充分探索，其中必须在授权的一组变送器之间进行识别。然而，更困难的开放集分类问题，即分类器需要拒绝未经授权的发射机，同时识别授权的发射机，直到最近才被访问过。到目前为止，开放集分类的工作主要依赖于利用从已知的一组未经授权的发射机捕获的信号样本来帮助分类器学习未经授权的发射机指纹。由于获取新的发射机以作为已知发射机使用非常昂贵，我们建议使用生成式深度学习方法来模拟未经授权的信号样本，以扩充训练数据集。我们开发了两种不同的数据增强技术，一种利用有限数量的已知未经授权的发射机，另一种不需要任何未经授权的发射机。在从WiFi测试台捕获的数据集上进行的实验表明，数据扩充可以显著提高开放集分类精度，特别是当授权集很小时。摘要：RF devices can be identified by unique imperfections embedded in the signals they transmit called RF fingerprints. The closed set classification of such devices, where the identification must be made among an authorized set of transmitters, has been well explored. However, the much more difficult problem of open set classification, where the classifier needs to reject unauthorized transmitters while recognizing authorized transmitters, has only been recently visited. So far, efforts at open set classification have largely relied on the utilization of signal samples captured from a known set of unauthorized transmitters to aid the classifier learn unauthorized transmitter fingerprints. Since acquiring new transmitters to use as known transmitters is highly expensive, we propose to use generative deep learning methods to emulate unauthorized signal samples for the augmentation of training datasets. We develop two different data augmentation techniques, one that exploits a limited number of known unauthorized transmitters and the other that does not require any unauthorized transmitters. Experiments conducted on a dataset captured from a WiFi testbed indicate that data augmentation allows for significant increases in open set classification accuracy, especially when the authorized set is small.

【18】 A fast point solver for deep nonlinear function approximators 标题：一种用于深度非线性函数逼近器的快速点求解器链接：https://arxiv.org/abs/2108.13097

作者：Laurence Aitchison 机构：Department of Computer Science, University of Bristol, Bristol, UK 摘要：深核过程（DKP）推广了贝叶斯神经网络，但不要求我们表示特征或权重。相反，在每个隐藏层，它们表示并优化一个灵活的内核。在这里，我们利用最初在控制理论文献中开发的矩阵解算器，为DKP开发了一种类似牛顿的方法，该方法在大约10步内收敛。这比通常的梯度下降法快很多倍。通过开发“内核backprop”和“内核autodiff”算法，我们将其推广到任意DKP体系结构。虽然这些方法目前不是贝叶斯方法，因为它们给出了点估计，并且由于数据点的数量是立方的，所以伸缩性很差，但我们希望它们将形成一类新的更有效的方法来优化深度非线性函数逼近器的基础。摘要：Deep kernel processes (DKPs) generalise Bayesian neural networks, but do not require us to represent either features or weights. Instead, at each hidden layer they represent and optimize a flexible kernel. Here, we develop a Newton-like method for DKPs that converges in around 10 steps, exploiting matrix solvers initially developed in the control theory literature. These are many times faster the usual gradient descent approach. We generalise to arbitrary DKP architectures, by developing "kernel backprop", and algorithms for "kernel autodiff". While these methods currently are not Bayesian as they give point estimates and scale poorly as they are cubic in the number of datapoints, we hope they will form the basis of a new class of much more efficient approaches to optimizing deep nonlinear function approximators.

【19】 A Closed Loop Gradient Descent Algorithm applied to Rosenbrock's function 标题：应用于Rosenbrock函数的闭环梯度下降算法链接：https://arxiv.org/abs/2108.12883

作者：Subhransu Bhattacharjee,Ian Petersen 机构：‡Dr. Ian R. Petersen, FAA is a professor at the College, of Engineering and Computer Science, Australian National, University, Canberra, Australia., Mostly, such algorithms are dependent on learning, the gradient of the cost function. A central aspect 备注：This paper has been accepted for the 2021 Australia and New Zealand Control Conference, to be held in November, 2021 摘要：我们介绍了一种新的惯性梯度系统自适应阻尼技术，该技术作为无约束优化的梯度下降算法得到了应用。在一个使用非凸Rosenbrock函数的例子中，我们展示了对现有基于动量的梯度优化方法的改进。同时利用Lyapunov稳定性分析，我们证明了算法的连续时间版本的性能。利用数值模拟，我们考虑离散时间对应的性能，通过使用辛欧拉离散化方法得到。摘要：We introduce a novel adaptive damping technique for an inertial gradient system which finds application as a gradient descent algorithm for unconstrained optimisation. In an example using the non-convex Rosenbrock's function, we show an improvement on existing momentum-based gradient optimisation methods. Also using Lyapunov stability analysis, we demonstrate the performance of the continuous-time version of the algorithm. Using numerical simulations, we consider the performance of its discrete-time counterpart obtained by using the symplectic Euler method of discretisation.

【20】 Self-fulfilling Bandits: Endogeneity Spillover and Dynamic Selection in Algorithmic Decision-making 标题：自我实现的强盗：算法决策中的内生性溢出与动态选择链接：https://arxiv.org/abs/2108.12547

作者：Jin Li,Ye Luo,Xiaowei Zhang 机构： The University of Hong Kong 备注：Main Body: 29 pages, 6 figures; Supplemental Material: 25 pages 摘要：在本文中，我们研究了数据和行为相互依赖的算法决策中的内生性问题。当背景多臂bandit模型中存在内生性协变量时，由于协变量的内生性溢出到行为中，会产生一种新的偏差（自我实现偏差）。我们提出了一类算法，通过将工具变量纳入领先的在线学习算法来纠正偏差。这些算法还获得了与无内生性的情况下最著名的下限相匹配的遗憾水平。为了建立理论属性，我们开发了一种通用技术，可以解开数据和动作之间的相互依赖关系。摘要：In this paper, we study endogeneity problems in algorithmic decision-making where data and actions are interdependent. When there are endogenous covariates in a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the covariates spills over to the actions. We propose a class of algorithms to correct for the bias by incorporating instrumental variables into leading online learning algorithms. These algorithms also attain regret levels that match the best known lower bound for the cases without endogeneity. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.

【21】 Variational embedding of protein folding simulations using gaussian mixture variational autoencoders 标题：使用高斯混合变分自动编码器的蛋白质折叠模拟的变分嵌入链接：https://arxiv.org/abs/2108.12493

作者：Mahdi Ghorbani,Samarjeet Prasad,Jeffery B. Klauda,Bernard R. Brooks 机构：)Laboratory of Computational Biology, National, Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland , USA., )Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD 摘要：利用分子动力学模拟对生物分子进行构象取样通常会产生大量高维数据，这使得使用传统分析技术难以解释。因此，需要使用降维方法来提取有用的相关信息。在这里，我们设计了一种机器学习方法，高斯混合变分自动编码器（GMVAE），它可以以无监督的方式同时执行生物分子构象的降维和聚类。我们表明，GMVAE可以通过高度分离的簇（对应于折叠过程中的亚稳态）学习蛋白质折叠自由能景观的简化表示。由于GMVAE使用高斯混合作为先验，它可以直接确认蛋白质折叠自由能景观的多盆地性质。为了使模型端到端可差分，我们使用Gumbel softmax分布。我们在三个长时间尺度的蛋白质折叠轨迹上测试了该模型，结果表明GMVAE嵌入类似于折叠漏斗，折叠状态沿漏斗向下，未折叠状态位于漏斗路径外侧。此外，我们还表明，GMVAE的潜在空间可用于动力学分析，基于该嵌入建立的马尔可夫状态模型产生的折叠和展开时间尺度与其他严格的动力学嵌入（如时间独立分量分析（TICA））非常一致。摘要：Conformational sampling of biomolecules using molecular dynamics simulations often produces large amount of high dimensional data that makes it difficult to interpret using conventional analysis techniques. Dimensionality reduction methods are thus required to extract useful and relevant information. Here we devise a machine learning method, Gaussian mixture variational autoencoder (GMVAE) that can simultaneously perform dimensionality reduction and clustering of biomolecular conformations in an unsupervised way. We show that GMVAE can learn a reduced representation of the free energy landscape of protein folding with highly separated clusters that correspond to the metastable states during folding. Since GMVAE uses a mixture of Gaussians as the prior, it can directly acknowledge the multi-basin nature of protein folding free-energy landscape. To make the model end-to-end differentialble, we use a Gumbel-softmax distribution. We test the model on three long-timescale protein folding trajectories and show that GMVAE embedding resembles the folding funnel with folded states down the funnel and unfolded states outer in the funnel path. Additionally, we show that the latent space of GMVAE can be used for kinetic analysis and Markov state models built on this embedding produce folding and unfolding timescales that are in close agreement with other rigorous dynamical embeddings such as time independent component analysis (TICA).

机器翻译，仅供参考

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-08-31，如有侵权请联系 cloudcommunity@tencent.com 删除

linux