前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >机器学习学术速递[10.18]

机器学习学术速递[10.18]

作者头像
公众号-arXiv每日学术速递
发布2021-10-21 16:07:33
1.6K0
发布2021-10-21 16:07:33
举报

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.LG 方向,今日共计98篇

Graph相关(图学习|图神经网络|图优化等)(9篇)

【1】 LPRules: Rule Induction in Knowledge Graphs Using Linear Programming 标题:LPRules:基于线性规划的知识图规则归纳 链接:https://arxiv.org/abs/2110.08245

作者:Sanjeeb Dash,Joao Goncalves 摘要:知识图(KG)完备性是人工智能中一个被广泛研究的问题。基于规则的方法和基于嵌入的方法形成了两种解决方案技术。基于规则的方法学习捕获输入图中现有事实的一阶逻辑规则,然后使用这些规则对缺失的事实进行推理。这种方法的一个主要缺点是缺乏对大型数据集的可伸缩性。在本文中,我们提出了一个简单的线性规划(LP)模型,从候选规则列表中选择规则并为其分配权重。对于较小的KG,我们使用简单的启发式方法创建候选列表。对于较大的KG,我们从一个小的初始候选列表开始,然后使用标准列生成思想添加更多规则,以改进LP模型的目标值。为了提高可解释性和通用性,我们通过显式约束限制所选规则集的复杂性,并调整单个数据集的复杂性超参数。我们表明,我们的方法可以为四分之三的广泛使用的KG数据集获得最先进的结果,同时比其他流行的规则学习者(包括一些基于神经符号的方法)花费更少的计算时间。我们的方法改进的可伸缩性使我们能够处理大型数据集,如YAGO3-10。 摘要:Knowledge graph (KG) completion is a well-studied problem in AI. Rule-based methods and embedding-based methods form two of the solution techniques. Rule-based methods learn first-order logic rules that capture existing facts in an input graph and then use these rules for reasoning about missing facts. A major drawback of such methods is the lack of scalability to large datasets. In this paper, we present a simple linear programming (LP) model to choose rules from a list of candidate rules and assign weights to them. For smaller KGs, we use simple heuristics to create the candidate list. For larger KGs, we start with a small initial candidate list, and then use standard column generation ideas to add more rules in order to improve the LP model objective value. To foster interpretability and generalizability, we limit the complexity of the set of chosen rules via explicit constraints, and tune the complexity hyperparameter for individual datasets. We show that our method can obtain state-of-the-art results for three out of four widely used KG datasets, while taking significantly less computing time than other popular rule learners including some based on neuro-symbolic methods. The improved scalability of our method allows us to tackle large datasets such as YAGO3-10.

【2】 Propagation on Multi-relational Graphs for Node Regression 标题:节点回归的多重关系图上的传播 链接:https://arxiv.org/abs/2110.08185

作者:Eda Bayram 机构:´Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Lausanne, Switzerland 备注:Accepted to IJCLR 2021 Workshop: Statistical Relational AI (StarAI) 摘要:近年来,利用丰富的结构信息捕获的真实数据不断增加,这些结构信息可以方便地用多关系图来描述。虽然当前的关系学习研究对简单图上连续节点特征的推断研究较少,但我们进一步研究了多关系图上的节点回归问题。我们借鉴了著名的标签传播算法,旨在完成简单图上的分类特征,并提出了一种新的传播框架,用于完成多关系有向图节点上缺失的连续特征。我们的多关系传播算法由源自关系局部生成模型的迭代邻域聚合组成。我们的研究结果表明,在不同的环境下,在多个节点回归场景中利用数据的多关系结构是有益的。 摘要:Recent years have witnessed a rise in real-world data captured with rich structural information that can be conveniently depicted by multi-relational graphs. While inference of continuous node features across a simple graph is rather under-studied by the current relational learning research, we go one step further and focus on node regression problem on multi-relational graphs. We take inspiration from the well-known label propagation algorithm aiming at completing categorical features across a simple graph and propose a novel propagation framework for completing missing continuous features at the nodes of a multi-relational and directed graph. Our multi-relational propagation algorithm is composed of iterative neighborhood aggregations which originate from a relational local generative model. Our findings show the benefit of exploiting the multi-relational structure of the data in several node regression scenarios in different settings.

【3】 Label-Wise Message Passing Graph Neural Network on Heterophilic Graphs 标题:异嗜图上的标号智能消息传递图神经网络 链接:https://arxiv.org/abs/2110.08128

作者:Enyan Dai,Zhimeng Guo,Suhang Wang 机构:The Pennsylvania State University 摘要:图形神经网络(GNNs)在各种应用的图形建模中取得了显著的性能。然而,大多数现有的GNN都假设图在节点标签中表现出强同态性,即具有相似标签的节点在图中是连接的。它们无法推广到异亲图,其中链接节点可能具有不同的标签和属性。因此,在本文中,我们研究了一种新的框架,该框架在具有同嗜性或异嗜性的图上表现良好。更具体地说,为了解决图中的异嗜性带来的挑战,我们提出了一种标签式消息传递机制。在标签式消息传递中,将具有相似伪标签的邻居聚合在一起,这将避免聚合不同节点表示所造成的负面影响。我们进一步提出了一种双层优化方法来自动选择具有同嗜性/异嗜性的图的模型。大量实验证明了我们提出的节点分类框架在同亲图和异亲图上的有效性。 摘要:Graph Neural Networks (GNNs) have achieved remarkable performance in modeling graphs for various applications. However, most existing GNNs assume the graphs exhibit strong homophily in node labels, i.e., nodes with similar labels are connected in the graphs. They fail to generalize to heterophilic graphs where linked nodes may have dissimilar labels and attributes. Therefore, in this paper, we investigate a novel framework that performs well on graphs with either homophily or heterophily. More specifically, to address the challenge brought by the heterophily in graphs, we propose a label-wise message passing mechanism. In label-wise message-passing, neighbors with similar pseudo labels will be aggregated together, which will avoid the negative effects caused by aggregating dissimilar node representations. We further propose a bi-level optimization method to automatically select the model for graphs with homophily/heterophily. Extensive experiments demonstrate the effectiveness of our proposed framework for node classification on both homophilic and heterophilic graphs.

【4】 ACE-HGNN: Adaptive Curvature Exploration Hyperbolic Graph Neural Network 标题:ACE-HGNN:自适应曲率探索双曲图神经网络 链接:https://arxiv.org/abs/2110.07888

作者:Xingcheng Fu,Jianxin Li,Jia Wu,Qingyun Sun,Cheng Ji,Senzhang Wang,Jiajun Tan,Hao Peng,Philip S. Yu 机构:∗Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing , China, †School of Computer Science and Engineering, Beihang University, Beijing , China, ‡Department of Computing, Macquarie University, Sydney, Australia 摘要:图形神经网络(GNNs)在各种图形数据挖掘任务中得到了广泛的研究。大多数现有的GNN将图形数据嵌入到欧几里德空间中,因此无法有效地捕获现实世界网络中无处不在的层次结构。双曲图神经网络(HGNNs)将GNNs扩展到双曲空间,从而在节点表示学习中更有效地捕捉图的层次结构。在双曲几何中,双曲空间的曲率可以反映图的层次结构,不同的曲率可以模拟图的不同层次结构。然而,为了简单起见,大多数现有的HGNN手动将曲率设置为固定值,由于图的复杂和多样的层次结构,这实现了次优的图学习性能。为了解决这个问题,我们提出了一种自适应曲率探索双曲线图神经网络ACE-HGNN,根据输入图和下游任务自适应学习最优曲率。具体地说,ACE-HGNN利用了一个多agent强化学习框架,包含两个agent,分别用于学习曲率和节点表示的ACE agent和HGNN agent。两个代理通过NashQ学习算法协同更新,以曲率为指标寻找最优双曲空间。在多个真实图形数据集上进行的大量实验表明,在模型质量方面有显著且一致的性能改进,具有竞争力和良好的泛化能力。 摘要:Graph Neural Networks (GNNs) have been widely studied in various graph data mining tasks. Most existingGNNs embed graph data into Euclidean space and thus are less effective to capture the ubiquitous hierarchical structures in real-world networks. Hyperbolic Graph Neural Networks(HGNNs) extend GNNs to hyperbolic space and thus are more effective to capture the hierarchical structures of graphs in node representation learning. In hyperbolic geometry, the graph hierarchical structure can be reflected by the curvatures of the hyperbolic space, and different curvatures can model different hierarchical structures of a graph. However, most existing HGNNs manually set the curvature to a fixed value for simplicity, which achieves a suboptimal performance of graph learning due to the complex and diverse hierarchical structures of the graphs. To resolve this problem, we propose an Adaptive Curvature Exploration Hyperbolic Graph NeuralNetwork named ACE-HGNN to adaptively learn the optimal curvature according to the input graph and downstream tasks. Specifically, ACE-HGNN exploits a multi-agent reinforcement learning framework and contains two agents, ACE-Agent andHGNN-Agent for learning the curvature and node representations, respectively. The two agents are updated by a NashQ-leaning algorithm collaboratively, seeking the optimal hyperbolic space indexed by the curvature. Extensive experiments on multiple real-world graph datasets demonstrate a significant and consistent performance improvement in model quality with competitive performance and good generalization ability.

【5】 Graph Neural Networks with Learnable Structural and Positional Representations 标题:具有可学习结构和位置表示的图神经网络 链接:https://arxiv.org/abs/2110.07875

作者:Vijay Prakash Dwivedi,Anh Tuan Luu,Thomas Laurent,Yoshua Bengio,Xavier Bresson 机构: Nanyang Technological University, Singapore, Loyola Marymount University, Mila, University of Montr´eal, CIFAR, National University of Singapore 备注:Code at this https URL 摘要:图形神经网络(GNNs)已成为图形的标准学习体系结构。GNNs已应用于从量子化学、推荐系统到知识图和自然语言处理等众多领域。任意图的一个主要问题是缺少节点的规范位置信息,这降低了GNN在区分同构节点和其他图对称性等方面的表示能力。解决此问题的一种方法是引入节点的位置编码(PE),并将其注入输入层,就像在Transformers中一样。可能的图PE是拉普拉斯特征向量。在这项工作中,我们建议解耦结构和位置表示,以便于网络学习这两个基本属性。我们介绍了一种新的通用架构,我们称之为LSPE(Learnable Structural and Positional encoding)。我们研究了几种稀疏和完全连接(Transformer式)GNN,并观察到分子数据集的性能提高,在考虑两种GNN类的可学习PE时,从2.87%提高到64.14%。 摘要:Graph neural networks (GNNs) have become the standard learning architectures for graphs. GNNs have been applied to numerous domains ranging from quantum chemistry, recommender systems to knowledge graphs and natural language processing. A major issue with arbitrary graphs is the absence of canonical positional information of nodes, which decreases the representation power of GNNs to distinguish e.g. isomorphic nodes and other graph symmetries. An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers. Possible graph PE are Laplacian eigenvectors. In this work, we propose to decouple structural and positional representations to make easy for the network to learn these two essential properties. We introduce a novel generic architecture which we call LSPE (Learnable Structural and Positional Encodings). We investigate several sparse and fully-connected (Transformer-like) GNNs, and observe a performance increase for molecular datasets, from 2.87% up to 64.14% when considering learnable PE for both GNN classes.

【6】 A Dual-Perception Graph Neural Network with Multi-hop Graph Generator 标题:一种具有多跳图生成器的双感知图神经网络 链接:https://arxiv.org/abs/2110.07869

作者:Li Zhou,Wenyu Chen,Dingyi Zeng,Shaohuan Cheng,Wanlong Liu,Hong Qu 机构:School of Computer Science and Engineering, University of Electronic Science and Technology of China, China 备注:8 pages 摘要:近年来,图神经网络(GNNs)受到了越来越多的关注,并在许多基于图的任务中取得了显著的性能,特别是在图的半监督学习中。然而,现有的大多数GNNs过分依赖拓扑结构,通过简单地叠加网络层来聚集多跳邻域信息,这可能会引入多余的噪声信息,限制GNNs的表达能力,最终导致过平滑问题。鉴于此,我们提出了一种新的双感知图神经网络(DPGNN)来解决这些问题。在DPGNN中,我们利用节点特征构造特征图,同时基于原始拓扑图和构造的特征图进行节点表示学习,有助于获取结构邻域信息和特征相关信息。此外,我们还设计了一个多跳图生成器(MHGG),它采用一种节点到跳的注意机制来自适应地聚合特定于节点的多跳邻域信息。最后,我们应用自置乱对未标记的节点表示形成一致的预测。在五个具有不同拓扑结构的数据集上的实验结果表明,我们提出的DPGNN在所有数据集上都实现了具有竞争力的性能,其中四个结果优于最新的最先进模型。我们模型的源代码可在https://github.com. 摘要:Graph neural networks (GNNs) have drawn increasing attention in recent years and achieved remarkable performance in many graph-based tasks, especially in semi-supervised learning on graphs. However, most existing GNNs excessively rely on topological structures and aggregate multi-hop neighborhood information by simply stacking network layers, which may introduce superfluous noise information, limit the expressive power of GNNs and lead to the over-smoothing problem ultimately. In light of this, we propose a novel Dual-Perception Graph Neural Network (DPGNN) to address these issues. In DPGNN, we utilize node features to construct a feature graph, and perform node representations learning based on the original topology graph and the constructed feature graph simultaneously, which conduce to capture the structural neighborhood information and the feature-related information. Furthermore, we design a Multi-Hop Graph Generator (MHGG), which applies a node-to-hop attention mechanism to aggregate node-specific multi-hop neighborhood information adaptively. Finally, we apply self-ensembling to form a consistent prediction for unlabeled node representations. Experimental results on five datasets with different topological structures demonstrate that our proposed DPGNN achieves competitive performance across all datasets, four of which the results outperform the latest state-of-the-art models. The source code of our model is available at https://github.com.

【7】 Residual2Vec: Debiasing graph embedding with random graphs 标题:Residual2Vec:嵌入随机图的去偏向图 链接:https://arxiv.org/abs/2110.07654

作者:Sadamori Kojaku,Jisung Yoon,Isabel Constantino,Yong-Yeol Ahn 机构:Center for Complex Networks and, Systems Research, Luddy School of, Informatics, Computing and, Engineering, Indiana University, USA, Department of Industrial, Management and Engineering, Pohang University of Science and, Technology, South Korea 备注:None 摘要:图嵌入将图映射到一个方便的向量空间表示,用于图分析和机器学习应用。许多图嵌入方法依赖于基于随机游动的上下文节点采样。然而,由于图的结构特性,随机游动可能是一个有偏差的采样器。最值得注意的是,随机游动受每个节点的阶数的影响,其中节点的采样与其阶数成比例。这种偏见的含义还不清楚,特别是在图形表示学习的背景下。在这里,我们研究了随机游动偏差对图嵌入的影响,并提出了residual2vec,这是一种通用的图嵌入方法,可以通过使用随机图来消除图中的各种结构偏差。我们证明,这种debiasing不仅提高了链接预测和聚类性能,而且允许我们在图嵌入中显式地建模显著的结构属性。 摘要:Graph embedding maps a graph into a convenient vector-space representation for graph analysis and machine learning applications. Many graph embedding methods hinge on a sampling of context nodes based on random walks. However, random walks can be a biased sampler due to the structural properties of graphs. Most notably, random walks are biased by the degree of each node, where a node is sampled proportionally to its degree. The implication of such biases has not been clear, particularly in the context of graph representation learning. Here, we investigate the impact of the random walks' bias on graph embedding and propose residual2vec, a general graph embedding method that can debias various structural biases in graphs by using random graphs. We demonstrate that this debiasing not only improves link prediction and clustering performance but also allows us to explicitly model salient structural properties in graph embedding.

【8】 Model-Change Active Learning in Graph-Based Semi-Supervised Learning 标题:基于图的半监督学习中的模型变换主动学习 链接:https://arxiv.org/abs/2110.07739

作者:Kevin Miller,Andrea L. Bertozzi 机构:†Department of Mathematics, University of California 备注:Submitted to SIAM Journal on Mathematics of Data Science (SIMODS) 摘要:半监督分类中的主动学习包括为未标记的数据引入额外的标签,以提高底层分类器的准确性。一个挑战是确定要标记哪些点以最好地提高性能,同时限制新标签的数量。“模型变化”主动学习通过引入额外的标签来量化分类器中产生的变化。我们将这一思想与基于图的半监督学习方法相结合,该方法使用图拉普拉斯矩阵的谱,该谱可以被截断以避免过高的计算和存储成本。我们考虑一个凸损失函数族,利用后验分布的拉普拉斯近似,可以有效地逼近捕获函数。我们展示了各种多类示例,这些示例说明了与现有技术相比性能的改进。 摘要:Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier. A challenge is to identify which points to label to best improve performance while limiting the number of new labels. "Model-change" active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s). We pair this idea with graph-based semi-supervised learning methods, that use the spectrum of the graph Laplacian matrix, which can be truncated to avoid prohibitively large computational and storage costs. We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution. We show a variety of multiclass examples that illustrate improved performance over prior state-of-art.

【9】 Pre-training Molecular Graph Representation with 3D Geometry 标题:基于三维几何的预训练分子图表示 链接:https://arxiv.org/abs/2110.07728

作者:Shengchao Liu,Hanchen Wang,Weiyang Liu,Joan Lasenby,Hongyu Guo,Jian Tang 机构:Mila, Université de Montréal, University of Cambridge, MPI for Intelligent Systems, Tübingen, National Research Council Canada, HEC Montréal, CIFAR AI Chair 摘要:分子图表征学习是现代药物和材料发现中的一个基本问题。分子图通常由其二维拓扑结构建模,但最近发现三维几何信息在预测分子功能方面起着更重要的作用。然而,现实场景中缺乏三维信息,这严重阻碍了几何图形表示的学习。为了应对这一挑战,我们提出了图形多视图预训练(GraphMVP)框架,其中通过利用二维拓扑结构和三维几何视图之间的对应性和一致性来执行自监督学习(SSL)。GraphMVP有效地学习2D分子图编码器,该编码器通过更丰富和更具辨别力的3D几何结构得到增强。我们进一步提供理论见解来证明GraphMVP的有效性。最后,综合实验表明,GraphMVP的性能始终优于现有的graphssl方法。 摘要:Molecular graph representation learning is a fundamental problem in modern drug and material discovery. Molecular graphs are typically modeled by their 2D topological structures, but it has been recently discovered that 3D geometric information plays a more vital role in predicting molecular functionalities. However, the lack of 3D information in real-world scenarios has significantly impeded the learning of geometric graph representation. To cope with this challenge, we propose the Graph Multi-View Pre-training (GraphMVP) framework where self-supervised learning (SSL) is performed by leveraging the correspondence and consistency between 2D topological structures and 3D geometric views. GraphMVP effectively learns a 2D molecular graph encoder that is enhanced by richer and more discriminative 3D geometry. We further provide theoretical insights to justify the effectiveness of GraphMVP. Finally, comprehensive experiments show that GraphMVP can consistently outperform existing graph SSL methods.

Transformer(6篇)

【1】 Transforming Autoregression: Interpretable and Expressive Time Series Forecast 标题:变换自回归:可解释和可表达的时间序列预测 链接:https://arxiv.org/abs/2110.08248

作者:David Rügamer,Philipp F. M. Baumann,Thomas Kneib,Torsten Hothorn 机构:Department of Statistics, LMU Munich, KOF Swiss Economic Institute, ETH Zurich, Chair of Statistics, University of Goettingen, Epidemiology, Biostatistics, and Prevention Institute, University of Zurich 摘要:时间序列的概率预测是许多应用和研究领域的一个重要问题。为了从概率预测中得出结论,我们必须确保用于近似真实预测分布的模型类具有足够的表达能力。然而,模型本身的特征,例如其不确定性或其一般功能,并不是不重要的。在本文中,我们提出了自回归转换模型(ATM),这是一个模型类,其灵感来源于各种研究方向,如规范化流和自回归模型。自动取款机使用半参数分布假设和可解释的模型规范将表达性分布预测结合起来,并允许基于(渐近)最大似然理论进行不确定性量化。我们从理论上论证了自动取款机的特性,并通过对几个模拟和现实世界的预测数据集进行实证评估。 摘要:Probabilistic forecasting of time series is an important matter in many applications and research fields. In order to draw conclusions from a probabilistic forecast, we must ensure that the model class used to approximate the true forecasting distribution is expressive enough. Yet, characteristics of the model itself, such as its uncertainty or its general functioning are not of lesser importance. In this paper, we propose Autoregressive Transformation Models (ATMs), a model class inspired from various research directions such as normalizing flows and autoregressive models. ATMs unite expressive distributional forecasts using a semi-parametric distribution assumption with an interpretable model specification and allow for uncertainty quantification based on (asymptotic) Maximum Likelihood theory. We demonstrate the properties of ATMs both theoretically and through empirical evaluation on several simulated and real-world forecasting datasets.

【2】 Tensor-to-Image: Image-to-Image Translation with Vision Transformers 标题:张量到图像:使用视觉变换器进行图像到图像的转换 链接:https://arxiv.org/abs/2110.08037

作者:Yiğit Gündüç 摘要:Transformer自首次推出以来就受到了广泛的关注,并有着广泛的应用。Transformer开始接管深度学习的所有领域,视觉Transformer论文也证明了它们可以用于计算机视觉任务。在本文中,我们使用了一个基于视觉转换器的自定义设计模型,即张量到图像,用于图像到图像的转换。在自我关注的帮助下,我们的模型能够在不做任何修改的情况下推广并应用于不同的问题。 摘要:Transformers gain huge attention since they are first introduced and have a wide range of applications. Transformers start to take over all areas of deep learning and the Vision transformers paper also proved that they can be used for computer vision tasks. In this paper, we utilized a vision transformer-based custom-designed model, tensor-to-image, for the image to image translation. With the help of self-attention, our model was able to generalize and apply to different problems without a single modification.

【3】 StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data 标题:StreaMulT:面向异构任意长序列数据的流式多模式转换器 链接:https://arxiv.org/abs/2110.08021

作者:Victor Pellegrain,Myriam Tami,Michel Batteux,Céline Hudelot 机构:† Institut de Recherche Technologique SystemX, boulevard Thomas Gobert, Palaiseau, France, ⋆ Universit´e Paris-Saclay, CentraleSup´elec, MICS, Gif-sur-Yvette, France 备注:5 pages, 4 figures, submitted to ICASSP 2022 摘要:本文解决了处理和组合来自不同模式、不同采集频率的任意长数据流的问题。例如,常见应用可以是从多模态异构数据(传感器数据、监控报告、图像等)进行长时间工业或现实系统监控。为了解决这个问题,我们提出了StreaMulT,一个流式多模式转换器,它依赖于跨模式注意和一个扩充内存库,在训练时处理任意长的输入序列,并在推理时以流式方式运行。StreaMulT在CMU-MOSEI数据集上再现最先进的结果,同时能够处理比以前的多模式Transformer等其他模型更长的输入。 摘要:This paper tackles the problem of processing and combining efficiently arbitrary long data streams, coming from different modalities with different acquisition frequencies. Common applications can be, for instance, long-time industrial or real-life systems monitoring from multimodal heterogeneous data (sensor data, monitoring report, images, etc.). To tackle this problem, we propose StreaMulT, a Streaming Multimodal Transformer, relying on cross-modal attention and an augmented memory bank to process arbitrary long input sequences at training time and run in a streaming way at inference. StreaMulT reproduces state-of-the-art results on CMU-MOSEI dataset, while being able to deal with much longer inputs than other models such as previous Multimodal Transformer.

【4】 Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation 标题:通过基于贴片的负增强理解和提高视觉转换器的鲁棒性 链接:https://arxiv.org/abs/2110.07858

作者:Yao Qin,Chiyuan Zhang,Ting Chen,Balaji Lakshminarayanan,Alex Beutel,Xuezhi Wang 机构:Google Research 摘要:我们研究了视觉变换器(VIT)的鲁棒性,通过其特殊的基于补丁的建筑结构,即,它们将图像处理为一系列图像补丁。我们发现,VIT对基于补丁的转换非常不敏感,即使转换在很大程度上破坏了原始语义并使图像无法被人类识别。这表明VIT大量使用了在这种转换中幸存下来的功能,但通常不表示人类的语义类。进一步的研究表明,这些特征是有用的,但不具有鲁棒性,因为在这些特征上训练的VIT可以获得较高的分布精度,但在分布偏移下会发生故障。从这一理解出发,我们会问:训练模型以减少对这些特性的依赖是否可以提高ViT鲁棒性和分布外性能?我们使用基于面片的操作转换的图像作为负面增强视图,并提供损失,以使训练正规化,而不使用非稳健特征。这是对现有研究的一个补充,现有研究主要集中于通过语义保持变换来增强输入,以增强模型的不变性。我们表明,基于补丁的负增强在一系列基于ImageNet的鲁棒性基准测试中持续提高VIT的鲁棒性。此外,我们发现基于补丁的负面增强是对传统(正面)数据增强的补充,共同进一步提高了性能。这项工作中的所有代码都将是开源的。 摘要:We investigate the robustness of vision transformers (ViTs) through the lens of their special patch-based architectural structure, i.e., they process an image as a sequence of image patches. We find that ViTs are surprisingly insensitive to patch-based transformations, even when the transformation largely destroys the original semantics and makes the image unrecognizable by humans. This indicates that ViTs heavily use features that survived such transformations but are generally not indicative of the semantic class to humans. Further investigations show that these features are useful but non-robust, as ViTs trained on them can achieve high in-distribution accuracy, but break down under distribution shifts. From this understanding, we ask: can training the model to rely less on these features improve ViT robustness and out-of-distribution performance? We use the images transformed with our patch-based operations as negatively augmented views and offer losses to regularize the training away from using non-robust features. This is a complementary view to existing research that mostly focuses on augmenting inputs with semantic-preserving transformations to enforce models' invariance. We show that patch-based negative augmentation consistently improves robustness of ViTs across a wide set of ImageNet based robustness benchmarks. Furthermore, we find our patch-based negative augmentation are complementary to traditional (positive) data augmentation, and together boost the performance further. All the code in this work will be open-sourced.

【5】 The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization 标题:神经数据路由器:Transformer中的自适应控制流改进了系统泛化 链接:https://arxiv.org/abs/2110.07732

作者:Róbert Csordás,Kazuki Irie,Jürgen Schmidhuber 机构:J¨urgen Schmidhuber, The Swiss AI Lab, IDSIA, University of Lugano (USI) & SUPSI, Lugano, Switzerland, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia 摘要:尽管在广泛的应用中取得了成功,但Transformer在系统化推广方面的成功有限。在算法任务的情况下,这种情况尤其令人沮丧,在这种情况下,他们往往无法找到直观的解决方案,在Transformer列表示的网格中的正确时间将相关信息路由到正确的节点/操作。为了便于学习有用的控制流,我们对Transformer结构提出了两个修改:复制门和几何注意。我们的新型神经数据路由器(NDR)在经典合成表查找任务上实现了100%的长度泛化精度,在简单算术任务上实现了近乎完美的精度,并且在计算深度上实现了一种新的ListOps泛化测试变体。NDR的注意力和门控模式往往可以解释为一种直观的神经路由形式。我们的代码是公开的。 摘要:Despite successes across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task, as well as near-perfect accuracy on the simple arithmetic task and a new variant of ListOps testing for generalization across computational depth. NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing. Our code is public.

【6】 Certified Patch Robustness via Smoothed Vision Transformers 标题:通过平滑视觉转换器认证的补丁稳健性 链接:https://arxiv.org/abs/2110.07719

作者:Hadi Salman,Saachi Jain,Eric Wong,Aleksander Mądry 机构:MIT, Aleksander M ˛adry 摘要:认证补丁防御可以保证图像分类器对有界连续区域内的任意变化的鲁棒性。但是,目前,这种稳健性是以标准精度降低和推理时间变慢为代价的。我们演示了如何使用视觉变换器实现显著更好的认证面片鲁棒性,从而提高计算效率,并且不会导致标准精度的大幅下降。这些改进源于视觉转换器的固有能力,它可以优雅地处理大部分被遮掩的图像。我们的代码可在https://github.com/MadryLab/smoothed-vit. 摘要:Certified patch defenses can guarantee robustness of an image classifier to arbitrary changes within a bounded contiguous region. But, currently, this robustness comes at a cost of degraded standard accuracies and slower inference times. We demonstrate how using vision transformers enables significantly better certified patch robustness that is also more computationally efficient and does not incur a substantial drop in standard accuracy. These improvements stem from the inherent ability of the vision transformer to gracefully handle largely masked images. Our code is available at https://github.com/MadryLab/smoothed-vit.

GAN|对抗|攻击|生成相关(7篇)

【1】 Guiding Visual Question Generation 标题:引导可视化问题生成 链接:https://arxiv.org/abs/2110.08226

作者:Nihir Vedd,Zixu Wang,Marek Rei,Yishu Miao,Lucia Specia 机构:Department of Computing, Imperial College London 备注:11 pages including references and Appendix. 3 figures and 3 tables 摘要:在传统的视觉问题生成(VQG)中,大多数图像都有多个概念(例如对象和类别),可以为其生成问题,但模型经过训练以模仿训练数据中给出的任意概念选择。这使得训练变得困难,也给评估带来了问题——对于大多数图像,存在多个有效问题,但只有一个或几个问题被人类参考捕获。我们提出了引导性视觉问题生成——VQG的一种变体,它根据对问题类型及其应该探究的对象的期望,根据分类信息对问题生成器进行调节。我们提出了两种变体:(i)一种显式引导模型,使参与者(人工或自动)能够选择生成问题的对象和类别;和(ii)一个隐式引导模型,该模型基于离散的潜在变量,学习哪些对象和类别作为条件。建议的模型在答案类别增强的VQA数据集上进行了评估,我们的定量结果显示,与目前的技术水平相比,有了实质性的改进(增加了9个BLEU-4)。人类评估验证了指导有助于生成语法连贯且与给定图像和对象相关的问题。 摘要:In traditional Visual Question Generation (VQG), most images have multiple concepts (e.g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their training data. This makes training difficult and also poses issues for evaluation -- multiple valid questions exist for most images but only one or a few are captured by the human references. We present Guiding Visual Question Generation - a variant of VQG which conditions the question generator on categorical information based on expectations on the type of question and the objects it should explore. We propose two variants: (i) an explicitly guided model that enables an actor (human or automated) to select which objects and categories to generate a question for; and (ii) an implicitly guided model that learns which objects and categories to condition on, based on discrete latent variables. The proposed models are evaluated on an answer-category augmented VQA dataset and our quantitative results show a substantial improvement over the current state of the art (over 9 BLEU-4 increase). Human evaluation validates that guidance helps the generation of questions that are grammatically coherent and relevant to the given image and objects.

【2】 Dual-Arm Adversarial Robot Learning 标题:双臂对抗性机器人学习 链接:https://arxiv.org/abs/2110.08066

作者:Elie Aljalbout 机构:Technical University of Munich 备注:Accepted at CoRL 2021, Blue Sky Track 摘要:机器人学习是未来自动化和机器智能的一个非常有前途的课题。未来的机器人应该能够自主地获得技能,学会表现自己的环境,并与之互动。虽然这些主题已经在仿真中进行了探索,但现实世界中的机器人学习研究似乎仍然有限。这是由于在现实世界中遇到的额外挑战,如噪声传感器和执行器、安全探测、非平稳动力学、自主环境重置以及长时间运行实验的成本。除非我们为这些问题开发出可扩展的解决方案,否则学习涉及手眼协调和丰富接触的复杂任务将仍然是一个未触及的愿景,只有在受控实验室环境中才可行。我们建议将双臂设置作为机器人学习的平台。这样的设置可以安全地收集数据,以获取操作技能,并以机器人监督的方式训练感知模块。它们还简化了重置环境的过程。此外,对抗式学习通过最大化基于博弈论目标的探索,同时确保基于协作任务空间的安全性,有可能提高机器人学习方法的泛化能力。在本文中,我们将讨论这种设置的潜在好处以及可以追求的挑战和研究方向。 摘要:Robot learning is a very promising topic for the future of automation and machine intelligence. Future robots should be able to autonomously acquire skills, learn to represent their environment, and interact with it. While these topics have been explored in simulation, real-world robot learning research seems to be still limited. This is due to the additional challenges encountered in the real-world, such as noisy sensors and actuators, safe exploration, non-stationary dynamics, autonomous environment resetting as well as the cost of running experiments for long periods of time. Unless we develop scalable solutions to these problems, learning complex tasks involving hand-eye coordination and rich contacts will remain an untouched vision that is only feasible in controlled lab environments. We propose dual-arm settings as platforms for robot learning. Such settings enable safe data collection for acquiring manipulation skills as well as training perception modules in a robot-supervised manner. They also ease the processes of resetting the environment. Furthermore, adversarial learning could potentially boost the generalization capability of robot learning methods by maximizing the exploration based on game-theoretic objectives while ensuring safety based on collaborative task spaces. In this paper, we will discuss the potential benefits of this setup as well as the challenges and research directions that can be pursued.

【3】 Adversarial Attacks on ML Defense Models Competition 标题:ML防御模型大赛的对抗性攻击 链接:https://arxiv.org/abs/2110.08042

作者:Yinpeng Dong,Qi-An Fu,Xiao Yang,Wenzhao Xiang,Tianyu Pang,Hang Su,Jun Zhu,Jiayu Tang,Yuefeng Chen,XiaoFeng Mao,Yuan He,Hui Xue,Chao Li,Ye Liu,Qilong Zhang,Lianli Gao,Yunrui Yu,Xitong Gao,Zhe Zhao,Daquan Lin,Jiadong Lin,Chuanbiao Song,Zihao Wang,Zhennan Wu,Yang Guo,Jiequan Cui,Xiaogang Xu,Pengguang Chen 机构: Tsinghua University, Alibaba Group, RealAI, Shanghai Jiao Tong University, University of Electronic Science and Technology of China, University of Macau, Chinese Academy of Sciences, ShanghaiTech University, Huazhong University of Science and Technology 备注:Competition Report 摘要:由于深层神经网络(DNN)对对抗性示例的脆弱性,近年来提出了大量防御技术来缓解这一问题。然而,不完整或不正确的稳健性评估通常会阻碍建立更稳健模型的进程。为了加速对图像分类中当前防御模型的对抗鲁棒性进行可靠评估的研究,清华大学蔡尔集团和阿里巴巴安全集团组织了本次竞赛,并举办了CVPR 2021对抗性机器学习研讨会(https://aisecure-workshop.github.io/amlcvpr2021/). 本次竞赛的目的是激发新的攻击算法,以便更有效、更可靠地评估对手的鲁棒性。鼓励参与者开发更强大的白盒攻击算法,以发现不同防御的最坏情况鲁棒性。本次比赛在对抗性稳健性评估平台ARES上进行(https://github.com/thu-ml/ares),并在天池平台上举行(https://tianchi.aliyun.com/competition/entrance/531847/introduction)作为AI安全挑战者计划系列之一。比赛结束后,我们总结了结果,并在https://ml.cs.tsinghua.edu.cn/ares-bench/,允许用户上传对抗性攻击算法和防御模型以进行评估。 摘要:Due to the vulnerability of deep neural networks (DNNs) to adversarial examples, a large number of defense techniques have been proposed to alleviate this problem in recent years. However, the progress of building more robust models is usually hampered by the incomplete or incorrect robustness evaluation. To accelerate the research on reliable evaluation of adversarial robustness of the current defense models in image classification, the TSAIL group at Tsinghua University and the Alibaba Security group organized this competition along with a CVPR 2021 workshop on adversarial machine learning (https://aisecure-workshop.github.io/amlcvpr2021/). The purpose of this competition is to motivate novel attack algorithms to evaluate adversarial robustness more effectively and reliably. The participants were encouraged to develop stronger white-box attack algorithms to find the worst-case robustness of different defenses. This competition was conducted on an adversarial robustness evaluation platform -- ARES (https://github.com/thu-ml/ares), and is held on the TianChi platform (https://tianchi.aliyun.com/competition/entrance/531847/introduction) as one of the series of AI Security Challengers Program. After the competition, we summarized the results and established a new adversarial robustness benchmark at https://ml.cs.tsinghua.edu.cn/ares-bench/, which allows users to upload adversarial attack algorithms and defense models for evaluation.

【4】 RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models 标题:RAP:用于防御NLP模型后门攻击的鲁棒性感知扰动 链接:https://arxiv.org/abs/2110.07831

作者:Wenkai Yang,Yankai Lin,Peng Li,Jie Zhou,Xu Sun 机构:Center for Data Science, Peking University, Pattern Recognition Center, WeChat AI, Tencent Inc., China, MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University 备注:EMNLP 2021 (main conference), long paper, camera-ready version 摘要:后门攻击恶意地控制训练有素的模型的实例输出和特定的触发器,最近被证明是对重用深度神经网络(DNN)安全的严重威胁。在这项工作中,我们提出了一种基于鲁棒性感知扰动的有效在线防御机制。具体来说,通过分析后门训练过程,我们指出有毒样本和干净样本之间存在很大的鲁棒性差距。基于这一观察,我们构造了一个基于单词的鲁棒性感知扰动来区分有毒样本和干净样本,以抵御对自然语言处理(NLP)模型的后门攻击。此外,我们还从理论上分析了基于鲁棒性感知的扰动防御方法的可行性。在情感分析和毒性检测任务上的实验结果表明,与现有的在线防御方法相比,该方法具有更好的防御性能和更低的计算成本。我们的代码可在https://github.com/lancopku/RAP. 摘要:Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods. Our code is available at https://github.com/lancopku/RAP.

【5】 Adversarial Purification through Representation Disentanglement 标题:通过表征解缠实现对抗性净化 链接:https://arxiv.org/abs/2110.07801

作者:Tao Bai,Jun Zhao,Lanqing Guo,Bihan Wen 机构:Nanyang Technological University 摘要:深度学习模型容易受到对抗性示例的攻击,并犯下无法理解的错误,这对其在现实世界中的部署构成了威胁。结合对抗训练的思想,基于预处理的防御因其任务独立性和良好的可推广性而广受欢迎,使用方便。目前的防御方法,特别是净化,倾向于消除“噪音”“通过学习和恢复自然图像。然而,与随机噪声不同,由于对抗模式与图像的强相关性,在模型训练过程中更容易过度拟合。在这项工作中,我们提出了一种新的对抗性净化方案,通过提出自然图像和对抗性扰动的解纠缠作为预处理防御。通过大量的实验,我们的防御被证明是可推广的,并对看不见的强大对抗性攻击提供了重要的保护。它将最先进的\textbf{emession}攻击的成功率平均从\textbf{61.7\%}降低到\textbf{14.9\%},优于许多现有方法。值得注意的是,我们的防御可以完美地恢复受干扰的图像,并且不会损害主干模型的干净精度,这在实践中是非常理想的。 摘要:Deep learning models are vulnerable to adversarial examples and make incomprehensible mistakes, which puts a threat on their real-world deployment. Combined with the idea of adversarial training, preprocessing-based defenses are popular and convenient to use because of their task independence and good generalizability. Current defense methods, especially purification, tend to remove ``noise" by learning and recovering the natural images. However, different from random noise, the adversarial patterns are much easier to be overfitted during model training due to their strong correlation to the images. In this work, we propose a novel adversarial purification scheme by presenting disentanglement of natural images and adversarial perturbations as a preprocessing defense. With extensive experiments, our defense is shown to be generalizable and make significant protection against unseen strong adversarial attacks. It reduces the success rates of state-of-the-art \textbf{ensemble} attacks from \textbf{61.7\%} to \textbf{14.9\%} on average, superior to a number of existing methods. Notably, our defense restores the perturbed images perfectly and does not hurt the clean accuracy of backbone models, which is highly desirable in practice.

【6】 An Optimization Perspective on Realizing Backdoor Injection Attacks on Deep Neural Networks in Hardware 标题:在硬件上实现对深度神经网络的后门注入攻击的优化视角 链接:https://arxiv.org/abs/2110.07683

作者:M. Caner Tol,Saad Islam,Berk Sunar,Ziming Zhang 机构:Worcester Polytechnic Institute, Worcester, MA, USA 摘要:最先进的深度神经网络(DNN)已被证明容易受到对手操纵和后门攻击。后门模型在使用预定义触发器的输入上偏离预期行为,同时在干净数据上保持性能。最近的工作侧重于通过修改网络权重在推理阶段对后门注入进行软件模拟,我们发现,由于内存中的位分配等硬件限制,这在实践中往往是不现实的。相比之下,在这项工作中,我们研究了DNN在硬件上的实际部署中后门注入攻击的可行性,并从新的优化角度解决了硬件实现中的此类实际问题。我们的动机是,易受攻击的内存位置非常罕见、特定于设备且分布稀疏。因此,我们提出了一种新的基于约束优化的网络训练算法,用于在硬件上实现真实的后门注入攻击。通过在卷积层和完全连接层上均匀地修改参数,以及一起优化触发模式,我们以更少的位翻转实现了最先进的攻击性能。例如,在CIFAR-10上训练的硬件部署的ResNet-20模型上,我们的方法只需翻转220万位中的10位,即可实现91%以上的测试精度和94%以上的攻击成功率。 摘要:State-of-the-art deep neural networks (DNNs) have been proven to be vulnerable to adversarial manipulation and backdoor attacks. Backdoored models deviate from expected behavior on inputs with predefined triggers while retaining performance on clean data. Recent works focus on software simulation of backdoor injection during the inference phase by modifying network weights, which we find often unrealistic in practice due to the hardware restriction such as bit allocation in memory. In contrast, in this work, we investigate the viability of backdoor injection attacks in real-life deployments of DNNs on hardware and address such practical issues in hardware implementation from a novel optimization perspective. We are motivated by the fact that the vulnerable memory locations are very rare, device-specific, and sparsely distributed. Consequently, we propose a novel network training algorithm based on constrained optimization for realistic backdoor injection attack in hardware. By modifying parameters uniformly across the convolutional and fully-connected layers as well as optimizing the trigger pattern together, we achieve the state-of-the-art attack performance with fewer bit flips. For instance, our method on a hardware-deployed ResNet-20 model trained on CIFAR-10 can achieve over 91% test accuracy and 94% attack success rate by flipping only 10 bits out of 2.2 million bits.

【7】 Federated learning and next generation wireless communications: A survey on bidirectional relationship 标题:联合学习与下一代无线通信:双向关系综述 链接:https://arxiv.org/abs/2110.07649

作者:Debaditya Shome,Omer Waqar,Wali Ullah Khan 机构: the next generationcommunication systems will entail optimization of the sheerDebaditya Shome was with the Department of Engineering, ThompsonRivers University, Omer Waqar is with the Department of Engineering, Thompson RiversUniversity (TRU) 备注:18 pages, 6 figures 摘要:为了满足下一代无线通信网络极其异构的需求,研究界越来越依赖于使用机器学习解决方案进行实时决策和无线资源管理。传统的机器学习采用完全集中的体系结构,其中整个训练数据在一个节点(如云服务器)上收集,这大大增加了通信开销,也引起了严重的隐私问题。为此,最近提出了一种称为联邦学习(FL)的分布式机器学习范式。在FL中,每个参与的边缘设备通过使用其自己的训练数据来训练其本地模型。然后,通过无线信道将本地训练模型的权重或参数发送到中央PS,中央PS聚合这些权重或参数并更新全局模型。一方面,FL在优化无线通信网络资源方面发挥着重要作用,另一方面,无线通信对FL至关重要。因此,FL和无线通信之间存在“双向”关系。虽然FL是一个新兴概念,但在FL及其在下一代无线网络中的应用领域已经发表了许多出版物。然而,我们注意到,没有一项研究突出了FL和无线通信之间的双向关系。因此,本文的目的是通过对FL和无线通信之间的相互依赖性进行及时和全面的讨论,弥合文献中的这一差距。 摘要:In order to meet the extremely heterogeneous requirements of the next generation wireless communication networks, research community is increasingly dependent on using machine learning solutions for real-time decision-making and radio resource management. Traditional machine learning employs fully centralized architecture in which the entire training data is collected at one node e.g., cloud server, that significantly increases the communication overheads and also raises severe privacy concerns. Towards this end, a distributed machine learning paradigm termed as Federated learning (FL) has been proposed recently. In FL, each participating edge device trains its local model by using its own training data. Then, via the wireless channels the weights or parameters of the locally trained models are sent to the central PS, that aggregates them and updates the global model. On one hand, FL plays an important role for optimizing the resources of wireless communication networks, on the other hand, wireless communications is crucial for FL. Thus, a `bidirectional' relationship exists between FL and wireless communications. Although FL is an emerging concept, many publications have already been published in the domain of FL and its applications for next generation wireless networks. Nevertheless, we noticed that none of the works have highlighted the bidirectional relationship between FL and wireless communications. Therefore, the purpose of this survey paper is to bridge this gap in literature by providing a timely and comprehensive discussion on the interdependency between FL and wireless communications.

半/弱/无/有监督|不确定性|主动学习(8篇)

【1】 Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction 标题:一种基于自监督掩码预测的动态剪枝方法 链接:https://arxiv.org/abs/2110.08232

作者:Sara Elkerdawy,Mostafa Elhoushi,Hong Zhang,Nilanjan Ray 机构:University of Alberta,Toronto Heterogeneous Compilers Lab, Huawei 摘要:动态模型修剪是最近的一个方向,它允许在部署期间为每个输入样本推断不同的子网络。然而,当前的动态方法依赖于通过引入稀疏损失的正则化来学习连续通道选通。该公式引入了平衡不同损失(例如任务损失、正则化损失)的复杂性。此外,基于正则化的方法缺乏透明的折衷超参数选择来实现计算预算。我们的贡献有两方面:1)解耦任务和修剪训练。2) 简单的超参数选择,可在训练前进行FLOPs减少估计。我们建议根据前一层的激活情况,预测一个掩模在一层中处理k个滤波器。我们提出的问题作为一个自我监督的二元分类问题。每个掩码预测器模块都经过训练,以预测当前层中每个滤波器的对数似然是否属于top-k激活滤波器。基于使用热图质量的新标准,动态估计每个输入的k值。我们在CIFAR和ImageNet数据集上展示了几种神经结构的实验,如VGG、ResNet和MobileNet。在CIFAR上,我们达到了与SOTA方法相似的精度,减少了15%和24%的触发器。类似地,在ImageNet中,我们实现了更低的精度下降,并使触发器减少了13%。 摘要:Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment. However, current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss. This formulation introduces complexity in balancing different losses (e.g task loss, regularization loss). In addition, regularization-based methods lack transparent tradeoff hyperparameter selection to realize computational budget. Our contribution is twofold: 1) decoupled task and pruning training. 2) Simple hyperparameter selection that enables FLOPs reduction estimation before training. We propose to predict a mask to process k filters in a layer based on the activation of its previous layer. We pose the problem as a self-supervised binary classification problem. Each mask predictor module is trained to predict if the log-likelihood of each filter in the current layer belongs to the top-k activated filters. The value k is dynamically estimated for each input based on a novel criterion using the mass of heatmaps. We show experiments on several neural architectures, such as VGG, ResNet, and MobileNet on CIFAR and ImageNet datasets. On CIFAR, we reach similar accuracy to SOTA methods with 15% and 24% higher FLOPs reduction. Similarly in ImageNet, we achieve a lower drop in accuracy with up to 13% improvement in FLOPs reduction.

【2】 Identifying Incorrect Classifications with Balanced Uncertainty 标题:用均衡的不确定性识别错误分类 链接:https://arxiv.org/abs/2110.08030

作者:Bolian Li,Zige Zheng,Changqing Zhang 机构:Tianjin University, The Chinese University of, Hong Kong, Shenzhen 摘要:不确定性估计对于成本敏感的深度学习应用(即疾病诊断)至关重要。这是非常具有挑战性的,部分原因是在大多数数据集中无法获得不确定性和真实性。以前的工作建议从softmax校准、蒙特卡罗抽样、主观逻辑等方面估计不确定度。然而,这些现有方法往往对其预测过于自信,总体不确定度不合理地低,这源于正(正确分类)和负(错误分类)样本之间的不平衡。针对这个问题,我们首先提出了分布不平衡来将不确定性估计中的不平衡建模为两种分布偏差,然后提出了平衡真实类概率(BTCP)框架,该框架学习具有新的分布焦损(DFL)目标的不确定性估计器。最后,我们评估了BTCP在多个数据集上的故障预测和分布外(OOD)检测。实验结果表明,BTCP在识别错误分类方面优于其他不确定性估计方法。 摘要:Uncertainty estimation is critical for cost-sensitive deep-learning applications (i.e. disease diagnosis). It is very challenging partly due to the inaccessibility of uncertainty groundtruth in most datasets. Previous works proposed to estimate the uncertainty from softmax calibration, Monte Carlo sampling, subjective logic and so on. However, these existing methods tend to be over-confident about their predictions with unreasonably low overall uncertainty, which originates from the imbalance between positive (correct classifications) and negative (incorrect classifications) samples. For this issue, we firstly propose the distributional imbalance to model the imbalance in uncertainty estimation as two kinds of distribution biases, and secondly propose Balanced True Class Probability (BTCP) framework, which learns an uncertainty estimator with a novel Distributional Focal Loss (DFL) objective. Finally, we evaluate the BTCP in terms of failure prediction and out-of-distribution (OOD) detection on multiple datasets. The experimental results show that BTCP outperforms other uncertainty estimation methods especially in identifying incorrect classifications.

【3】 Wasserstein Unsupervised Reinforcement Learning 标题:Wasserstein无监督强化学习 链接:https://arxiv.org/abs/2110.07940

作者:Shuncheng He,Yuhang Jiang,Hongchang Zhang,Jianzhun Shao,Xiangyang Ji 机构:Department of Automation, Tsinghua University, Beijing, China 摘要:无监督强化学习旨在训练代理人在没有外部奖励的环境中学习一些政策或技能。这些预先训练好的策略可以在获得外部奖励的情况下加速学习,也可以作为分层强化学习的原始选项。传统的无监督技能发现方法通过互信息(MI)最大化为agent提供潜在变量,并将其赋予agent的行为。然而,基于MI的方法学习的策略不能充分探索状态空间,尽管它们可以成功地相互识别。因此,我们提出了一个新的框架Wasserstein无监督强化学习(WURL),其中我们直接最大化由不同策略引起的状态分布距离。此外,我们还克服了同时训练N(N>2)个策略以及将总体奖励分期支付给每个步骤的困难。实验表明,该方法学习的策略在保持高可分辨性的同时,优于基于Wasserstein距离度量的MI方法。此外,WURL训练的Agent可以充分探索迷宫和MuJoCo任务中的状态空间,并且预先训练的策略可以通过分层学习应用于下游任务。 摘要:Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward. These pre-trained policies can accelerate learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning. Conventional approaches of unsupervised skill discovery feed a latent variable to the agent and shed its empowerment on agent's behavior by mutual information (MI) maximization. However, the policies learned by MI-based methods cannot sufficiently explore the state space, despite they can be successfully identified from each other. Therefore we propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies. Additionally, we overcome difficulties in simultaneously training N(N >2) policies, and amortizing the overall reward to each step. Experiments show policies learned by our approach outperform MI-based methods on the metric of Wasserstein distance while keeping high discriminability. Furthermore, the agents trained by WURL can sufficiently explore the state space in mazes and MuJoCo tasks and the pre-trained policies can be applied to downstream tasks by hierarchical learning.

【4】 Improving Unsupervised Domain Adaptive Re-Identification via Source-Guided Selection of Pseudo-Labeling Hyperparameters 标题:源引导选择伪标签超参数改进无监督域自适应再辨识 链接:https://arxiv.org/abs/2110.07897

作者:Fabian Dubourvieux,Angélique Loesch,Romaric Audigier,Samia Ainouz,Stéphane Canu 机构:Université Paris-Saclay, CEA, List, F-, Palaiseau, France, Normandie Univ, INSA Rouen, LITIS, Av. de l’Université le Madrillet , Saint Etienne du Rouvray, France 备注:Submitted to IEEE Access for review 摘要:用于重新识别(re ID)的无监督域适配(UDA)是一项具有挑战性的任务:为了避免额外数据的昂贵注释,它旨在将知识从具有注释数据的域转移到仅具有未标记数据的感兴趣域。伪标记方法已被证明对UDA re-ID有效。然而,这些方法的有效性在很大程度上取决于一些超参数(HP)的选择,这些超参数影响通过聚类生成伪标记。由于在感兴趣的领域中缺乏注释,因此这种选择非常重要。当前的方法只是对所有适应任务重复使用相同的经验值,而不考虑通过伪标记训练阶段改变的目标数据表示。由于这种过于简单的选择可能会限制它们的性能,我们旨在解决这个问题。我们提出了聚类UDA re ID的HP选择的新理论依据,以及伪标记UDA聚类的自动和循环HP调优方法:HyPASS。HyPASS包含伪标记方法中的两个模块:(i)基于标记源验证集的HP选择和(ii)特征辨别性的条件域对齐,以改进基于源样本的HP选择。在常用的人员和车辆身份识别数据集上的实验表明,与常用的经验HP设置相比,我们提出的HyPASS持续改进了最佳状态的身份识别方法。 摘要:Unsupervised Domain Adaptation (UDA) for re-identification (re-ID) is a challenging task: to avoid a costly annotation of additional data, it aims at transferring knowledge from a domain with annotated data to a domain of interest with only unlabeled data. Pseudo-labeling approaches have proven to be effective for UDA re-ID. However, the effectiveness of these approaches heavily depends on the choice of some hyperparameters (HP) that affect the generation of pseudo-labels by clustering. The lack of annotation in the domain of interest makes this choice non-trivial. Current approaches simply reuse the same empirical value for all adaptation tasks and regardless of the target data representation that changes through pseudo-labeling training phases. As this simplistic choice may limit their performance, we aim at addressing this issue. We propose new theoretical grounds on HP selection for clustering UDA re-ID as well as method of automatic and cyclic HP tuning for pseudo-labeling UDA clustering: HyPASS. HyPASS consists in incorporating two modules in pseudo-labeling methods: (i) HP selection based on a labeled source validation set and (ii) conditional domain alignment of feature discriminativeness to improve HP selection based on source samples. Experiments on commonly used person re-ID and vehicle re-ID datasets show that our proposed HyPASS consistently improves the best state-of-the-art methods in re-ID compared to the commonly used empirical HP setting.

【5】 FedSEAL: Semi-Supervised Federated Learning with Self-Ensemble Learning and Negative Learning 标题:FedSEAL:具有自集成学习和负学习的半监督联合学习 链接:https://arxiv.org/abs/2110.07829

作者:Jieming Bian,Zhu Fu,Jie Xu 机构:Department of Electrical and Computer Engineering, University of Miami 备注:11 pages, 5 figures 摘要:联邦学习(FL)是一种流行的分散和隐私保护的机器学习(FL)框架,近年来受到了广泛的研究关注。现有的大多数工作集中在监督学习(SL)问题上,其中假设客户端携带有标记的数据集,而服务器没有数据。然而,在现实场景中,由于缺乏专业知识和动机,客户机通常无法标记其数据,而服务器可能承载少量标记的数据。因此,如何合理利用服务器标记的数据和客户端未标记的数据具有极其重要的现实意义。在本文中,我们提出了一种新的FL算法,称为FedSEAL,来解决这个半监督联邦学习(SSFL)问题。我们的算法利用自集成学习和互补负学习来提高客户端对未标记数据的无监督学习的准确性和效率,并在服务器端和客户端协调模型训练。我们在SSFL设置下的时装MNIST和CIFAR10数据集上的实验结果验证了我们的方法的有效性,它大大优于最先进的SSFL方法。 摘要:Federated learning (FL), a popular decentralized and privacy-preserving machine learning (FL) framework, has received extensive research attention in recent years. The majority of existing works focus on supervised learning (SL) problems where it is assumed that clients carry labeled datasets while the server has no data. However, in realistic scenarios, clients are often unable to label their data due to the lack of expertise and motivation while the server may host a small amount of labeled data. How to reasonably utilize the server labeled data and the clients' unlabeled data is thus of paramount practical importance. In this paper, we propose a new FL algorithm, called FedSEAL, to solve this Semi-Supervised Federated Learning (SSFL) problem. Our algorithm utilizes self-ensemble learning and complementary negative learning to enhance both the accuracy and the efficiency of clients' unsupervised learning on unlabeled data, and orchestrates the model training on both the server side and the clients' side. Our experimental results on Fashion-MNIST and CIFAR10 datasets in the SSFL setting validate the effectiveness of our method, which outperforms the state-of-the-art SSFL methods by a large margin.

【6】 Continual Learning on Noisy Data Streams via Self-Purified Replay 标题:基于自净化重放的噪声数据流连续学习 链接:https://arxiv.org/abs/2110.07735

作者:Chris Dongjoo Kim,Jinseo Jeong,Sangwoo Moon,Gunhee Kim 机构:NALBI Inc., Department of Computer Science and Engineering, Seoul National University, South Korea 备注:Published at ICCV 2021 main conference 摘要:在现实世界中持续学习必须克服许多挑战,其中噪声标签是一个常见且不可避免的问题。在这项工作中,我们首次提出了一个基于重复的持续学习框架,同时解决了灾难性遗忘和噪声标签。我们的解决方案基于两个观察结果;(i)通过自监督学习,即使有噪声标签,遗忘也可以减轻,(ii)重放缓冲区的纯度至关重要。基于此,我们提出了我们方法的两个关键组成部分:(i)一种称为自重放的自监督重放技术,它可以绕过噪声标签数据产生的错误训练信号,以及(ii)通过基于中心性的随机图集合,自中心过滤器可维持一个纯净的重放缓冲区。在MNIST、CIFAR-10、CIFAR-100和WebVision上的实验结果表明,我们的框架可以在有噪声的流数据中维持一个高纯的重放缓冲区,同时大大优于f最先进的持续学习和噪声标签学习方法。源代码可在http://vision.snu.ac.kr/projects/SPR 摘要:Continually learning in the real world must overcome many challenges, among which noisy labels are a common and inevitable issue. In this work, we present a repla-ybased continual learning framework that simultaneously addresses both catastrophic forgetting and noisy labels for the first time. Our solution is based on two observations; (i) forgetting can be mitigated even with noisy labels via self-supervised learning, and (ii) the purity of the replay buffer is crucial. Building on this regard, we propose two key components of our method: (i) a self-supervised replay technique named Self-Replay which can circumvent erroneous training signals arising from noisy labeled data, and (ii) the Self-Centered filter that maintains a purified replay buffer via centrality-based stochastic graph ensembles. The empirical results on MNIST, CIFAR-10, CIFAR-100, and WebVision with real-world noise demonstrate that our framework can maintain a highly pure replay buffer amidst noisy streamed data while greatly outperforming the combinations of the state-of-the-art continual learning and noisy label learning methods. The source code is available at http://vision.snu.ac.kr/projects/SPR

【7】 A Semi-Supervised Approach for Abnormal Event Prediction on Large Operational Network Time-Series Data 标题:一种针对大型运营网络时间序列数据的半监督异常事件预测方法 链接:https://arxiv.org/abs/2110.07660

作者:Yijun Lin,Yao-Yi Chiang 机构:Department of Computer Science and Engineering, University of Minnesota, Twin Cities 备注:9 pages + 3 supplementary pages; submitted to SDM2022 摘要:大型网络日志记录由网络中的异构设备和传感器生成的多变量时间序列,通常可以揭示有关异常活动的重要信息,例如网络入侵和设备故障。用于多元时间序列异常检测的现有机器学习方法通常假设1)正常序列在训练无监督模型时具有一致的行为,或2)在有监督模型中需要大量标记的正常和异常序列。然而,在实践中,正常的网络活动可以显示出显著不同的序列模式(例如,在重新路由部分网络流量之前和之后)。此外,记录的异常事件可能是稀疏的。本文提出了一种新的半监督方法,该方法能够有效地捕获网络时间序列之间以及跨时间点之间的依赖关系,从而生成网络活动的有意义表示,用于预测异常事件。该方法可以利用有限的标记数据明确地学习正常和异常样本的可分离嵌入空间,并有效地利用未标记数据处理训练数据的稀缺性。实验表明,在大型真实网络日志上,我们的方法显著优于最先进的事件检测方法。 摘要:Large network logs, recording multivariate time series generated from heterogeneous devices and sensors in a network, can often reveal important information about abnormal activities, such as network intrusions and device malfunctions. Existing machine learning methods for anomaly detection on multivariate time series typically assume that 1) normal sequences would have consistent behavior for training unsupervised models, or 2) require a large set of labeled normal and abnormal sequences for supervised models. However, in practice, normal network activities can demonstrate significantly varying sequence patterns (e.g., before and after rerouting partial network traffic). Also, the recorded abnormal events can be sparse. This paper presents a novel semi-supervised method that efficiently captures dependencies between network time series and across time points to generate meaningful representations of network activities for predicting abnormal events. The method can use the limited labeled data to explicitly learn separable embedding space for normal and abnormal samples and effectively leverage unlabeled data to handle training data scarcity. The experiments demonstrate that our approach significantly outperformed state-of-the-art approaches for event detection on a large real-world network log.

【8】 An active learning approach for improving the performance of equilibrium based chemical simulations 标题:一种提高基于平衡的化学模拟性能的主动学习方法 链接:https://arxiv.org/abs/2110.08111

作者:Mary Savino,Céline Lévy-Leduc,Marc Leconte,Benoit Cochepin 备注:22 pages, 17 figures 摘要:在本文中,我们提出了一种新的顺序数据驱动方法来处理基于平衡的化学模拟,这可以看作是一种特殊的机器学习方法,称为主动学习。我们的方法的基本思想是考虑函数估计作为高斯过程的样本,这使得我们能够计算函数估计的全局不确定性。由于这种估计,并且几乎没有需要调整的参数,所提出的方法依次选择最相关的输入数据,在该数据处必须对要估计的函数进行评估,以构建代理模型。因此,要估计的函数的评估数量非常有限。我们的主动学习方法通过数值实验得到验证,并应用于地球科学中常用的复杂化学系统。 摘要:In this paper, we propose a novel sequential data-driven method for dealing with equilibrium based chemical simulations, which can be seen as a specific machine learning approach called active learning. The underlying idea of our approach is to consider the function to estimate as a sample of a Gaussian process which allows us to compute the global uncertainty on the function estimation. Thanks to this estimation and with almost no parameter to tune, the proposed method sequentially chooses the most relevant input data at which the function to estimate has to be evaluated to build a surrogate model. Hence, the number of evaluations of the function to estimate is dramatically limited. Our active learning method is validated through numerical experiments and applied to a complex chemical system commonly used in geoscience.

迁移|Zero/Few/One-Shot|自适应(1篇)

【1】 Multitask Prompted Training Enables Zero-Shot Task Generalization 标题:多任务提示式训练实现零命中任务泛化 链接:https://arxiv.org/abs/2110.08207

作者:Victor Sanh,Albert Webson,Colin Raffel,Stephen H. Bach,Lintang Sutawika,Zaid Alyafeai,Antoine Chaffin,Arnaud Stiegler,Teven Le Scao,Arun Raja,Manan Dey,M Saiful Bari,Canwen Xu,Urmish Thakker,Shanya Sharma Sharma,Eliza Szczechla,Taewoon Kim,Gunjan Chhablani,Nihal Nayak,Debajyoti Datta,Jonathan Chang,Mike Tian-Jian Jiang,Han Wang,Matteo Manica,Sheng Shen,Zheng Xin Yong,Harshit Pandey,Rachel Bawden,Thomas Wang,Trishala Neeraj,Jos Rozen,Abheesht Sharma,Andrea Santilli,Thibault Fevry,Jason Alan Fries,Ryan Teehan,Stella Biderman,Leo Gao,Tali Bers,Thomas Wolf,Alexander M. Rush 机构:Hugging Face, Brown University, BigScience, KFUPM, IRISA, IMATAG, Hyperscience, I,R, ASTAR, SAP, NTU, UCSDHF, SambaNova Systems, Shanya Sharma, Walmart Labs, VU Amsterdam, Nihal V. Nayak, University of Virginia, ASUS, ZEALS, NYU, IBM Research, UC Berkeley, Zheng-Xin Yong, Michael McKenna, Parity 备注:this https URL 摘要:大型语言模型最近被证明可以在不同的任务集上实现合理的零炮泛化。有人假设这是语言模型训练中内隐多任务学习的结果。零炮泛化能直接由显式多任务学习产生吗?为了大规模地测试这个问题,我们开发了一个系统,可以轻松地将一般自然语言任务映射为人类可读的提示形式。我们使用不同的自然语言转换一大组有监督的数据集,每个数据集都有多个提示。这些提示数据集允许对模型执行自然语言中指定的完全看不见的任务的能力进行基准测试。我们在此多任务混合上微调了一个预训练编码器-解码器模型,该模型涵盖了各种各样的任务。该模型在多个标准数据集上实现了强大的零炮性能,通常比其尺寸大16倍的模型表现更好。此外,我们的方法在大型基准测试中的一部分任务上获得了很好的性能,超过了6倍于其规模的模型。所有提示和经过训练的模型均可在github.com/bigscience workshop/promptsource/上获得。 摘要:Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks. It has been hypothesized that this is a consequence of implicit multitask learning in language model training. Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping general natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts using varying natural language. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-Bench benchmark, outperforming models 6x its size. All prompts and trained models are available at github.com/bigscience-workshop/promptsource/.

强化学习(2篇)

【1】 Containerized Distributed Value-Based Multi-Agent Reinforcement Learning 标题:基于容器化分布值的多Agent强化学习 链接:https://arxiv.org/abs/2110.08169

作者:Siyang Wu,Tonghan Wang,Chenghao Li,Chongjie Zhang 机构:† Institute for Interdisciplinary Information Sciences, Tsinghua University, ‡ Department of Automation, Tsinghua University 摘要:多agent强化学习任务对训练样本量提出了很高的要求。与单agent强化学习不同,基于分布式价值的多agent强化学习面临着数据传输要求高、进程间通信管理要求高、探索性要求高等独特挑战。我们提出了一个集装箱化学习框架来解决这些问题。我们打包了几个环境实例、一个本地学习器和缓冲区,以及一个精心设计的多队列管理器,它可以避免阻塞到容器中。鼓励每个容器的本地政策尽可能多样化,只向全球学习者发送优先级最高的轨迹。通过这种方式,我们实现了一个具有高系统吞吐量的可扩展、高效、多样的分布式MARL学习框架。据自己所知,我们的方法是第一个解决具有挑战性的Google Research足球完整游戏$5\\ u v\\ u 5$。在星际争霸II的微观管理基准上,我们的方法比最先进的非分布式MARL算法获得了$4$-$18\倍的结果。 摘要:Multi-agent reinforcement learning tasks put a high demand on the volume of training samples. Different from its single-agent counterpart, distributed value-based multi-agent reinforcement learning faces the unique challenges of demanding data transfer, inter-process communication management, and high requirement of exploration. We propose a containerized learning framework to solve these problems. We pack several environment instances, a local learner and buffer, and a carefully designed multi-queue manager which avoids blocking into a container. Local policies of each container are encouraged to be as diverse as possible, and only trajectories with highest priority are sent to a global learner. In this way, we achieve a scalable, time-efficient, and diverse distributed MARL learning framework with high system throughput. To own knowledge, our method is the first to solve the challenging Google Research Football full game $5\_v\_5$. On the StarCraft II micromanagement benchmark, our method gets $4$-$18\times$ better results compared to state-of-the-art non-distributed MARL algorithms.

【2】 On-Policy Model Errors in Reinforcement Learning 标题:强化学习中的On-Policy模型错误 链接:https://arxiv.org/abs/2110.07985

作者:Lukas P. Fröhlich,Maksym Lefarov,Melanie N. Zeilinger,Felix Berkenkamp 机构:Bosch Center for Artificial Intelligence, Renningen, Germany, Melanie Zeilinger, Institute for Dynamic Systems and Control, ETH Zürich, Zurich, Switzerland 摘要:无模型强化学习算法可以计算给定采样环境转换的策略梯度,但需要大量数据。相反,基于模型的方法可以使用学习的模型生成新数据,但模型错误和偏差会导致学习不稳定或次优。在本文中,我们提出了一种新的方法,结合真实世界的数据和学习模型,以获得最好的两个世界。其核心思想是利用现实世界的数据进行政策预测,并仅使用学习到的模型来概括不同的行动。具体而言,我们在学习模型的基础上,将数据作为与时间相关的政策修正项,以保持生成数据的能力,而不会在长期预测范围内累积错误。我们从理论上激励了这种方法,并证明它抵消了基于模型的策略改进的错误项。在MuJoCo和PyBullet基准上的实验表明,我们的方法可以在不引入额外调优参数的情况下大大改进现有的基于模型的方法。 摘要:Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model errors and bias can render learning unstable or sub-optimal. In this paper, we present a novel method that combines real world data and a learned model in order to get the best of both worlds. The core idea is to exploit the real world data for on-policy predictions and use the learned model only to generalize to different actions. Specifically, we use the data as time-dependent on-policy correction terms on top of a learned model, to retain the ability to generate data without accumulating errors over long prediction horizons. We motivate this method theoretically and show that it counteracts an error term for model-based policy improvement. Experiments on MuJoCo- and PyBullet-benchmarks show that our method can drastically improve existing model-based approaches without introducing additional tuning parameters.

元学习(1篇)

【1】 Meta-learning via Language Model In-context Tuning 标题:通过语境中的语言模型调整实现元学习 链接:https://arxiv.org/abs/2110.07814

作者:Yanda Chen,Ruiqi Zhong,Sheng Zha,George Karypis,He He 机构:Columbia University,University of California, Berkeley,AWS AI, New York University 摘要:元学习的目标是学会适应一项新任务,只需要几个标记的例子。为了解决NLP中的这个问题,我们提出了$\textit{in context tuning}$,它将自适应和预测重新定义为一个简单的序列预测问题:为了形成输入序列,我们将任务指令、标记的示例和目标输入连接起来进行预测;为了对模型进行元训练以从上下文示例中学习,我们对预先训练的语言模型(LM)进行微调,以根据任务集合上的输入序列预测目标标签。我们在两个文本分类任务集合上对我们的方法进行了基准测试:LAMA和BinaryClfs。与采用梯度下降模型的一阶MAML相比,我们的方法更好地利用LMs的诱导偏差来执行模式匹配,并且在二元CLF上的绝对$6\%$AUC ROC分数优于MAML,并且增加了模型大小的优势。与未微调的上下文学习(即提示原始LM)相比,上下文调整直接学习从上下文示例中学习。在BinaryClfs上,上下文内调优将平均AUC-ROC分数提高了10\%$,并将示例排序的方差降低了6倍,示例选择的方差降低了2倍。 摘要:The goal of meta-learning is to learn to adapt to a new task with only a few labeled examples. To tackle this problem in NLP, we propose $\textit{in-context tuning}$, which recasts adaptation and prediction as a simple sequence prediction problem: to form the input sequence, we concatenate the task instruction, the labeled examples, and the target input to predict; to meta-train the model to learn from in-context examples, we fine-tune a pre-trained language model (LM) to predict the target label from the input sequences on a collection of tasks. We benchmark our method on two collections of text classification tasks: LAMA and BinaryClfs. Compared to first-order MAML which adapts the model with gradient descent, our method better leverages the inductive bias of LMs to perform pattern matching, and outperforms MAML by an absolute $6\%$ AUC ROC score on BinaryClfs, with increasing advantage w.r.t. model size. Compared to non-fine-tuned in-context learning (i.e. prompting a raw LM), in-context tuning directly learns to learn from in-context examples. On BinaryClfs, in-context tuning improves the average AUC-ROC score by an absolute $10\%$, and reduces the variance with respect to example ordering by 6x and example choices by 2x.

推荐(1篇)

【1】 Value Penalized Q-Learning for Recommender Systems 标题:价值惩罚Q-学习在推荐系统中的应用 链接:https://arxiv.org/abs/2110.07923

作者:Chengqian Gao,Ke Xu,Peilin Zhao 机构: Shenzhen International Graduate School, Tsinghua University, Tencent AI Lab 备注:An offline RL algorithm for recommender systems, 10 Pages 摘要:将强化学习(RL)扩展到推荐系统(RS)是有希望的,因为最大化RL代理的预期累积回报符合RS的目标,即提高客户的长期满意度。实现这一目标的关键方法是离线RL,其目的是从记录的数据中学习策略。然而,商业RS中的高维动作空间和非平稳动态加剧了分布转移问题,使得将离线RL方法应用于RS具有挑战性。为了缓解从静态轨迹提取RL策略时的动作分布转移问题,我们提出了值惩罚Q学习(VPQ),一种基于不确定性的离线RL算法。它通过不确定性感知权重惩罚回归目标中不稳定的Q值,无需估计行为策略,适用于具有大量项目的RS。我们从Q-函数集合的方差推导惩罚权重。为了缓解测试时的分布转移问题,我们进一步引入了critic框架,将所提出的方法与经典RS模型相结合。在两个真实数据集上进行的大量实验表明,该方法可以作为现有RS模型的增益插件。 摘要:Scaling reinforcement learning (RL) to recommender systems (RS) is promising since maximizing the expected cumulative rewards for RL agents meets the objective of RS, i.e., improving customers' long-term satisfaction. A key approach to this goal is offline RL, which aims to learn policies from logged data. However, the high-dimensional action space and the non-stationary dynamics in commercial RS intensify distributional shift issues, making it challenging to apply offline RL methods to RS. To alleviate the action distribution shift problem in extracting RL policy from static trajectories, we propose Value Penalized Q-learning (VPQ), an uncertainty-based offline RL algorithm. It penalizes the unstable Q-values in the regression target by uncertainty-aware weights, without the need to estimate the behavior policy, suitable for RS with a large number of items. We derive the penalty weights from the variances across an ensemble of Q-functions. To alleviate distributional shift issues at test time, we further introduce the critic framework to integrate the proposed method with classic RS models. Extensive experiments conducted on two real-world datasets show that the proposed method could serve as a gain plugin for existing RS models.

聚类(1篇)

【1】 A Survey of Evolutionary Multi-Objective Clustering Approaches 标题:进化多目标聚类方法综述 链接:https://arxiv.org/abs/2110.08100

作者:Cristina Y. Morimoto,Aurora Pozo,Marcílio C. P. de Souto 机构: Federal University of Paraná, University of Orléans 备注:Submitted to ACM Computing Surveys 摘要:本文基于ACM、IEEE和Scopus中索引文章的映射,介绍了进化多目标聚类的研究多年来是如何发展的。我们提出了最相关的方法,考虑到高影响力的期刊和会议,以提供这一研究领域的概述。我们分析了基于特征和组件的算法,提出了进化多目标聚类的一般架构。考虑到常见的聚类策略和应用,对这些算法进行了分组。此外,还讨论了定义适用于进化多目标聚类的合适聚类标准的困难以及进化过程评估对于清楚了解优化效率的重要性等问题。在设计新方法或选择/使用现有方法时,除了观察特定的聚类特性外,还必须观察这些方面。最后,我们提出了未来研究的其他潜在主题,在这些主题中,本文可以为希望对该领域有广泛视野的新手或忙碌的研究人员做出贡献。 摘要:This article presents how the studies of the evolutionary multi-objective clustering have been evolving over the years, based on a mapping of the indexed articles in the ACM, IEEE, and Scopus. We present the most relevant approaches considering the high impact journals and conferences to provide an overview of this study field. We analyzed the algorithms based on the features and components presented in the proposed general architecture of the evolutionary multi-objective clustering. These algorithms were grouped considering common clustering strategies and applications. Furthermore, issues regarding the difficulty in defining appropriate clustering criteria applied to evolutionary multi-objective clustering and the importance of the evolutionary process evaluation to have a clear view of the optimization efficiency are discussed. It is essential to observe these aspects besides specific clustering properties when designing new approaches or selecting/using the existing ones. Finally, we present other potential subjects of future research, in which this article can contribute to newcomers or busy researchers who want to have a wide vision of the field.

自动驾驶|车辆|车道检测等(1篇)

【1】 Anomaly Detection in Multi-Agent Trajectories for Automated Driving 标题:面向自动驾驶的多智能体轨迹异常检测 链接:https://arxiv.org/abs/2110.07922

作者:Julian Wiederer,Arij Bouazizi,Marco Troina,Ulrich Kressel,Vasileios Belagiannis 机构:Mercedes-Benz AG, Ulm University 备注:15 pages incl. supplementary material, 8 figures, 4 tables (accepted by CoRL 2021) 摘要:人类驾驶员可以识别快速异常驾驶情况,以避免发生事故。与人类类似,自动化车辆应该执行异常检测。在这项工作中,我们提出了时空图自动编码器学习正常驾驶行为。我们的创新在于能够联合学习动态多个代理的多个轨迹。为了进行异常检测,我们首先估计学习轨迹特征表示的密度函数,然后在低密度区域检测异常。由于缺乏用于自动驾驶中异常检测的多智能体轨迹数据集,我们介绍了使用驾驶模拟器进行正常和异常操纵的数据集。我们的评估表明,与相关工作相比,我们的方法学习了不同代理之间的关系,并提供了有希望的结果。代码、模拟和数据集可在项目页面上公开获取:https://github.com/againerju/maad_highway. 摘要:Human drivers can recognise fast abnormal driving situations to avoid accidents. Similar to humans, automated vehicles are supposed to perform anomaly detection. In this work, we propose the spatio-temporal graph auto-encoder for learning normal driving behaviours. Our innovation is the ability to jointly learn multiple trajectories of a dynamic number of agents. To perform anomaly detection, we first estimate a density function of the learned trajectory feature representation and then detect anomalies in low-density regions. Due to the lack of multi-agent trajectory datasets for anomaly detection in automated driving, we introduce our dataset using a driving simulator for normal and abnormal manoeuvres. Our evaluations show that our approach learns the relation between different agents and delivers promising results compared to the related works. The code, simulation and the dataset are publicly available on the project page: https://github.com/againerju/maad_highway.

联邦学习|隐私保护|加密(3篇)

【1】 Evaluation of Hyperparameter-Optimization Approaches in an Industrial Federated Learning System 标题:工业联合学习系统中超参数优化方法的评价 链接:https://arxiv.org/abs/2110.08202

作者:Stephanie Holly,Thomas Hiessl,Safoura Rezapour Lakani,Daniel Schall,Clemens Heitzinger,Jana Kemnitz 机构:Siemens Technology and, TU Wien 备注:This paper is accepted at the IDSC this https URL and will be published by Springer. The Version uploaded is before the peer review process. The link to the final version will be updated as soon as the paper is published 摘要:联合学习(FL)将模型训练与直接访问数据的需求分离,并允许组织与行业合作伙伴协作,以达到令人满意的性能水平,而无需共享易受攻击的业务信息。机器学习算法的性能对其超参数的选择非常敏感。在FL环境中,超参数优化带来了新的挑战。在这项工作中,我们研究了FL系统中不同超参数优化方法的影响。为了降低通信成本,这是FL中的一个关键瓶颈,我们研究了一种局部超参数优化方法,与全局超参数优化方法相比,它允许每个客户机都有自己的超参数配置。我们基于网格搜索和贝叶斯优化实现了这些方法,并在使用i.i.d.分区的MNIST数据集和使用非i.i.d.分区的基于物联网(IoT)传感器的工业数据集上评估了算法。 摘要:Federated Learning (FL) decouples model training from the need for direct access to the data and allows organizations to collaborate with industry partners to reach a satisfying level of performance without sharing vulnerable business information. The performance of a machine learning algorithm is highly sensitive to the choice of its hyperparameters. In an FL setting, hyperparameter optimization poses new challenges. In this work, we investigated the impact of different hyperparameter optimization approaches in an FL system. In an effort to reduce communication costs, a critical bottleneck in FL, we investigated a local hyperparameter optimization approach that -- in contrast to a global hyperparameter optimization approach -- allows every client to have its own hyperparameter configuration. We implemented these approaches based on grid search and Bayesian optimization and evaluated the algorithms on the MNIST data set using an i.i.d. partition and on an Internet of Things (IoT) sensor based industrial data set using a non-i.i.d. partition.

【2】 FedMe: Federated Learning via Model Exchange 标题:FedMe:基于模型交换的联合学习 链接:https://arxiv.org/abs/2110.07868

作者:Koji Matsuda,Yuya Sasaki,Chuan Xiao,Makoto Onizuka 机构:Osaka University 摘要:联合学习是一种分布式机器学习方法,其中单个服务器和多个客户机协作构建机器学习模型,而无需在客户机上共享数据集。为了解决联邦学习中的数据异构问题,已经提出了许多方法。现有的解决方案需要由中央服务器调整模型体系结构,但一个主要的技术挑战是,由于中央服务器上缺少本地数据,因此很难调整模型体系结构。在本文中,我们提出了通过模型交换的联合学习(FedMe),它在学习过程中通过自动调整模型体系结构来个性化模型。FedMe的新奇之处在于它的学习过程:客户交换他们的模型以进行模型体系结构调整和模型训练。首先,为了优化本地数据的模型体系结构,客户机通过比较交换的模型并选择产生最佳性能的模型来调整自己的个性化模型。第二,尽管客户的模型架构不同,但客户通过深度交互学习来训练个性化模型和交换模型。我们在三个真实数据集上进行了实验,结果表明,FedMe在自动调整模型结构的同时优于最先进的联邦学习方法。 摘要:Federated learning is a distributed machine learning method in which a single server and multiple clients collaboratively build machine learning models without sharing datasets on clients. Numerous methods have been proposed to cope with the data heterogeneity issue in federated learning. Existing solutions require a model architecture tuned by the central server, yet a major technical challenge is that it is difficult to tune the model architecture due to the absence of local data on the central server. In this paper, we propose Federated learning via Model exchange (FedMe), which personalizes models with automatic model architecture tuning during the learning process. The novelty of FedMe lies in its learning process: clients exchange their models for model architecture tuning and model training. First, to optimize the model architectures for local data, clients tune their own personalized models by comparing to exchanged models and picking the one that yields the best performance. Second, clients train both personalized models and exchanged models by using deep mutual learning, in spite of different model architectures across the clients. We perform experiments on three real datasets and show that FedMe outperforms state-of-the-art federated learning methods while tuning model architectures automatically.

【3】 Distribution-Free Federated Learning with Conformal Predictions 标题:具有共形预测的无分布联邦学习 链接:https://arxiv.org/abs/2110.07661

作者:Charles Lu,Jayasheree Kalpathy-Cramer 机构:Massachusetts General Hospital, Martinos Center for Biomedical Imaging, Charlestown, MA 摘要:联邦学习吸引了医疗保健领域中协作机器学习的大量兴趣,以利用独立的机构数据集,同时维护患者隐私。然而,其他挑战,如校准差和缺乏可解释性,也可能阻碍联邦模型在临床实践中的广泛应用,并导致用户不信任或在高风险临床决策中滥用ML工具。在本文中,我们建议通过将自适应共形框架合并到联合学习中来解决这些挑战,以确保无分布预测集提供覆盖保证和不确定性估计,而无需对模型或假设进行任何额外修改。MedMNIST医学影像基准的实证结果表明,在2D和3D多类分类任务中,我们的联邦方法在6个不同医学影像基准数据集上的局部共形预测中,以较低的平均基数提供了更紧密的覆盖。此外,我们将类熵和预测集大小关联起来,用保形方法评估任务的不确定性。 摘要:Federated learning has attracted considerable interest for collaborative machine learning in healthcare to leverage separate institutional datasets while maintaining patient privacy. However, additional challenges such as poor calibration and lack of interpretability may also hamper widespread deployment of federated models into clinical practice and lead to user distrust or misuse of ML tools in high-stakes clinical decision-making. In this paper, we propose to address these challenges by incorporating an adaptive conformal framework into federated learning to ensure distribution-free prediction sets that provide coverage guarantees and uncertainty estimates without requiring any additional modifications to the model or assumptions. Empirical results on the MedMNIST medical imaging benchmark demonstrate our federated method provide tighter coverage in lower average cardinality over local conformal predictions on 6 different medical imaging benchmark datasets in 2D and 3D multi-class classification tasks. Further, we correlate class entropy and prediction set size to assess task uncertainty with conformal methods.

推理|分析|理解|解释(6篇)

【1】 A Modern Analysis of Aging Machine Learning Based IoT Cybersecurity Methods 标题:基于老龄机器学习的物联网网络安全方法的现代分析 链接:https://arxiv.org/abs/2110.07832

作者:Sam Strecker,Rushit Dave,Nyle Siddiqui,Naeem Seliya 机构:Department of Computer Science, University of Wisconsin – Eau Claire, Eau Claire, US 摘要:现代科学的进步往往有助于引进和完善前所未有的技术。这可能是人类需要维护和监控的任务,因此,我们的社会已经开始依赖机器学习来协助完成这项任务。随着新技术的出现,出现了规避现有网络安全措施的新方法和新方法。本研究考察了目前工业界用于恶意软件和入侵检测的三种不同物联网网络安全算法的有效性:随机森林(RF)、支持向量机(SVM)和K-最近邻(KNN)。每个算法都在Aposemat IoT-23数据集上进行了训练和测试,该数据集于2020年1月发布,最早的捕获时间为2018年,最晚的捕获时间为2019年。RF、SVM和KNN在入侵检测中的峰值准确率分别为92.96%、86.23%和91.48%,在恶意软件检测中的峰值准确率分别为92.27%、83.52%和89.80%。研究发现,在2021年物联网网络安全的当前形势下,所有三种算法都能够得到有效利用。 摘要:Modern scientific advancements often contribute to the introduction and refinement of never-before-seen technologies. This can be quite the task for humans to maintain and monitor and as a result, our society has become reliant on machine learning to assist in this task. With new technology comes new methods and thus new ways to circumvent existing cyber security measures. This study examines the effectiveness of three distinct Internet of Things cyber security algorithms currently used in industry today for malware and intrusion detection: Random Forest (RF), Support-Vector Machine (SVM), and K-Nearest Neighbor (KNN). Each algorithm was trained and tested on the Aposemat IoT-23 dataset which was published in January 2020 with the earliest of captures from 2018 and latest from 2019. The RF, SVM, and KNN reached peak accuracies of 92.96%, 86.23%, and 91.48%, respectively, in intrusion detection and 92.27%, 83.52%, and 89.80% in malware detection. It was found all three algorithms are capable of being effectively utilized for the current landscape of IoT cyber security in 2021.

【2】 NeuroView: Explainable Deep Network Decision Making 标题:NeuroView:可解释的深度网络决策 链接:https://arxiv.org/abs/2110.07778

作者:CJ Barberan,Randall Balestriero,Richard G. Baraniuk 机构:Department of Electrical, and Computer Engineering, Rice University, Houston, TX 备注:12 pages, 7 figures 摘要:深度神经网络(DNs)在许多计算机视觉任务中提供了超人的性能,但仍不清楚DN的哪些单元对特定决策有贡献。NeuroView是一个新的DN体系结构家族,可通过设计进行解释。通过对单位输出值进行矢量量化并将其输入到全局线性分类器中,该家族的每个成员都从标准DN体系结构中派生出来。由此产生的体系结构在每个单元的状态和分类决策之间建立了直接的因果关系。我们在标准数据集和分类任务上验证NeuroView,以显示其单元/类映射如何帮助理解决策过程。 摘要:Deep neural networks (DNs) provide superhuman performance in numerous computer vision tasks, yet it remains unclear exactly which of a DN's units contribute to a particular decision. NeuroView is a new family of DN architectures that are interpretable/explainable by design. Each member of the family is derived from a standard DN architecture by vector quantizing the unit output values and feeding them into a global linear classifier. The resulting architecture establishes a direct, causal link between the state of each unit and the classification decision. We validate NeuroView on standard datasets and classification tasks to show that how its unit/class mapping aids in understanding the decision-making process.

【3】 Towards Understanding the Data Dependency of Mixup-style Training 标题:迈向对混合式训练的数据依赖性的理解 链接:https://arxiv.org/abs/2110.07647

作者:Muthu Chidambaram,Xiang Wang,Yuzheng Hu,Chenwei Wu,Rong Ge 机构:Department of Computer Science, Duke University, School of Mathematical Sciences, Peking University 备注:25 pages, 13 figures 摘要:在混合训练范式中,使用数据点及其相关标签的凸组合训练模型。尽管在训练过程中几乎看不到真实的数据点,但与标准训练相比,使用混合训练的模型似乎仍然能够最小化原始经验风险,并在各种任务上表现出更好的泛化性和鲁棒性。在本文中,我们研究了混合训练的这些好处如何依赖于分类环境中数据的属性。为了最小化原始经验风险,我们计算了混合最优分类的闭合形式,这允许我们构造一个简单的数据集,在该数据集上最小化混合损失可以证明学习一个不最小化数据上的经验损失的分类器。另一方面,我们也给出了混合训练的充分条件,使原始经验风险最小化。对于泛化,我们描述了混合分类器的边界,并使用它来理解为什么与标准训练相比,混合分类器的决策边界能够更好地适应训练数据的完整结构。相比之下,我们还表明,对于一大类线性模型和线性可分离数据集,混合训练导致学习与标准训练相同的分类器。 摘要:In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training. In this paper, we investigate how these benefits of Mixup training rely on properties of the data in the context of classification. For minimizing the original empirical risk, we compute a closed form for the Mixup-optimal classification, which allows us to construct a simple dataset on which minimizing the Mixup loss can provably lead to learning a classifier that does not minimize the empirical loss on the data. On the other hand, we also give sufficient conditions for Mixup training to also minimize the original empirical risk. For generalization, we characterize the margin of a Mixup classifier, and use this to understand why the decision boundary of a Mixup classifier can adapt better to the full structure of the training data when compared to standard training. In contrast, we also show that, for a large class of linear models and linearly separable datasets, Mixup training leads to learning the same classifier as standard training.

【4】 Compressive Independent Component Analysis: Theory and Algorithms 标题:压缩独立分量分析:理论与算法 链接:https://arxiv.org/abs/2110.08045

作者:Michael P. Sheehan,Mike E. Davies 机构:Institute of Digital Communications, University of Edinburgh, Edinburgh, UK 备注:27 pages, 8 figures, under review 摘要:压缩学习形成了压缩感知和统计学习之间令人兴奋的交叉点,人们利用稀疏性和结构形式来降低学习任务的记忆和/或计算复杂性。在本文中,我们通过压缩学习透镜来研究独立分量分析(ICA)模型。特别是,我们证明了基于累积量的ICA模型的解具有特定的结构,从而产生了驻留在累积量张量空间中的低维模型集。通过显示随机累积量(例如高斯集合)的受限等距性质,我们证明了压缩ICA方案的存在性。此后,我们提出了两种压缩ICA的迭代投影梯度(IPG)算法和交替最速下降(ASD)算法,其中通过经验结果实现了从受限等距特性断言的压缩顺序。我们提供了CICA算法的分析,包括有限样本的影响。压缩效应的特点是草图尺寸和ICA估计的统计效率之间的权衡。通过考虑合成数据集和真实数据集,我们展示了通过使用其中一种提出的CICA算法,与著名的ICA算法相比所获得的巨大内存增益。最后,我们总结了论文中的开放性问题,包括来自压缩学习新兴领域的有趣挑战。 摘要:Compressive learning forms the exciting intersection between compressed sensing and statistical learning where one exploits forms of sparsity and structure to reduce the memory and/or computational complexity of the learning task. In this paper, we look at the independent component analysis (ICA) model through the compressive learning lens. In particular, we show that solutions to the cumulant based ICA model have particular structure that induces a low dimensional model set that resides in the cumulant tensor space. By showing a restricted isometry property holds for random cumulants e.g. Gaussian ensembles, we prove the existence of a compressive ICA scheme. Thereafter, we propose two algorithms of the form of an iterative projection gradient (IPG) and an alternating steepest descent (ASD) algorithm for compressive ICA, where the order of compression asserted from the restricted isometry property is realised through empirical results. We provide analysis of the CICA algorithms including the effects of finite samples. The effects of compression are characterised by a trade-off between the sketch size and the statistical efficiency of the ICA estimates. By considering synthetic and real datasets, we show the substantial memory gains achieved over well-known ICA algorithms by using one of the proposed CICA algorithms. Finally, we conclude the paper with open problems including interesting challenges from the emerging field of compressive learning.

【5】 Sparse Implicit Processes for Approximate Inference 标题:近似推理的稀疏隐式过程 链接:https://arxiv.org/abs/2110.07618

作者:Simón Rodríguez Santana,Bryan Zaldivar,Daniel Hernández-Lobato 机构: 1IntroductionApproximate inference has recently attracted an enor-mous interest from the neural networks (NNs) com-munity due to the high performance and popularity 1Institute of Mathematical Sciences (ICMAT-CSIC) 备注:10 pages for the main text (with 3 figures and 1 table), and 9 pages of supplementary material (with 6 figures and 3 tables) 摘要:隐式过程(IPs)是一种灵活的先验知识,可以描述贝叶斯神经网络、神经采样器和数据生成器等模型。IPs允许在函数空间中进行近似推理。这避免了由于参数数量多、依赖性强而导致的参数空间近似推理的退化问题。为此,通常使用额外IP来近似先前IP的后部。然而,同时调整先验知识产权和近似后验知识产权的参数是一项具有挑战性的任务。现有的方法可以调整先验IP,导致高斯预测分布,无法捕获重要的数据模式。相比之下,通过使用另一个IP来近似后验过程来产生灵活的预测分布的方法不能使前一个IP与观测数据相匹配。我们在此提出了一种可以同时执行这两项任务的方法。为此,我们依赖于先前IP的诱导点表示,就像在稀疏高斯过程中经常做的那样。其结果是一种可扩展的近似推理方法,该方法可以根据数据调整先验IP参数,并提供准确的非高斯预测分布。 摘要:Implicit Processes (IPs) are flexible priors that can describe models such as Bayesian neural networks, neural samplers and data generators. IPs allow for approximate inference in function-space. This avoids some degenerate problems of parameter-space approximate inference due to the high number of parameters and strong dependencies. For this, an extra IP is often used to approximate the posterior of the prior IP. However, simultaneously adjusting the parameters of the prior IP and the approximate posterior IP is a challenging task. Existing methods that can tune the prior IP result in a Gaussian predictive distribution, which fails to capture important data patterns. By contrast, methods producing flexible predictive distributions by using another IP to approximate the posterior process cannot fit the prior IP to the observed data. We propose here a method that can carry out both tasks. For this, we rely on an inducing-point representation of the prior IP, as often done in the context of sparse Gaussian processes. The result is a scalable method for approximate inference with IPs that can tune the prior IP parameters to the data, and that provides accurate non-Gaussian predictive distributions.

【6】 High-dimensional Inference for Dynamic Treatment Effects 标题:动态治疗效果的高维推理 链接:https://arxiv.org/abs/2110.04924

作者:Jelena Bradic,Weijie Ji,Yuqian Zhang 机构: WEIJIE JI 2 AND YUQIAN ZHANG 2 1Department of Mathematics and Halicioglu Data Science Institute, University of California San Diego jbradic, edu 2Department of Mathematics, University of California San Diego w6ji 摘要:本文提出了一种基于$N$样本和高维$d$混杂因子的多阶段实验中异质处理效应的置信区间构建方法。我们的重点是$d\gg N$的情况,但得到的结果也适用于低维情况。我们展示了正则化估计的偏差,这在高维协变量空间中是不可避免的,可以通过简单的双稳健分数来缓解。通过这种方式,不需要额外的偏差消除,我们获得了根$N$推断结果,同时允许治疗和协变量的多阶段相互依赖性。也不假设无记忆属性;治疗可能取决于所有先前的治疗任务和所有先前的多阶段混杂因素。我们的结果依赖于基础依赖性的某些稀疏性假设。我们发现了新的产品率条件所必需的鲁棒推理与动态处理。 摘要:This paper proposes a confidence interval construction for heterogeneous treatment effects in the context of multi-stage experiments with $N$ samples and high-dimensional, $d$, confounders. Our focus is on the case of $d\gg N$, but the results obtained also apply to low-dimensional cases. We showcase that the bias of regularized estimation, unavoidable in high-dimensional covariate spaces, is mitigated with a simple double-robust score. In this way, no additional bias removal is necessary, and we obtain root-$N$ inference results while allowing multi-stage interdependency of the treatments and covariates. Memoryless property is also not assumed; treatment can possibly depend on all previous treatment assignments and all previous multi-stage confounders. Our results rely on certain sparsity assumptions of the underlying dependencies. We discover new product rate conditions necessary for robust inference with dynamic treatments.

检测相关(2篇)

【1】 Detecting Modularity in Deep Neural Networks 标题:深度神经网络中的模块性检测 链接:https://arxiv.org/abs/2110.08058

作者:Shlomi Hod,Stephen Casper,Daniel Filan,Cody Wild,Andrew Critch,Stuart Russell 机构:Center for Human-Compatible AI (CHAI), University of California Berkeley, Boston University, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), ∗ Equal contribution 备注:Code is available at this https URL 摘要:神经网络是模块化的,其计算图的部分(即结构)可以表示为执行与总体任务(即功能)相关的一些可理解的子任务。现代深层神经网络是模块化的吗?如何量化这一点?在本文中,我们考虑的问题评估模块化所表现出的网络神经元的分割。我们为此提出了两个代理:重要性,它反映了神经元组对网络性能的重要性;和连贯性,这反映了他们的神经元如何一致地与输入的特征相联系。为了测量这些代理,我们开发了一套统计方法,基于传统上用于解释单个神经元的技术。我们将代理应用于通过对网络神经元的图形表示进行频谱聚类而生成的划分,其边缘由网络权重或激活相关性确定。我们表明,这些划分,甚至是仅基于权重的划分(即严格地从非运行时分析),揭示了重要且连贯的神经元组。这些结果表明,基于图的划分可以揭示模块性,并帮助我们理解深层神经网络的功能。 摘要:A neural network is modular to the extent that parts of its computational graph (i.e. structure) can be represented as performing some comprehensible subtask relevant to the overall task (i.e. functionality). Are modern deep neural networks modular? How can this be quantified? In this paper, we consider the problem of assessing the modularity exhibited by a partitioning of a network's neurons. We propose two proxies for this: importance, which reflects how crucial sets of neurons are to network performance; and coherence, which reflects how consistently their neurons associate with features of the inputs. To measure these proxies, we develop a set of statistical methods based on techniques conventionally used to interpret individual neurons. We apply the proxies to partitionings generated by spectrally clustering a graph representation of the network's neurons with edges determined either by network weights or correlations of activations. We show that these partitionings, even ones based only on weights (i.e. strictly from non-runtime analysis), reveal groups of neurons that are important and coherent. These results suggest that graph-based partitioning can reveal modularity and help us understand how deep neural networks function.

【2】 A Survey of Machine Learning Algorithms for Detecting Ransomware Encryption Activity 标题:勒索软件加密活动检测的机器学习算法综述 链接:https://arxiv.org/abs/2110.07636

作者:Erik Larsen,David Noever,Korey MacVittie 机构:PeopleTec, Inc., Corporate Dr., Huntsville, AL 备注:9 pages, 8 figures, 3 tables 摘要:介绍了一种用于检测勒索软件的机器学习技术。这项工作建立在Taylor等人的努力基础上,他们使用基于传感器的方法,利用从内置仪器(如CPU电源和温度监视器)收集的数据来识别加密活动。探索性数据分析(EDA)显示,从模拟数据中最有用的特性是时钟速度、温度和CPU负载。这些特征用于训练多种算法以确定最佳检测方法。使用准确性、F1分数和假阴性率指标评估绩效。具有三个隐藏层的多层感知器在准确性、F1和鲁棒性数据准备方面达到97%的分数。一个随机森林模型产生93%的准确率和92%的F1分数,表明基于传感器的检测目前是一个可行的选择,可以在代码完全执行之前检测零天勒索软件攻击。 摘要:A survey of machine learning techniques trained to detect ransomware is presented. This work builds upon the efforts of Taylor et al. in using sensor-based methods that utilize data collected from built-in instruments like CPU power and temperature monitors to identify encryption activity. Exploratory data analysis (EDA) shows the features most useful from this simulated data are clock speed, temperature, and CPU load. These features are used in training multiple algorithms to determine an optimal detection approach. Performance is evaluated with accuracy, F1 score, and false-negative rate metrics. The Multilayer Perceptron with three hidden layers achieves scores of 97% in accuracy and F1 and robust data preparation. A random forest model produces scores of 93% accuracy and 92% F1, showing that sensor-based detection is currently a viable option to detect even zero-day ransomware attacks before the code fully executes.

分类|识别(2篇)

【1】 Exposing Query Identification for Search Transparency 标题:公开查询标识以提高搜索透明度 链接:https://arxiv.org/abs/2110.07701

作者:Ruohan Li,Jianxiang Li,Bhaskar Mitra,Fernando Diaz,Asia J. Biega 机构:Carnegie Mellon University, Microsoft, United States, Microsoft, University College London, Canada, Microsoft Research, Max Planck Institute, for Security and Privacy, Germany 摘要:搜索系统控制对搜索者公开排名的内容。在许多情况下,创作者不仅重视内容的曝光,而且还重视对内容出现的特定搜索的理解。识别哪些查询在排名结果中公开给定的内容是一个重要的问题,但搜索透明度方面的挑战相对较少。公开查询有助于量化搜索偏差、隐私、数据保护、安全性和搜索引擎优化等各种问题。在给定系统中准确识别公开查询的计算代价很高,特别是在动态上下文中,如web搜索。为了寻求一个更轻量级的解决方案,我们通过改变查询和文档在两类搜索系统(密集双编码器模型和传统BM25模型)中的作用,探讨了近似公开查询标识(EQI)作为检索任务的可行性。然后,我们提出了如何通过检索嵌入空间上的度量学习来改进这种方法。我们进一步推导了一个评估指标来衡量公开查询排名的质量,并针对近似EQI的各个实际方面进行了实证分析。 摘要:Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Exposing queries are useful for quantifying various issues of search bias, privacy, data protection, security, and search engine optimization. Exact identification of exposing queries in a given system is computationally expensive, especially in dynamic contexts such as web search. In quest of a more lightweight solution, we explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems: dense dual-encoder models and traditional BM25 models. We then propose how this approach can be improved through metric learning over the retrieval embedding space. We further derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.

【2】 Causal Identification with Additive Noise Models: Quantifying the Effect of Noise 标题:利用加性噪声模型进行因果识别:量化噪声的影响 链接:https://arxiv.org/abs/2110.08087

作者:Benjamin Kap,Marharyta Aleksandrova,Thomas Engel 机构:University of Luxembourg, avenue de l’Universit´e, L-, Esch-sur-Alzette, Luxembourg 备注:Presented at 10\`emes Journ\'ees Francophones sur les R\'eseaux Bay\'esiens et les Mod\`eles Graphiques Probabilistes (JFRB-2021), this https URL 摘要:近年来,在因果推理和因果学习领域进行了大量的研究。已经开发了许多方法来识别模型中的因果对,并成功地应用于实际观测数据,以确定因果关系的方向。然而,在双变量情况下,因果发现问题仍然具有挑战性。其中一类方法基于加性噪声模型(ANM),也可以处理双变量情况。不幸的是,到目前为止,这些方法的一个方面还没有得到太多的关注:不同的噪声水平对这些方法识别因果关系方向的能力有什么影响。这项工作旨在通过实证研究弥补这一差距。我们通过随后的独立性测试(RESIT)测试回归,使用一系列模型,其中加性噪声水平从原因噪声水平的1%逐渐变化到10000%(后者保持不变)。此外,在这项工作中的实验考虑几种不同类型的分布以及线性和非线性模型。实验结果表明,ANMs方法无法捕捉某些噪声水平的真正原因方向。 摘要:In recent years, a lot of research has been conducted within the area of causal inference and causal learning. Many methods have been developed to identify the cause-effect pairs in models and have been successfully applied to observational real-world data to determine the direction of causal relationships. Yet in bivariate situations, causal discovery problems remain challenging. One class of such methods, that also allows tackling the bivariate case, is based on Additive Noise Models (ANMs). Unfortunately, one aspect of these methods has not received much attention until now: what is the impact of different noise levels on the ability of these methods to identify the direction of the causal relationship. This work aims to bridge this gap with the help of an empirical study. We test Regression with Subsequent Independence Test (RESIT) using an exhaustive range of models where the level of additive noise gradually changes from 1\% to 10000\% of the causes' noise level (the latter remains fixed). Additionally, the experiments in this work consider several different types of distributions as well as linear and non-linear models. The results of the experiments show that ANMs methods can fail to capture the true causal direction for some levels of noise.

表征(2篇)

【1】 Shared Visual Representations of Drawing for Communication: How do different biases affect human interpretability and intent? 标题:交流绘画的共享视觉表征:不同的偏见如何影响人类的解释力和意图? 链接:https://arxiv.org/abs/2110.08203

作者:Daniela Mihai,Jonathon Hare 机构:Electronics and Computer Science, The University of Southampton, Southampton, UK 摘要:我们提出了一项调查,调查如何代表性损失可以影响绘画制作的人工代理玩通信游戏。基于最近的进展,我们表明,强大的预训练编码器网络与适当的感应偏差相结合,可以使代理绘制可识别的草图,同时仍能进行良好的通信。此外,我们开始开发一种方法来帮助自动分析草图所传达的语义内容,并证明,尽管agent训练是自我监督的,但当前诱导感知偏差的方法导致了对象性是一个关键特征的概念。 摘要:We present an investigation into how representational losses can affect the drawings produced by artificial agents playing a communication game. Building upon recent advances, we show that a combination of powerful pretrained encoder networks, with appropriate inductive biases, can lead to agents that draw recognisable sketches, whilst still communicating well. Further, we start to develop an approach to help automatically analyse the semantic content being conveyed by a sketch and demonstrate that current approaches to inducing perceptual biases lead to a notion of objectness being a key feature despite the agent training being self-supervised.

【2】 Gait-based Frailty Assessment using Image Representation of IMU Signals and Deep CNN 标题:基于IMU信号图像表示和深层CNN的步态脆弱性评估 链接:https://arxiv.org/abs/2110.07821

作者:Muhammad Zeeshan Arshad,Dawoon Jung,Mina Park,Hyungeun Shin,Jinwook Kim,Kyung-Ryoul Mun 机构:©, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media 备注:Accepted in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2021) 摘要:虚弱是老年人常见且严重的疾病,可能导致健康状况进一步恶化。然而,基于活动相关问卷的传统脆弱性评估存在困难和复杂性。这些都可以通过监测步态虚弱的影响来克服。本文表明,通过将步态信号编码为图像,基于深度学习的模型可以用于步态类型的分类。提出了两种深度学习模型(a)基于单步幅输入图像的SS-CNN和(b)基于3个连续步幅的MS-CNN。结果表明,MS-CNN的准确率为85.1%,而SS-CNN的准确率为77.3%。这是因为MS-CNN可以观察到更多与步幅变化相对应的特征,步幅变化是虚弱的关键症状之一。步态信号用STFT、CWT和GAF编码为图像。虽然使用GAF图像的MS-CNN模型取得了最佳的整体准确度和精确度,但CWT的召回率略高。这项研究展示了如何利用图像编码的步态数据来充分利用深度学习CNN模型评估脆弱性的潜力。 摘要:Frailty is a common and critical condition in elderly adults, which may lead to further deterioration of health. However, difficulties and complexities exist in traditional frailty assessments based on activity-related questionnaires. These can be overcome by monitoring the effects of frailty on the gait. In this paper, it is shown that by encoding gait signals as images, deep learning-based models can be utilized for the classification of gait type. Two deep learning models (a) SS-CNN, based on single stride input images, and (b) MS-CNN, based on 3 consecutive strides were proposed. It was shown that MS-CNN performs best with an accuracy of 85.1\%, while SS-CNN achieved an accuracy of 77.3\%. This is because MS-CNN can observe more features corresponding to stride-to-stride variations which is one of the key symptoms of frailty. Gait signals were encoded as images using STFT, CWT, and GAF. While the MS-CNN model using GAF images achieved the best overall accuracy and precision, CWT has a slightly better recall. This study demonstrates how image encoded gait data can be used to exploit the full potential of deep learning CNN models for the assessment of frailty.

优化|敛散性(4篇)

【1】 Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization 标题:无限宽神经网络的梯度下降:全局收敛性和泛化 链接:https://arxiv.org/abs/2110.08084

作者:Francis Bach,Lenaïc Chizat 机构:Inria & Ecole Normale Sup´erieure, PSL Research University, L´ena¨ıc Chizat, Ecole Polytechnique F´ed´erale de Lausanne 摘要:许多有监督的机器学习方法自然地被转化为优化问题。对于参数为线性的预测模型,这通常会导致存在许多数学保证的凸问题。参数非线性的模型(如神经网络)会导致难以获得保证的非凸优化问题。在这篇评论文章中,我们考虑两层神经网络的均匀激活函数,隐藏神经元的数量趋于无穷大,并显示如何定性收敛保证可以得到。 摘要:Many supervised machine learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many mathematical guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this review paper, we consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived.

【2】 Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits 标题:批量线性上下文Bitts的几乎最优批量-后悔权衡 链接:https://arxiv.org/abs/2110.08057

作者:Zihan Zhang,Xiangyang Ji,Yuan Zhou 机构:Tsinghua University 摘要:我们研究了批量线性关联交易的最优批量后悔权衡问题。对于任何批次号$M$、操作数$K$、时间范围$T$和维度$d$,我们提供了一种算法并证明了其遗憾保证,由于技术原因,随着时间范围$T$的增长,该算法具有一个两阶段表达式。我们还证明了一个下界定理,该定理令人惊讶地显示了在问题参数的{full range}范围内,我们的两相后悔上界(最多对数因子)的最优性,因此建立了精确的批次后悔权衡。与最近的工作\citep{ruan2020linear}相比,该工作表明$M=O(\log\log T)$batches足以在没有批次约束的情况下实现渐近极大极小最优遗憾,我们的算法更简单,更易于实际实现。此外,我们的算法实现了所有$T\geq d$的最佳遗憾,而\citep{ruan2020linear}要求$T$大于一个不现实的大多项式$d$。在我们的分析中,我们还证明了一个新的依赖于其动态上界的矩阵集中不等式,据我们所知,这是文献中第一个此类不等式,可能具有独立的意义。 摘要:We study the optimal batch-regret tradeoff for batch linear contextual bandits. For any batch number $M$, number of actions $K$, time horizon $T$, and dimension $d$, we provide an algorithm and prove its regret guarantee, which, due to technical reasons, features a two-phase expression as the time horizon $T$ grows. We also prove a lower bound theorem that surprisingly shows the optimality of our two-phase regret upper bound (up to logarithmic factors) in the \emph{full range} of the problem parameters, therefore establishing the exact batch-regret tradeoff. Compared to the recent work \citep{ruan2020linear} which showed that $M = O(\log \log T)$ batches suffice to achieve the asymptotically minimax-optimal regret without the batch constraints, our algorithm is simpler and easier for practical implementation. Furthermore, our algorithm achieves the optimal regret for all $T \geq d$, while \citep{ruan2020linear} requires that $T$ greater than an unrealistically large polynomial of $d$. Along our analysis, we also prove a new matrix concentration inequality with dependence on their dynamic upper bounds, which, to the best of our knowledge, is the first of its kind in literature and maybe of independent interest.

【3】 Improving Hyperparameter Optimization by Planning Ahead 标题:提前规划提高超参数优化水平 链接:https://arxiv.org/abs/2110.08028

作者:Hadi S. Jomaa,Jonas Falkner,Lars Schmidt-Thieme 机构:Department of Computer Science, University of Hildesheim, Hildesheim, Germany 摘要:超参数优化(HPO)通常被视为一个双层优化问题,涉及将(概率)替代模型拟合到一组观察到的超参数响应,例如验证损失,因此,使用代理模型最大化获取函数,以识别用于评估的良好超参数候选者。通过跨相关任务的知识转移,可以进一步改进替代和/或获取功能的选择。在本文中,我们提出了一种新的迁移学习方法,定义在基于模型的强化学习中,其中我们将替代项表示为允许轨迹采样的概率模型集合。我们进一步提出了一种新的模型预测控制变体,它采用一种简单的前瞻策略作为策略,优化一系列动作,表示超参数候选,以加快HPO。我们在三个元数据集上的实验与最先进的HPO算法(包括无模型强化学习方法)进行了比较,结果表明,通过利用一个简单的基于规划的策略,所提出的方法可以优于所有基线。 摘要:Hyperparameter optimization (HPO) is generally treated as a bi-level optimization problem that involves fitting a (probabilistic) surrogate model to a set of observed hyperparameter responses, e.g. validation loss, and consequently maximizing an acquisition function using a surrogate model to identify good hyperparameter candidates for evaluation. The choice of a surrogate and/or acquisition function can be further improved via knowledge transfer across related tasks. In this paper, we propose a novel transfer learning approach, defined within the context of model-based reinforcement learning, where we represent the surrogate as an ensemble of probabilistic models that allows trajectory sampling. We further propose a new variant of model predictive control which employs a simple look-ahead strategy as a policy that optimizes a sequence of actions, representing hyperparameter candidates to expedite HPO. Our experiments on three meta-datasets comparing to state-of-the-art HPO algorithms including a model-free reinforcement learning approach show that the proposed method can outperform all baselines by exploiting a simple planning-based policy.

【4】 Gaussian Process Bandit Optimization with Few Batches 标题:少批次高斯过程的Bandit优化 链接:https://arxiv.org/abs/2110.07788

作者:Zihan Li,Jonathan Scarlett 机构:The authors are with the Department of Computer Science, School of Computing, National University of Singapore (NUS), Jonathan Scarlett is also with the Department of Mathematics and the Institute of Data Science 摘要:在本文中,我们考虑的问题,使用高斯过程(GP)Butd-优化与少量批次的黑盒优化。假设未知函数在再生核Hilbert空间(RKHS)中有一个低范数,我们引入了一个受批处理有限arm bandit算法启发的批处理算法,并证明它在时间范围$T$内使用$O(\log\log T)$批处理$O^\ast(\sqrt{T\gamma T})实现了累积后悔上界$O^\ast(\sqrt{T\gamma T})$,其中$O^\ast(\cdot)$notation隐藏与维度无关的对数因子,$\gamma\u T$是与内核相关的最大信息增益。这个界限对于几个感兴趣的内核来说是接近最优的,并且改进了典型的$O^\ast(\sqrt{T}\gamma_T)$界限,我们的方法可以说是实现这个改进的算法中最简单的。此外,在批次数量不变(不取决于$T$)的情况下,我们提出了我们的算法的改进版本,并描述了批次数量对遗憾的影响,重点是平方指数核和Matern核。通过与算法无关的类似下界,证明了算法的上界是近似极大极小最优的。 摘要:In this paper, we consider the problem of black-box optimization using Gaussian Process (GP) bandit optimization with a small number of batches. Assuming the unknown function has a low norm in the Reproducing Kernel Hilbert Space (RKHS), we introduce a batch algorithm inspired by batched finite-arm bandit algorithms, and show that it achieves the cumulative regret upper bound $O^\ast(\sqrt{T\gamma_T})$ using $O(\log\log T)$ batches within time horizon $T$, where the $O^\ast(\cdot)$ notation hides dimension-independent logarithmic factors and $\gamma_T$ is the maximum information gain associated with the kernel. This bound is near-optimal for several kernels of interest and improves on the typical $O^\ast(\sqrt{T}\gamma_T)$ bound, and our approach is arguably the simplest among algorithms attaining this improvement. In addition, in the case of a constant number of batches (not depending on $T$), we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches, focusing on the squared exponential and Mat\'ern kernels. The algorithmic upper bounds are shown to be nearly minimax optimal via analogous algorithm-independent lower bounds.

预测|估计(4篇)

【1】 An Artificial Neural Network-Based Model Predictive Control for Three-phase Flying Capacitor Multi-Level Inverter 标题:基于人工神经网络的三相飞电容多电平逆变器模型预测控制 链接:https://arxiv.org/abs/2110.08101

作者:Parisa Boodaghi Malidarreh,Abualkasim Bakeer,Ihab S. Mohamed,Lantao Liu 机构:∗Dep. of Power Electronics and Elect. Machines, Iran University of Science and Technology (IUST), Iran, †Dep. of Elect. Engineering, Aswan University, Aswan , Egypt 备注:10 pages, 16 figures, 5 tables 摘要:模型预测控制(MPC)以其概念简单、动态响应快、参考跟踪能力强等优点在电力电子领域得到了广泛的应用。然而,由于它直接依赖于系统的数学模型来预测下一次采样时使用的最佳开关状态,因此它受到参数不确定性的影响。因此,不确定的参数会导致MPC设计不当。因此,本文提出了一种基于人工神经网络(ANN)的无模型控制策略,以减轻参数失配的影响,同时对逆变器的性能产生较小的负面影响。该方法包括两个相关的阶段。首先,利用MPC作为专家对所研究的变换器进行控制,以提供训练数据;在第二阶段,利用获得的数据集对所提出的神经网络进行训练,该神经网络将直接用于逆变器的控制,而不需要系统的数学模型。本文的案例研究基于一个四电平三单元飞行电容逆变器。在本研究中,考虑到各种运行条件,使用MATLAB/Simulink对所提出的控制策略的性能进行了仿真。随后,将仿真结果与传统的MPC方案进行了比较,证明了所提出的控制策略在获得较低的总谐波失真(THD)和对参数失配的鲁棒性方面具有优越的性能,尤其是在系统参数发生变化时。 摘要:Model predictive control (MPC) has been used widely in power electronics due to its simple concept, fast dynamic response, and good reference tracking. However, it suffers from parametric uncertainties, since it directly relies on the mathematical model of the system to predict the optimal switching states to be used at the next sampling time. As a result, uncertain parameters lead to an ill-designed MPC. Thus, this paper offers a model-free control strategy on the basis of artificial neural networks (ANNs), for mitigating the effects of parameter mismatching while having a little negative impact on the inverter's performance. This method includes two related stages. First, MPC is used as an expert to control the studied converter in order to provide the training data; while, in the second stage, the obtained dataset is utilized to train the proposed ANN which will be used directly to control the inverter without the requirement for the mathematical model of the system. The case study herein is based on a four-level three-cell flying capacitor inverter. In this study, MATLAB/Simulink is used to simulate the performance of the proposed control strategy, taking into account various operating conditions. Afterward, the simulation results are reported in comparison with the conventional MPC scheme, demonstrating the superior performance of the proposed control strategy in terms of getting low total harmonic distortion (THD) and the robustness against parameters mismatch, especially when changes occur in the system parameters.

【2】 Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation 标题:利用时空相关性进行稀疏均值估计 链接:https://arxiv.org/abs/2110.07751

作者:Divyansh Jhunjhunwala,Ankur Mallick,Advait Gadhikar,Swanand Kadhe,Gauri Joshi 机构:Carnegie Mellon University, Pittsburgh, PA , University of California Berkeley, Berkeley, CA 备注:Accepted to NeurIPS 2021 摘要:我们研究在中央服务器上估计分布在多个节点上的一组向量的平均值(每个节点一个向量)的问题。当向量是高维的时,发送整个向量的通信成本可能是禁止的,并且它们可能必须使用稀疏化技术。虽然大多数关于稀疏平均估计的现有工作与数据向量的特性无关,但在许多实际应用中,如联合学习,可能存在空间相关性(不同节点发送的向量的相似性)或时间相关性(单个节点在算法的不同迭代中发送的数据的相似性)在数据向量中。我们通过简单地修改服务器用于估计平均值的解码方法来利用这些相关性。我们提供了对产生的估计误差的分析以及PCA、K-均值和逻辑回归的实验,这表明我们的估计值始终优于更复杂和昂贵的SPAR分类方法。 摘要:We study the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node). When the vectors are high-dimensional, the communication cost of sending entire vectors may be prohibitive, and it may be imperative for them to use sparsification techniques. While most existing work on sparsified mean estimation is agnostic to the characteristics of the data vectors, in many practical applications such as federated learning, there may be spatial correlations (similarities in the vectors sent by different nodes) or temporal correlations (similarities in the data sent by a single node over different iterations of the algorithm) in the data vectors. We leverage these correlations by simply modifying the decoding method used by the server to estimate the mean. We provide an analysis of the resulting estimation error as well as experiments for PCA, K-Means and Logistic Regression, which show that our estimators consistently outperform more sophisticated and expensive sparsification methods.

【3】 Predicting Solar Flares with Remote Sensing and Machine Learning 标题:基于遥感和机器学习的太阳耀斑预报 链接:https://arxiv.org/abs/2110.07658

作者:Erik Larsen 机构:PeopleTec, Inc., Corporate Dr., Huntsville, AL 备注:16 pages, 10 figures, 3 tables 摘要:高能太阳耀斑和日冕物质抛射有可能摧毁地球的地面和卫星基础设施,造成数万亿美元的损失和巨大的人类痛苦。摧毁这些关键系统将使电网和卫星瘫痪,使通信和运输瘫痪。这将导致粮食短缺和无法应对紧急情况。本文提出了一种解决这一迫在眉睫问题的方法,即利用太阳轨道上的卫星持续监测太阳,利用人工智能和机器学习从这些感知数据计算大规模太阳爆炸的概率,然后通过信号防御机制缓解威胁。随着现代技术的发展,可能只有在足够的警告下才能实施保障措施,这就是为什么必须用现有和新的数据确定并持续训练最佳算法,以最大限度地提高真阳性率,同时最大限度地减少假阴性。本文使用开源的太阳耀斑预测数据对当前的机器学习模型进行了调查。边缘计算的兴起使得机器学习硬件可以与传感器阵列放在同一颗卫星上,不必跨空间远距离传输遥感数据,从而节省了关键时间。系统体系的方法将允许为安全措施提供足够的警告,以降低灾难风险。 摘要:High energy solar flares and coronal mass ejections have the potential to destroy Earth's ground and satellite infrastructures, causing trillions of dollars in damage and mass human suffering. Destruction of these critical systems would disable power grids and satellites, crippling communications and transportation. This would lead to food shortages and an inability to respond to emergencies. A solution to this impending problem is proposed herein using satellites in solar orbit that continuously monitor the Sun, use artificial intelligence and machine learning to calculate the probability of massive solar explosions from this sensed data, and then signal defense mechanisms that will mitigate the threat. With modern technology there may be only safeguards that can be implemented with enough warning, which is why the best algorithm must be identified and continuously trained with existing and new data to maximize true positive rates while minimizing false negatives. This paper conducts a survey of current machine learning models using open source solar flare prediction data. The rise of edge computing allows machine learning hardware to be placed on the same satellites as the sensor arrays, saving critical time by not having to transmit remote sensing data across the vast distances of space. A system of systems approach will allow enough warning for safety measures to be put into place mitigating the risk of disaster.

【4】 BayesAoA: A Bayesian method for Computation Efficient Angle of Arrival Estimation 标题:BayesAoA:一种计算有效到达角的贝叶斯方法 链接:https://arxiv.org/abs/2110.07992

作者:Akshay Sharma,Nancy Nayak,Sheetal Kalyani 机构: equal contribution has been made by the authorsThe authors are with the Department of Electrical Engineering, IndianInstitute of Technology Madras 摘要:到达角(AoA)估计在现代通信系统中具有重要意义。传统的基于最大似然的迭代算法对初始化敏感,不能在线使用。我们提出了一种贝叶斯方法来寻找对初始化不敏感的AoA。与传统的基于深度学习的方法相比,该方法复杂度低,所需计算资源少。它比蛮力方法具有更快的收敛速度。此外,提出了一种套期类型的解决方案,该解决方案有助于在线部署该方法,以处理接收机中的信道噪声和天线配置随时间变化的情况。所提出的方法在噪声方差为$10^{-6}$的信道中达到$92\%$的精度,蛮力方法的计算为$19.3\%$。 摘要:The angle of Arrival (AoA) estimation is of great interest in modern communication systems. Traditional maximum likelihood-based iterative algorithms are sensitive to initialization and cannot be used online. We propose a Bayesian method to find AoA that is insensitive towards initialization. The proposed method is less complex and needs fewer computing resources than traditional deep learning-based methods. It has a faster convergence than the brute-force methods. Further, a Hedge type solution is proposed that helps to deploy the method online to handle the situations where the channel noise and antenna configuration in the receiver change over time. The proposed method achieves $92\%$ accuracy in a channel of noise variance $10^{-6}$ with $19.3\%$ of the brute-force method's computation.

其他神经网络|深度学习|模型|建模(19篇)

【1】 Learn Proportional Derivative Controllable Latent Space from Pixels 标题:从像素学习比例导数可控潜在空间 链接:https://arxiv.org/abs/2110.08239

作者:Weiyao Wang,Marin Kobilarov,Gregory D. Hager 摘要:基于像素的潜在空间动力学模型的最新进展显示了基于视觉的模型预测控制(MPC)的良好进展。然而,由于MPC在每一个时间步中都需要大量的计算量,因此实时执行MPC是一项挑战。我们建议引入额外的学习目标,以确保学习的潜在空间是比例微分可控的。在执行时间上,简单PD控制器可以直接应用于由像素编码的潜在空间,从而对具有视觉观测的系统产生简单而有效的控制。我们表明,在各种环境下,我们的方法优于基线方法,能够产生鲁棒的目标到达和轨迹跟踪。 摘要:Recent advances in latent space dynamics model from pixels show promising progress in vision-based model predictive control (MPC). However, executing MPC in real time can be challenging due to its intensive computational cost in each timestep. We propose to introduce additional learning objectives to enforce that the learned latent space is proportional derivative controllable. In execution time, the simple PD-controller can be applied directly to the latent space encoded from pixels, to produce simple and effective control to systems with visual observations. We show that our method outperforms baseline methods to produce robust goal reaching and trajectory tracking in various environments.

【2】 NNK-Means: Dictionary Learning using Non-Negative Kernel regression 标题:NNK-Means:基于非负核回归的词典学习 链接:https://arxiv.org/abs/2110.08212

作者:Sarath Shekkizhar,Antonio Ortega 机构:University of Southern California, Los Angeles 备注:Under review at ICASSP 摘要:通过首先收集大量数据,然后直接使用获得的数据优化系统参数,正在设计越来越多的系统。这通常是在不分析数据集结构的情况下完成的。随着任务复杂性、数据大小和参数都增加到数百万甚至数十亿,数据摘要正成为一项重大挑战。在这项工作中,我们利用最近引入的非负核回归(NNK)图的性质,通过字典学习研究数据摘要。与KSVD等竞争技术不同,我们提出的NNK方法使用位于输入数据空间的原子学习几何词典。实验表明,与kMeans和kSVD的线性版本和内核版本相比,使用NNK Meanscan的摘要提供了更好的辨别能力。此外,NNK-Means具有可伸缩的实现,其运行时复杂性与kMeans类似。 摘要:An increasing number of systems are being designed by first gathering significant amounts of data, and then optimizing the system parameters directly using the obtained data. Often this is done without analyzing the dataset structure. As task complexity, data size, and parameters all increase to millions or even billions, data summarization is becoming a major challenge. In this work, we investigate data summarization via dictionary learning, leveraging the properties of recently introduced non-negative kernel regression (NNK) graphs. Our proposed NNK-Means, unlike competing techniques, such askSVD, learns geometric dictionaries with atoms that lie in the input data space. Experiments show that summaries using NNK-Meanscan provide better discrimination compared to linear and kernel versions of kMeans and kSVD. Moreover, NNK-Means has a scalable implementation, with runtime complexity similar to that of kMeans.

【3】 Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings 标题:基于Frank-Wolfe的可解释神经网络:稀疏关联图和关联排序 链接:https://arxiv.org/abs/2110.08105

作者:Jan Macdonald,Mathieu Besançon,Sebastian Pokutta 机构:Institut für Mathematik, Technische Universität Berlin, Department for AI in Society, Science, and Technology, Zuse Institut Berlin 备注:16 pages, 19 figures, 1 table 摘要:我们研究了约束优化公式和Frank-Wolfe算法对获得可解释的神经网络预测的影响。将相关属性的率失真解释(RDE)方法重新表述为一个约束优化问题,可以精确控制相关映射的稀疏性。这使得一个新的多速率以及相关排序变量的RDE在经验上都优于标准的RDE在一个完善的比较测试。我们展示了Frank-Wolfe算法的几个确定性和随机变量及其对RDE的有效性。 摘要:We study the effects of constrained optimization formulations and Frank-Wolfe algorithms for obtaining interpretable neural network predictions. Reformulating the Rate-Distortion Explanations (RDE) method for relevance attribution as a constrained optimization problem provides precise control over the sparsity of relevance maps. This enables a novel multi-rate as well as a relevance-ordering variant of RDE that both empirically outperform standard RDE in a well-established comparison test. We showcase several deterministic and stochastic variants of the Frank-Wolfe algorithm and their effectiveness for RDE.

【4】 Equivariant and Invariant Reynolds Networks 标题:等变与不变雷诺网络 链接:https://arxiv.org/abs/2110.08092

作者:Akiyoshi Sannai,Makoto Kawano,Wataru Kumagai 机构:Tokyo, Japan, The University of Tokyo, RIKEN AIP 备注:15 pages, 4 figures 摘要:不变和等变网络在学习具有对称性的数据(包括图像、集合、点云和图形)时非常有用。在本文中,我们考虑不变和等变网络的对称性的有限群。各种研究人员利用雷诺算子构造了不变网络和等变网络。然而,当群的阶数较大时,Reynolds算子的计算成本较高,因为它们使用整个群的和,这给实现带来了困难。为了克服这个困难,我们考虑将雷诺兹算子表示为子集上的和而不是整个群上的和。我们称这样的子集为雷诺设计,由雷诺设计之和定义的算符为约化雷诺算符。例如,在具有$n$节点的图的情况下,约化Reynolds算子的计算复杂度降低为$O(n^2)$,而Reynolds算子的计算复杂度为$O(n!)$。我们构造了基于约化雷诺算子的学习模型,称为等变和不变雷诺网络(ReyNets),并证明了它们具有普适逼近性。等变雷诺数的雷诺数设计是从杨氏图的组合观测中推导出来的,而不变雷诺数的雷诺数设计是从不变多项式集合上定义的称为雷诺维数的不变量中推导出来的。数值实验表明,我们的模型的性能与最先进的方法相当。 摘要:Invariant and equivariant networks are useful in learning data with symmetry, including images, sets, point clouds, and graphs. In this paper, we consider invariant and equivariant networks for symmetries of finite groups. Invariant and equivariant networks have been constructed by various researchers using Reynolds operators. However, Reynolds operators are computationally expensive when the order of the group is large because they use the sum over the whole group, which poses an implementation difficulty. To overcome this difficulty, we consider representing the Reynolds operator as a sum over a subset instead of a sum over the whole group. We call such a subset a Reynolds design, and an operator defined by a sum over a Reynolds design a reductive Reynolds operator. For example, in the case of a graph with $n$ nodes, the computational complexity of the reductive Reynolds operator is reduced to $O(n^2)$, while the computational complexity of the Reynolds operator is $O(n!)$. We construct learning models based on the reductive Reynolds operator called equivariant and invariant Reynolds networks (ReyNets) and prove that they have universal approximation property. Reynolds designs for equivariant ReyNets are derived from combinatorial observations with Young diagrams, while Reynolds designs for invariant ReyNets are derived from invariants called Reynolds dimensions defined on the set of invariant polynomials. Numerical experiments show that the performance of our models is comparable to state-of-the-art methods.

【5】 MaGNET: Uniform Sampling from Deep Generative Network Manifolds Without Retraining 标题:磁铁:无需再训练的深度生成网络流形的均匀采样 链接:https://arxiv.org/abs/2110.08009

作者:Ahmed Imtiaz Humayun,Randall Balestriero,Richard Baraniuk 机构:Rice University 备注:13 pages, 14 pages Appendix, 23 figures 摘要:深度生成网络(DGN)广泛应用于生成性对抗网络(GAN)、变分自动编码器(VAE)及其变体中,以近似数据流形,以及该流形上的数据分布。然而,训练样本通常是基于经验数据分布中的偏好、成本或便利性人工制品获得的,例如,CelebA数据集中的大部分笑脸或FFHQ中的大部分黑发个体。当从经过训练的DGN中进行采样时,这些不一致性将重现,这对公平性、数据增强、异常检测、域自适应等具有深远的潜在影响。作为回应,我们开发了一种基于微分几何的取样器——人造磁铁——在给定任何经过训练的DGN的情况下,该取样器可以产生均匀分布在学习流形上的样本。我们从理论和经验上证明,无论训练集分布如何,我们的技术都能在流形上产生一致的分布。我们在各种数据集和DGN上进行了一系列实验。其中一个考虑了在FFHQ数据集上训练的最先进的StyleGAN2,通过磁铁的均匀采样可将分布精度和召回率提高4.1%和3.0%,并将性别偏见降低41.2%,而无需标签或再训练。 摘要:Deep Generative Networks (DGNs) are extensively employed in Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and their variants to approximate the data manifold, and data distribution on that manifold. However, training samples are often obtained based on preferences, costs, or convenience producing artifacts in the empirical data distribution e.g., the large fraction of smiling faces in the CelebA dataset or the large fraction of dark-haired individuals in FFHQ. These inconsistencies will be reproduced when sampling from the trained DGN, which has far-reaching potential implications for fairness, data augmentation, anomaly detection, domain adaptation, and beyond. In response, we develop a differential geometry based sampler -- coined MaGNET -- that, given any trained DGN, produces samples that are uniformly distributed on the learned manifold. We prove theoretically and empirically that our technique produces a uniform distribution on the manifold regardless of the training set distribution. We perform a range of experiments on various datasets and DGNs. One of them considers the state-of-the-art StyleGAN2 trained on FFHQ dataset, where uniform sampling via MaGNET increases distribution precision and recall by 4.1% & 3.0% and decreases gender bias by 41.2%, without requiring labels or retraining.

【6】 NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem 标题:NeuroLKH:深度学习模型与Lin-Kernighan-Helsagn启发式相结合求解旅行商问题 链接:https://arxiv.org/abs/2110.07983

作者:Liang Xin,Wen Song,Zhiguang Cao,Jie Zhang 机构:Nanyang Technological University, Shandong Unviersity, Qingdao, China, Singapore Institute of Manufacturing Technology 备注:Accepted at NeurIPS 2021 摘要:我们提出了一种新的算法NeuroLKH,它将深度学习和强大的传统启发式Lin-Kernighan-Helsgaun(LKH)相结合来解决旅行商问题。具体地说,我们训练了一个稀疏图网络(SGN),其中边分数的有监督学习和节点惩罚的无监督学习对提高LKH的性能至关重要。基于SGN的输出,Neurikh创建边缘候选集并变换边缘距离,以指导LKH的搜索过程。大量的实验有力地证明,通过对一个模型进行大范围的问题规模训练,NeuroKH显著优于LKH,并能很好地推广到更大的规模。此外,我们还证明了NeuroKH可以应用于其他路径问题,如容量限制车辆路径问题(CVRP)、取货和交货问题(PDP)和带时间窗的CVRP(CVRPTW)。 摘要:We present NeuroLKH, a novel algorithm that combines deep learning with the strong traditional heuristic Lin-Kernighan-Helsgaun (LKH) for solving Traveling Salesman Problem. Specifically, we train a Sparse Graph Network (SGN) with supervised learning for edge scores and unsupervised learning for node penalties, both of which are critical for improving the performance of LKH. Based on the output of SGN, NeuroLKH creates the edge candidate set and transforms edge distances to guide the searching process of LKH. Extensive experiments firmly demonstrate that, by training one model on a wide range of problem sizes, NeuroLKH significantly outperforms LKH and generalizes well to much larger sizes. Also, we show that NeuroLKH can be applied to other routing problems such as Capacitated Vehicle Routing Problem (CVRP), Pickup and Delivery Problem (PDP), and CVRP with Time Windows (CVRPTW).

【7】 Reappraising Domain Generalization in Neural Networks 标题:神经网络中领域泛化的再评价 链接:https://arxiv.org/abs/2110.07981

作者:Sarath Sivaprasad,Akshay Goindani,Vaibhav Garg,Vineet Gandhi 机构:Kohli Centre on Intelligent Systems, IIIT Hyderabad, TCS Research, Pune 摘要:机器学习算法的领域泛化(DG)定义为从多个训练分布中学习领域不可知假设的能力,该假设从一个不可见的领域泛化到数据上。DG在具有明显特征的目标域具有稀疏训练数据的场景中至关重要。根据最近的工作{gullajani2020search},我们发现直接的经验风险最小化(ERM)基线始终优于现有的DG方法。我们目前的研究表明,主干网、数据增强和优化算法的选择掩盖了现有技术中探索的许多技巧和交易。我们的工作使四种流行的DG数据集达到了新的技术水平,大大超过了以前的方法。此外,作为一个关键贡献,我们提出了一个类DG公式,其中对于每个类,我们随机选择一个域,并将其保留在一边进行测试。我们认为,这种基准测试更接近人类学习,并且与现实场景相关。我们在DomainBed上对类DG进行了全面的基准测试,并提出了一种结合ERM和反向梯度的方法,以获得最先进的结果。令我们惊讶的是,尽管在训练期间接触到了所有领域,但分类DG比传统DG评估更具挑战性,并激发了对DG问题更根本的反思。 摘要:Domain generalization (DG) of machine learning algorithms is defined as their ability to learn a domain agnostic hypothesis from multiple training distributions, which generalizes onto data from an unseen domain. DG is vital in scenarios where the target domain with distinct characteristics has sparse data for training. Aligning with recent work~\cite{gulrajani2020search}, we find that a straightforward Empirical Risk Minimization (ERM) baseline consistently outperforms existing DG methods. We present ablation studies indicating that the choice of backbone, data augmentation, and optimization algorithms overshadows the many tricks and trades explored in the prior art. Our work leads to a new state of the art on the four popular DG datasets, surpassing previous methods by large margins. Furthermore, as a key contribution, we propose a classwise-DG formulation, where for each class, we randomly select one of the domains and keep it aside for testing. We argue that this benchmarking is closer to human learning and relevant in real-world scenarios. We comprehensively benchmark classwise-DG on the DomainBed and propose a method combining ERM and reverse gradients to achieve the state-of-the-art results. To our surprise, despite being exposed to all domains during training, the classwise DG is more challenging than traditional DG evaluation and motivates more fundamental rethinking on the problem of DG.

【8】 SaLinA: Sequential Learning of Agents 标题:Salina:Agent的顺序学习 链接:https://arxiv.org/abs/2110.07910

作者:Ludovic Denoyer,Alfredo de la Fuente,Song Duong,Jean-Baptiste Gaya,Pierre-Alexandre Kamienny,Daniel H. Thompson 机构:Facebook 摘要:SaLinA是一个简单的库,使实现复杂的顺序学习模型变得容易,包括强化学习算法。它是PyTorch的一个扩展:PyTorch用户可以在几分钟内理解用\SALINA{}编码的算法,并且很容易修改。此外,SaLinA在训练和测试时自然使用多个CPU和GPU,因此非常适合大规模训练用例。与现有的RL库相比,SaLinA具有非常低的采用成本,并捕获了大量的设置(基于模型的RL、批RL、层次RL、多代理RL等)。但是SaLinA不仅仅针对RL从业者,它的目标是为任何深度学习程序员提供顺序学习能力。 摘要:SaLinA is a simple library that makes implementing complex sequential learning models easy, including reinforcement learning algorithms. It is built as an extension of PyTorch: algorithms coded with \SALINA{} can be understood in few minutes by PyTorch users and modified easily. Moreover, SaLinA naturally works with multiple CPUs and GPUs at train and test time, thus being a good fit for the large-scale training use cases. In comparison to existing RL libraries, SaLinA has a very low adoption cost and capture a large variety of settings (model-based RL, batch RL, hierarchical RL, multi-agent RL, etc.). But SaLinA does not only target RL practitioners, it aims at providing sequential learning capabilities to any deep learning programmer.

【9】 Towards Better Plasticity-Stability Trade-off in Incremental Learning: A simple Linear Connector 标题:在增量学习中寻求更好的塑性稳定性权衡:一种简单的线性连接器 链接:https://arxiv.org/abs/2110.07905

作者:Guoliang Lin,Hanglu Chu,Hanjiang Lai 机构: Hanjiang Lai 3 1Sun Yat-Sen university, cn 2South China Normal University, cn 3Sun Yat-Sen university 摘要:可塑性-稳定性困境是增量学习的一个主要问题,可塑性指学习新知识的能力,稳定性指保留以前任务的知识。由于缺乏以前任务的训练样本,很难平衡可塑性和稳定性。例如,最近的零空间投影方法(如Adam NSCL)在保留以前的知识方面表现出了良好的性能,而这种强投影也会导致当前任务的性能下降。为了实现更好的塑性稳定性权衡,本文中,我们证明了两个独立优化的网络最优值的简单平均,过去任务的零空间投影和当前任务的简单SGD,可以在保留已学知识和赋予学习新任务的足够灵活性之间实现有意义的平衡。这种简单的线性连接器也为我们提供了一种新的视角和技术来控制塑性和稳定性之间的平衡。我们在几个基准数据集上对所提出的方法进行了评估。结果表明,我们的简单方法可以实现显著的改进,并且在过去和当前的任务中都表现良好。简言之,我们的方法是一种非常简单的方法,实现了更好的平衡模型。 摘要:Plasticity-stability dilemma is a main problem for incremental learning, with plasticity referring to the ability to learn new knowledge, and stability retaining the knowledge of previous tasks. Due to the lack of training samples from previous tasks, it is hard to balance the plasticity and stability. For example, the recent null-space projection methods (e.g., Adam-NSCL) have shown promising performance on preserving previous knowledge, while such strong projection also causes the performance degradation of the current task. To achieve better plasticity-stability trade-off, in this paper, we show that a simple averaging of two independently optimized optima of networks, null-space projection for past tasks and simple SGD for the current task, can attain a meaningful balance between preserving already learned knowledge and granting sufficient flexibility for learning a new task. This simple linear connector also provides us a new perspective and technology to control the trade-off between plasticity and stability. We evaluate the proposed method on several benchmark datasets. The results indicate our simple method can achieve notable improvement, and perform well on both the past and current tasks. In short, our method is an extremely simple approach and achieves a better balance model.

【10】 FOLD-R++: A Toolset for Automated Inductive Learning of Default Theories from Mixed Data 标题:Fold-R++:从混合数据中自动归纳学习缺省理论的工具集 链接:https://arxiv.org/abs/2110.07843

作者:Huaduo Wang,Gopal Gupta 机构:The University of Texas at Dallas, Richardson TX , USA 摘要:FOLD-R是一种自动归纳学习算法,用于学习混合(数字和分类)数据例外情况下的默认规则。它为分类任务生成(可解释的)答案集编程(ASP)规则集。我们提出了一种改进的FOLD-R算法,称为FOLD-R++,该算法显著提高了FOLD-R的效率和可扩展性。FOLD-R++改进了FOLD-R,在编码或特征选择阶段不会损害或丢失输入训练数据中的信息。FOLD-R++算法在性能上与广泛使用的XGBoost算法具有竞争力,但是,与XGBoost不同,FOLD-R++算法产生了一个可解释的模型。接下来,我们将FOLD-R++与s(CASP)相结合,创建一个功能强大的工具集。s(CASP)是一个面向目标的ASP执行引擎,用于使用FOLD-R++生成的答案集程序对新数据样本进行预测。s(CASP)系统也为预测提供了依据。本文的实验表明,改进的FOLD-R++算法是对原设计的一个显著改进,并且s(CASP)系统也可以有效地进行预测。 摘要:FOLD-R is an automated inductive learning algorithm for learning default rules with exceptions for mixed (numerical and categorical) data. It generates an (explainable) answer set programming (ASP) rule set for classification tasks. We present an improved FOLD-R algorithm, called FOLD-R++, that significantly increases the efficiency and scalability of FOLD-R. FOLD-R++ improves upon FOLD-R without compromising or losing information in the input training data during the encoding or feature selection phase. The FOLD-R++ algorithm is competitive in performance with the widely-used XGBoost algorithm, however, unlike XGBoost, the FOLD-R++ algorithm produces an explainable model. Next, we create a powerful tool-set by combining FOLD-R++ with s(CASP)-a goal-directed ASP execution engine-to make predictions on new data samples using the answer set program generated by FOLD-R++. The s(CASP) system also produces a justification for the prediction. Experiments presented in this paper show that our improved FOLD-R++ algorithm is a significant improvement over the original design and that the s(CASP) system can make predictions in an efficient manner as well.

【11】 On Extending Amdahl's law to Learn Computer Performance 标题:关于将Amdahl定律推广到计算机性能学习的探讨 链接:https://arxiv.org/abs/2110.07822

作者:Chaitanya Poolla,Rahul Saxena 机构:Intel Corporation, Juliette Ln, Santa Clara, CA , USA 备注:20 pages, 5 figures 摘要:研究了多核处理器环境下并行计算机性能的学习问题。在给定固定工作负载的情况下,寻求不同系统配置对性能的影响。传统上,由于单个资源增强而导致的性能加速是使用阿姆达尔定律来表示的。然而,在多个可配置资源的情况下,传统公式会导致几个不连续的加速比方程,这些方程无法组合在一起以确定总体加速比。为了解决这个问题,我们建议(1)扩展Amdahl定律,将多个可配置资源纳入整体加速比方程,以及(2)将加速比方程转化为适合机器学习的多变量回归问题。使用来自两个基准(SPECCPU 2017和PCMark 10)和四个硬件平台(Intel Xeon 8180M、AMD EPYC 7702P、Intel CoffeeLake 8700K和AMD Ryzen 3900X)的实验数据,开发分析模型并进行交叉验证。研究结果表明,在大多数情况下,模型的平均交叉验证准确率高于95%,从而验证了拟议的阿姆达尔定律扩展。提出的方法能够快速生成智能分析模型,以支持未来的工业发展、优化和模拟需求。 摘要:The problem of learning parallel computer performance is investigated in the context of multicore processors. Given a fixed workload, the effect of varying system configuration on performance is sought. Conventionally, the performance speedup due to a single resource enhancement is formulated using Amdahl's law. However, in case of multiple configurable resources the conventional formulation results in several disconnected speedup equations that cannot be combined together to determine the overall speedup. To solve this problem, we propose to (1) extend Amdahl's law to accommodate multiple configurable resources into the overall speedup equation, and (2) transform the speedup equation into a multivariable regression problem suitable for machine learning. Using experimental data from two benchmarks (SPECCPU 2017 and PCMark 10) and four hardware platforms (Intel Xeon 8180M, AMD EPYC 7702P, Intel CoffeeLake 8700K, and AMD Ryzen 3900X), analytical models are developed and cross-validated. Findings indicate that in most cases, the models result in an average cross-validated accuracy higher than 95%, thereby validating the proposed extension of Amdahl's law. The proposed methodology enables rapid generation of intelligent analytical models to support future industrial development, optimization, and simulation needs.

【12】 Provable Regret Bounds for Deep Online Learning and Control 标题:深度在线学习与控制的可证明后悔界限 链接:https://arxiv.org/abs/2110.07807

作者:Xinyi Chen,Edgar Minasyan,Jason D. Lee,Elad Hazan 机构: Department of Computer Science, Princeton University, Google AI Princeton 摘要:深度神经网络在强化学习和控制中的应用非常成功,尽管对于这些问题的深度学习几乎没有理论保证。推导性能保证有两个主要挑战:a)控制具有状态信息,因此本质上是在线的;b)深度网络是非凸预测,在线学习通常无法提供可证明的保证。基于过参数化神经网络的线性化技术,我们推导了深度神经网络有效在线学习的可证明遗憾界。具体地说,我们证明了在任何凸损失函数序列上,任何低后悔算法都可以用来优化神经网络的参数,使其在事后与最佳网络竞争。作为这些结果在在线环境中的应用,我们获得了具有深度神经网络控制器的在线幕式控制的可证明界。 摘要:The use of deep neural networks has been highly successful in reinforcement learning and control, although few theoretical guarantees for deep learning exist for these problems. There are two main challenges for deriving performance guarantees: a) control has state information and thus is inherently online and b) deep networks are non-convex predictors for which online learning cannot provide provable guarantees in general. Building on the linearization technique for overparameterized neural networks, we derive provable regret bounds for efficient online learning with deep neural networks. Specifically, we show that over any sequence of convex loss functions, any low-regret algorithm can be adapted to optimize the parameters of a neural network such that it competes with the best net in hindsight. As an application of these results in the online setting, we obtain provable bounds for online episodic control with deep neural network controllers.

【13】 Learning the Koopman Eigendecomposition: A Diffeomorphic Approach 标题:学习库普曼本征分解:一种微分同胚方法 链接:https://arxiv.org/abs/2110.07786

作者:Petar Bevanda,Johannes Kirmayr,Stefan Sosnowski,Sandra Hirche 机构: Department of Electrical and Computer Engineering 备注:Submitted to the 2022 American Control Conference 摘要:我们提出了一种新的数据驱动方法,用于利用Koopman特征函数学习一类稳定非线性系统的线性表示。通过正规化流学习非线性系统及其雅可比线性化之间的共轭映射,可以保证所学习的函数是微分同胚。利用这种微分同胚性,我们通过共轭系统的谱等价性来构造非线性系统的特征函数,从而可以构造非线性系统的线性预测器。微分同胚学习器的普遍性导致了非线性系统Koopman特征函数的普遍逼近。所开发的方法也是安全的,因为它保证了无论表示精度如何,模型都是渐近稳定的。据我们所知,这是填补操作员、系统和学习理论之间差距的第一项工作。仿真实例表明了该方法的有效性。 摘要:We present a novel data-driven approach for learning linear representations of a class of stable nonlinear systems using Koopman eigenfunctions. By learning the conjugacy map between a nonlinear system and its Jacobian linearization through a Normalizing Flow one can guarantee the learned function is a diffeomorphism. Using this diffeomorphism, we construct eigenfunctions of the nonlinear system via the spectral equivalence of conjugate systems - allowing the construction of linear predictors for nonlinear systems. The universality of the diffeomorphism learner leads to the universal approximation of the nonlinear system's Koopman eigenfunctions. The developed method is also safe as it guarantees the model is asymptotically stable regardless of the representation accuracy. To our best knowledge, this is the first work to close the gap between the operator, system and learning theories. The efficacy of our approach is shown through simulation examples.

【14】 Scalable Causal Structure Learning: New Opportunities in Biomedicine 标题:可扩展因果结构学习:生物医学的新机遇 链接:https://arxiv.org/abs/2110.07785

作者:Pulakesh Upadhyaya,Kai Zhang,Can Li,Xiaoqian Jiang,Yejin Kim 机构:School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas 摘要:本文提供了一个关于流行的因果结构学习模型的实用教程,并以实际数据为例,帮助医疗保健受众理解和应用这些模型。我们回顾了传统的、基于分数的和基于机器学习的因果结构发现方案,研究了它们在一些基准数据集上的性能,并讨论了它们在生物医学中的一些应用。在数据充足的情况下,基于机器学习的方法可以扩展,可以包含比传统方法更多的变量,并且可以潜在地应用于许多生物医学应用。 摘要:This paper gives a practical tutorial on popular causal structure learning models with examples of real-world data to help healthcare audiences understand and apply them. We review prominent traditional, score-based and machine-learning based schemes for causal structure discovery, study some of their performance over some benchmark datasets, and discuss some of the applications to biomedicine. In the case of sufficient data, machine learning-based approaches can be scalable, can include a greater number of variables than traditional approaches, and can potentially be applied in many biomedical applications.

【15】 Creating User Interface Mock-ups from High-Level Text Descriptions with Deep-Learning Models 标题:使用深度学习模型从高级文本描述创建用户界面模型 链接:https://arxiv.org/abs/2110.07775

作者:Forrest Huang,Gang Li,Xin Zhou,John F. Canny,Yang Li 机构: University of California 摘要:用户界面(UI)的设计过程通常从阐明高层设计目标开始。然而,将这些高级设计目标转化为具体的设计模型需要大量的工作和UI设计专业知识。为了促进应用程序设计师和开发人员的这一过程,我们引入了三种深度学习技术,从描述高级设计目标的自然语言短语(例如,“弹出显示图像和其他选项”)创建低保真UI模型。特别是,我们提供了两种基于检索的方法和一种生成方法,以及前处理和后处理技术,以确保创建的UI模型的质量。我们定量和定性地比较和对比了每种方法在建议连贯、多样和相关的UI设计模型方面的能力。我们与15名专业UI设计师和实践者一起进一步评估这些方法,以了解每种方法的优缺点。设计师对这些方法在协助设计过程中的潜力做出了积极的回应。 摘要:The design process of user interfaces (UIs) often begins with articulating high-level design goals. Translating these high-level design goals into concrete design mock-ups, however, requires extensive effort and UI design expertise. To facilitate this process for app designers and developers, we introduce three deep-learning techniques to create low-fidelity UI mock-ups from a natural language phrase that describes the high-level design goal (e.g. "pop up displaying an image and other options"). In particular, we contribute two retrieval-based methods and one generative method, as well as pre-processing and post-processing techniques to ensure the quality of the created UI mock-ups. We quantitatively and qualitatively compare and contrast each method's ability in suggesting coherent, diverse and relevant UI design mock-ups. We further evaluate these methods with 15 professional UI designers and practitioners to understand each method's advantages and disadvantages. The designers responded positively to the potential of these methods for assisting the design process.

【16】 CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training 标题:CCQA:一种新的用于模型预训练的Web规模问答数据集 链接:https://arxiv.org/abs/2110.07731

作者:Patrick Huber,Armen Aghajanyan,Barlas Oğuz,Dmytro Okhonko,Wen-tau Yih,Sonal Gupta,Xilun Chen 机构:†University of British Columbia; ‡Facebook Inc. 摘要:随着大规模预训练语言模型的兴起,开放领域问答(ODQA)已成为NLP的一个重要研究课题。基于流行的预训练微调方法,我们假设使用大规模、自然和多样的问答(QA)数据集的额外域内预训练阶段有利于ODQA。因此,本文提出了一种基于公共爬网项目的新型QA数据集。使用现成的schema.org注释,我们提取了大约1.3亿个多语言问答对,包括大约6000万个英语数据点。有了这些之前未发现的自然问答对,我们对流行的语言模型进行了预训练,以展示大规模领域内预训练在问答任务中的潜力。在我们的实验中,我们发现,在我们的通用爬网问答数据集(CCQA)上的预训练问答模型在跨多个任务、模型和基准的Zero-Shot、低资源和微调设置方面取得了令人满意的结果。 摘要:With the rise of large-scale pre-trained language models, open-domain question-answering (ODQA) has become an important research topic in NLP. Based on the popular pre-training fine-tuning approach, we posit that an additional in-domain pre-training stage using a large-scale, natural, and diverse question-answering (QA) dataset can be beneficial for ODQA. Consequently, we propose a novel QA dataset based on the Common Crawl project in this paper. Using the readily available schema.org annotation, we extract around 130 million multilingual question-answer pairs, including about 60 million English data-points. With this previously unseen number of natural QA pairs, we pre-train popular language models to show the potential of large-scale in-domain pre-training for the task of question-answering. In our experiments, we find that pre-training question-answering models on our Common Crawl Question Answering dataset (CCQA) achieves promising results in zero-shot, low resource and fine-tuned settings across multiple tasks, models and benchmarks.

【17】 Hindsight Network Credit Assignment: Efficient Credit Assignment in Networks of Discrete Stochastic Units 标题:后见之明网络信用分配:离散随机单元网络中的有效信用分配 链接:https://arxiv.org/abs/2110.07700

作者:Kenny Young 机构:Department of Computing Science, University of Alberta, Edmonton, Canada 摘要:用离散随机变量训练神经网络是一个独特的挑战。反向传播不是直接适用的,在具有连续随机变量的网络中也没有使用重新参数化技巧。为了应对这一挑战,我们提出了后见网络信用分配(HNCA),一种新的离散随机单元网络学习算法。HNCA的工作方式是根据每个单位的产出对其网络中直系子女的影响程度,为每个单位分配学分。我们证明了与强化估计相比,HNCA产生方差减小的无偏梯度估计,而计算量与反向传播近似。我们首先在上下文bandit设置中应用HNCA来优化代理未知的奖励函数。在此背景下,我们实证证明HNCA显著优于Enstructure,表明我们的理论分析所暗示的方差减少是显著且有效的。然后,我们展示了如何扩展HNCA以优化随机单元网络输出的更一般的函数,其中函数是代理已知的。我们将此扩展版HNCA应用于训练离散变分自动编码器,并通过实验证明其优于其他强方法。我们相信,HNCA背后的思想有助于激发关于随机计算图中有效信用分配的新思路。 摘要:Training neural networks with discrete stochastic variables presents a unique challenge. Backpropagation is not directly applicable, nor are the reparameterization tricks used in networks with continuous stochastic variables. To address this challenge, we present Hindsight Network Credit Assignment (HNCA), a novel learning algorithm for networks of discrete stochastic units. HNCA works by assigning credit to each unit based on the degree to which its output influences its immediate children in the network. We prove that HNCA produces unbiased gradient estimates with reduced variance compared to the REINFORCE estimator, while the computational cost is similar to that of backpropagation. We first apply HNCA in a contextual bandit setting to optimize a reward function that is unknown to the agent. In this setting, we empirically demonstrate that HNCA significantly outperforms REINFORCE, indicating that the variance reduction implied by our theoretical analysis is significant and impactful. We then show how HNCA can be extended to optimize a more general function of the outputs of a network of stochastic units, where the function is known to the agent. We apply this extended version of HNCA to train a discrete variational auto-encoder and empirically show it compares favourably to other strong methods. We believe that the ideas underlying HNCA can help stimulate new ways of thinking about efficient credit assignment in stochastic compute graphs.

【18】 Sound and Complete Neural Network Repair with Minimality and Locality Guarantees 标题:具有最小性和局部性保证的健全完整的神经网络修复 链接:https://arxiv.org/abs/2110.07682

作者:Feisi Fu,Wenchao Li 机构:Department of Systems Engineering, Boston University, Boston, MA , Department of Electrical and Computer Engineering 备注:16 pages, 3 figures 摘要:我们提出了一种利用ReLU激活函数修复神经网络的新方法。与现有方法不同的是,我们的方法依赖于修改神经网络的权值,从而导致函数空间中的全局变化,我们的方法仅应用函数空间中的局部变化,同时仍然保证去除错误行为。通过利用ReLU网络的分段线性特性,我们的方法可以有效地构造一个针对有缺陷输入所在的线性区域定制的补丁网络,当与原始网络结合时,该补丁网络可证明纠正了有缺陷输入的行为。我们的方法既可靠又完整——修复后的网络保证修复有缺陷的输入,并且保证为任何有缺陷的输入找到补丁。此外,我们的方法保留了ReLU网络的连续分段线性性质,自动将修复推广到所有点,包括修复区域内的其他未检测到的故障输入,函数空间的变化最小,并保证远离修复区域的输入上的输出不变。在几个基准上,我们表明,我们的方法在局部性和限制负面影响方面明显优于现有方法。 摘要:We present a novel methodology for repairing neural networks that use ReLU activation functions. Unlike existing methods that rely on modifying the weights of a neural network which can induce a global change in the function space, our approach applies only a localized change in the function space while still guaranteeing the removal of the buggy behavior. By leveraging the piecewise linear nature of ReLU networks, our approach can efficiently construct a patch network tailored to the linear region where the buggy input resides, which when combined with the original network, provably corrects the behavior on the buggy input. Our method is both sound and complete -- the repaired network is guaranteed to fix the buggy input, and a patch is guaranteed to be found for any buggy input. Moreover, our approach preserves the continuous piecewise linear nature of ReLU networks, automatically generalizes the repair to all the points including other undetected buggy inputs inside the repair region, is minimal in terms of changes in the function space, and guarantees that outputs on inputs away from the repair region are unaltered. On several benchmarks, we show that our approach significantly outperforms existing methods in terms of locality and limiting negative side effects.

【19】 Learning Mean-Field Equations from Particle Data Using WSINDy 标题:利用WSINDy从粒子数据中学习平均场方程 链接:https://arxiv.org/abs/2110.07756

作者:Daniel A. Messenger,David M. Bortz 摘要:我们开发了一种用于交互粒子系统(IPS)的弱形式稀疏识别方法,其主要目标是降低大粒子数$N$的计算复杂度,并提供对内在或外在噪声的鲁棒性。特别是,我们将IPS的平均场理论的概念与非线性动力学算法(WSINDy)的弱形式稀疏辨识相结合提供一种快速可靠的系统辨识方案,用于在每次实验的粒子数$N$为数千且实验次数$M$小于100时恢复IPS的控制随机微分方程。这与现有工作形成对比,现有工作表明,使用强形式方法,对低于100美元的N$和几千美元的M$进行系统识别是可行的。我们证明了在一些标准的正则性假设下,该格式在普通最小二乘条件下以$\mathcal{O}(N^{-1/2})$的速度收敛,并在一维和二维空间上数值证明了收敛速度。我们的例子包括来自均匀化理论的典型问题(作为学习粗粒度模型的第一步)、吸引排斥群的动力学以及抛物椭圆Keller-Segel趋化模型的IPS描述。 摘要:We develop a weak-form sparse identification method for interacting particle systems (IPS) with the primary goals of reducing computational complexity for large particle number $N$ and offering robustness to either intrinsic or extrinsic noise. In particular, we use concepts from mean-field theory of IPS in combination with the weak-form sparse identification of nonlinear dynamics algorithm (WSINDy) to provide a fast and reliable system identification scheme for recovering the governing stochastic differential equations for an IPS when the number of particles per experiment $N$ is on the order of several thousand and the number of experiments $M$ is less than 100. This is in contrast to existing work showing that system identification for $N$ less than 100 and $M$ on the order of several thousand is feasible using strong-form methods. We prove that under some standard regularity assumptions the scheme converges with rate $\mathcal{O}(N^{-1/2})$ in the ordinary least squares setting and we demonstrate the convergence rate numerically on several systems in one and two spatial dimensions. Our examples include a canonical problem from homogenization theory (as a first step towards learning coarse-grained models), the dynamics of an attractive-repulsive swarm, and the IPS description of the parabolic-elliptic Keller-Segel model for chemotaxis.

其他(19篇)

【1】 Influencing Towards Stable Multi-Agent Interactions 标题:对稳定的多智能体交互的影响 链接:https://arxiv.org/abs/2110.08229

作者:Woodrow Z. Wang,Andy Shih,Annie Xie,Dorsa Sadigh 备注:15 pages, 5 figures, Published as an Oral at Conference on Robot Learning (CoRL) 2021 摘要:在多智能体环境中学习是困难的,因为对手或伙伴的变化行为会引入非平稳性。我们提出了一种算法来主动影响另一个代理的策略以使其稳定下来,而不是被动地适应另一个代理(对手或伙伴)的行为,这可以抑制由另一个代理引起的非平稳性。我们学习了另一个智能体策略的低维潜在表示,以及潜在策略相对于我们机器人行为的演化动力学。有了这个学习过的动力学模型,我们可以定义一个无监督的稳定性奖励,来训练我们的机器人故意影响另一个代理朝着单一策略稳定下来。我们证明了在各种模拟环境中,包括自动驾驶、紧急通信和机器人操作,稳定在提高任务报酬最大化效率方面的有效性。我们在网站上展示定性结果:https://sites.google.com/view/stable-marl/. 摘要:Learning in multi-agent environments is difficult due to the non-stationarity introduced by an opponent's or partner's changing behaviors. Instead of reactively adapting to the other agent's (opponent or partner) behavior, we propose an algorithm to proactively influence the other agent's strategy to stabilize -- which can restrain the non-stationarity caused by the other agent. We learn a low-dimensional latent representation of the other agent's strategy and the dynamics of how the latent strategy evolves with respect to our robot's behavior. With this learned dynamics model, we can define an unsupervised stability reward to train our robot to deliberately influence the other agent to stabilize towards a single strategy. We demonstrate the effectiveness of stabilizing in improving efficiency of maximizing the task reward in a variety of simulated environments, including autonomous driving, emergent communication, and robotic manipulation. We show qualitative results on our website: https://sites.google.com/view/stable-marl/.

【2】 VICause: Simultaneous Missing Value Imputation and Causal Discovery with Groups 标题:VICause:与群体同时进行缺失值推算和因果发现 链接:https://arxiv.org/abs/2110.08223

作者:Pablo Morales-Alvarez,Angus Lamb,Simon Woodhead,Simon Peyton Jones,Miltiadis Allamanis,Cheng Zhang 机构:University of Granada, Microsoft Research,Eedi 摘要:对于预测和因果发现任务,缺失值是现实世界机器学习中的一个重要挑战。然而,现有的插补方法对因果关系是不可知的,而在传统的因果发现中只有少数方法能够有效地处理缺失数据。在这项工作中,我们提出了VICause,一种通过深度学习同时有效处理缺失值插补和因果发现的新方法。特别地,我们提出了一个具有结构化潜在空间和基于图形神经网络的结构的生成模型,可扩展到大量变量。此外,我们的方法可以发现变量组之间的关系,这在许多实际应用中非常有用。在缺失值插补和因果发现方面,VICause与流行的和最新的方法相比表现出更好的性能。 摘要:Missing values constitute an important challenge in real-world machine learning for both prediction and causal discovery tasks. However, existing imputation methods are agnostic to causality, while only few methods in traditional causal discovery can handle missing data in an efficient way. In this work we propose VICause, a novel approach to simultaneously tackle missing value imputation and causal discovery efficiently with deep learning. Particularly, we propose a generative model with a structured latent space and a graph neural network-based architecture, scaling to large number of variables. Moreover, our method can discover relationships between groups of variables which is useful in many real-world applications. VICause shows improved performance compared to popular and recent approaches in both missing value imputation and causal discovery.

【3】 Combining Diverse Feature Priors 标题:组合不同的功能先验 链接:https://arxiv.org/abs/2110.08220

作者:Saachi Jain,Dimitris Tsipras,Aleksander Madry 机构:MIT, Aleksander M ˛adry 摘要:为了提高模型的泛化能力,模型设计者通常会隐式或显式地限制其模型使用的特征。在这项工作中,我们通过将这些特性作为数据的不同视角来探索利用这些特性优先级的设计空间。具体而言,我们发现,使用不同的特征先验集训练的模型具有较少的重叠故障模式,因此可以更有效地组合。此外,我们还证明,在附加(未标记)数据上联合训练此类模型可以使它们纠正彼此的错误,从而提高泛化能力和对虚假相关性的恢复能力。代码可在https://github.com/MadryLab/copriors. 摘要:To improve model generalization, model designers often restrict the features that their models use, either implicitly or explicitly. In this work, we explore the design space of leveraging such feature priors by viewing them as distinct perspectives on the data. Specifically, we find that models trained with diverse sets of feature priors have less overlapping failure modes, and can thus be combined more effectively. Moreover, we demonstrate that jointly training such models on additional (unlabeled) data allows them to correct each other's mistakes, which, in turn, leads to better generalization and resilience to spurious correlations. Code available at https://github.com/MadryLab/copriors.

【4】 Collaborating with Humans without Human Data 标题:在没有人类数据的情况下与人类协作 链接:https://arxiv.org/abs/2110.08176

作者:DJ Strouse,Kevin R. McKee,Matt Botvinick,Edward Hughes,Richard Everett 机构:DeepMind 备注:Accepted at NeurIPS 2021 (spotlight) 摘要:与人类合作需要快速适应其个人优势、劣势和偏好。不幸的是,大多数标准的多智能体强化学习技术,如自我游戏(SP)或群体游戏(PP),产生的智能体过于适合其训练伙伴,不能很好地推广到人类。或者,研究人员可以收集人类数据,使用行为克隆训练人类模型,然后使用该模型训练“人类感知”代理(“行为克隆游戏”,或BCP)。虽然这种方法可以提高代理对新的人类共同参与者的泛化能力,但首先需要收集大量人类数据这一繁琐而昂贵的步骤。在这里,我们研究如何在不使用人工数据的情况下训练与人工合作伙伴协作良好的代理的问题。我们认为,问题的关键在于训练多样化的训练伙伴。从竞争领域中成功的多代理方法中得到启发,我们发现一种令人惊讶的简单方法是非常有效的。我们将我们的代理伙伴训练为对一群自玩代理的最佳反应,以及他们在整个训练过程中采取的过去检查点,我们称之为虚拟共玩(FCP)。我们的实验集中在一个两人合作的烹饪模拟器上,这个模拟器最近被提出作为一个与人类协调的挑战性问题。我们发现,当与新型药物和人类伙伴配对时,FCP药物得分显著高于SP、PP和BCP。此外,人类还报告了与FCP代理合作的强烈主观偏好超过所有基线。 摘要:Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model using behavioral cloning, and then use that model to train "human-aware" agents ("behavioral cloning play", or BCP). While such an approach can improve the generalization of agents to new human co-players, it involves the onerous and expensive step of collecting large amounts of human data first. Here, we study the problem of how to train agents that collaborate well with human partners without using human data. We argue that the crux of the problem is to produce a diverse set of training partners. Drawing inspiration from successful multi-agent approaches in competitive domains, we find that a surprisingly simple approach is highly effective. We train our agent partner as the best response to a population of self-play agents and their past checkpoints taken throughout training, a method we call Fictitious Co-Play (FCP). Our experiments focus on a two-player collaborative cooking simulator that has recently been proposed as a challenge problem for coordination with humans. We find that FCP agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners. Furthermore, humans also report a strong subjective preference to partnering with FCP agents over all baselines.

【5】 Trade-offs of Local SGD at Scale: An Empirical Study 标题:地方SGD规模取舍的实证研究 链接:https://arxiv.org/abs/2110.08133

作者:Jose Javier Gonzalez Ortiz,Jonathan Frankle,Mike Rabbat,Ari Morcos,Nicolas Ballas 机构:MIT CSAIL, Facebook AI Research 摘要:随着数据集和模型变得越来越大,分布式训练已经成为允许深度神经网络在合理的时间内进行训练的必要组成部分。然而,分布式训练可能会有大量的通信开销,这阻碍了其可扩展性。减少此开销的一种策略是在同步步骤之间对每个工作进程独立执行多个非同步SGD步骤,这是一种称为本地SGD的技术。我们对大规模图像分类任务中的局部SGD和相关方法进行了全面的实证研究。我们发现执行本地SGD是有代价的:更低的通信成本(从而更快的训练)伴随着更低的准确性。这一发现与先前工作中的小规模实验形成对比,表明本地SGD在规模上面临挑战。我们进一步表明,结合Wang等人(2020年)的慢动量框架,在不需要额外沟通的情况下持续提高准确性,暗示未来可能摆脱这种权衡的方向。 摘要:As datasets and models become increasingly large, distributed training has become a necessary component to allow deep neural networks to train in reasonable amounts of time. However, distributed training can have substantial communication overhead that hinders its scalability. One strategy for reducing this overhead is to perform multiple unsynchronized SGD steps independently on each worker between synchronization steps, a technique known as local SGD. We conduct a comprehensive empirical study of local SGD and related methods on a large-scale image classification task. We find that performing local SGD comes at a price: lower communication costs (and thereby faster training) are accompanied by lower accuracy. This finding is in contrast from the smaller-scale experiments in prior work, suggesting that local SGD encounters challenges at scale. We further show that incorporating the slow momentum framework of Wang et al. (2020) consistently improves accuracy without requiring additional communication, hinting at future directions for potentially escaping this trade-off.

【6】 Hand Me Your PIN! Inferring ATM PINs of Users Typing with a Covered Hand 标题:把你的密码给我!推论用遮盖手打字的用户的ATM PIN 链接:https://arxiv.org/abs/2110.08113

作者:Matteo Cardaioli,Stefano Cecconello,Mauro Conti,Simone Milani,Stjepan Picek,Eugen Saraci 机构:University of Padua, Italy, GFT Italia, Italy, Delft University of Technology, The Netherlands 摘要:自动柜员机(ATM)是最常用的提款系统。欧洲央行报告称,2019年欧洲ATM机上的提款和装卸交易超过110亿。尽管自动柜员机经历了各种技术进步,但个人识别码(PIN)仍然是这些设备最常用的身份验证方法。不幸的是,PIN机制很容易受到通过安装在ATM机附近的隐藏摄像头进行的肩部冲浪攻击。为了克服这个问题,人们习惯用另一只手盖住打字的手。虽然这些用户可能认为这种行为足够安全,可以抵御上述攻击,但在科学文献中对这种对策没有明确的评估。本文提出了一种新的攻击方法来重建受害者用另一只手覆盖打字手输入的PIN码。我们考虑攻击者可以访问同一品牌/型号的ATM针垫作为目标对象的设置。之后,攻击者使用该模型推断受害者在输入PIN时按下的数字。我们的攻击将其成功归功于精心选择的深度学习架构,该架构可以从打字手的位置和动作推断PIN。我们对58名用户进行了详细的实验分析。通过我们的方法,我们可以在三次尝试中猜出30%的5位PIN码,这是ATM在阻塞卡之前通常允许的。我们还对78名用户进行了调查,在相同的设置下,平均准确率仅为7.92%。最后,我们评估了一种屏蔽对策,该对策被证明是相当低效的,除非整个键盘都被屏蔽。 摘要:Automated Teller Machines (ATMs) represent the most used system for withdrawing cash. The European Central Bank reported more than 11 billion cash withdrawals and loading/unloading transactions on the European ATMs in 2019. Although ATMs have undergone various technological evolutions, Personal Identification Numbers (PINs) are still the most common authentication method for these devices. Unfortunately, the PIN mechanism is vulnerable to shoulder-surfing attacks performed via hidden cameras installed near the ATM to catch the PIN pad. To overcome this problem, people get used to covering the typing hand with the other hand. While such users probably believe this behavior is safe enough to protect against mentioned attacks, there is no clear assessment of this countermeasure in the scientific literature. This paper proposes a novel attack to reconstruct PINs entered by victims covering the typing hand with the other hand. We consider the setting where the attacker can access an ATM PIN pad of the same brand/model as the target one. Afterward, the attacker uses that model to infer the digits pressed by the victim while entering the PIN. Our attack owes its success to a carefully selected deep learning architecture that can infer the PIN from the typing hand position and movements. We run a detailed experimental analysis including 58 users. With our approach, we can guess 30% of the 5-digit PINs within three attempts -- the ones usually allowed by ATM before blocking the card. We also conducted a survey with 78 users that managed to reach an accuracy of only 7.92% on average for the same setting. Finally, we evaluate a shielding countermeasure that proved to be rather inefficient unless the whole keypad is shielded.

【7】 FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes 标题:FlexConv:核大小可微的连续核卷积 链接:https://arxiv.org/abs/2110.08059

作者:David W. Romero,Robert-Jan Bruintjes,Jakub M. Tomczak,Erik J. Bekkers,Mark Hoogendoorn,Jan C. van Gemert 机构: Vrije Universiteit Amsterdam, Delft University of Technology, University of Amsterdam, The Netherlands 备注:First two authors contributed equally to this work 摘要:在设计卷积神经网络(CNN)时,必须在训练之前选择卷积核的大小。最近的研究表明,CNN从不同层的不同内核大小中获益,但探索所有可能的组合在实践中是不可行的。一种更有效的方法是在训练期间学习内核大小。然而,现有的了解内核大小的工作带宽有限。这些方法通过膨胀来缩放内核,因此它们所能描述的细节是有限的。在这项工作中,我们提出了FlexConv,一种新的卷积运算,通过这种运算,可以以固定的参数代价学习具有可学习核大小的高带宽卷积核。FlexNet在不使用池的情况下建模长期依赖关系,在多个连续数据集上实现最先进的性能,在了解内核大小方面优于最近的工作,并且在图像基准数据集上与更深入的Resnet竞争。此外,FlexNet的部署分辨率比训练期间的分辨率更高。为了避免混叠,我们提出了一种新的核参数化方法,通过这种方法可以解析地控制核的频率。我们新的内核参数化显示出比现有参数化更高的描述能力和更快的收敛速度。这将大大提高分类精度。 摘要:When designing Convolutional Neural Networks (CNNs), one must select the size of the convolutional kernels before training. Recent works show CNNs benefit from different kernel sizes at different layers, but exploring all possible combinations is unfeasible in practice. A more efficient approach is to learn the kernel size during training. However, existing works that learn the kernel size have a limited bandwidth. These approaches scale kernels by dilation, and thus the detail they can describe is limited. In this work, we propose FlexConv, a novel convolutional operation with which high bandwidth convolutional kernels of learnable kernel size can be learned at a fixed parameter cost. FlexNets model long-term dependencies without the use of pooling, achieve state-of-the-art performance on several sequential datasets, outperform recent works with learned kernel sizes, and are competitive with much deeper ResNets on image benchmark datasets. Additionally, FlexNets can be deployed at higher resolutions than those seen during training. To avoid aliasing, we propose a novel kernel parameterization with which the frequency of the kernels can be analytically controlled. Our novel kernel parameterization shows higher descriptive power and faster convergence speed than existing parameterizations. This leads to important improvements in classification accuracy.

【8】 Toward Annotator Group Bias in Crowdsourcing 标题:论众包中的注释员群体偏向 链接:https://arxiv.org/abs/2110.08038

作者:Haochen Liu,Joseph Thekinen,Sinem Mollaoglu,Da Tang,Ji Yang,Youlong Cheng,Hui Liu,Jiliang Tang 机构: Michigan State University, East Lansing, MI, USA, ByteDance Inc., Mountain View, CA, USA 备注:10 pages 摘要:众包已经成为一种流行的方法,用于收集带注释的数据以训练有监督的机器学习模型。但是,注释器偏差可能导致注释有缺陷。虽然有一些研究注释者个体偏见的著作,但注释者的群体效应在很大程度上被忽略了。在这项工作中,我们发现同一人口统计学组中的注释者在注释任务中往往表现出一致的组偏见,因此我们对注释者组偏见进行了初步研究。我们首先实证验证了各种真实众包数据集中注释者群体偏差的存在。然后,我们开发了一种新的概率图形框架GroupAnno,用一种新的扩展期望最大化(EM)训练算法捕获注释者组偏差。我们在合成数据集和真实数据集上进行实验。实验结果证明了我们的模型在标签聚合和竞争基线模型学习中建模注释者组偏差的有效性。 摘要:Crowdsourcing has emerged as a popular approach for collecting annotated data to train supervised machine learning models. However, annotator bias can lead to defective annotations. Though there are a few works investigating individual annotator bias, the group effects in annotators are largely overlooked. In this work, we reveal that annotators within the same demographic group tend to show consistent group bias in annotation tasks and thus we conduct an initial study on annotator group bias. We first empirically verify the existence of annotator group bias in various real-world crowdsourcing datasets. Then, we develop a novel probabilistic graphical framework GroupAnno to capture annotator group bias with a new extended Expectation Maximization (EM) training algorithm. We conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the effectiveness of our model in modeling annotator group bias in label aggregation and model learning over competitive baselines.

【9】 Low-rank Matrix Recovery With Unknown Correspondence 标题:未知对应的低秩矩阵恢复算法 链接:https://arxiv.org/abs/2110.07959

作者:Zhiwei Tang,Tsung-Hui Chang,Xiaojing Ye,Hongyuan Zha 机构:The Chinese University of Hong Kong, Shenzhen, Georgia State University 摘要:我们研究了一个具有未知对应关系的矩阵恢复问题:给定观测矩阵$M_o=[a,\tilde P B]$,其中$\tilde P$是一个未知置换矩阵,我们的目标是恢复底层矩阵$M=[a,B]$。此类问题通常出现在许多应用中,其中使用了异构数据,并且它们之间的对应关系未知,例如,由于隐私问题。我们证明了在适当的低秩条件下,通过求解核范数极小化问题在$M$上恢复$M$是可能的,可证明的非渐近误差界为$M$。我们提出了一种算法$\text{M}^3\text{O}$(通过最小-最大优化进行矩阵恢复),该算法将该组合问题转化为一个连续的最小-最大优化问题,并使用最大预言机通过近似梯度进行求解$\text{M}^3\text{O}$也可以应用于更一般的场景,在这种场景中,$M_O$中缺少条目,并且有多组数据具有明显的未知对应关系。在模拟数据、MovieLens 100K数据集和耶鲁B数据库上的实验表明,$\text{M}^3\text{O}$在多条基线上实现了最先进的性能,并且能够以高精度恢复地面真实值对应关系。 摘要:We study a matrix recovery problem with unknown correspondence: given the observation matrix $M_o=[A,\tilde P B]$, where $\tilde P$ is an unknown permutation matrix, we aim to recover the underlying matrix $M=[A,B]$. Such problem commonly arises in many applications where heterogeneous data are utilized and the correspondence among them are unknown, e.g., due to privacy concerns. We show that it is possible to recover $M$ via solving a nuclear norm minimization problem under a proper low-rank condition on $M$, with provable non-asymptotic error bound for the recovery of $M$. We propose an algorithm, $\text{M}^3\text{O}$ (Matrix recovery via Min-Max Optimization) which recasts this combinatorial problem as a continuous minimax optimization problem and solves it by proximal gradient with a Max-Oracle. $\text{M}^3\text{O}$ can also be applied to a more general scenario where we have missing entries in $M_o$ and multiple groups of data with distinct unknown correspondence. Experiments on simulated data, the MovieLens 100K dataset and Yale B database show that $\text{M}^3\text{O}$ achieves state-of-the-art performance over several baselines and can recover the ground-truth correspondence with high accuracy.

【10】 k\texttt{-experts} -- Online Policies and Fundamental Limits标题:k exttt{-Experts}--在线策略和基本限制链接:https://arxiv.org/abs/2110.07881

作者:Samrat Mukhopadhyay,Sourav Sahoo,Abhishek Sinha 机构:Department of Electronics Engineering, IIT (ISM) Dhanbad, Department of Electrical Engineering, IIT Madras 摘要:本文介绍并研究了$k\texttt{experts}$问题,这是经典的专家建议预测(即$\texttt{experts}$)问题的推广。与$\texttt{Experts}$问题不同,学习者只选择一位专家,在这个问题中,学习者从每轮$N$专家中选择$k$专家的子集。学习者在任何一轮中获得的奖励取决于所选专家的奖励。$k\texttt{-experts}$问题出现在许多实际环境中,包括在线广告投放、个性化新闻推荐和分页。我们的主要目标是设计一个有点遗憾的在线学习策略。在这一过程中,我们提出了$\texttt{SAGE}$($\textbf{Sa}$mpled Hed$\textbf{ge}$),这是一个利用统计抽样技术设计有效在线学习策略的框架。我们表明,对于许多相关的问题,$\texttt{SAGE}$改进了遗憾和计算复杂性的最新界限。此外,超越遗憾的概念,我们描述了一类稳定损失函数的在线学习策略可实现的错误界限。我们通过为$k\texttt{experts}$问题的一个变体建立一个严格的遗憾下界并用标准数据集进行实验来结束本文。 摘要:This paper introduces and studies the $k\texttt{-experts}$ problem -- a generalization of the classic Prediction with Expert's Advice (i.e., the $\texttt{Experts}$) problem. Unlike the $\texttt{Experts}$ problem, where the learner chooses exactly one expert, in this problem, the learner selects a subset of $k$ experts from a pool of $N$ experts at each round. The reward obtained by the learner at any round depends on the rewards of the selected experts. The $k\texttt{-experts}$ problem arises in many practical settings, including online ad placements, personalized news recommendations, and paging. Our primary goal is to design an online learning policy having a small regret. In this pursuit, we propose $\texttt{SAGE}$ ($\textbf{Sa}$mpled Hed$\textbf{ge}$) - a framework for designing efficient online learning policies by leveraging statistical sampling techniques. We show that, for many related problems, $\texttt{SAGE}$ improves upon the state-of-the-art bounds for regret and computational complexity. Furthermore, going beyond the notion of regret, we characterize the mistake bounds achievable by online learning policies for a class of stable loss functions. We conclude the paper by establishing a tight regret lower bound for a variant of the $k\texttt{-experts}$ problem and carrying out experiments with standard datasets.

【11】 Cross-Lingual Fine-Grained Entity Typing 标题:跨语言细粒度实体分类 链接:https://arxiv.org/abs/2110.07837

作者:Nila Selvaraj,Yasumasa Onoe,Greg Durrett 机构:Department of Computer Science, The University of Texas at Austin 摘要:跨语言预先训练模型的增长使NLP工具能够快速推广到新语言。虽然这些模型已应用于涉及实体的任务,但它们跨语言明确预测这些实体类型特征的能力尚未建立。在本文中,我们提出了一个统一的跨语言细粒度实体类型模型,该模型能够处理100多种语言,并分析了该模型推广到训练过程中看不见的语言和实体的能力。我们使用从维基百科多语言超链接收集的跨语言训练数据(训练语言)来训练这个模型。在推理过程中,我们的模型采用特定语言(测试语言,可能不在训练语言中)中的实体名称和上下文,并预测该实体的细粒度类型。推广到新语言和看不见的实体是此实体类型设置的基本挑战,因此我们将重点评估这些设置,并与简单但功能强大的字符串匹配基线进行比较。实验结果表明,在日语、泰米尔语、阿拉伯语、塞尔维亚语和波斯语等看不见的语言上,我们的方法优于基线。此外,我们的方法大大提高了基线上看不见的实体(甚至是看不见的语言)的性能,并且人工评估显示了在这些设置中预测相关类型的强大能力。 摘要:The growth of cross-lingual pre-trained models has enabled NLP tools to rapidly generalize to new languages. While these models have been applied to tasks involving entities, their ability to explicitly predict typological features of these entities across languages has not been established. In this paper, we present a unified cross-lingual fine-grained entity typing model capable of handling over 100 languages and analyze this model's ability to generalize to languages and entities unseen during training. We train this model on cross-lingual training data collected from Wikipedia hyperlinks in multiple languages (training languages). During inference, our model takes an entity mention and context in a particular language (test language, possibly not in the training languages) and predicts fine-grained types for that entity. Generalizing to new languages and unseen entities are the fundamental challenges of this entity typing setup, so we focus our evaluation on these settings and compare against simple yet powerful string match baselines. Experimental results show that our approach outperforms the baselines on unseen languages such as Japanese, Tamil, Arabic, Serbian, and Persian. In addition, our approach substantially improves performance on unseen entities (even in unseen languages) over the baselines, and human evaluation shows a strong ability to predict relevant types in these settings.

【12】 Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent 标题:Polyak步长梯度下降的统计和计算复杂度研究 链接:https://arxiv.org/abs/2110.07810

作者:Tongzheng Ren,Fuheng Cui,Alexia Atsidakou,Sujay Sanghavi,Nhat Ho 机构:Department of Computer Science, University of Texas at Austin⋄, Department of Statistics and Data Sciences, University of Texas at Austin♭, Department of Electrical and Computer Engineering, University of Texas at Austin† 备注:First three authors contributed equally. 40 pages, 4 figures 摘要:我们研究了在总体损失函数的广义光滑性和Lojasiewicz条件下,即当样本量趋于无穷大时,经验损失函数的极限,Polyak步长梯度下降算法的统计和计算复杂性,以及经验损失函数梯度与总体损失函数梯度之间的稳定性,即样本损失函数梯度与总体损失函数梯度之间浓度界的多项式增长。我们证明了Polyak步长梯度下降迭代在按照样本大小进行对数迭代后,在真参数周围达到最终的统计收敛半径。当总体损失函数不是局部强凸时,与固定步长梯度下降算法在样本大小上的多项式迭代次数相比,它在计算上更便宜,以达到相同的最终统计半径。最后,我们通过三个统计例子来说明我们的一般理论:广义线性模型、混合模型和混合线性回归模型。 摘要:We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growth on the concentration bound between the gradients of sample and population loss functions. We demonstrate that the Polyak step size gradient descent iterates reach a final statistical radius of convergence around the true parameter after logarithmic number of iterations in terms of the sample size. It is computationally cheaper than the polynomial number of iterations on the sample size of the fixed-step size gradient descent algorithm to reach the same final statistical radius when the population loss function is not locally strongly convex. Finally, we illustrate our general theory under three statistical examples: generalized linear model, mixture model, and mixed linear regression model.

【13】 PTQ-SL: Exploring the Sub-layerwise Post-training Quantization 标题:PTQ-SL:探索子层智慧型训练后量化 链接:https://arxiv.org/abs/2110.07809

作者:Zhihang Yuan,Yiqi Chen,Chenhao Xue,Chenguang Zhang,Qiankun Wang,Qiankun Wang,Guangyu Sun 机构:Center for Energy-Efficient Computing and Applications, Peking University, Beijing, China, Houmo AI, Beijing, China 摘要:网络量化是一种有效的卷积神经网络压缩技术。量化粒度决定了如何在权重中共享缩放因子,从而影响网络量化的性能。对于卷积层的量化,大多数现有方法以分层或信道方式共享缩放因子。信道量化和分层量化在各种应用中得到了广泛的应用。然而,很少研究其他量化粒度。在本文中,我们将探讨跨多个输入和输出通道共享比例因子的子层粒度。提出了一种有效的子层粒度训练后量化方法(PTQ-SL)。然后,我们对各种粒度进行了系统的实验,发现量化神经网络的预测精度与粒度有很强的相关性。此外,我们发现调整信道的位置可以提高子层量化的性能。因此,我们提出了一种对信道重新排序的方法,用于子层量化。实验表明,采用适当的信道重排序的子层量化比信道量化具有更好的性能。 摘要:Network quantization is a powerful technique to compress convolutional neural networks. The quantization granularity determines how to share the scaling factors in weights, which affects the performance of network quantization. Most existing approaches share the scaling factors layerwisely or channelwisely for quantization of convolutional layers. Channelwise quantization and layerwise quantization have been widely used in various applications. However, other quantization granularities are rarely explored. In this paper, we will explore the sub-layerwise granularity that shares the scaling factor across multiple input and output channels. We propose an efficient post-training quantization method in sub-layerwise granularity (PTQ-SL). Then we systematically experiment on various granularities and observe that the prediction accuracy of the quantized neural network has a strong correlation with the granularity. Moreover, we find that adjusting the position of the channels can improve the performance of sub-layerwise quantization. Therefore, we propose a method to reorder the channels for sub-layerwise quantization. The experiments demonstrate that the sub-layerwise quantization with appropriate channel reordering can outperform the channelwise quantization.

【14】 Attention-Free Keyword Spotting 标题:无需注意的关键词识别 链接:https://arxiv.org/abs/2110.07749

作者:Mashrur M. Morshed,Ahmad Omar Ahsan 机构:Islamic University of Technology (IUT), Department of Computer Science & Engineering (CSE) 备注:Submitted to ICASSP-2022 (5 pages) 摘要:到目前为止,基于注意的模型已经在关键词识别领域取得了巨大的成功。然而,鉴于深度学习的最新进展,出现了一个问题:自我关注是否真的不可替代地识别语音关键词。因此,我们探讨了在关键字识别任务中使用门控MLP(以前在视觉任务中被证明是转换器的替代品)的问题。我们在Google Speech Commands V2-35数据集上验证了我们的方法,并表明在不使用任何明显的自我注意的情况下,可以获得与最新技术相当的性能。 摘要:Till now, attention-based models have been used with great success in the keyword spotting problem domain. However, in light of recent advances in deep learning, the question arises whether self-attention is truly irreplaceable for recognizing speech keywords. We thus explore the usage of gated MLPs -- previously shown to be alternatives to transformers in vision tasks -- for the keyword spotting task. We verify our approach on the Google Speech Commands V2-35 dataset and show that it is possible to obtain performance comparable to the state of the art without any apparent usage of self-attention.

【15】 Safety-aware Policy Optimisation for Autonomous Racing 标题:自主赛车的安全感知策略优化 链接:https://arxiv.org/abs/2110.07699

作者:Bingqing Chen,Jonathan Francis,James Herman,Jean Oh,Eric Nyberg,Sylvia L. Herbert 机构:School of Computer Science, Carnegie Mellon University, Pittsburgh, PA , Human-Machine Collaboration, Bosch Research Pittsburgh, Pittsburgh, PA , Safe Autonomous Systems, University of California, San Diego, CA 备注:22 pages, 14 figures, 3 tables 摘要:为了适用于安全关键应用,如自动驾驶和辅助机器人,自动代理应在与其环境的整个交互过程中遵守安全约束。汉密尔顿-雅可比(HJ)可达性等方法不是通过收集样本(包括不安全样本)来学习安全性,而是使用系统动力学模型计算具有理论保证的安全集。然而,HJ可达性不能扩展到高维系统,其保证取决于模型的质量。在这项工作中,我们将HJ可达性理论注入到约束马尔可夫决策过程(CMDP)框架中,作为通过状态-动作对的无模型更新进行安全分析的控制理论方法。此外,我们证明HJ安全值可以直接在视觉环境中学习,这是迄今为止通过该方法研究的最高维度问题。我们在几个基准任务上评估了我们的方法,包括安全健身房和学会比赛(L2R),一个最近发布的高保真自主比赛环境。与其他受约束的RL基线相比,我们的方法明显减少了违反约束的情况,并在L2R基准任务上实现了新的最新成果。 摘要:To be viable for safety-critical applications, such as autonomous driving and assistive robotics, autonomous agents should adhere to safety constraints throughout the interactions with their environments. Instead of learning about safety by collecting samples, including unsafe ones, methods such as Hamilton-Jacobi (HJ) reachability compute safe sets with theoretical guarantees using models of the system dynamics. However, HJ reachability is not scalable to high-dimensional systems, and the guarantees hinge on the quality of the model. In this work, we inject HJ reachability theory into the constrained Markov decision process (CMDP) framework, as a control-theoretical approach for safety analysis via model-free updates on state-action pairs. Furthermore, we demonstrate that the HJ safety value can be learned directly on vision context, the highest-dimensional problem studied via the method to-date. We evaluate our method on several benchmark tasks, including Safety Gym and Learn-to-Race (L2R), a recently-released high-fidelity autonomous racing environment. Our approach has significantly fewer constraint violations in comparison to other constrained RL baselines, and achieve the new state-of-the-art results on the L2R benchmark task.

【16】 More Efficient Sampling for Tensor Decomposition 标题:更有效的张量分解采样 链接:https://arxiv.org/abs/2110.07631

作者:Osman Asif Malik 备注:32 pages, 4 figures 摘要:最近的论文发展了用于CP和张量环分解的交替最小二乘(ALS)方法,每次迭代的成本在低秩分解的输入张量条目的数量上是次线性的。然而,这些方法的每次迭代成本仍然与张量模式的数量呈指数关系。在本文中,我们提出了基于抽样的ALS方法,用于CP和张量环分解,其成本不具有这种指数依赖性,从而显著改进了以前的最新技术。我们提供了详细的理论分析,并将这些方法应用于特征提取实验中。 摘要:Recent papers have developed alternating least squares (ALS) methods for CP and tensor ring decomposition with a per-iteration cost which is sublinear in the number of input tensor entries for low-rank decomposition. However, the per-iteration cost of these methods still has an exponential dependence on the number of tensor modes. In this paper, we propose sampling-based ALS methods for the CP and tensor ring decompositions whose cost does not have this exponential dependence, thereby significantly improving on the previous state-of-the-art. We provide a detailed theoretical analysis and also apply the methods in a feature extraction experiment.

【17】 Neural Dubber: Dubbing for Silent Videos According to Scripts 标题:Neuran Dubber:根据脚本为无声视频配音 链接:https://arxiv.org/abs/2110.08243

作者:Chenxu Hu,Qiao Tian,Tingle Li,Yuping Wang,Yuxuan Wang,Hang Zhao 机构:Tsinghua University, ByteDance, Shanghai Qi Zhi Institute 备注:Accepted by NeurIPS 2021 摘要:配音是重新录制演员对话的后期制作过程,广泛用于电影制作和视频制作。它通常由专业的配音演员手动执行,他们以适当的韵律朗读台词,并与预先录制的视频同步。在这项工作中,我们提出了神经配音器,第一个神经网络模型来解决一个新的自动视频配音(AVD)任务:从文本中合成与给定无声视频同步的人类语音。神经配音器是一种多模态文本到语音(TTS)模型,它利用视频中的嘴唇运动来控制生成语音的韵律。此外,还开发了基于图像的说话人嵌入(ISE)模块,用于多说话人设置,使神经配音器能够根据说话人的脸型生成具有合理音色的语音。在化学讲座单扬声器数据集和LRS2多说话者数据集上的实验表明,神经配音器可以在语音质量方面与最先进的TTS模型相媲美地生成语音音频。最重要的是,定性和定量评估表明,神经配音器可以通过视频控制合成语音的韵律,并生成与视频时间同步的高保真语音。 摘要:Dubbing is a post-production process of re-recording actors' dialogues, which is extensively used in filmmaking and video production. It is usually performed manually by professional voice actors who read lines with proper prosody, and in synchronization with the pre-recorded videos. In this work, we propose Neural Dubber, the first neural network model to solve a novel automatic video dubbing (AVD) task: synthesizing human speech synchronized with the given silent video from the text. Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech. Furthermore, an image-based speaker embedding (ISE) module is developed for the multi-speaker setting, which enables Neural Dubber to generate speech with a reasonable timbre according to the speaker's face. Experiments on the chemistry lecture single-speaker dataset and LRS2 multi-speaker dataset show that Neural Dubber can generate speech audios on par with state-of-the-art TTS models in terms of speech quality. Most importantly, both qualitative and quantitative evaluations show that Neural Dubber can control the prosody of synthesized speech by the video, and generate high-fidelity speech temporally synchronized with the video.

【18】 Choice functions based multi-objective Bayesian optimisation 标题:基于选择函数的多目标贝叶斯优化 链接:https://arxiv.org/abs/2110.08217

作者:Alessio Benavoli,Dario Azzimonti,Dario Piga 机构:School of Computer Science and Statistics, Trinity College Dublin, Ireland, Dalle Molle Institute for Artificial, Intelligence Research (IDSIA), USISUPSI, Manno, Switzerland 摘要:在这项工作中,我们引入了一个新的多目标贝叶斯优化框架,其中多目标函数只能通过选择判断来访问,例如“我在五个选项a、B、C、D、E中选择选项a、B、C”。选项D被拒绝的事实意味着在所选选项A、B、C中至少有一个选项是我严格喜欢的,而不是D(但我不必指定哪一个)。我们假设对于某些维度$n_e$存在一个潜在的向量函数f,它将期权嵌入到维度n的实向量空间中,因此选择集可以通过非支配期权的Pareto集来表示。通过将高斯过程置于f之前,并推导出一种新的选择数据似然模型,我们提出了一种选择函数学习的贝叶斯框架。然后,我们应用这个代理模型来解决一个新的多目标贝叶斯优化选择数据问题。 摘要:In this work we introduce a new framework for multi-objective Bayesian optimisation where the multi-objective functions can only be accessed via choice judgements, such as ``I pick options A,B,C among this set of five options A,B,C,D,E''. The fact that the option D is rejected means that there is at least one option among the selected ones A,B,C that I strictly prefer over D (but I do not have to specify which one). We assume that there is a latent vector function f for some dimension $n_e$ which embeds the options into the real vector space of dimension n, so that the choice set can be represented through a Pareto set of non-dominated options. By placing a Gaussian process prior on f and deriving a novel likelihood model for choice data, we propose a Bayesian framework for choice functions learning. We then apply this surrogate model to solve a novel multi-objective Bayesian optimisation from choice data problem.

【19】 Areas on the space of smooth probability density functions on S^2标题:S^2上光滑概率密度函数空间上的面积链接:https://arxiv.org/abs/2110.07773

作者:J. C. Ruíz-Pantaleón,P. Suárez-Serrato 备注:9 pages, 2 algorithms, 3 tables 摘要:我们提出了计算平面、2-环面和2-球面正密度测度空间上泊松括号的符号和数值方法。我们应用我们的方法来计算2-球情况下有限区域的辛面积,包括正密度高斯测度的一个显式例子。 摘要:We present symbolic and numerical methods for computing Poisson brackets on the spaces of measures with positive densities of the plane, the 2-torus, and the 2-sphere. We apply our methods to compute symplectic areas of finite regions for the case of the 2-sphere, including an explicit example for Gaussian measures with positive densities.

机器翻译,仅供参考

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2021-10-18,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
批量计算
批量计算(BatchCompute,Batch)是为有大数据计算业务的企业、科研单位等提供高性价比且易用的计算服务。批量计算 Batch 可以根据用户提供的批处理规模,智能地管理作业和调动其所需的最佳资源。有了 Batch 的帮助,您可以将精力集中在如何分析和处理数据结果上。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档