访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问
cs.LG 方向,今日共计99篇
Graph相关(图学习|图神经网络|图优化等)(9篇)
【1】 Self-supervised Incremental Deep Graph Learning for Ethereum Phishing Scam Detection 标题:自监督增量式深度图学习在以太网络钓鱼检测中的应用
作者:Shucheng Li,Fengyuan Xu,Runchuan Wang,Sheng Zhong 机构:National Key Lab for Novel Software Technology, Nanjing University, China 链接:https://arxiv.org/abs/2106.10176 摘要:近年来,网络钓鱼诈骗已成为第二大区块链平台以太坊涉案金额最大的犯罪类型。同时,图形神经网络(GNN)在各种节点分类任务中表现出了良好的性能。然而,对于以太坊事务数据而言,它可以自然地抽象为现实世界中的复杂图形,标签的稀缺性和事务数据的巨大容量使得利用GNN方法变得困难。本文针对这两个问题,提出了一种自监督的增量深度图学习模型(seave),用于以太坊网络钓鱼欺诈的检测。在我们的模型中,从空间和时间角度设计的两个借口任务帮助我们有效地从大量未标记的事务数据中学习有用的节点嵌入。增量范式允许我们有效地处理大规模事务数据,并帮助模型在数据分布急剧变化时保持良好的性能。我们从以太坊收集了大约半年的交易记录,我们的大量实验表明,我们的模型在传导和感应两种情况下都始终优于强基线。 摘要:In recent years, phishing scams have become the crime type with the largest money involved on Ethereum, the second-largest blockchain platform. Meanwhile, graph neural network (GNN) has shown promising performance in various node classification tasks. However, for Ethereum transaction data, which could be naturally abstracted to a real-world complex graph, the scarcity of labels and the huge volume of transaction data make it difficult to take advantage of GNN methods. Here in this paper, to address the two challenges, we propose a Self-supervised Incremental deep Graph learning model (SIEGE), for the phishing scam detection problem on Ethereum. In our model, two pretext tasks designed from spatial and temporal perspectives help us effectively learn useful node embedding from the huge amount of unlabelled transaction data. And the incremental paradigm allows us to efficiently handle large-scale transaction data and help the model maintain good performance when the data distribution is drastically changing. We collect transaction records about half a year from Ethereum and our extensive experiments show that our model consistently outperforms strong baselines in both transductive and inductive settings.
【2】 FinGAT: Financial Graph Attention Networks for Recommending Top-K Profitable Stocks 标题:FinGAT:推荐前K名盈利股票的金融图注意力网络
作者:Yi-Ling Hsu,Yu-Che Tsai,Cheng-Te Li 机构: National Taiwan University, Institute of Data Science, National Cheng Kung University 备注:Accepted to IEEE TKDE 2021. The first two authors equally contribute to this work. Code is available at this https URL 链接:https://arxiv.org/abs/2106.10159 摘要:金融科技(FinTech)已引起投资者和企业的广泛关注。虽然传统的金融科技股票分析的目标是预测股票价格,较少的努力是有利可图的股票推荐。此外,在现有的股票价格时间序列建模方法中,股票与行业(即股票类别)之间的关系要么被忽略,要么被预先定义。忽略股票关系会丢失股票之间共享的信息,而使用预定义的关系不能描述股票之间潜在的相互作用或影响。在这项工作中,我们的目标是推荐前K名盈利股票的回报率方面,利用股票价格和行业信息的时间序列。我们提出了一个新的基于深度学习的金融图注意网络(FinGAT)模型来解决在股票之间没有预定义关系的情况下的问题。FinGAT的思想有三个方面。首先,我们设计一个阶层式学习元件来学习股票时间序列的短期和长期序列模式。第二,构造股票之间的全连通图和部门之间的全连通图,以及图形注意网络,以了解股票和部门之间潜在的相互作用。第三,设计了一个多任务目标来联合推荐盈利股票和预测股票走势。在台湾股市、标准普尔500指数和纳斯达克数据集上进行的实验表明,与最先进的方法相比,我们的FinGAT具有显著的推荐性能。 摘要:Financial technology (FinTech) has drawn much attention among investors and companies. While conventional stock analysis in FinTech targets at predicting stock prices, less effort is made for profitable stock recommendation. Besides, in existing approaches on modeling time series of stock prices, the relationships among stocks and sectors (i.e., categories of stocks) are either neglected or pre-defined. Ignoring stock relationships will miss the information shared between stocks while using pre-defined relationships cannot depict the latent interactions or influence of stock prices between stocks. In this work, we aim at recommending the top-K profitable stocks in terms of return ratio using time series of stock prices and sector information. We propose a novel deep learning-based model, Financial Graph Attention Networks (FinGAT), to tackle the task under the setting that no pre-defined relationships between stocks are given. The idea of FinGAT is three-fold. First, we devise a hierarchical learning component to learn short-term and long-term sequential patterns from stock time series. Second, a fully-connected graph between stocks and a fully-connected graph between sectors are constructed, along with graph attention networks, to learn the latent interactions among stocks and sectors. Third, a multi-task objective is devised to jointly recommend the profitable stocks and predict the stock movement. Experiments conducted on Taiwan Stock, S&P 500, and NASDAQ datasets exhibit remarkable recommendation performance of our FinGAT, comparing to state-of-the-art methods.
【3】 Graph Context Encoder: Graph Feature Inpainting for Graph Generation and Self-supervised Pretraining 标题:图上下文编码器:用于图生成和自监督预训练的图特征修复
作者:Oriel Frigo,Rémy Brossard,David Dehaene 机构:AnotherBrain, Paris, France 备注:13 pages, 4 figures 链接:https://arxiv.org/abs/2106.10124 摘要:提出了一种基于图特征掩蔽和重构的图表示学习方法&图上下文编码器(GCE)。GCE模型被训练来有效地重构输入图,类似于图自动编码器,其中节点和边标签被屏蔽。特别地,我们的模型还可以通过掩蔽和重建随机伪边增强的图来改变图的结构。我们证明了GCE可以用于新的图形生成,并应用于分子生成。作为一种预训练方法,我们还证明了GCE提高了在多个标准基准图数据集上测试的监督分类任务的基线性能。 摘要:We propose the Graph Context Encoder (GCE), a simple but efficient approach for graph representation learning based on graph feature masking and reconstruction. GCE models are trained to efficiently reconstruct input graphs similarly to a graph autoencoder where node and edge labels are masked. In particular, our model is also allowed to change graph structures by masking and reconstructing graphs augmented by random pseudo-edges. We show that GCE can be used for novel graph generation, with applications for molecule generation. Used as a pretraining method, we also show that GCE improves baseline performances in supervised classification tasks tested on multiple standard benchmark graph datasets.
【4】 BinarizedAttack: Structural Poisoning Attacks to Graph-based Anomaly Detection 标题:BinarizedAttack:基于图的异常检测的结构中毒攻击
作者:Yulin Zhu,Yuni Lai,Kaifa Zhao,Xiapu Luo,Mingquan Yuan,Jian Ren,Kai Zhou 机构:∗Dept. of Computing, The Hong Kong Polytechnic University, HKSAR, email address 链接:https://arxiv.org/abs/2106.09989 摘要:基于图的异常检测(GAD)由于其强大的图表示能力以及近年来图挖掘技术的发展而日益流行。然而,这些GAD工具暴露了一个新的攻击面,具有讽刺意味的是,它们具有独特的优势,能够利用数据之间的关系。也就是说,攻击者现在可以操纵这些关系(即图的结构)以允许某些目标节点逃避检测。在本文中,我们利用这一弱点,设计了一种新型的有针对性的结构中毒攻击的代表性回归为基础的广域网系统称为OddBall。特别地,我们将对OddBall的攻击描述为一个双层优化问题,其中关键的技术挑战是在离散域中有效地解决问题。提出了一种基于梯度下降的二值化攻击方法。与现有技术相比,二值化方法能更好地利用梯度信息,特别适合于求解组合优化问题。此外,我们还研究了二值化dattack攻击的可转移性,并将其用于攻击其他基于表示学习的GAD系统。我们的综合实验表明,二值化攻击策略在攻击者预算有限的情况下,能够有效地使目标节点避开基于图的异常检测工具,在黑盒转移攻击环境下,二值化攻击策略也被测试有效,特别是,可以显著改变GAD系统学习到的节点嵌入。因此,我们的研究为研究一种新型的基于图形数据的安全分析工具打开了大门。 摘要:Graph-based Anomaly Detection (GAD) is becoming prevalent due to the powerful representation abilities of graphs as well as recent advances in graph mining techniques. These GAD tools, however, expose a new attacking surface, ironically due to their unique advantage of being able to exploit the relations among data. That is, attackers now can manipulate those relations (i.e., the structure of the graph) to allow some target nodes to evade detection. In this paper, we exploit this vulnerability by designing a new type of targeted structural poisoning attacks to a representative regression-based GAD system termed OddBall. Specially, we formulate the attack against OddBall as a bi-level optimization problem, where the key technical challenge is to efficiently solve the problem in a discrete domain. We propose a novel attack method termed BinarizedAttack based on gradient descent. Comparing to prior arts, BinarizedAttack can better use the gradient information, making it particularly suitable for solving combinatorial optimization problems. Furthermore, we investigate the attack transferability of BinarizedAttack by employing it to attack other representation-learning-based GAD systems. Our comprehensive experiments demonstrate that BinarizedAttack is very effective in enabling target nodes to evade graph-based anomaly detection tools with limited attackers' budget, and in the black-box transfer attack setting, BinarizedAttack is also tested effective and in particular, can significantly change the node embeddings learned by the GAD systems. Our research thus opens the door to studying a new type of attack against security analytic tools that rely on graph data.
【5】 Message Passing in Graph Convolution Networks via Adaptive Filter Banks 标题:基于自适应过滤银行的图卷积网络消息传递
作者:Xing Gao,Wenrui Dai,Chenglin Li,Junni Zou,Hongkai Xiong,Pascal Frossard 机构:‡Department of Electronic Engineering, Shanghai Jiao Tong University, ⋄Department of Computer Science, Shanghai Jiao Tong University, †Signal Processing Laboratory (LTS,), EPFL 链接:https://arxiv.org/abs/2106.09910 摘要:图卷积网络与消息传递图卷积网络(MPGCNs)一样,是网络数据表示学习的有力工具。然而,当数据是异构的时,大多数体系结构都受到限制,因为它们采用单一的策略来处理多通道图形信号,并且通常集中于低频信息。在本文中,我们提出了一种新的图卷积算子,称为BankGCN,它保留了消息传递模型的优点,但扩展了它们的能力,使之超越了“低通”特性。它将图上的多通道信号分解为子空间,并用自适应滤波器处理每个子空间中的特定信息。所有子空间的滤波器具有不同的频率响应,共同构成一个滤波器组。此外,谱域中的每个滤波器对应于一个消息传递方案,并且通过滤波器组实现不同的方案。重要的是,滤波器组和信号分解被联合学习以适应数据的频谱特性和目标应用。此外,与大多数现有mpgcn相比,这几乎是在没有额外参数的情况下实现的。实验结果表明,所提出的卷积算子可以在一组基准图数据集上实现良好的分类性能。 摘要:Graph convolution networks, like message passing graph convolution networks (MPGCNs), have been a powerful tool in representation learning of networked data. However, when data is heterogeneous, most architectures are limited as they employ a single strategy to handle multi-channel graph signals and they typically focus on low-frequency information. In this paper, we present a novel graph convolution operator, termed BankGCN, which keeps benefits of message passing models, but extends their capabilities beyond `low-pass' features. It decomposes multi-channel signals on graphs into subspaces and handles particular information in each subspace with an adapted filter. The filters of all subspaces have different frequency responses and together form a filter bank. Furthermore, each filter in the spectral domain corresponds to a message passing scheme, and diverse schemes are implemented via the filter bank. Importantly, the filter bank and the signal decomposition are jointly learned to adapt to the spectral characteristics of data and to target applications. Furthermore, this is implemented almost without extra parameters in comparison with most existing MPGCNs. Experimental results show that the proposed convolution operator permits to achieve excellent performance in graph classification on a collection of benchmark graph datasets.
【6】 Anomaly Detection in Dynamic Graphs via Transformer 标题:基于Transformer的动态图异常检测
作者:Yixin Liu,Shirui Pan,Yu Guang Wang,Fei Xiong,Liang Wang,Vincent CS Lee 机构:WangiswiththeMaxPlanckInstituteforMathemat-ics in Sciences, Xiong is with Key Laboratory of Communication and InformationSystems 备注:12 pages, 5 figures 链接:https://arxiv.org/abs/2106.09876 摘要:由于动态图在社会网络、电子商务和网络安全等领域的广泛应用,其异常检测问题越来越受到人们的关注。最近的基于深度学习的方法已经显示出比浅层方法有希望的结果。然而,它们未能解决动态图中异常检测的两个核心问题:未分配节点缺乏信息编码和从耦合时空动态图中学习区分知识的困难。为了克服这些挑战,本文提出了一种新的基于Transformer的动态图异常检测框架TADDY。我们的框架构造了一个全面的节点编码策略,以更好地表示每个节点在演化图流中的结构和时间角色。同时,TADDY通过一个动态图转换器模型从具有耦合时空模式的动态图中获取信息表示。大量的实验结果表明,我们提出的TADDY框架在四个真实数据集上的性能大大优于现有的方法。 摘要:Detecting anomalies for dynamic graphs has drawn increasing attention due to their wide applications in social networks, e-commerce, and cybersecurity. The recent deep learning-based approaches have shown promising results over shallow methods. However, they fail to address two core challenges of anomaly detection in dynamic graphs: the lack of informative encoding for unattributed nodes and the difficulty of learning discriminate knowledge from coupled spatial-temporal dynamic graphs. To overcome these challenges, in this paper, we present a novel Transformer-based Anomaly Detection framework for DYnamic graph (TADDY). Our framework constructs a comprehensive node encoding strategy to better represent each node's structural and temporal roles in an evolving graphs stream. Meanwhile, TADDY captures informative representation from dynamic graphs with coupled spatial-temporal patterns via a dynamic graph transformer model. The extensive experimental results demonstrate that our proposed TADDY framework outperforms the state-of-the-art methods by a large margin on four real-world datasets.
【7】 Towards Clustering-friendly Representations: Subspace Clustering via Graph Filtering 标题:面向聚类友好表示:基于图滤波子空间聚类
作者:Zhengrui Ma,Zhao Kang,Guangchun Luo,Ling Tian 机构:School of Computer Science and, Engineering, University of Electronic, Science and Technology of China, School of Information and Software 备注:Published in ACM Multimedia 2020 链接:https://arxiv.org/abs/2106.09874 摘要:在许多应用中,为特定的任务找到一个合适的数据表示是至关重要的。子空间聚类的成功与否取决于能否将数据划分成不同的子空间。然而,这个简单的假设并不总是成立的,因为原始数据可能不可分为子空间。为了恢复“聚类友好”的表示并便于后续的聚类,我们提出了一种图过滤方法,通过这种方法可以获得平滑的表示。具体地说,它通过应用一个低通滤波器将图的相似性注入到数据特征中,以提取有用的数据表示用于聚类。对图像和文档聚类数据集的大量实验表明,该方法改进了现有的子空间聚类技术。特别是,它与深度学习方法的可比性强调了简单图滤波方案在许多实际应用中的有效性。研究表明,图滤波可以去除噪声,保持图像的结构,提高分类的可分性。 摘要:Finding a suitable data representation for a specific task has been shown to be crucial in many applications. The success of subspace clustering depends on the assumption that the data can be separated into different subspaces. However, this simple assumption does not always hold since the raw data might not be separable into subspaces. To recover the ``clustering-friendly'' representation and facilitate the subsequent clustering, we propose a graph filtering approach by which a smooth representation is achieved. Specifically, it injects graph similarity into data features by applying a low-pass filter to extract useful data representations for clustering. Extensive experiments on image and document clustering datasets demonstrate that our method improves upon state-of-the-art subspace clustering techniques. Especially, its comparable performance with deep learning methods emphasizes the effectiveness of the simple graph filtering scheme for many real-world applications. An ablation study shows that graph filtering can remove noise, preserve structure in the image, and increase the separability of classes.
【8】 Unsupervised Resource Allocation with Graph Neural Networks 标题:基于图神经网络的无监督资源分配
作者:Miles Cranmer,Peter Melchior,Brian Nord 机构:Princeton University, Princeton, NJ , USA, Fermilab, Batavia, IL , USA 备注:Accepted to PMLR/contributed oral at NeurIPS 2020 Pre-registration Workshop. Code at this https URL 链接:https://arxiv.org/abs/2106.09761 摘要:我们提出了一种通过学习如何在无监督的方式分配资源来最大化全局效用函数的方法。我们期望分配目标之间的相互作用是重要的,因此建议学习具有GNN的近似最优分配策略的报酬结构。通过放松资源约束,我们可以采用基于梯度的优化,而不是更标准的进化算法。我们的算法是由现代天文学中的一个问题驱动的,在这个问题中,我们需要根据10^9$个星系中有限的初始信息来选择那些详细测量将导致对宇宙组成的最佳推断的星系。我们的技术提供了一种灵活地学习分配策略的方法,只需要针对感兴趣的物理和测量过程的前向模拟器。我们期望我们的技术也能在一系列资源分配问题中得到应用。 摘要:We present an approach for maximizing a global utility function by learning how to allocate resources in an unsupervised way. We expect interactions between allocation targets to be important and therefore propose to learn the reward structure for near-optimal allocation policies with a GNN. By relaxing the resource constraint, we can employ gradient-based optimization in contrast to more standard evolutionary algorithms. Our algorithm is motivated by a problem in modern astronomy, where one needs to select-based on limited initial information-among $10^9$ galaxies those whose detailed measurement will lead to optimal inference of the composition of the universe. Our technique presents a way of flexibly learning an allocation strategy by only requiring forward simulators for the physics of interest and the measurement process. We anticipate that our technique will also find applications in a range of resource allocation problems.
【9】 Hybrid graph convolutional neural networks for landmark-based anatomical segmentation 标题:基于地标的混合图卷积神经网络解剖分割
作者:Nicolás Gaggion,Lucas Mansilla,Diego Milone,Enzo Ferrante 机构:Research Institute for Signals, Systems and Computational Intelligence, sinc(i), CONICET, Universidad Nacional del Litoral, Santa Fe, Argentina 备注:Accepted for publication at MICCAI 2021 链接:https://arxiv.org/abs/2106.09832 摘要:在这项工作中,我们解决的问题,地标为基础的分割解剖结构。我们提出HybridGNet,一种编码器-解码器神经结构,它结合了用于图像特征编码的标准卷积和用于解码解剖结构合理表示的图卷积神经网络。我们在考虑了其他标准地标和基于像素的胸部x射线图像解剖分割模型的基础上,对所提出的结构进行了测试,发现hybridgenet对图像遮挡更具鲁棒性。我们还表明,它可以用来构建基于地标的分割从像素级注释。我们的实验结果表明,HybridGNet通过在解码过程中通过频谱卷积自然地结合形状约束,从而产生准确且在解剖学上合理的基于路标的分割。 摘要:In this work we address the problem of landmark-based segmentation for anatomical structures. We propose HybridGNet, an encoder-decoder neural architecture which combines standard convolutions for image feature encoding, with graph convolutional neural networks to decode plausible representations of anatomical structures. We benchmark the proposed architecture considering other standard landmark and pixel-based models for anatomical segmentation in chest x-ray images, and found that HybridGNet is more robust to image occlusions. We also show that it can be used to construct landmark-based segmentations from pixel level annotations. Our experimental results suggest that HybridGNet produces accurate and anatomically plausible landmark-based segmentations, by naturally incorporating shape constraints within the decoding process via spectral convolutions.
Transformer(3篇)
【1】 How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers 标题:如何训练你的VIT?视觉变换器中的数据、增强和正则化
作者:Andreas Steiner,Alexander Kolesnikov,Xiaohua Zhai,Ross Wightman,Jakob Uszkoreit,Lucas Beyer 机构:Google Research, Brain Team; †independent researcher 备注:Andreas, Alex, Xiaohua and Lucas contributed equally. We release more than 50'000 ViT models trained under diverse settings on various datasets. We believe this to be a treasure trove for model analysis. Available at this https URL and this https URL 链接:https://arxiv.org/abs/2106.10270 摘要:视觉变换器(ViT)在图像分类、目标检测和语义图像分割等领域具有很强的竞争力。与卷积神经网络相比,在较小的训练数据集上训练时,视觉变换器较弱的感应偏差通常会导致对模型正则化或数据增强(简称“AugReg”)的依赖性增加。为了更好地理解训练数据量、AugReg、模型大小和计算预算之间的相互作用,我们进行了系统的实证研究。作为这项研究的一个结果,我们发现,增加计算和AugReg的组合可以产生与在一个数量级以上的训练数据上训练的模型具有相同性能的模型:我们在公共ImageNet-21k数据集上训练各种大小的ViT模型,这些模型与在更大的数据集上训练的对应模型相匹配或优于它们,但JFT-300M数据集尚未公开。 摘要:Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation. In comparison to convolutional neural networks, the Vision Transformer's weaker inductive bias is generally found to cause an increased reliance on model regularization or data augmentation (``AugReg'' for short) when training on smaller training datasets. We conduct a systematic empirical study in order to better understand the interplay between the amount of training data, AugReg, model size and compute budget. As one result of this study we find that the combination of increased compute and AugReg can yield models with the same performance as models trained on an order of magnitude more training data: we train ViT models of various sizes on the public ImageNet-21k dataset which either match or outperform their counterparts trained on the larger, but not publicly available JFT-300M dataset.
【2】 BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models 标题:BitFit:基于Transformer屏蔽语言模型的简单参数高效微调
作者:Elad Ben Zaken,Shauli Ravfogel,Yoav Goldberg 机构:Computer Science Department, Bar Ilan University, Allen Institute for Artificial Intelligence 链接:https://arxiv.org/abs/2106.10199 摘要:我们表明,对于中小型训练数据,仅微调预训练BERT模型的偏差项(或偏差项的子集)与微调整个模型具有竞争性(有时甚至优于微调)。对于较大的数据,只有偏差的微调方法与其他稀疏的微调方法是有竞争力的。除了实用性之外,这些发现还与理解常用的微调过程有关:它们支持这样的假设,即微调主要是暴露由语言建模训练诱导的知识,而不是学习新的任务特定的语言知识。 摘要:We show that with small-to-medium training data, fine-tuning only the bias terms (or a subset of the bias terms) of pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, bias-only fine-tuning is competitive with other sparse fine-tuning methods. Besides their practical utility, these findings are relevant for the question of understanding the commonly-used process of finetuning: they support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge.
【3】 Efficient Self-supervised Vision Transformers for Representation Learning 标题:用于表征学习的高效自监督视觉转换器
作者:Chunyuan Li,Jianwei Yang,Pengchuan Zhang,Mei Gao,Bin Xiao,Xiyang Dai,Lu Yuan,Jianfeng Gao 机构:Microsoft Research at Redmond, Microsoft Cloud + AI 备注:24 pages, 12 figures, file size 13.6MB 链接:https://arxiv.org/abs/2106.09785 摘要:研究了两种用于视觉表征学习的高效自监督视觉变换器(EsViT)。首先,我们通过一个全面的实证研究表明,具有稀疏自关注的多阶段体系结构可以显著降低建模复杂度,但代价是失去捕获图像区域间细粒度对应关系的能力。其次,我们提出了一个新的区域匹配预训练任务,使得模型能够捕捉到细粒度的区域依赖关系,从而显著提高了学习视觉表示的质量。我们的结果表明,结合这两种技术,EsViT在ImageNet线性探针评估中达到81.3%的top-1,在大约一个数量级的更高吞吐量下优于现有技术。当转移到下游线性分类任务时,EsViT在18个数据集中的17个数据集上优于其监督的同类。代码和模型将公开。 摘要:This paper investigates two techniques for developing efficient self-supervised vision transformers (EsViT) for visual representation learning. First, we show through a comprehensive empirical study that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity but with a cost of losing the ability to capture fine-grained correspondences between image regions. Second, we propose a new pre-training task of region matching which allows the model to capture fine-grained region dependencies and as a result significantly improves the quality of the learned vision representations. Our results show that combining the two techniques, EsViT achieves 81.3% top-1 on the ImageNet linear probe evaluation, outperforming prior arts with around an order magnitude of higher throughput. When transferring to downstream linear classification tasks, EsViT outperforms its supervised counterpart on 17 out of 18 datasets. The code and models will be publicly available.
GAN|对抗|攻击|生成相关(12篇)
【1】 Less is More: Feature Selection for Adversarial Robustness with Compressive Counter-Adversarial Attacks 标题:少即是多:利用压缩反对抗攻击实现对抗健壮性的特征选择
作者:Emre Ozfatura,Muhammad Zaid Hameed,Kerem Ozfatura,Deniz Gunduz 链接:https://arxiv.org/abs/2106.10252 摘要:关于对抗性攻击的一个常见观察是,它们主要导致倒数第二层的错误激活来欺骗分类器。假设这些激活值对应于输入的某些特征,目标就变成选择对分类最有用的特征。因此,我们提出了一种新的方法来识别重要的特征,采用对抗性攻击,这突出了倒数第二层对输入样本扰动的一致性。首先,我们从经验上证明了存在一个基于分类的特征子集,它弥补了干净和稳健精度之间的差距。其次,我们提出了一种简单而有效的机制,通过搜索输入样本的邻域来识别这些特征。然后通过观察倒数第二层激活值的一致性来选择特征。 摘要:A common observation regarding adversarial attacks is that they mostly give rise to false activation at the penultimate layer to fool the classifier. Assuming that these activation values correspond to certain features of the input, the objective becomes choosing the features that are most useful for classification. Hence, we propose a novel approach to identify the important features by employing counter-adversarial attacks, which highlights the consistency at the penultimate layer with respect to perturbations on input samples. First, we empirically show that there exist a subset of features, classification based in which bridge the gap between the clean and robust accuracy. Second, we propose a simple yet efficient mechanism to identify those features by searching the neighborhood of input sample. We then select features by observing the consistency of the activation values at the penultimate layer.
【2】 Residual Error: a New Performance Measure for Adversarial Robustness 标题:残差:一种新的对抗鲁棒性性能度量
作者:Hossein Aboutalebi,Mohammad Javad Shafiee,Michelle Karg,Christian Scharfenberger,Alexander Wong 机构:Waterloo AI Institute, University of Waterloo, Waterloo, Ontario, Canada, ADC Automotive Distance Control Systems GmbH, Continental, Germany, DarwinAI Corp., Canada 链接:https://arxiv.org/abs/2106.10212 摘要:尽管在过去十年中,深度学习取得了重大进展,但限制深度学习广泛应用的一个主要挑战是,深度学习在对抗性攻击中的脆弱性。在存在不利扰动数据的情况下,这种对错误预测的敏感性使得深层神经网络很难用于某些现实世界的任务关键型应用。虽然大部分的研究重点都围绕着对抗性例子的创建和对抗性强化,但是评估对抗性稳健性的性能度量的领域还没有得到很好的探索。基于此,本研究提出了残差的概念,残差是一种新的性能指标,不仅可以在个体样本水平上评估深层神经网络的对抗鲁棒性,还可以用来区分对抗性和非对抗性样本,以便于对抗性样本检测。此外,我们还引入了一个混合模型来逼近残差。以图像分类为例的实验结果表明,所提出的残差度量方法对于评价几种常见的深度神经网络结构是有效的。这些结果表明,所提出的方法不仅可用于评估任务关键场景中使用的深度神经网络的鲁棒性,而且可用于设计对抗鲁棒模型。 摘要:Despite the significant advances in deep learning over the past decade, a major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks. This sensitivity to making erroneous predictions in the presence of adversarially perturbed data makes deep neural networks difficult to adopt for certain real-world, mission-critical applications. While much of the research focus has revolved around adversarial example creation and adversarial hardening, the area of performance measures for assessing adversarial robustness is not well explored. Motivated by this, this study presents the concept of residual error, a new performance measure for not only assessing the adversarial robustness of a deep neural network at the individual sample level, but also can be used to differentiate between adversarial and non-adversarial examples to facilitate for adversarial example detection. Furthermore, we introduce a hybrid model for approximating the residual error in a tractable manner. Experimental results using the case of image classification demonstrates the effectiveness and efficacy of the proposed residual error metric for assessing several well-known deep neural network architectures. These results thus illustrate that the proposed measure could be a useful tool for not only assessing the robustness of deep neural networks used in mission-critical scenarios, but also in the design of adversarially robust models.
【3】 Federated Robustness Propagation: Sharing Adversarial Robustness in Federated Learning 标题:联合健壮性传播:联合学习中的对抗性健壮性共享
作者:Junyuan Hong,Haotao Wang,Zhangyang Wang,Jiayu Zhou 机构:Department of Computer Science and Engineering, Michigan State University, East Lansing, MI , USA, †Department of Electrical and Computer Engineering, University of Texas at Austin, Austin TX , USA 链接:https://arxiv.org/abs/2106.10196 摘要:联邦学习(FL)是一种流行的分布式学习模式,它从一组参与的用户那里学习模型,而不需要共享原始数据。FL的一个主要挑战来自用户的异构性,用户可能具有分布不同(或非iid)的数据和不同的计算资源。与集中式学习一样,FL用户也希望模型在测试时对恶意攻击者具有鲁棒性。尽管对抗式训练(AT)为集中学习提供了一个良好的解决方案,但扩展其在FL用户中的使用带来了巨大的挑战,因为许多用户可能拥有非常有限的训练数据以及紧张的计算预算,无法负担AT所需的数据和昂贵的成本。在本文中,我们研究了一种新的学习环境,在FL过程中,它将对抗性健壮性从高资源用户(可以负担AT)传播到低资源用户(不能负担AT)。我们证明了现有的FL技术不能有效地在非iid用户之间传播对抗性健壮性,并提出了一种简单而有效的传播方法,该方法通过精心设计的批标准化统计来传递健壮性。通过大量的实验验证了该方法的合理性和有效性。特别是当学习过程中只有一小部分用户负担得起AT时,所提出的方法也能赋予FL显著的鲁棒性。代码将在验收后发布。 摘要:Federated learning (FL) emerges as a popular distributed learning schema that learns a model from a set of participating users without requiring raw data to be shared. One major challenge of FL comes from heterogeneity in users, which may have distributionally different (or non-iid) data and varying computation resources. Just like in centralized learning, FL users also desire model robustness against malicious attackers at test time. Whereas adversarial training (AT) provides a sound solution for centralized learning, extending its usage for FL users has imposed significant challenges, as many users may have very limited training data as well as tight computational budgets, to afford the data-hungry and costly AT. In this paper, we study a novel learning setting that propagates adversarial robustness from high-resource users that can afford AT, to those low-resource users that cannot afford it, during the FL process. We show that existing FL techniques cannot effectively propagate adversarial robustness among non-iid users, and propose a simple yet effective propagation approach that transfers robustness through carefully designed batch-normalization statistics. We demonstrate the rationality and effectiveness of our method through extensive experiments. Especially, the proposed method is shown to grant FL remarkable robustness even when only a small portion of users afford AT during learning. Codes will be published upon acceptance.
【4】 Adversarial Training Helps Transfer Learning via Better Representations 标题:对抗性训练通过更好的表征帮助迁移学习
作者:Zhun Deng,Linjun Zhang,Kailas Vodrahalli,Kenji Kawaguchi,James Zou 机构:edu 3Stanford University 链接:https://arxiv.org/abs/2106.10189 摘要:迁移学习的目的是利用在源数据上预先训练的模型来有效地适应目标设定,因为只有有限的数据可用于模型微调。最近的研究经验表明,源数据中的对抗性训练可以提高模型向新领域转移的能力。然而,为什么会发生这种情况还不得而知。在本文中,我们提供了一个理论模型来严格分析对抗性训练如何帮助迁移学习。我们表明,源数据中的对抗性训练可以产生更好的表示,因此在这种表示的基础上进行微调可以得到更准确的目标数据预测。我们进一步从理论和实证两个方面证明了源数据中的半监督学习也可以通过同样地改进表示来改进迁移学习。此外,在半监督学习的基础上进行对抗性训练可以进一步提高可转移性,这表明这两种方法在表征上具有互补的优势。我们用流行的数据集和深度学习架构的实验来支持我们的理论。 摘要:Transfer learning aims to leverage models pre-trained on source data to efficiently adapt to target setting, where only limited data are available for model fine-tuning. Recent works empirically demonstrate that adversarial training in the source data can improve the ability of models to transfer to new domains. However, why this happens is not known. In this paper, we provide a theoretical model to rigorously analyze how adversarial training helps transfer learning. We show that adversarial training in the source data generates provably better representations, so fine-tuning on top of this representation leads to a more accurate predictor of the target data. We further demonstrate both theoretically and empirically that semi-supervised learning in the source data can also improve transfer learning by similarly improving the representation. Moreover, performing adversarial training on top of semi-supervised learning can further improve transferability, suggesting that the two approaches have complementary benefits on representations. We support our theories with experiments on popular data sets and deep learning architectures.
【5】 World-GAN: a Generative Model for Minecraft Worlds 标题:World-GAN:“我的世界”的产生式模型
作者:Maren Awiszus,Frederik Schubert,Bodo Rosenhahn 机构:Institut f¨ur Informationsverarbeitung, Leibniz University Hannover, Hannover, Germany 备注:8 pages, 8 figures, IEEE Conference on Games (CoG) 2021 链接:https://arxiv.org/abs/2106.10155 摘要:这项工作介绍了World-GAN,第一种通过机器学习在Minecraft中通过单个示例执行数据驱动的程序内容生成的方法。基于三维生成对抗网络(GAN)架构,我们可以从给定的样本中创建任意大小的世界片段。我们评估我们的方法,从社区的创作以及结构生成的地雷世界发电机。我们的方法是基于word2vec[1]引入的自然语言处理(NLP)中的密集表示。提出的block2vec表示使世界GAN独立于不同块的数量,而不同块的数量在Minecraft中可能会有很大的变化,并且能够生成更大的级别。最后,我们证明了改变这个新的表示空间可以改变已经训练过的生成器的生成样式。World GAN使其用户能够根据他们的部分作品生成雷击世界。 摘要:This work introduces World-GAN, the first method to perform data-driven Procedural Content Generation via Machine Learning in Minecraft from a single example. Based on a 3D Generative Adversarial Network (GAN) architecture, we are able to create arbitrarily sized world snippets from a given sample. We evaluate our approach on creations from the community as well as structures generated with the Minecraft World Generator. Our method is motivated by the dense representations used in Natural Language Processing (NLP) introduced with word2vec [1]. The proposed block2vec representations make World-GAN independent from the number of different blocks, which can vary a lot in Minecraft, and enable the generation of larger levels. Finally, we demonstrate that changing this new representation space allows us to change the generated style of an already trained generator. World-GAN enables its users to generate Minecraft worlds based on parts of their creations.
【6】 The Dimpled Manifold Model of Adversarial Examples in Machine Learning 标题:机器学习中对抗性例子的凹陷流形模型
作者:Adi Shamir,Odelia Melamed,Oriel BenShmuel 机构:Weizmann Institute of Science, Israel 链接:https://arxiv.org/abs/2106.10151 摘要:2013年,多个研究小组独立发现了深部神经网络在输入受到微小扰动时的极端脆弱性,但尽管付出了巨大努力,这些对立的例子仍然是一个令人困惑的现象,没有明确的解释。本文介绍了一个新的概念框架(我们称之为酒窝流形模型),它简单地解释了为什么存在对抗性例子,为什么它们的扰动有如此微小的范数,为什么这些扰动看起来像随机噪声,为什么用错误标记的图像进行对抗性训练的网络仍然能够正确地对测试图像进行分类。在论文的最后一部分,我们描述了大量实验的结果,这些结果有力地支持了这一新模型,特别是我们的观点,即敌对扰动大致垂直于包含所有训练实例的低维流形。 摘要:The extreme fragility of deep neural networks when presented with tiny perturbations in their inputs was independently discovered by several research groups in 2013, but in spite of enormous effort these adversarial examples remained a baffling phenomenon with no clear explanation. In this paper we introduce a new conceptual framework (which we call the Dimpled Manifold Model) which provides a simple explanation for why adversarial examples exist, why their perturbations have such tiny norms, why these perturbations look like random noise, and why a network which was adversarially trained with incorrectly labeled images can still correctly classify test images. In the last part of the paper we describe the results of numerous experiments which strongly support this new model, and in particular our assertion that adversarial perturbations are roughly perpendicular to the low dimensional manifold which contains all the training examples.
【7】 Accumulative Poisoning Attacks on Real-time Data 标题:针对实时数据的累积中毒攻击
作者:Tianyu Pang,Xiao Yang,Yinpeng Dong,Hang Su,Jun Zhu 机构:Department of Computer Science & Technology, Tsinghua University 链接:https://arxiv.org/abs/2106.09993 摘要:从不可信的来源收集训练数据会使机器学习服务暴露给恶意操作训练数据以降低模型精度的对手。当对离线数据集进行训练时,中毒对手必须在训练前提前注入中毒数据,并且将这些中毒数据批输入模型的顺序是随机的。与此相反,实际系统通常是在顺序捕获的实时数据上进行训练/微调,在这种情况下,中毒对手可以根据当前模型状态动态地对每个数据批进行中毒。在本文中,我们着眼于实时设置,提出了一种新的攻击策略,它将中毒攻击与累积阶段相关联,在不影响准确度的前提下,秘密地放大(中毒)触发批次的破坏效果。通过模拟CIFAR-10上的在线学习和联邦学习,我们发现在累积阶段之后,触发批上的一个更新步骤将显著降低模型的精度。我们的工作验证了一个设计良好但简单的攻击策略可以显著地放大中毒效应,而不需要探索复杂的技术。 摘要:Collecting training data from untrusted sources exposes machine learning services to poisoning adversaries, who maliciously manipulate training data to degrade the model accuracy. When trained on offline datasets, poisoning adversaries have to inject the poisoned data in advance before training, and the order of feeding these poisoned batches into the model is stochastic. In contrast, practical systems are more usually trained/fine-tuned on sequentially captured real-time data, in which case poisoning adversaries could dynamically poison each data batch according to the current model state. In this paper, we focus on the real-time settings and propose a new attacking strategy, which affiliates an accumulative phase with poisoning attacks to secretly (i.e., without affecting accuracy) magnify the destructive effect of a (poisoned) trigger batch. By mimicking online learning and federated learning on CIFAR-10, we show that the model accuracy will significantly drop by a single update step on the trigger batch after the accumulative phase. Our work validates that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects, with no need to explore complex techniques.
【8】 On the Connections between Counterfactual Explanations and Adversarial Examples 标题:论反事实解释与对抗性例证的联系
作者:Martin Pawelczyk,Shalmali Joshi,Chirag Agarwal,Sohini Upadhyay,Himabindu Lakkaraju 机构:University of Tübingen, Harvard University 链接:https://arxiv.org/abs/2106.09992 摘要:反事实解释和对抗性例子已经成为解决机器学习(ML)的可解释性和鲁棒性目标的关键研究领域。反事实解释的目的是向受到算法决策不利影响的个人提供求助,而对抗性例子则是为了暴露ML模型的弱点。虽然先前的研究已经暗示了这些框架之间的共同点,但是对于系统地探讨反事实解释和对抗性例子之间的联系的工作却很少甚至没有。在这项工作中,我们首次尝试将反事实解释和对抗性例子之间的联系形式化。更具体地说,我们从理论上分析了显著的反事实解释和对抗性例子生成方法,并强调了它们行为相似的条件。分析表明,Wachter等人、Carlini和Wagner等人(均方误差损失)提出的反事实解释和对抗性例子生成方法,以及Zhao等人提出的C-CHVAE和自然对抗性例子生成方法是等价的。我们还限定了Wachter等人产生的反事实解释和对抗性例子之间的距离,以及线性模型的DeepFool方法。最后,我们用大量的合成和真实数据集的实验来验证我们的理论发现。 摘要:Counterfactual explanations and adversarial examples have emerged as critical research areas for addressing the explainability and robustness goals of machine learning (ML). While counterfactual explanations were developed with the goal of providing recourse to individuals adversely impacted by algorithmic decisions, adversarial examples were designed to expose the vulnerabilities of ML models. While prior research has hinted at the commonalities between these frameworks, there has been little to no work on systematically exploring the connections between the literature on counterfactual explanations and adversarial examples. In this work, we make one of the first attempts at formalizing the connections between counterfactual explanations and adversarial examples. More specifically, we theoretically analyze salient counterfactual explanation and adversarial example generation methods, and highlight the conditions under which they behave similarly. Our analysis demonstrates that several popular counterfactual explanation and adversarial example generation methods such as the ones proposed by Wachter et. al. and Carlini and Wagner (with mean squared error loss), and C-CHVAE and natural adversarial examples by Zhao et. al. are equivalent. We also bound the distance between counterfactual explanations and adversarial examples generated by Wachter et. al. and DeepFool methods for linear models. Finally, we empirically validate our theoretical findings using extensive experimentation with synthetic and real world datasets.
【9】 Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples 标题:攻击失败的指标:对抗性实例的调试和改进优化
作者:Maura Pintor,Luca Demetrio,Angelo Sotgiu,Giovanni Manca,Ambra Demontis,Nicholas Carlini,Battista Biggio,Fabio Roli 机构:University of Cagliari, Italy, Pluribus One, Google 链接:https://arxiv.org/abs/2106.09947 摘要:评估机器学习模型对对抗性例子的鲁棒性是一个具有挑战性的问题。许多防御已经被证明通过导致基于梯度的攻击失败而提供虚假的安全感,并且在更严格的评估下它们已经被破坏。尽管已经提出了一些指导方针和最佳做法来改进目前的对抗性稳健性评估,但由于缺乏自动测试和调试工具,很难系统地应用这些建议。在这项工作中,我们通过(i)定义一组定量指标来克服这些限制,这些指标揭示了基于梯度的攻击优化中的常见故障,以及(ii)在系统评估协议中提出具体的缓解策略。我们广泛的实验分析表明,提出的失效指标可以用于可视化、调试和改进当前的对抗性稳健性评估,为实现当前对抗性稳健性评估的自动化和系统化迈出了具体的第一步。我们的开放源代码可从以下网址获得:https://github.com/pralab/IndicatorsOfAttackFailure. 摘要:Evaluating robustness of machine-learning models to adversarial examples is a challenging problem. Many defenses have been shown to provide a false sense of security by causing gradient-based attacks to fail, and they have been broken under more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic manner. In this work, we overcome these limitations by (i) defining a set of quantitative indicators which unveil common failures in the optimization of gradient-based attacks, and (ii) proposing specific mitigation strategies within a systematic evaluation protocol. Our extensive experimental analysis shows that the proposed indicators of failure can be used to visualize, debug and improve current adversarial robustness evaluations, providing a first concrete step towards automatizing and systematizing current adversarial robustness evaluations. Our open-source code is available at: https://github.com/pralab/IndicatorsOfAttackFailure.
【10】 Evolving GANs: When Contradictions Turn into Compliance 标题:演变中的甘斯:当矛盾转变为顺从
作者:Sauptik Dhar,Javad Heydari,Samarth Tripathi,Unmesh Kurup,Mohak Shah 机构:America Research Lab, LG Electronics, Great America Pkwy, Santa Clara, CA, USA 备注:Generative Adversarial Networks, Universum Learning, Semi-Supervised Learning 链接:https://arxiv.org/abs/2106.09946 摘要:标记数据的有限可用性使得任何监督学习问题都具有挑战性。半监督学习和universum学习等替代学习设置减轻了对标记数据的依赖,但仍然需要大量的未标记数据,这些数据可能不可用或获取成本高昂。基于GAN的合成数据生成方法最近通过生成合成样本来改进手头的任务,显示出了良好的前景。但是,这些样品不能用于其他目的。在本文中,我们提出了一个GAN游戏,在有限的数据设置下提供了改进的鉴别器精度,同时生成真实的合成数据。这提供了一个额外的优势,即现在生成的数据可以用于其他类似的任务。我们提供了理论保证和实证结果来支持我们的方法。 摘要:Limited availability of labeled-data makes any supervised learning problem challenging. Alternative learning settings like semi-supervised and universum learning alleviate the dependency on labeled data, but still require a large amount of unlabeled data, which may be unavailable or expensive to acquire. GAN-based synthetic data generation methods have recently shown promise by generating synthetic samples to improve task at hand. However, these samples cannot be used for other purposes. In this paper, we propose a GAN game which provides improved discriminator accuracy under limited data settings, while generating realistic synthetic data. This provides the added advantage that now the generated data can be used for other similar tasks. We provide the theoretical guarantees and empirical results in support of our approach.
【11】 A Unified Generative Adversarial Network Training via Self-Labeling and Self-Attention 标题:基于自我标记和自我注意的统一生成性对抗性网络训练
作者:Tomoki Watanabe,Paolo Favaro 机构: The first network is trained toThis work has been done while the first author was visitingthe University of Bern, Japan 2University of Bern 链接:https://arxiv.org/abs/2106.09914 摘要:我们提出了一种新颖的GAN训练方案,可以统一处理任何级别的标记。我们的方案引入了一种人工标签的形式,可以包含手动定义的标签(如果可用),并诱导它们之间的对齐。为了定义人工标签,我们假设神经网络生成器可以更容易地被训练来将附近的潜在向量映射到语义相似的数据,而不是跨不同的类别。我们使用生成的数据样本和相应的人工条件标签来训练分类器。然后使用分类器自标记真实数据。为了提高自标记的精度,我们还使用了指数移动平均的分类器。然而,由于分类器仍然可能出错,特别是在训练开始时,我们还通过自我注意来细化标签,只在分类器输出高分类概率分数时才使用真实数据样本的标签。我们在CIFAR-10、STL-10和SVHN上对我们的方法进行了评估,结果表明,自标记和自注意都能持续地提高生成数据的质量。更令人惊讶的是,我们发现所提出的方案甚至可以优于类条件GANs。 摘要:We propose a novel GAN training scheme that can handle any level of labeling in a unified manner. Our scheme introduces a form of artificial labeling that can incorporate manually defined labels, when available, and induce an alignment between them. To define the artificial labels, we exploit the assumption that neural network generators can be trained more easily to map nearby latent vectors to data with semantic similarities, than across separate categories. We use generated data samples and their corresponding artificial conditioning labels to train a classifier. The classifier is then used to self-label real data. To boost the accuracy of the self-labeling, we also use the exponential moving average of the classifier. However, because the classifier might still make mistakes, especially at the beginning of the training, we also refine the labels through self-attention, by using the labeling of real data samples only when the classifier outputs a high classification probability score. We evaluate our approach on CIFAR-10, STL-10 and SVHN, and show that both self-labeling and self-attention consistently improve the quality of generated data. More surprisingly, we find that the proposed scheme can even outperform class-conditional GANs.
【12】 Bad Characters: Imperceptible NLP Attacks 标题:坏人:潜移默化的NLP攻击
作者:Nicholas Boucher,Ilia Shumailov,Ross Anderson,Nicolas Papernot 机构:University of Cambridge, Cambridge, United Kingdom, Vector Institute, University of Toronto, University of Edinburgh, Toronto, Canada 链接:https://arxiv.org/abs/2106.09898 摘要:多年的研究表明,无论是在理论上还是在实践中,机器学习系统都很容易受到对抗性例子的攻击。到目前为止,此类攻击主要针对视觉模型,利用人类和机器感知之间的差距。尽管基于文本的模型也受到了对抗性示例的攻击,但这种攻击很难保持语义和不可区分性。在本文中,我们探讨了一大类敌对的例子,可以用来攻击文本为基础的模型在一个黑盒子的设置,而不作出任何人类可感知的视觉修改的输入。我们使用人眼无法察觉的特定编码扰动来操纵从神经机器翻译管道到网络搜索引擎的各种自然语言处理(NLP)系统的输出。我们发现,通过一次不可察觉的编码注入(表示一个不可见字符、同形符、重新排序或删除),攻击者可以显著降低易受攻击模型的性能,而通过三次注入,大多数模型都可能在功能上被破坏。我们的攻击针对当前部署的商业系统,包括微软和谷歌生产的系统,以及Facebook和IBM发布的开源模型。这一系列新的攻击对许多语言处理系统构成了重大威胁:攻击者可以有针对性地影响系统,而无需对底层模型进行任何假设。我们的结论是,基于文本的NLP系统需要仔细的输入净化,就像传统的应用程序一样,并且考虑到这种系统现在正在大规模快速部署,迫切需要架构师和操作员的关注。 摘要:Several years of research have shown that machine-learning systems are vulnerable to adversarial examples, both in theory and in practice. Until now, such attacks have primarily targeted visual models, exploiting the gap between human and machine perception. Although text-based models have also been attacked with adversarial examples, such attacks struggled to preserve semantic meaning and indistinguishability. In this paper, we explore a large class of adversarial examples that can be used to attack text-based models in a black-box setting without making any human-perceptible visual modification to inputs. We use encoding-specific perturbations that are imperceptible to the human eye to manipulate the outputs of a wide range of Natural Language Processing (NLP) systems from neural machine-translation pipelines to web search engines. We find that with a single imperceptible encoding injection -- representing one invisible character, homoglyph, reordering, or deletion -- an attacker can significantly reduce the performance of vulnerable models, and with three injections most models can be functionally broken. Our attacks work against currently-deployed commercial systems, including those produced by Microsoft and Google, in addition to open source models published by Facebook and IBM. This novel series of attacks presents a significant threat to many language processing systems: an attacker can affect systems in a targeted manner without any assumptions about the underlying model. We conclude that text-based NLP systems require careful input sanitization, just like conventional applications, and that given such systems are now being deployed rapidly at scale, the urgent attention of architects and operators is required.
半/弱/无/有监督|不确定性|主动学习(1篇)
【1】 Goal-Directed Planning by Reinforcement Learning and Active Inference 标题:基于强化学习和主动推理的目标导向规划
作者:Dongqi Han,Kenji Doya,Jun Tani 机构:Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Okinawa, Japan, Neural Computation Unit 备注:Work in progress 链接:https://arxiv.org/abs/2106.09938 摘要:目标导向行为和习惯性行为有什么区别?我们提出了一种新的贝叶斯推理决策计算框架,其中所有的东西都集成为一个完整的神经网络模型。该模型通过自我探索学习预测环境状态转换,并通过随机内部状态采样$z$生成运动行为。习惯性行为是通过强化学习获得的,它是从$z$的先验分布中获得的。目标定向行为由$z$的后验分布决定,通过计划,使用主动推理,使目标观察的自由能最小化。我们通过在一个有摄像头观察和连续运动动作的感觉运动导航任务中的实验,证明了该框架的有效性。 摘要:What is the difference between goal-directed and habitual behavior? We propose a novel computational framework of decision making with Bayesian inference, in which everything is integrated as an entire neural network model. The model learns to predict environmental state transitions by self-exploration and generating motor actions by sampling stochastic internal states $z$. Habitual behavior, which is obtained from the prior distribution of $z$, is acquired by reinforcement learning. Goal-directed behavior is determined from the posterior distribution of $z$ by planning, using active inference, to minimize the free energy for goal observation. We demonstrate the effectiveness of the proposed framework by experiments in a sensorimotor navigation task with camera observations and continuous motor actions.
迁移|Zero/Few/One-Shot|自适应(3篇)
【1】 Zero-Shot Federated Learning with New Classes for Audio Classification 标题:用于音频分类的带新类的零概率联合学习
作者:Gautham Krishna Gudur,Satheesh K. Perepu 机构:Global AI Accelerator, Ericsson, Ericsson Research 备注:Accepted at Interspeech 2021. Also accepted at the Distributed and Private Machine Learning (DPML) and Hardware Aware Efficient Training (HAET) workshops at ICLR 2021 链接:https://arxiv.org/abs/2106.10019 摘要:联合学习是一种有效的方法,可以从不同的用户设备中提取见解,同时保护用户的隐私。但是,具有完全看不见的数据分布的新类可以在联邦学习设置中的任何设备上进行流式处理,全局服务器或其他用户无法访问这些设备的数据。为此,我们提出了一个统一的Zero-Shot框架来处理联邦学习过程中的上述挑战。我们在这里模拟了两种情况——1)当用户没有报告新的类标签时,使用传统的FL设置;2) 当用户报告新的类标签时,我们通过计算与每个设备的新类对应的类相似矩阵来合成匿名数据印象,然后通过无监督聚类来区分不同用户的新类。此外,我们提出的框架还可以处理参与用户的标签和模型中的统计异质性。我们在两个广泛使用的音频分类应用程序(关键字定位和城市声音分类)上,使用本地和全局更新中的新类,以及异构标签和模型,对我们在不同通信轮(FL迭代)上的设备框架进行了实证评估,平均确定性精度分别提高了约4.041%和约4.258%。 摘要:Federated learning is an effective way of extracting insights from different user devices while preserving the privacy of users. However, new classes with completely unseen data distributions can stream across any device in a federated learning setting, whose data cannot be accessed by the global server or other users. To this end, we propose a unified zero-shot framework to handle these aforementioned challenges during federated learning. We simulate two scenarios here -- 1) when the new class labels are not reported by the user, the traditional FL setting is used; 2) when new class labels are reported by the user, we synthesize Anonymized Data Impressions by calculating class similarity matrices corresponding to each device's new classes followed by unsupervised clustering to distinguish between new classes across different users. Moreover, our proposed framework can also handle statistical heterogeneities in both labels and models across the participating users. We empirically evaluate our framework on-device across different communication rounds (FL iterations) with new classes in both local and global updates, along with heterogeneous labels and models, on two widely used audio classification applications -- keyword spotting and urban sound classification, and observe an average deterministic accuracy increase of ~4.041% and ~4.258% respectively.
【2】 Gradual Domain Adaptation via Self-Training of Auxiliary Models 标题:通过辅助模型的自训练实现领域渐进式自适应
作者:Yabin Zhang,Bin Deng,Kui Jia,Lei Zhang 机构:Hong Kong Polytechnic University, South China University of Technology 备注:code will be released at: this https URL 链接:https://arxiv.org/abs/2106.09890 摘要:随着源域和目标域之间差距的增大,域适应变得更具挑战性。基于对标记源数据用于距离目标域的可靠性的实证分析,我们提出了辅助模型的自训练(AuxSelfTrain),它学习中间域的模型,并逐渐克服跨域的距离转移。我们引入进化中间域作为源数据比例递减和目标数据比例递增的组合,对中间域进行采样以最小化连续域之间的域距离。然后通过对进化中间域辅助模型的自训练,使源模型逐渐适应目标域的使用。我们还引入了一种改进的隐集成样本选择指标,并将该方法推广到半监督域自适应。在无监督和半监督领域自适应的基准数据集上的实验验证了该方法的有效性。 摘要:Domain adaptation becomes more challenging with increasing gaps between source and target domains. Motivated from an empirical analysis on the reliability of labeled source data for the use of distancing target domains, we propose self-training of auxiliary models (AuxSelfTrain) that learns models for intermediate domains and gradually combats the distancing shifts across domains. We introduce evolving intermediate domains as combinations of decreasing proportion of source data and increasing proportion of target data, which are sampled to minimize the domain distance between consecutive domains. Then the source model could be gradually adapted for the use in the target domain by self-training of auxiliary models on evolving intermediate domains. We also introduce an enhanced indicator for sample selection via implicit ensemble and extend the proposed method to semi-supervised domain adaptation. Experiments on benchmark datasets of unsupervised and semi-supervised domain adaptation verify its efficacy.
【3】 Guided Integrated Gradients: An Adaptive Path Method for Removing Noise 标题:导引积分梯度:一种自适应路径去噪方法
作者:Andrei Kapishnikov,Subhashini Venugopalan,Besim Avci,Ben Wedin,Michael Terry,Tolga Bolukbasi 机构:Google Research 备注:None 链接:https://arxiv.org/abs/2106.09788 摘要:积分梯度(IG)是一种常用的深部神经网络特征归属方法。虽然IG具有许多令人满意的特性,但当应用于视觉模型时,该方法通常在与预测类无关的区域中产生虚假/噪声像素属性。虽然之前已经提到过这一点,但大多数现有的解决方案都旨在通过显式地降低结果属性中的噪声来解决症状。在这项工作中,我们证明了问题的原因之一是沿IG路径的噪声累积。为了最小化这种噪声源的影响,我们建议对属性路径本身进行调整——不仅对图像,而且对所解释的模型进行调整。我们引入自适应路径方法(APMs)作为路径方法的推广,并将IG作为APM的一个具体实例。根据经验,引导免疫球蛋白创建显着地图更好地与模型的预测和输入图像是解释一致。我们通过定性和定量实验表明,引导免疫算法在几乎所有实验中都优于其他相关方法。 摘要:Integrated Gradients (IG) is a commonly used feature attribution method for deep neural networks. While IG has many desirable properties, the method often produces spurious/noisy pixel attributions in regions that are not related to the predicted class when applied to visual models. While this has been previously noted, most existing solutions are aimed at addressing the symptoms by explicitly reducing the noise in the resulting attributions. In this work, we show that one of the causes of the problem is the accumulation of noise along the IG path. To minimize the effect of this source of noise, we propose adapting the attribution path itself -- conditioning the path not just on the image but also on the model being explained. We introduce Adaptive Path Methods (APMs) as a generalization of path methods, and Guided IG as a specific instance of an APM. Empirically, Guided IG creates saliency maps better aligned with the model's prediction and the input image that is being explained. We show through qualitative and quantitative experiments that Guided IG outperforms other, related methods in nearly every experiment.
强化学习(4篇)
【1】 Deep Reinforcement Learning Models Predict Visual Responses in the Brain: A Preliminary Result 标题:深度强化学习模型预测大脑视觉反应的初步结果
作者:Maytus Piriyajitakonkij,Sirawaj Itthipuripat,Theerawit Wilaiprasitporn,Nat Dilokthanakul 机构: Vidyasirimedhi Institute of Science and Technology (VISTEC), Thailand, King Mongkut’s University of Technology Thonburi (KMUTT), Thailand, Imperial College London, United Kingdom 链接:https://arxiv.org/abs/2106.10112 摘要:有监督的深度卷积神经网络(DCNNs)是目前解释灵长类动物腹侧视觉流如何解决目标识别问题的最佳计算模型之一。然而,在现有的视觉加工模型中,还没有考虑到具体化认知。从生态学的观点来看,人类通过与物体的相互作用来学习识别物体,从而实现更好的分类、专门化和泛化。在这里,我们问在具体化学习框架下的计算模型是否能比现有的监督模型更好地解释灵长类视觉系统中物体识别的机制?为了解决这个问题,我们使用强化学习来训练神经网络模型来玩3D电脑游戏,我们发现这些强化学习模型在早期视觉区域(例如。,V1和V2)与监督神经网络模型实现的水平相当。相比之下,监督神经网络模型在更高的视觉区域产生更好的神经反应预测,相比于强化学习模型。我们的初步研究结果表明,视觉神经科学未来的发展方向,深层强化学习应包括在填补缺失的体现概念。 摘要:Supervised deep convolutional neural networks (DCNNs) are currently one of the best computational models that can explain how the primate ventral visual stream solves object recognition. However, embodied cognition has not been considered in the existing visual processing models. From the ecological standpoint, humans learn to recognize objects by interacting with them, allowing better classification, specialization, and generalization. Here, we ask if computational models under the embodied learning framework can explain mechanisms underlying object recognition in the primate visual system better than the existing supervised models? To address this question, we use reinforcement learning to train neural network models to play a 3D computer game and we find that these reinforcement learning models achieve neural response prediction accuracy scores in the early visual areas (e.g., V1 and V2) in the levels that are comparable to those accomplished by the supervised neural network model. In contrast, the supervised neural network models yield better neural response predictions in the higher visual areas, compared to the reinforcement learning models. Our preliminary results suggest the future direction of visual neuroscience in which deep reinforcement learning should be included to fill the missing embodiment concept.
【2】 On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data 标题:策略诱导数据批量强化学习的样本复杂度研究
作者:Chenjun Xiao,Ilbin Lee,Bo Dai,Dale Schuurmans,Csaba Szepesvari 机构:University of Alberta, Google Research, Brain Team, DeepMind 备注:26 pages, 2 figures 链接:https://arxiv.org/abs/2106.09973 摘要:我们研究了有限马尔可夫决策过程(MDP)中学习一个好策略的样本复杂度这一基本问题,当可用于学习的数据是通过遵循一个必须在不知道底层MDP的情况下选择的日志策略获得的。我们的主要结果表明,当规划周期$H$有限时,样本复杂度,即获得一个好策略所需的和足够的最小转移次数,是相关量的指数函数。特别地,我们证明了获得$\epsilon$-最优策略的样本复杂度对于$\gamma$-折扣问题至少是$\Omega(\mathrm{A}^{\min(\mathrm{S}-1,H+1)})$,其中$\mathrm{S}$是状态数,$\mathrm{A}$是动作数,$H$是有效视界,定义为$H=\lfloor\tfrac{\ln(1/\epsilon)}{\ln(1/\gamma)}\rfloor$;它至少是$\Omega(\mathrm{A}^{\min(\mathrm{S}-1,H)}/\varepsilon^2)$,其中,$H$是问题的规划范围。这个下限基本上与一个上限匹配。对于平均报酬设置,我们证明了在有限的数据量下没有算法可以找到$\epsilon$-最优策略。 摘要:We study the fundamental question of the sample complexity of learning a good policy in finite Markov decision processes (MDPs) when the data available for learning is obtained by following a logging policy that must be chosen without knowledge of the underlying MDP. Our main results show that the sample complexity, the minimum number of transitions necessary and sufficient to obtain a good policy, is an exponential function of the relevant quantities when the planning horizon $H$ is finite. In particular, we prove that the sample complexity of obtaining $\epsilon$-optimal policies is at least $\Omega(\mathrm{A}^{\min(\mathrm{S}-1, H+1)})$ for $\gamma$-discounted problems, where $\mathrm{S}$ is the number of states, $\mathrm{A}$ is the number of actions, and $H$ is the effective horizon defined as $H=\lfloor \tfrac{\ln(1/\epsilon)}{\ln(1/\gamma)} \rfloor$; and it is at least $\Omega(\mathrm{A}^{\min(\mathrm{S}-1, H)}/\varepsilon^2)$ for finite horizon problems, where $H$ is the planning horizon of the problem. This lower bound is essentially matched by an upper bound. For the average-reward setting we show that there is no algorithm finding $\epsilon$-optimal policies with a finite amount of data.
【3】 Many Agent Reinforcement Learning Under Partial Observability 标题:部分可观测性下的多智能体强化学习
作者:Keyang He,Prashant Doshi,Bikramjit Banerjee 机构:Department of Computer Science, University of Georgia, University of Southern Mississippi 链接:https://arxiv.org/abs/2106.09825 摘要:最近对多智能体强化学习(MARL)的新兴趣产生了一系列令人印象深刻的技术,这些技术利用了深度强化学习,主要是演员-评论家体系结构,并且可以应用于有限范围的可观察性和通信设置。然而,这项工作的一个持续的局限性是,当涉及到基于联合行动的表示时,维数的诅咒,它随着代理的数量呈指数增长。在本文中,我们直接关注可伸缩性这一挑战。我们将行动匿名性的关键观点(导致联合行动的排列不变性)应用于最近提出的两种深层泥灰岩算法MADDPG和IA2C,并将这些实例与另一种利用行动匿名性的最新技术,即平均场泥灰岩进行了比较。我们使用最近引入的实用主义领域,证明了我们的实例可以在比平均场方法更广泛的一类代理网络中学习最优行为。 摘要:Recent renewed interest in multi-agent reinforcement learning (MARL) has generated an impressive array of techniques that leverage deep reinforcement learning, primarily actor-critic architectures, and can be applied to a limited range of settings in terms of observability and communication. However, a continuing limitation of much of this work is the curse of dimensionality when it comes to representations based on joint actions, which grow exponentially with the number of agents. In this paper, we squarely focus on this challenge of scalability. We apply the key insight of action anonymity, which leads to permutation invariance of joint actions, to two recently presented deep MARL algorithms, MADDPG and IA2C, and compare these instantiations to another recent technique that leverages action anonymity, viz., mean-field MARL. We show that our instantiations can learn the optimal behavior in a broader class of agent networks than the mean-field method, using a recently introduced pragmatic domain.
【4】 Adapting the Function Approximation Architecture in Online Reinforcement Learning 标题:函数逼近结构在在线强化学习中的应用
作者:John D. Martin,Joseph Modayil 机构: large systems typically impose sparse con-nections from knowledge of the observation structure; forEqual contribution 1Department of Computing Science 链接:https://arxiv.org/abs/2106.09776 摘要:强化学习(RL)系统的性能取决于用于逼近值函数的计算结构。深度学习方法为从高维噪声观测值中逼近非线性函数提供了优化技术和结构。然而,流行的优化技术并不是为严格的增量在线更新而设计的。标准体系结构也不是为具有先验未知结构的观测而设计的:例如,光传感器随机分布在空间中。提出了一种在线RL预测算法,该算法采用自适应结构,能有效地发现有用的非线性特征。该算法在高维随机观测的空间域中进行评估。该算法的性能优于非自适应基线结构,接近给定侧信道信息的结构性能。在观测结构不可用的情况下,这些结果朝着可扩展的RL算法迈出了一步。 摘要:The performance of a reinforcement learning (RL) system depends on the computational architecture used to approximate a value function. Deep learning methods provide both optimization techniques and architectures for approximating nonlinear functions from noisy, high-dimensional observations. However, prevailing optimization techniques are not designed for strictly-incremental online updates. Nor are standard architectures designed for observations with an a priori unknown structure: for example, light sensors randomly dispersed in space. This paper proposes an online RL prediction algorithm with an adaptive architecture that efficiently finds useful nonlinear features. The algorithm is evaluated in a spatial domain with high-dimensional, stochastic observations. The algorithm outperforms non-adaptive baseline architectures and approaches the performance of an architecture given side-channel information. These results are a step towards scalable RL algorithms for more general problems, where the observation structure is not available.
医学相关(1篇)
【1】 Synthetic COVID-19 Chest X-ray Dataset for Computer-Aided Diagnosis 标题:用于计算机辅助诊断的合成冠状病毒胸片数据集
作者:Hasib Zunair,A. Ben Hamza 机构: including tailored convolutional neural 1Concordia University 链接:https://arxiv.org/abs/2106.09759 摘要:我们引入了一个新的数据集,称为合成COVID-19胸部X射线数据集,用于训练机器学习模型。该数据集由21295张用于计算机辅助诊断的COVID-19胸片组成。这些图像,通过无监督域自适应方法生成,具有高质量。我们发现,合成图像不仅可以提高各种深度学习结构在严重不平衡条件下作为额外训练数据时的性能,而且可以高置信度地检测目标类。我们还发现,当只对合成图像进行训练时,也可以获得类似的性能。此外,合成COVID-19图像的显著特征表明,该分布与非COVID-19类显著不同,从而实现了适当的决策边界。我们希望COVID-19的这种高保真胸部X射线图像的可用性将促进诊断和/或管理工具的发展。 摘要:We introduce a new dataset called Synthetic COVID-19 Chest X-ray Dataset for training machine learning models. The dataset consists of 21,295 synthetic COVID-19 chest X-ray images to be used for computer-aided diagnosis. These images, generated via an unsupervised domain adaptation approach, are of high quality. We find that the synthetic images not only improve performance of various deep learning architectures when used as additional training data under heavy imbalance conditions, but also detect the target class with high confidence. We also find that comparable performance can also be achieved when trained only on synthetic images. Further, salient features of the synthetic COVID-19 images indicate that the distribution is significantly different from Non-COVID-19 classes, enabling a proper decision boundary. We hope the availability of such high fidelity chest X-ray images of COVID-19 will encourage advances in the development of diagnostic and/or management tools.
聚类(3篇)
【1】 Smoothed Multi-View Subspace Clustering 标题:平滑多视图子空间聚类
作者:Peng Chen,Liang Liu,Zhengrui Ma,Zhao Kang 机构: Jangsu Automation Research Institute, Lianyungang, Jiangsu, China, University of Electronic Science and Technology of China, Chengdu, Sichuan, China, Trusted Cloud Computing and Big Data Key Laboratory of Sichuan Province 备注:Accepted by International Conference on Neural Computing for Advanced Applications 2021 链接:https://arxiv.org/abs/2106.09875 摘要:近年来,多视点子空间聚类由于利用了多视点间的互补信息,取得了令人瞩目的效果。然而,多视图数据可能非常复杂,并且在实际应用中不容易聚类。现有的大多数方法都是对原始数据进行处理,可能得不到最优解。在这项工作中,我们提出了一种新的多视图聚类方法,称为平滑多视图子空间聚类(SMVSC),通过使用一种新的技术,即图过滤,来获得每个视图的平滑表示,其中相似的数据点具有相似的特征值。具体来说,它通过应用低通滤波器来保留图形的几何特征。因此,它产生了一个“聚类友好”的表示,极大地促进了下游的聚类任务。在基准数据集上的大量实验验证了该方法的优越性。分析表明,图过滤提高了类的可分性。 摘要:In recent years, multi-view subspace clustering has achieved impressive performance due to the exploitation of complementary imformation across multiple views. However, multi-view data can be very complicated and are not easy to cluster in real-world applications. Most existing methods operate on raw data and may not obtain the optimal solution. In this work, we propose a novel multi-view clustering method named smoothed multi-view subspace clustering (SMVSC) by employing a novel technique, i.e., graph filtering, to obtain a smooth representation for each view, in which similar data points have similar feature values. Specifically, it retains the graph geometric features through applying a low-pass filter. Consequently, it produces a ``clustering-friendly" representation and greatly facilitates the downstream clustering task. Extensive experiments on benchmark datasets validate the superiority of our approach. Analysis shows that graph filtering increases the separability of classes.
【2】 LSEC: Large-scale spectral ensemble clustering 标题:LSEC:大规模光谱集成聚类
作者:Hongmin Li,Xiucai Ye,Akira Imakura,Tetsuya Sakurai 机构:Department of Computer Science, University of Tsukuba, Tsukuba, Japan 备注:22 pages 链接:https://arxiv.org/abs/2106.09852 摘要:集成聚类是机器学习领域的一个基本问题,它将多个基聚类结合在一起可以得到更好的聚类结果。然而,由于效率的瓶颈,现有的方法大多不适合大规模的集成聚类任务。在本文中,我们提出了一种大规模谱系综聚类(LSEC)方法来在效率和有效性之间取得良好的平衡。在LSEC中,设计了一个基于大规模谱聚类的高效集成生成框架,在较低的计算复杂度下生成各种基簇。然后通过基于二部图划分的一致性函数将所有基于聚类的算法结合起来,得到更好的一致性聚类结果。LSEC方法实现了较低的计算复杂度比大多数现有的集成聚类方法。在10个大规模数据集上的实验表明了LSEC方法的有效性和有效性。该方法的MATLAB代码和实验数据集可在https://github.com/Li- 洪敏/我的文件有代码。 摘要:Ensemble clustering is a fundamental problem in the machine learning field, combining multiple base clusterings into a better clustering result. However, most of the existing methods are unsuitable for large-scale ensemble clustering tasks due to the efficiency bottleneck. In this paper, we propose a large-scale spectral ensemble clustering (LSEC) method to strike a good balance between efficiency and effectiveness. In LSEC, a large-scale spectral clustering based efficient ensemble generation framework is designed to generate various base clusterings within a low computational complexity. Then all based clustering are combined through a bipartite graph partition based consensus function into a better consensus clustering result. The LSEC method achieves a lower computational complexity than most existing ensemble clustering methods. Experiments conducted on ten large-scale datasets show the efficiency and effectiveness of the LSEC method. The MATLAB code of the proposed method and experimental datasets are available at https://github.com/Li- Hongmin/MyPaperWithCode.
【3】 A Distance-based Separability Measure for Internal Cluster Validation 标题:一种用于内部聚类验证的基于距离的可分性度量
作者:Shuyue Guan,Murray Loew 机构:Department of Biomedical Engineering, The George Washington University, nd St NW, Washington, DC , USA 备注:It is an extended version of the paper: arXiv:2009.01328 链接:https://arxiv.org/abs/2106.09794 摘要:聚类结果的评价是聚类分析的重要组成部分。由于在典型的无监督学习中没有真正的类标签进行聚类,许多内部聚类有效性指数(CVIs)是使用预测的标签和数据建立的。没有真正的标签,设计一个有效的CVI就像创建一个聚类方法一样困难。拥有更多的CVI是至关重要的,因为没有通用的CVI可以用来测量所有的数据集,也没有为没有真正标签的集群选择合适的CVI的具体方法。因此,应用多种CVIs对聚类结果进行评价是必要的。在本文中,我们提出了一种新的基于数据可分性测度的内部CVI——基于距离的可分性指数(DSI)。我们使用5种聚类算法在12个真实数据集和97个合成数据集上的聚类结果,将DSI与8种内部CVI进行了比较,包括从早期Dunn(1974)到最近的CVDD(2019)的研究,以及一种外部CVI作为基本事实。结果表明,DSI是一种有效的、独特的、有竞争力的CVI。我们还总结了评价CVIs的一般过程,并建立了用于比较CVIs结果的秩差度量。 摘要:To evaluate clustering results is a significant part of cluster analysis. Since there are no true class labels for clustering in typical unsupervised learning, many internal cluster validity indices (CVIs), which use predicted labels and data, have been created. Without true labels, to design an effective CVI is as difficult as to create a clustering method. And it is crucial to have more CVIs because there are no universal CVIs that can be used to measure all datasets and no specific methods of selecting a proper CVI for clusters without true labels. Therefore, to apply a variety of CVIs to evaluate clustering results is necessary. In this paper, we propose a novel internal CVI -- the Distance-based Separability Index (DSI), based on a data separability measure. We compared the DSI with eight internal CVIs including studies from early Dunn (1974) to most recent CVDD (2019) and an external CVI as ground truth, by using clustering results of five clustering algorithms on 12 real and 97 synthetic datasets. Results show DSI is an effective, unique, and competitive CVI to other compared CVIs. We also summarized the general process to evaluate CVIs and created the rank-difference metric for comparison of CVIs' results.
超分辨率|去噪|去模糊|去雾(1篇)
【1】 Residual Contrastive Learning for Joint Demosaicking and Denoising 标题:残差对比学习在联合去马赛克和去噪中的应用
作者:Nanqing Dong,Matteo Maggioni,Yongxin Yang,Eduardo Pérez-Pellitero,Ales Leonardis,Steven McDonagh 机构: Department of Computer Science, University of Oxford, Huawei Noah’s Ark Lab 链接:https://arxiv.org/abs/2106.10070 摘要:对比学习(CL)的突破推动了自监督学习(SSL)在RGB图像高级视觉任务中的应用。然而,对于低层次的视觉任务,如联合去噪和去噪(JDD),CL在原始域中的定义仍然不明确。为了弥补这种方法上的差距,我们提出了一种新的基于原始图像的对比学习方法,即残差对比学习(RCL),旨在学习JDD的有意义表征。我们的工作建立在每个原始图像中包含的噪声与信号相关的假设之上,因此来自同一原始图像的两个压缩比来自不同原始图像的两个压缩具有更相似的噪声分布。我们使用残差作为判别特征,并以地球移动器的距离来衡量对比损失的分布散度。为了评估所提出的CL策略,我们模拟了一系列无监督JDD实验,实验中大量数据被合成的信号相关噪声破坏,我们为具有未知(随机)噪声方差的无监督JDD任务设置了一个新的基准。我们的实证研究不仅验证了CL可以应用于分布(c.f.特征),而且揭示了以往的非ML和SSL-JDD方法在噪声统计未知的情况下缺乏鲁棒性,从而对信号相关噪声问题提供了一些进一步的见解。 摘要:The breakthrough of contrastive learning (CL) has fueled the recent success of self-supervised learning (SSL) in high-level vision tasks on RGB images. However, CL is still ill-defined for low-level vision tasks, such as joint demosaicking and denoising (JDD), in the RAW domain. To bridge this methodological gap, we present a novel CL approach on RAW images, residual contrastive learning (RCL), which aims to learn meaningful representations for JDD. Our work is built on the assumption that noise contained in each RAW image is signal-dependent, thus two crops from the same RAW image should have more similar noise distribution than two crops from different RAW images. We use residuals as a discriminative feature and the earth mover's distance to measure the distribution divergence for the contrastive loss. To evaluate the proposed CL strategy, we simulate a series of unsupervised JDD experiments with large-scale data corrupted by synthetic signal-dependent noise, where we set a new benchmark for unsupervised JDD tasks with unknown (random) noise variance. Our empirical study not only validates that CL can be applied on distributions (c.f. features), but also exposes the lack of robustness of previous non-ML and SSL JDD methods when the statistics of the noise are unknown, thus providing some further insight into signal-dependent noise problems.
联邦学习|隐私保护|加密(2篇)
【1】 A Vertical Federated Learning Framework for Horizontally Partitioned Labels 标题:一种面向水平划分标签的垂直联合学习框架
作者:Wensheng Xia,Ying Li,Lan Zhang,Zhonghai Wu,Xiaoyong Yuan 机构:Peking University, Michigan Technological University 备注:10 pages, 6 figures 链接:https://arxiv.org/abs/2106.10056 摘要:垂直联合学习是一种协作式机器学习框架,用于在保持隐私的情况下,对垂直划分的数据进行深度学习模型的训练。它引起了学术界和工业界的广泛关注。不幸的是,在实际应用中应用大多数现有的垂直联合学习方法仍然面临两个严峻的挑战。首先,大多数现有的垂直联邦学习方法都有一个很强的假设,即至少有一方持有所有数据样本的完整标签集,而在许多实际场景中,这种假设并不满足,因为标签是水平分区的,各方只持有部分标签。现有的垂直联合学习方法只能利用部分标签,这可能导致端到端反向传播模型更新不足。第二,计算和通信资源因各方而异。一些计算和通信资源有限的各方将成为掉队者,减缓训练的收敛速度。在垂直联邦学习中,这种离散问题在水平划分标签的情况下会被夸大。为了应对这些挑战,我们提出了一种新的垂直联邦学习框架级联垂直联邦学习(CVFL),以充分利用所有水平分区标签来训练具有隐私保护的神经网络。为了缓解离散问题,我们设计了一个新的优化目标,可以增加离散者对训练模型的贡献。我们进行了一系列的定性实验来严格验证CVFL的有效性。结果表明,通过集中训练,CVFL可以获得相当的性能(例如,分类任务的准确性)。新的优化目标与训练过程中仅采用异步聚合机制相比,能进一步缓解训练过程中的掉队问题。 摘要:Vertical federated learning is a collaborative machine learning framework to train deep leaning models on vertically partitioned data with privacy-preservation. It attracts much attention both from academia and industry. Unfortunately, applying most existing vertical federated learning methods in real-world applications still faces two daunting challenges. First, most existing vertical federated learning methods have a strong assumption that at least one party holds the complete set of labels of all data samples, while this assumption is not satisfied in many practical scenarios, where labels are horizontally partitioned and the parties only hold partial labels. Existing vertical federated learning methods can only utilize partial labels, which may lead to inadequate model update in end-to-end backpropagation. Second, computational and communication resources vary in parties. Some parties with limited computational and communication resources will become the stragglers and slow down the convergence of training. Such straggler problem will be exaggerated in the scenarios of horizontally partitioned labels in vertical federated learning. To address these challenges, we propose a novel vertical federated learning framework named Cascade Vertical Federated Learning (CVFL) to fully utilize all horizontally partitioned labels to train neural networks with privacy-preservation. To mitigate the straggler problem, we design a novel optimization objective which can increase straggler's contribution to the trained models. We conduct a series of qualitative experiments to rigorously verify the effectiveness of CVFL. It is demonstrated that CVFL can achieve comparable performance (e.g., accuracy for classification tasks) with centralized training. The new optimization objective can further mitigate the straggler problem comparing with only using the asynchronous aggregation mechanism during training.
【2】 Locally Differentially Private Federated Learning: Efficient Algorithms with Tight Risk Bounds 标题:局部差分私有联合学习:具有严格风险界的高效算法
作者:Andrew Lowy,Meisam Razaviyayn 机构:University of Southern California 链接:https://arxiv.org/abs/2106.09779 摘要:联邦学习(FL)是一种分布式学习范式,在这种范式中,许多具有异构、不平衡且通常敏感的本地数据的客户机协作学习模型。本地差异隐私(LDP)提供了一个强有力的保证,即每个客户的数据不会在训练期间和训练后泄露,而不依赖于可信的第三方。虽然自民党通常被认为过于严格,无法实现令人满意的效用,但我们的论文对这一观点提出了挑战。我们考虑的是一个不平衡的、异构的数据、跨客户端的不同隐私需求以及不可靠的通信的一般设置,其中每轮都有随机数/子集的客户端可用。针对光滑(强)凸FL提出了三种LDP算法;每一种都是分布式小批量SGD的噪声变体。一个是加速的,另一个是新的时变噪声,我们利用它得到了完全一般非i.i.d.FL问题的第一个非平凡LDP超额风险界。专门针对i.i.d.客户,我们的风险边界在集中式设置和跨设备设置中的最知名和/或最佳边界之间插值,其中每个客户仅代表一个人的数据。此外,我们还表明,在某些情况下,我们的收敛速度(几乎)与相应的非私有下界相匹配,或优于最新的非私有算法(“免费隐私”)。最后,通过数值实验验证了本文算法的有效性。 摘要:Federated learning (FL) is a distributed learning paradigm in which many clients with heterogeneous, unbalanced, and often sensitive local data, collaborate to learn a model. Local Differential Privacy (LDP) provides a strong guarantee that each client's data cannot be leaked during and after training, without relying on a trusted third party. While LDP is often believed to be too stringent to allow for satisfactory utility, our paper challenges this belief. We consider a general setup with unbalanced, heterogeneous data, disparate privacy needs across clients, and unreliable communication, where a random number/subset of clients is available each round. We propose three LDP algorithms for smooth (strongly) convex FL; each are noisy variations of distributed minibatch SGD. One is accelerated and one involves novel time-varying noise, which we use to obtain the first non-trivial LDP excess risk bound for the fully general non-i.i.d. FL problem. Specializing to i.i.d. clients, our risk bounds interpolate between the best known and/or optimal bounds in the centralized setting and the cross-device setting, where each client represents just one person's data. Furthermore, we show that in certain regimes, our convergence rate (nearly) matches the corresponding non-private lower bound or outperforms state of the art non-private algorithms (``privacy for free''). Finally, we validate our theoretical results and illustrate the practical utility of our algorithm with numerical experiments.
推理|分析|理解|解释(2篇)
【1】 NoiseGrad: enhancing explanations by introducing stochasticity to model weights 标题:NoiseGrad:通过在模型权重中引入随机性来增强解释
作者:Kirill Bykov,Anna Hedström,Shinichi Nakajima,Marina M. -C. Höhne 机构:ML Group, TU Berlin, Germany, UMI Lab, RIKEN AIP, Tokyo, Japan, Marina M.-C. Höhne 备注:20 pages, 11 figures 链接:https://arxiv.org/abs/2106.10185 摘要:属性方法仍然是一种实用的工具,在实际应用中被用来解释复杂学习机的决策过程。研究表明,一种简单的方法SmoothGrad可以有效地减少基于梯度的归因方法的视觉扩散,并在研究者和实践者中确立了自己的地位。然而,研究中尚未探索的是如何通过引入模型权重的随机性来改进解释。有鉴于此,我们引入了-NoiseGrad-一种随机的、与方法无关的解释增强方法,它将噪声添加到权重中,而不是输入数据。我们通过各种实验(包括不同的数据集、解释方法和网络结构)对所提出的方法进行了研究,并得出结论:与SmoothGrad相比,带乘性高斯噪声的NoiseGrad(及其扩展NoiseGrad++)在一些评价标准上具有明显的优势。我们将所提出的方法与贝叶斯学习相结合,为用户选择超参数提供了一种启发式方法。 摘要:Attribution methods remain a practical instrument that is used in real-world applications to explain the decision-making process of complex learning machines. It has been shown that a simple method called SmoothGrad can effectively reduce the visual diffusion of gradient-based attribution methods and has established itself among both researchers and practitioners. What remains unexplored in research, however, is how explanations can be improved by introducing stochasticity to the model weights. In the light of this, we introduce - NoiseGrad - a stochastic, method-agnostic explanation-enhancing method that adds noise to the weights instead of the input data. We investigate our proposed method through various experiments including different datasets, explanation methods and network architectures and conclude that NoiseGrad (and its extension NoiseGrad++) with multiplicative Gaussian noise offers a clear advantage compared to SmoothGrad on several evaluation criteria. We connect our proposed method to Bayesian Learning and provide the user with a heuristic for choosing hyperparameters.
【2】 An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises 标题:对基于私有表格合成数据训练的模型部署的分析:意想不到的惊喜
作者:Mayana Pereira,Meghana Kshirsagar,Sumit Mukherjee,Rahul Dodhia,Juan Lavista Ferres 机构: USA 2Department of Electrical Engineering, Universityof Brasilia 链接:https://arxiv.org/abs/2106.10241 摘要:差分私有(DP)合成数据集是训练机器学习模型的一种强有力的方法,同时尊重各个数据提供者的隐私。DP对训练模型公平性的影响尚不清楚。在本文中,我们系统地研究了差异私有合成数据生成对分类的影响。我们分析了由合成数据集引起的模型效用和偏差的差异,通过算法公平性度量来衡量。我们的第一组结果表明,尽管在我们评估的所有数据合成器中,隐私和效用之间似乎存在明显的负相关(越隐私,越不准确),但隐私越多并不一定意味着更多的偏见。此外,我们还评估了利用合成数据集进行模型训练和模型评估的效果。我们发现,在合成数据上得到的结果可能会错误估计实际模型在实际数据上的性能。因此,我们主张在使用不同私有合成数据集进行模型训练和评估的场景中,需要定义适当的测试协议。 摘要:Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models while respecting the privacy of individual data providers. The effect of DP on the fairness of the resulting trained models is not yet well understood. In this contribution, we systematically study the effects of differentially private synthetic data generation on classification. We analyze disparities in model utility and bias caused by the synthetic dataset, measured through algorithmic fairness metrics. Our first set of results show that although there seems to be a clear negative correlation between privacy and utility (the more private, the less accurate) across all data synthesizers we evaluated, more privacy does not necessarily imply more bias. Additionally, we assess the effects of utilizing synthetic datasets for model training and model evaluation. We show that results obtained on synthetic data can misestimate the actual model performance when it is deployed on real data. We hence advocate on the need for defining proper testing protocols in scenarios where differentially private synthetic datasets are utilized for model training and evaluation.
检测相关(2篇)
【1】 Bridging the Gap Between Object Detection and User Intent via Query-Modulation 标题:通过查询调制弥合对象检测和用户意图之间的差距
作者:Marco Fornoni,Chaochao Yan,Liangchen Luo,Kimberly Wilber,Alex Stark,Yin Cui,Boqing Gong,Andrew Howard 机构:Google Research, University of Texas at Arlington 链接:https://arxiv.org/abs/2106.10258 摘要:当用户通过相机或图片与对象交互时,往往有特定的意图。例如,他们可能希望执行视觉搜索。然而,大多数目标检测模型忽略了用户的意图,依赖于图像像素作为其唯一的输入。这通常会导致不正确的结果,例如对感兴趣的对象缺乏高置信度检测,或者使用错误的类标签进行检测。在本文中,我们研究的技术,以调整标准对象检测器显式地解释用户的意图,表示为一个简单的查询嵌入。与标准对象检测器相比,查询调制检测器在检测给定感兴趣标签的对象时表现出更高的性能。由于从标准对象检测注释合成的大规模训练数据,查询调制检测器也可以优于专门的引用表达式识别系统。此外,它们可以同时训练来求解查询调制检测和标准目标检测。 摘要:When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. However, most object detection models ignore the user intent, relying on image pixels as their only input. This often leads to incorrect results, such as lack of a high-confidence detection on the object of interest, or detection with a wrong class label. In this paper we investigate techniques to modulate standard object detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard object detectors, query-modulated detectors show superior performance at detecting objects for a given label of interest. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors can also outperform specialized referring expression recognition systems. Furthermore, they can be simultaneously trained to solve for both query-modulated detection and standard object detection.
【2】 Labelling Drifts in a Fault Detection System for Wind Turbine Maintenance 标题:风电机组检修故障检测系统中漂移的标注
作者:Iñigo Martinez,Elisabeth Viles,Iñaki Cabrejas 机构:esElisabeth VilesUniversity of Navarra - Tecnun 备注:None 链接:https://arxiv.org/abs/2106.09951 摘要:故障检测系统是实现预测性维修策略的第一步。一种流行的数据驱动的方法来检测早期故障和异常是通过应用机器学习技术(如前馈神经网络(FFNN)或极限学习机(ELM))来训练正常行为模型。然而,在工业资产运行的动态环境中,非平稳性的意外上升可能会恶化这些建模技术的性能。测量变量中这种不可预测的统计变化称为概念漂移。本文介绍了一个风电机组维修案例,其中各种非平稳性可能会意外发生。这种概念漂移事件需要通过统计检测器和基于窗口的方法来检测。然而,在实际的复杂系统中,概念漂移并不像人工生成的数据集那样清晰和明显。为了评估电流漂移探测器的有效性,并为这种特殊的工业应用设计一种合适的新技术,必须事先处理现有漂移的特征。在缺乏这方面信息的情况下,提出了一种在风力涡轮机寿命期内标记概念漂移事件的方法。这种方法将有助于建立一个漂移数据库,该数据库将既是概念漂移探测器的训练基地,也是增强复杂系统维护知识的宝贵信息。 摘要:A failure detection system is the first step towards predictive maintenance strategies. A popular data-driven method to detect incipient failures and anomalies is the training of normal behaviour models by applying a machine learning technique like feed-forward neural networks (FFNN) or extreme learning machines (ELM). However, the performance of any of these modelling techniques can be deteriorated by the unexpected rise of non-stationarities in the dynamic environment in which industrial assets operate. This unpredictable statistical change in the measured variable is known as concept drift. In this article a wind turbine maintenance case is presented, where non-stationarities of various kinds can happen unexpectedly. Such concept drift events are desired to be detected by means of statistical detectors and window-based approaches. However, in real complex systems, concept drifts are not as clear and evident as in artificially generated datasets. In order to evaluate the effectiveness of current drift detectors and also to design an appropriate novel technique for this specific industrial application, it is essential to dispose beforehand of a characterization of the existent drifts. Under the lack of information in this regard, a methodology for labelling concept drift events in the lifetime of wind turbines is proposed. This methodology will facilitate the creation of a drift database that will serve both as a training ground for concept drift detectors and as a valuable information to enhance the knowledge about maintenance of complex systems.
分类|识别(3篇)
【1】 Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition 标题:嵌入网络融合稳健结合文本相关和独立说话人识别
作者:Ruirui Li,Chelsea J. -T. Ju,Zeya Chen,Hongda Mao,Oguz Elibol,Andreas Stolcke 机构:Chelsea J.-T. Ju, Amazon Alexa Speech 链接:https://arxiv.org/abs/2106.10169 摘要:通过基于用户的语音输入隐式地识别用户,说话人识别可以实现许多下游应用,例如个性化的系统行为和快速购物结账。基于语音内容是否受限,可以使用文本相关(TD)和文本无关(TI)说话人识别模型。我们希望通过一个集合系统将这两种模型的优点结合起来,做出更可靠的预测。然而,任何这样的组合方法必须对不完整的输入具有鲁棒性,即当TD或TI输入缺失时。作为一个解决方案,我们提出了一个融合嵌入式网络的foenet架构,结合联合学习和神经注意。在一个语音辅助输入数据集上,我们将foenet与四种有竞争力的基线融合方法进行了比较,结果表明,该方法比基线融合和分数融合方法具有更高的准确率,特别是在存在不完全输入的情况下。 摘要:By implicitly recognizing a user based on his/her speech input, speaker identification enables many downstream applications, such as personalized system behavior and expedited shopping checkouts. Based on whether the speech content is constrained or not, both text-dependent (TD) and text-independent (TI) speaker recognition models may be used. We wish to combine the advantages of both types of models through an ensemble system to make more reliable predictions. However, any such combined approach has to be robust to incomplete inputs, i.e., when either TD or TI input is missing. As a solution we propose a fusion of embeddings network foenet architecture, combining joint learning with neural attention. We compare foenet with four competitive baseline methods on a dataset of voice assistant inputs, and show that it achieves higher accuracy than the baseline and score fusion methods, especially in the presence of incomplete inputs.
【2】 Generalized Learning Vector Quantization for Classification in Randomized Neural Networks and Hyperdimensional Computing 标题:随机神经网络分类的广义学习矢量量化与超维计算
作者:Cameron Diao,Denis Kleyko,Jan M. Rabaey,Bruno A. Olshausen 机构:Department of, Computer Science, Rice University, Houston, USA, UC Berkeley, Berkeley, USA, Research Institutes of Sweden, Kista, Sweden, Berkeley Wireless, Research Center, Redwood Center for, Theoretical Neuroscience 备注:10 pages, 7 figures 链接:https://arxiv.org/abs/2106.09821 摘要:部署在边缘设备上的机器学习算法必须满足一定的资源约束和效率要求。随机向量函数链(RVFL)网络由于其设计简单、训练效率高等优点而受到广泛的应用。提出了一种改进的RVFL网络,避免了训练过程中计算量大的矩阵运算,扩大了网络的应用范围。我们的改进将最小二乘分类器替换为广义学习矢量量化(GLVQ)分类器,该分类器仅使用简单的矢量和距离计算。GLVQ分类器也可以看作是对超维计算领域中常用的某些分类算法的改进。所提出的方法在UCI机器学习库的数据集上达到了最先进的精度-比先前提出的RVFL网络更高。我们进一步证明,我们的方法在训练迭代次数严重受限的情况下仍能获得较高的精度(平均仅使用21%的最小二乘分类器计算成本)。 摘要:Machine learning algorithms deployed on edge devices must meet certain resource constraints and efficiency requirements. Random Vector Functional Link (RVFL) networks are favored for such applications due to their simple design and training efficiency. We propose a modified RVFL network that avoids computationally expensive matrix operations during training, thus expanding the network's range of potential applications. Our modification replaces the least-squares classifier with the Generalized Learning Vector Quantization (GLVQ) classifier, which only employs simple vector and distance calculations. The GLVQ classifier can also be considered an improvement upon certain classification algorithms popularly used in the area of Hyperdimensional Computing. The proposed approach achieved state-of-the-art accuracy on a collection of datasets from the UCI Machine Learning Repository - higher than previously proposed RVFL networks. We further demonstrate that our approach still achieves high accuracy while severely limited in training iterations (using on average only 21% of the least-squares classifier computational costs).
【3】 On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech 标题:无序语音的自动语音识别模型的在设备个性化
作者:Katrin Tomanek,Françoise Beaufays,Julie Cattiau,Angad Chandorkar,Khe Chai Sim 机构:Google, USA 链接:https://arxiv.org/abs/2106.10259 摘要:虽然目前最先进的自动语音识别(ASR)系统对典型语音的识别精度很高,但对无序语音和其他非典型语音模式的识别性能却有很大的下降。ASR模型的个人化是解决这一问题的常用方法,通常在基于服务器的训练环境中进行,这会带来数据隐私、模型更新时间延迟以及在移动设备和服务器基础设施之间复制数据和模型的通信成本等问题。在本文中,我们提出了一种基于设备的ASR个性化的方法,只需要非常少量的特定于说话人的数据。我们在一组100个语音混乱的人身上测试了我们的方法,发现平均相对单词错误率提高了71%,每个人只需要50个简短的话语。在语音控制的家庭自动化平台上进行测试时,设备上的个性化模型显示平均任务成功率为81%,而不适应的模型只有40%。 摘要:While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem, is usually performed in a server-based training environment posing problems around data privacy, delayed model-update times, and communication cost for copying data and models between mobile device and server infrastructure. In this paper, we present an approach to on-device based ASR personalization with very small amounts of speaker-specific data. We test our approach on a diverse set of 100 speakers with disordered speech and find median relative word error rate improvement of 71% with only 50 short utterances required per speaker. When tested on a voice-controlled home automation platform, on-device personalized models show a median task success rate of 81%, compared to only 40% of the unadapted models.
表征(4篇)
【1】 A Probabilistic Representation of DNNs: Bridging Mutual Information and Generalization 标题:DNNs的一种概率表示:桥接互信息和泛化
作者:Xinjie Lan,Kenneth Barner 机构:Equal contribution 1Department of Electrical and ComputerEngineering, University of Delaware 备注:To appear in the ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI 链接:https://arxiv.org/abs/2106.10262 摘要:近年来,互信息在限制深度神经网络泛化误差方面引起了广泛的关注。然而,在DNNs中准确估计MI是一个困难的问题,因此以往的工作大多不得不放宽MI界,这反过来削弱了信息论对泛化的解释。为了解决这一局限性,本文提出了一种精确估计MI的DNNs概率表示方法。利用所提出的MI估计,我们验证了推广的信息论解释,并得出了比最新松弛更严格的推广界。 摘要:Recently, Mutual Information (MI) has attracted attention in bounding the generalization error of Deep Neural Networks (DNNs). However, it is intractable to accurately estimate the MI in DNNs, thus most previous works have to relax the MI bound, which in turn weakens the information theoretic explanation for generalization. To address the limitation, this paper introduces a probabilistic representation of DNNs for accurately estimating the MI. Leveraging the proposed MI estimator, we validate the information theoretic explanation for generalization, and derive a tighter generalization bound than the state-of-the-art relaxations.
【2】 It's FLAN time! Summing feature-wise latent representations for interpretability 标题:果仁时间到了!用于可解释性的基于特征的潜在表示求和
作者:An-phi Nguyen,Maria Rodriguez Martinez 机构:IBM Research Europe, ETH Zürich, Zürich, Switzerland, Maria Rodríguez Martínez 链接:https://arxiv.org/abs/2106.10086 摘要:可解释性已成为机器学习模型在关键场景(如法律系统、医疗保健)中的一个必要特性。在这些情况下,算法决策可能对受决策影响的最终用户产生(潜在的负面)长期影响。在许多情况下,不需要深度学习模型的表现力,因此应首选简单且可解释的模型(例如线性模型)。然而,在高维和/或复杂领域(如计算机视觉),需要神经网络的通用逼近能力。受线性模型和Kolmogorov-Arnol表示定理的启发,我们提出了一类新的结构约束神经网络,我们称之为FLANs(特征型潜在可加网络)。关键的是,FLANs分别处理每个输入特征,为每个特征计算公共潜在空间中的表示。然后简单地对这些特征的潜在表示进行求和,并使用聚集的表示进行预测。这些约束(线性模型可解释性的核心)允许用户独立于其他特征估计每个特征的效果,从而增强可解释性。在一组跨不同领域的实验中,我们展示了在不过度影响测试性能的情况下,FLANs中提出的结构约束确实提高了深度学习模型的可解释性。 摘要:Interpretability has become a necessary feature for machine learning models deployed in critical scenarios, e.g. legal systems, healthcare. In these situations, algorithmic decisions may have (potentially negative) long-lasting effects on the end-user affected by the decision. In many cases, the representational power of deep learning models is not needed, therefore simple and interpretable models (e.g. linear models) should be preferred. However, in high-dimensional and/or complex domains (e.g. computer vision), the universal approximation capabilities of neural networks is required. Inspired by linear models and the Kolmogorov-Arnol representation theorem, we propose a novel class of structurally-constrained neural networks, which we call FLANs (Feature-wise Latent Additive Networks). Crucially, FLANs process each input feature separately, computing for each of them a representation in a common latent space. These feature-wise latent representations are then simply summed, and the aggregated representation is used for prediction. These constraints (which are at the core of the interpretability of linear models) allow an user to estimate the effect of each individual feature independently from the others, enhancing interpretability. In a set of experiments across different domains, we show how without compromising excessively the test performance, the structural constraints proposed in FLANs indeed increase the interpretability of deep learning models.
【3】 Investigating the Role of Negatives in Contrastive Representation Learning 标题:否定在对比表征学习中的作用研究
作者:Jordan T. Ash,Surbhi Goel,Akshay Krishnamurthy,Dipendra Misra 机构:Microsoft Research NYC 链接:https://arxiv.org/abs/2106.09943 摘要:噪声对比学习是一种流行的无监督表征学习方法。在这种方法中,通过简化到监督学习来获得表示,在给定语义相似性概念的情况下,学习者试图从随机(负)示例集合中区分相似(正)示例。现代对比学习管道的成功依赖于许多参数,如数据扩充的选择、负面例子的数量和批量大小;然而,对于这些参数如何相互作用以及如何影响下游性能,人们的理解还很有限。我们专注于消除这些参数之一的作用:负面例子的数量。理论上,我们证明了冲突覆盖率权衡的存在,这表明最佳的负面例子数量应该与数据中潜在概念的数量成比例。从经验上看,我们仔细研究了NLP和视觉任务中消极因素的数量所起的作用。在NLP任务中,我们发现结果与我们的理论基本一致,而我们的视觉实验比较模糊,有时甚至对负片数不敏感。我们讨论了这种行为的合理解释,并建议未来的方向,以更好地调整理论和实践。 摘要:Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many parameters such as the choice of data augmentation, the number of negative examples, and the batch size; however, there is limited understanding as to how these parameters interact and affect downstream performance. We focus on disambiguating the role of one of these parameters: the number of negative examples. Theoretically, we show the existence of a collision-coverage trade-off suggesting that the optimal number of negative examples should scale with the number of underlying concepts in the data. Empirically, we scrutinize the role of the number of negatives in both NLP and vision tasks. In the NLP task, we find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives. We discuss plausible explanations for this behavior and suggest future directions to better align theory and practice.
【4】 On Contrastive Representations of Stochastic Processes 标题:关于随机过程的对比表示
作者:Emile Mathieu,Adam Foster,Yee Whye Teh 机构:† Department of Statistics, University of Oxford, United Kingdom, ‡ Deepmind, United Kingdom 链接:https://arxiv.org/abs/2106.10052 摘要:随机过程的学习表示是机器学习中的一个新兴问题,从元学习到物理对象模型再到时间序列。典型的方法依赖于观测值的精确重建,但这种方法随着观测值变得高维或噪声分布变得复杂而失效。为了解决这个问题,我们提出了一个学习随机过程对比表示(CRESP)的统一框架,该框架不需要精确重构。我们剖析了随机过程表示的潜在用例,并提出了适应每种情况的方法。实验证明,我们的方法对于学习周期函数、三维物体和动力学过程的表示是有效的。我们的方法比传统方法更能容忍高维的噪声观测,并且学习到的表征可以转移到一系列的下游任务中。 摘要:Learning representations of stochastic processes is an emerging problem in machine learning with applications from meta-learning to physical object models to time series. Typical methods rely on exact reconstruction of observations, but this approach breaks down as observations become high-dimensional or noise distributions become complex. To address this, we propose a unifying framework for learning contrastive representations of stochastic processes (CRESP) that does away with exact reconstruction. We dissect potential use cases for stochastic process representations, and propose methods that accommodate each. Empirically, we show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes. Our methods tolerate noisy high-dimensional observations better than traditional approaches, and the learned representations transfer to a range of downstream tasks.
编码器(1篇)
【1】 Autoencoder-based cleaning in probabilistic databases 标题:概率数据库中基于自动编码器的清理
作者:R. R. Mauritz,F. P. J. Nijweide,J. Goseling,M. van Keulen 机构:University of Twente, Enschede, NL 备注:Submitted to ACM Journal of Data and Information Quality, Special Issue on Deep Learning for Data Quality 链接:https://arxiv.org/abs/2106.09764 摘要:在数据集成领域,数据的提取、合并和合并往往会遇到数据质量问题。概率数据集成方法表示概率数据库中有关不确定性等问题的信息。在本文中,我们提出了一个数据清洗自动编码器能够近自动的数据质量改善。它学习数据中的结构和依赖关系,以识别和更正可疑值。给出了一个理论框架,实验表明该方法能有效地去除分类概率数据和数值概率数据中的噪声。我们的方法不需要干净的数据。然而,我们确实证明了手动清理一小部分数据可以显著提高性能。 摘要:In the field of data integration, data quality problems are often encountered when extracting, combining, and merging data. The probabilistic data integration approach represents information about such problems as uncertainties in a probabilistic database. In this paper, we propose a data-cleaning autoencoder capable of near-automatic data quality improvement. It learns the structure and dependencies in the data to identify and correct doubtful values. A theoretical framework is provided, and experiments show that it can remove significant amounts of noise from categorical and numeric probabilistic data. Our method does not require clean data. We do, however, show that manually cleaning a small fraction of the data significantly improves performance.
优化|敛散性(4篇)
【1】 Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks 标题:基于深度自回归网络的批量多保真贝叶斯优化
作者:Shibo Li,Robert M. Kirby,Shandian Zhe 机构:School of Computing, University of Utah, Salt Lake City, UT 链接:https://arxiv.org/abs/2106.09884 摘要:贝叶斯优化(BO)是一种强大的黑盒优化方法,其代价昂贵。为了在成本和精度之间实现灵活的权衡,许多应用程序允许在不同的置信度下对函数进行评估。为了在最大化效益成本比的同时降低优化成本,本文提出了基于深度自回归网络的批量多保真度贝叶斯优化算法(BMBO-DARN)。我们使用一组贝叶斯神经网络来构造一个完全自回归的模型,该模型具有足够的表达能力来捕捉所有置信度之间强大而复杂的关系,从而提高代理学习和优化性能。此外,为了提高查询的质量和多样性,我们开发了一种简单而有效的批量查询方法,不需要对可信度进行任何组合搜索。我们提出了一个基于最大值熵搜索(MES)原理的批量获取函数,该函数惩罚高度相关的查询并鼓励多样性。我们使用后验样本和矩匹配来实现捕获函数的高效计算,并对每个保真度输入对进行交替优化,保证了每一步的改进。我们在四个实际的超参数优化应用中展示了我们的方法的优势。 摘要:Bayesian optimization (BO) is a powerful approach for optimizing black-box, expensive-to-evaluate functions. To enable a flexible trade-off between the cost and accuracy, many applications allow the function to be evaluated at different fidelities. In order to reduce the optimization cost while maximizing the benefit-cost ratio, in this paper, we propose Batch Multi-fidelity Bayesian Optimization with Deep Auto-Regressive Networks (BMBO-DARN). We use a set of Bayesian neural networks to construct a fully auto-regressive model, which is expressive enough to capture strong yet complex relationships across all the fidelities, so as to improve the surrogate learning and optimization performance. Furthermore, to enhance the quality and diversity of queries, we develop a simple yet efficient batch querying method, without any combinatorial search over the fidelities. We propose a batch acquisition function based on Max-value Entropy Search (MES) principle, which penalizes highly correlated queries and encourages diversity. We use posterior samples and moment matching to fulfill efficient computation of the acquisition function and conduct alternating optimization over every fidelity-input pair, which guarantees an improvement at each step. We demonstrate the advantage of our approach on four real-world hyperparameter optimization applications.
【2】 Shuffle Private Stochastic Convex Optimization 标题:洗牌私人随机凸优化
作者:Albert Cheu,Matthew Joseph,Jieming Mao,Binghui Peng 链接:https://arxiv.org/abs/2106.09805 摘要:在shuffle privacy中,每个用户向一个可信的shuffler发送一组随机消息,shuffler随机地排列这些消息,得到的消息集必须满足差异隐私。该模型的前期工作主要集中在使用单轮通信来计算算法原语(如均值、直方图和计数)的协议上。在这项工作中,我们提出了交互式随机凸优化洗牌协议。我们的优化协议依赖于一个新的非交互协议来求有界$\ellu 2$范数的向量的和。通过将该求和子程序与包括小批量随机梯度下降、加速梯度下降和Nesterov平滑方法在内的技术相结合,我们获得了各种凸损失函数的损失保证,这些凸损失函数显著改进了局部模型的损失保证,有时与中心模型的损失保证相匹配。 摘要:In shuffle privacy, each user sends a collection of randomized messages to a trusted shuffler, the shuffler randomly permutes these messages, and the resulting shuffled collection of messages must satisfy differential privacy. Prior work in this model has largely focused on protocols that use a single round of communication to compute algorithmic primitives like means, histograms, and counts. In this work, we present interactive shuffle protocols for stochastic convex optimization. Our optimization protocols rely on a new noninteractive protocol for summing vectors of bounded $\ell_2$ norm. By combining this sum subroutine with techniques including mini-batch stochastic gradient descent, accelerated gradient descent, and Nesterov's smoothing method, we obtain loss guarantees for a variety of convex loss functions that significantly improve on those of the local model and sometimes match those of the central model.
【3】 Escaping strict saddle points of the Moreau envelope in nonsmooth optimization 标题:非光滑优化中逃避Moreau包络的严格鞍点
作者:Damek Davis,Mateo Díaz,Dmitriy Drusvyatskiy 机构:Mateo D´ıaz† 备注:29 pages, 1 figure 链接:https://arxiv.org/abs/2106.09815 摘要:最近的研究表明,随机扰动梯度方法可以有效地避开光滑函数的严格鞍点。我们通过分析应用于Moreau包络的随机扰动梯度法的不精确模拟,将这一工作扩展到非光滑优化。主要结论是,各种非光滑优化算法都能以可控的速率避开Moreau包络的严格鞍点。主要的技术见解是,应用于近端子问题的典型算法产生的方向近似于Moreau包络的相对梯度。 摘要:Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbed gradient method applied to the Moreau envelope. The main conclusion is that a variety of algorithms for nonsmooth optimization can escape strict saddle points of the Moreau envelope at a controlled rate. The main technical insight is that typical algorithms applied to the proximal subproblem yield directions that approximate the gradient of the Moreau envelope in relative terms.
【4】 Gradient-free optimization of chaotic acoustics with reservoir computing 标题:基于水库计算的混沌声学无梯度优化
作者:Francisco Huhn,Luca Magri 机构:University of Cambridge, Department of Engineering, United Kingdom, Imperial College London, Aeronautics Department, United Kingdom, The Alan Turing Institute, United Kingdom, Institute of Advanced Study, TU Munich, Germany (visiting) 备注:15 figures, 23 pages 链接:https://arxiv.org/abs/2106.09780 摘要:我们发展了一种通用的优化方法,它可以找到使时间平均声学成本函数最小化的设计参数。该方法无梯度,模型信息丰富,数据驱动,基于回声状态网络的储层计算。首先,我们分析了回声状态网络在短期和长期动态预测方面的预测能力。我们发现,无论是完全数据驱动和模型通知架构学习混沌声学动力学,无论是时间准确和统计。用一种声学模式的物理降阶模型进行训练,显著提高了回声状态网络的精度和鲁棒性,同时保持了较低的计算成本。回声状态网络提供了对长时间动力学的精确预测,否则,通过积分控制方程来评估要优化的时间平均量会很昂贵。其次,我们将回音状态网路与贝氏技术结合,探讨设计热声参数空间。计算方法是最小的侵入。第三,我们找到了一组使混沌振荡的时间平均声能最小化的火焰参数,这些混沌振荡是由热源的正反馈引起的,例如燃气轮机或火箭发动机中的火焰。这些振荡称为热声振荡。找到的最佳火焰参数集的精度与蛮力网格搜索相同,但收敛速度快一个数量级以上。这项工作为混沌系统的非侵入式优化开辟了新的可能性,在这种优化中,产生数据的成本很高,例如从高保真模拟和实验中产生数据的成本很高。 摘要:We develop a versatile optimization method, which finds the design parameters that minimize time-averaged acoustic cost functionals. The method is gradient-free, model-informed, and data-driven with reservoir computing based on echo state networks. First, we analyse the predictive capabilities of echo state networks both in the short- and long-time prediction of the dynamics. We find that both fully data-driven and model-informed architectures learn the chaotic acoustic dynamics, both time-accurately and statistically. Informing the training with a physical reduced-order model with one acoustic mode markedly improves the accuracy and robustness of the echo state networks, whilst keeping the computational cost low. Echo state networks offer accurate predictions of the long-time dynamics, which would be otherwise expensive by integrating the governing equations to evaluate the time-averaged quantity to optimize. Second, we couple echo state networks with a Bayesian technique to explore the design thermoacoustic parameter space. The computational method is minimally intrusive. Third, we find the set of flame parameters that minimize the time-averaged acoustic energy of chaotic oscillations, which are caused by the positive feedback with a heat source, such as a flame in gas turbines or rocket motors. These oscillations are known as thermoacoustic oscillations. The optimal set of flame parameters is found with the same accuracy as brute-force grid search, but with a convergence rate that is more than one order of magnitude faster. This work opens up new possibilities for non-intrusive (``hands-off'') optimization of chaotic systems, in which the cost of generating data, for example from high-fidelity simulations and experiments, is high.
预测|估计(5篇)
【1】 Predicting gender of Brazilian names using deep learning 标题:基于深度学习的巴西人名性别预测
作者:Rosana C. B. Rego,Verônica M. L. Silva 机构:Program in Electrical and Computer Engineering, Federal University of Rio Grande do Norte, Brazil, Department of Engineering and Technology, Federal Rural University of Semi-Arid, Brazil 备注:9 pages, 8 figures 链接:https://arxiv.org/abs/2106.10156 摘要:通过名字来预测性别并不是一项简单的任务。在许多应用程序中,特别是在自然语言处理(NLP)领域,这项任务可能是必要的,主要是在考虑外来名称时。一些机器学习算法可以很好地进行预测。在本文中,我们研究并实现了前馈和递归的深层神经网络模型,如MLP、RNN、GRU、CNN和BiLSTM,通过名字来分类性别。巴西人名数据集用于训练和评估模型。我们分析了模型的准确度、召回率、精确度和混淆矩阵来衡量模型的性能。结果表明,通过将人名看作一组字符串的特征提取策略,可以进行性别预测。一些模型准确预测了90%以上病例的性别。递归模型克服了二值分类问题中的前馈模型。 摘要:Predicting gender by the name is not a simple task. In many applications, especially in the natural language processing (NLP) field, this task may be necessary, mainly when considering foreign names. Some machine learning algorithms can satisfactorily perform the prediction. In this paper, we examined and implemented feedforward and recurrent deep neural network models, such as MLP, RNN, GRU, CNN, and BiLSTM, to classify gender through the first name. A dataset of Brazilian names is used to train and evaluate the models. We analyzed the accuracy, recall, precision, and confusion matrix to measure the models' performances. The results indicate that the gender prediction can be performed from the feature extraction strategy looking at the names as a set of strings. Some models accurately predict the gender in more than 90% of the cases. The recurrent models overcome the feedforward models in this binary classification problem.
【2】 ScoreGrad: Multivariate Probabilistic Time Series Forecasting with Continuous Energy-based Generative Models 标题:ScoreGrad:基于连续能量生成模型的多变量概率时间序列预测
作者:Tijin Yan,Hongwei Zhang,Tong Zhou,Yufeng Zhan,Yuanqing Xia 机构: Zhan are with the School ofAutomation, Beijing Institute of Technology 备注:12 pages, 10 figures 链接:https://arxiv.org/abs/2106.10121 摘要:多元时间序列预测因其在智能交通、人工智能等领域的广泛应用而受到广泛关注。由于生成模型能够对数据分布进行建模并考虑噪声的影响,因此在时间序列建模中取得了令人瞩目的成果。然而,由于生成模型函数形式的限制或对超参数的敏感性,现有的许多研究成果不能得到广泛的应用。本文提出了基于连续能量生成模型的多元概率时间序列预测框架ScoreGrad。ScoreGrad由时间序列特征提取模块和基于条件随机微分方程的分数匹配模块组成。这种预测可以通过迭代求解逆时SDE来实现。据我们所知,ScoreGrad是第一个用于时间序列预测的基于连续能量的生成模型。此外,ScoreGrad在六个真实数据集上获得了最新的结果。此外,还探讨了超参数和采样器类型对性能的影响。代码位于https://github.com/yantijin/ScoreGradPred. 摘要:Multivariate time series prediction has attracted a lot of attention because of its wide applications such as intelligence transportation, AIOps. Generative models have achieved impressive results in time series modeling because they can model data distribution and take noise into consideration. However, many existing works can not be widely used because of the constraints of functional form of generative models or the sensitivity to hyperparameters. In this paper, we propose ScoreGrad, a multivariate probabilistic time series forecasting framework based on continuous energy-based generative models. ScoreGrad is composed of time series feature extraction module and conditional stochastic differential equation based score matching module. The prediction can be achieved by iteratively solving reverse-time SDE. To the best of our knowledge, ScoreGrad is the first continuous energy based generative model used for time series forecasting. Furthermore, ScoreGrad achieves state-of-the-art results on six real-world datasets. The impact of hyperparameters and sampler types on the performance are also explored. Code is available at https://github.com/yantijin/ScoreGradPred.
【3】 PAC Prediction Sets Under Covariate Shift 标题:协变量漂移下的PAC预测集
作者:Sangdon Park,Edgar Dobriban,Insup Lee,Osbert Bastani 机构:Dept. of Computer & Info. Science, PRECISE Center, University of Pennsylvania, Dept. of Statistics & Data Science, The Wharton School 链接:https://arxiv.org/abs/2106.09848 摘要:现代机器学习面临的一个重要挑战是如何严格量化模型预测的不确定性。当潜在数据分布发生变化,可能使预测模型失效时,传递不确定性尤为重要。然而,大多数现有的不确定性量化算法在出现这种变化时会崩溃。我们提出了一种新的方法,通过在协变量移位的情况下构造\emph{可能近似正确(PAC)}预测集来解决这个问题。我们的方法侧重于从源分布(我们标记了训练示例)到目标分布(我们要量化不确定性)的协变量转移。我们的算法假设给定的重要性权重编码如何在协变移位下改变训练样本的概率。在实践中,通常需要估计重要性权重;因此,我们将我们的算法扩展到这样的设置,即我们得到的是重要性权重的置信区间,而不是它们的真实值。我们证明了我们的方法对基于DomainNet和ImageNet数据集设计的各种协变量转移的有效性。 摘要:An important challenge facing modern machine learning is how to rigorously quantify the uncertainty of model predictions. Conveying uncertainty is especially important when there are changes to the underlying data distribution that might invalidate the predictive model. Yet, most existing uncertainty quantification algorithms break down in the presence of such shifts. We propose a novel approach that addresses this challenge by constructing \emph{probably approximately correct (PAC)} prediction sets in the presence of covariate shift. Our approach focuses on the setting where there is a covariate shift from the source distribution (where we have labeled training examples) to the target distribution (for which we want to quantify uncertainty). Our algorithm assumes given importance weights that encode how the probabilities of the training examples change under the covariate shift. In practice, importance weights typically need to be estimated; thus, we extend our algorithm to the setting where we are given confidence intervals for the importance weights rather than their true value. We demonstrate the effectiveness of our approach on various covariate shifts designed based on the DomainNet and ImageNet datasets.
【4】 Machining Cycle Time Prediction: Data-driven Modelling of Machine Tool Feedrate Behavior with Neural Networks 标题:加工周期预测:基于神经网络的机床进给行为数据驱动建模
作者:Chao Sun,Javier Dominguez-Caballero,Rob Ward,Sabino Ayvar-Soberanis,David Curtis 机构: Advanced Manufacturing Research Centre, University of Sheffield, Rotherham, S,TZ, UK 链接:https://arxiv.org/abs/2106.09719 摘要:加工周期的准确预测在制造业中具有重要意义。通常,计算机辅助制造(CAM)软件使用基本的运动学设置,使用刀轨文件中的指令进给速度来估计加工时间。通常,这些方法不考虑刀轨几何形状或刀轨公差,因此大大低估了加工周期时间。通过建立机床各轴的神经网络模型,提出了一种数据驱动的进给速度和加工周期预测方法。在这项研究中,由指令进给速度、名义加速度、刀轨几何形状和测量进给速度组成的数据集被用来训练神经网络模型。在工业加工中心上进行了典型工业薄壁构件的验证试验,结果表明,该方法估计加工时间的准确率在90%以上。该方法表明,神经网络模型具有学习复杂机床系统行为和预测加工周期的能力。这些方法的进一步整合对于工业4.0中数字孪生的植入至关重要。 摘要:Accurate prediction of machining cycle times is important in the manufacturing industry. Usually, Computer Aided Manufacturing (CAM) software estimates the machining times using the commanded feedrate from the toolpath file using basic kinematic settings. Typically, the methods do not account for toolpath geometry or toolpath tolerance and therefore under estimate the machining cycle times considerably. Removing the need for machine specific knowledge, this paper presents a data-driven feedrate and machining cycle time prediction method by building a neural network model for each machine tool axis. In this study, datasets composed of the commanded feedrate, nominal acceleration, toolpath geometry and the measured feedrate were used to train a neural network model. Validation trials using a representative industrial thin wall structure component on a commercial machining centre showed that this method estimated the machining time with more than 90% accuracy. This method showed that neural network models have the capability to learn the behavior of a complex machine tool system and predict cycle times. Further integration of the methods will be critical in the implantation of digital twins in Industry 4.0.
【5】 Efficient Black-Box Importance Sampling for VaR and CVaR Estimation 标题:VaR和CVaR估计的有效黑箱重要抽样
作者:Anand Deo,Karthyek Murthy 机构:Singapore University of Technology and Design, Somapah Rd, Singapore 链接:https://arxiv.org/abs/2106.10236 摘要:本文考虑了用机器学习特征映射或混合整数线性优化公式等复杂对象来估计损失尾部风险的重要性抽样方法。假设只有黑箱访问损失和潜在随机向量的分布,本文提出了一种有效的估计风险值和条件风险值的IS算法。在任何IS程序中,关键的挑战是,识别适当的测量变化,通过自结构IS转换实现自动化,该转换学习并复制不太罕见样本中条件过剩的浓度特性。当在对数尺度上观察时,得到的估计享受渐近最优方差缩减。仿真实验验证了该方案的有效性和实用性 摘要:This paper considers Importance Sampling (IS) for the estimation of tail risks of a loss defined in terms of a sophisticated object such as a machine learning feature map or a mixed integer linear optimisation formulation. Assuming only black-box access to the loss and the distribution of the underlying random vector, the paper presents an efficient IS algorithm for estimating the Value at Risk and Conditional Value at Risk. The key challenge in any IS procedure, namely, identifying an appropriate change-of-measure, is automated with a self-structuring IS transformation that learns and replicates the concentration properties of the conditional excess from less rare samples. The resulting estimators enjoy asymptotically optimal variance reduction when viewed in the logarithmic scale. Simulation experiments highlight the efficacy and practicality of the proposed scheme
其他神经网络|深度学习|模型|建模(17篇)
【1】 An Empirical Investigation into Deep and Shallow Rule Learning 标题:深层规则学习与浅层规则学习的实证研究
作者:Florian Beck,Johannes Fürnkranz 机构:Application-oriented Knowledge Processing (FAW), Department of Computer Science, Johannes Kepler University Linz, Austria 链接:https://arxiv.org/abs/2106.10254 摘要:归纳规则学习是机器学习中最传统的模式之一。尽管多年来我们在学习基于规则的理论方面取得了相当大的进步,但是所有最先进的学习者仍然学习直接将输入特征与目标概念联系起来的描述。在最简单的情况下,概念学习,这是一个析取范式(DNF)描述的积极类。很明显,从逻辑的角度来看,这是足够的,因为每个逻辑表达式都可以简化为一个等价的DNF表达式,然而,更结构化的表示形式,通过形成中间概念形成深层理论,可能更容易学习,与深度神经网络相比,深度神经网络的性能要优于浅层网络,尽管后者也是通用函数逼近器。在本文中,我们将深度和浅层规则学习与基于贪心小批量优化的统一通用算法进行了实证比较。我们在人工和真实基准数据上的实验表明,深规则网络的性能优于浅规则网络。 摘要:Inductive rule learning is arguably among the most traditional paradigms in machine learning. Although we have seen considerable progress over the years in learning rule-based theories, all state-of-the-art learners still learn descriptions that directly relate the input features to the target concept. In the simplest case, concept learning, this is a disjunctive normal form (DNF) description of the positive class. While it is clear that this is sufficient from a logical point of view because every logical expression can be reduced to an equivalent DNF expression, it could nevertheless be the case that more structured representations, which form deep theories by forming intermediate concepts, could be easier to learn, in very much the same way as deep neural networks are able to outperform shallow networks, even though the latter are also universal function approximators. In this paper, we empirically compare deep and shallow rule learning with a uniform general algorithm, which relies on greedy mini-batch based optimization. Our experiments on both artificial and real-world benchmark data indicate that deep rule networks outperform shallow networks.
【2】 Distributed Deep Learning in Open Collaborations 标题:开放协作中的分布式深度学习
作者:Michael Diskin,Alexey Bukhtiyarov,Max Ryabinin,Lucile Saulnier,Quentin Lhoest,Anton Sinitsin,Dmitry Popov,Dmitry Pyrkin,Maxim Kashirin,Alexander Borzunov,Albert Villanova del Moral,Denis Mazur,Ilia Kobelev,Yacine Jernite,Thomas Wolf,Gennady Pekhimenko 机构:† Yandex, Russia, ‡ Hugging Face, USA, ♥ HSE University, Russia, ♣ Moscow Institute of Physics and Technology, Russia, ♦ University of Toronto, Canada, ♠ Vector Institute, Canada 备注:30 pages, 9 figures. Code: this https URL 链接:https://arxiv.org/abs/2106.10207 摘要:现代的深度学习应用需要越来越多的计算机来训练最先进的模型。为了满足这一需求,大型公司和机构使用专用的高性能计算集群,这些集群的构建和维护成本高昂,远远超出了大多数组织的预算。因此,一些研究方向成为少数大型工业界和更少学术界人士的专属领域。为了缓解这种差异,较小的小组可以集中他们的计算资源,并进行有利于所有参与者的合作实验。这种模式被称为网格或志愿计算,在许多科学领域都有成功的应用。然而,由于高延迟、非对称带宽以及志愿计算特有的一些挑战,使用这种方法进行机器学习是困难的。在这项工作中,我们仔细分析了这些限制,并提出了一个新的算法框架,专为协作训练设计。我们证明了我们的方法在实际条件下对SwAV和ALBERT预训练的有效性,并以很小的成本实现了与传统设置相当的性能。最后,我们提供了一份40人参与的成功的合作语言模式预训练的详细报告。 摘要:Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a few large industrial and even fewer academic actors. To alleviate this disparity, smaller groups may pool their computational resources and run collaborative experiments that benefit all participants. This paradigm, known as grid- or volunteer computing, has seen successful applications in numerous scientific areas. However, using this approach for machine learning is difficult due to high latency, asymmetric bandwidth, and several challenges unique to volunteer computing. In this work, we carefully analyze these constraints and propose a novel algorithmic framework designed specifically for collaborative training. We demonstrate the effectiveness of our approach for SwAV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost. Finally, we provide a detailed report of successful collaborative language model pretraining with 40 participants.
【3】 An Investigation into Mini-Batch Rule Learning 标题:关于小批量规则学习的研究
作者:Florian Beck,Johannes Fürnkranz 机构:Institute for Application-oriented Knowledge Processing (FAW), JKU Linz, Austria 备注:None 链接:https://arxiv.org/abs/2106.10202 摘要:我们研究是否有可能学习规则集有效地在一个网络结构与一个单一的隐藏层使用迭代细化在小批量的例子。第一个基本版本在除一个数据集之外的所有数据集上都显示了可接受的性能,尽管它还没有达到Ripper的性能级别。 摘要:We investigate whether it is possible to learn rule sets efficiently in a network structure with a single hidden layer using iterative refinements over mini-batches of examples. A first rudimentary version shows an acceptable performance on all but one dataset, even though it does not yet reach the performance levels of Ripper.
【4】 The Principles of Deep Learning Theory 标题:深度学习理论的基本原理
作者:Daniel A. Roberts,Sho Yaida,Boris Hanin 机构:based on research in collaboration with, arXiv:,.,v, [cs.LG] , Jun 备注:451 pages, to be published by Cambridge University Press 链接:https://arxiv.org/abs/2106.10165 摘要:这本书开发了一个有效的理论方法来理解实际意义的深层神经网络。从网络的第一性原理分量图出发,说明了如何通过求解层间迭代方程和非线性学习动力学来确定训练网络输出的精确描述。主要结果是网络的预测是近似高斯分布的,网络的深宽比控制着与无限宽高斯描述的偏差。我们解释了这些深度网络如何有效地从训练中学习非平凡的表示,并更广泛地分析了非线性模型的表示学习机制。从近似核方法的角度,我们发现这种模型的预测对底层学习算法的依赖性可以用一种简单而通用的方式来表示。为了得到这些结果,我们提出了表示群流(RG-flow)的概念来描述信号在网络中的传播。通过将网络调整到临界状态,我们给出了爆炸和消失梯度问题的一个实用解。我们进一步解释了RG流如何导致接近普遍性的行为,并让我们将从不同激活函数构建的网络分类为普遍性类。总之,我们证明了深度与宽度之比决定了训练网络集合的有效模型复杂性。通过使用信息论技术,我们估计了最佳的纵横比,在这个比例下,我们期望网络实际上是最有用的,并展示了如何使用剩余连接将这个比例推到任意深度。通过这些工具,我们可以详细了解体系结构、超参数和优化器的归纳偏差。 摘要:This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.
【5】 Steerable Partial Differential Operators for Equivariant Neural Networks 标题:等变神经网络的可控偏微分算子
作者:Erik Jenner,Maurice Weiler 机构:University of Amsterdam 备注:43 pages, 4 figures, code available at this https URL 链接:https://arxiv.org/abs/2106.10163 摘要:最近在等变深度学习方面的工作与物理学有很强的相似性。基空间上的场是两个主题中的基本实体,这些场之间的等变映射也是如此。然而,在深度学习中,这些映射通常是由核卷积定义的,而在物理学中它们是偏微分算子(pdo)。在深度学习的背景下发展等变PDO理论可以使这些学科更加紧密地联系在一起,并导致更强大的思想流动。在这项工作中,我们推导了一个$G$-可操纵性约束,它完全刻画了任意对称群$G$的特征向量场之间的PDO是等变的。然后,我们完全解决了几个重要群体的这个约束。我们使用我们的解决方案作为卷积层的等变量替换,并在该角色中对它们进行基准测试。最后,我们发展了一个基于Schwartz分布的等变映射框架,它统一了经典卷积和微分算子,并给出了它们之间的关系。 摘要:Recent work in equivariant deep learning bears strong similarities to physics. Fields over a base space are fundamental entities in both subjects, as are equivariant maps between these fields. In deep learning, however, these maps are usually defined by convolutions with a kernel, whereas they are partial differential operators (PDOs) in physics. Developing the theory of equivariant PDOs in the context of deep learning could bring these subjects even closer together and lead to a stronger flow of ideas. In this work, we derive a $G$-steerability constraint that completely characterizes when a PDO between feature vector fields is equivariant, for arbitrary symmetry groups $G$. We then fully solve this constraint for several important groups. We use our solutions as equivariant drop-in replacements for convolutional layers and benchmark them in that role. Finally, we develop a framework for equivariant maps based on Schwartz distributions that unifies classical convolutions and differential operators and gives insight about the relation between the two.
【6】 Learning to Generate Code Sketches 标题:学习生成代码草图
作者:Daya Guo,Alexey Svyatkovskiy,Jian Yin,Nan Duan,Marc Brockschmidt,Miltiadis Allamanis 机构:Microsoft Research, Beijing, China, Redmond, WA, USA, School of Data and Computer Science, Sun Yat-sen University, China, Cambridge, UK 链接:https://arxiv.org/abs/2106.10158 摘要:传统的生成模型仅限于预测终端令牌序列。但是,生成任务中的模糊性可能会导致不正确的输出。为了解决这个问题,我们引入了Grammformers,一种基于转换器的语法引导模型,它学习(没有明确的监督)生成草图——带孔的标记序列。通过强化学习,语法形成者学习引入洞,避免在目标任务有歧义的地方产生错误的标记。我们训练程序员完成语句级源代码,也就是说,在给定不明确的用户意图(例如部分代码上下文)的情况下生成代码片段。我们评估了C#和Python的程序员的代码完成情况,结果表明,与传统的生成模型相比,它生成的草图精度提高了10-50%,与使用类似技术训练的草图生成基线相比,它生成的草图更长了37-50%。 摘要:Traditional generative models are limited to predicting sequences of terminal tokens. However, ambiguities in the generation task may lead to incorrect outputs. Towards addressing this, we introduce Grammformers, transformer-based grammar-guided models that learn (without explicit supervision) to generate sketches -- sequences of tokens with holes. Through reinforcement learning, Grammformers learn to introduce holes avoiding the generation of incorrect tokens where there is ambiguity in the target task. We train Grammformers for statement-level source code completion, i.e., the generation of code snippets given an ambiguous user intent, such as a partial code context. We evaluate Grammformers on code completion for C# and Python and show that it generates 10-50% more accurate sketches compared to traditional generative models and 37-50% longer sketches compared to sketch-generating baselines trained with similar techniques.
【7】 Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in Deep Neural Networks 标题:深度神经网络中基于触发集的水印鲁棒性评估
作者:Suyoung Lee,Wonho Song,Suman Jana,Meeyoung Cha,Sooel Son 机构:School of Computing, KAIST, Columbia University 链接:https://arxiv.org/abs/2106.10147 摘要:基于触发集的数字水印方案为深度神经网络模型所有者提供了一种证明所有权的方法,因此受到了广泛的关注。在本文中,我们认为最新的基于触发集的水印算法并没有达到证明所有权的目的。我们认为,这种能力受损的原因是现有的研究实践在评估水印算法鲁棒性时存在两个常见的实验缺陷:(1)不完全对抗性评估和(2)忽略自适应攻击。我们对10个有代表性的水印方案进行了综合的对抗性评估,结果表明,每一个水印方案对至少两种攻击都缺乏鲁棒性。我们还提出了新的自适应攻击,利用对手的知识的基础水印算法的目标模型。我们证明,所提出的攻击有效地破坏了所有的10个水印方案,从而允许对手掩盖任何水印模型的所有权。我们鼓励后续研究在评估其水印方案的鲁棒性时考虑我们的指导方针,通过进行全面的对抗性评估,包括我们的自适应攻击,以证明水印鲁棒性的有意义的上界。 摘要:Trigger set-based watermarking schemes have gained emerging attention as they provide a means to prove ownership for deep neural network model owners. In this paper, we argue that state-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership. We posit that this impaired capability stems from two common experimental flaws that the existing research practice has committed when evaluating the robustness of watermarking algorithms: (1) incomplete adversarial evaluation and (2) overlooked adaptive attacks. We conduct a comprehensive adversarial evaluation of 10 representative watermarking schemes against six of the existing attacks and demonstrate that each of these watermarking schemes lacks robustness against at least two attacks. We also propose novel adaptive attacks that harness the adversary's knowledge of the underlying watermarking algorithm of a target model. We demonstrate that the proposed attacks effectively break all of the 10 watermarking schemes, consequently allowing adversaries to obscure the ownership of any watermarked model. We encourage follow-up studies to consider our guidelines when evaluating the robustness of their watermarking schemes via conducting comprehensive adversarial evaluation that include our adaptive attacks to demonstrate a meaningful upper bound of watermark robustness.
【8】 Learning to Plan via a Multi-Step Policy Regression Method 标题:通过多步策略回归方法学习计划
作者:Stefan Wagner,Michael Janschek,Tobias Uelwer,Stefan Harmeling 机构:Department of Computer Science, Heinrich Heine University D¨usseldorf, Germany 备注:Accepted at the 30th International Conference on Artificial Neural Networks (ICANN 2021) 链接:https://arxiv.org/abs/2106.10075 摘要:我们提出了一种新的方法来提高推理性能的环境中,需要一个特定的序列的行动,以解决。例如,在迷宫环境中,理想情况下确定了最佳路径。我们希望学习一个可以提前预测n个动作的策略,而不是一步一步地学习一个策略。我们提出的策略水平回归(PHR)方法利用A2C采样的环境知识,在一个策略蒸馏设置中学习一个n维的策略向量,每个观测值产生n个连续动作。我们在微网格和Pong环境下对我们的方法进行了测试,通过成功地预测单个观测的动作序列,在推理过程中显示出极大的加速。 摘要:We propose a new approach to increase inference performance in environments that require a specific sequence of actions in order to be solved. This is for example the case for maze environments where ideally an optimal path is determined. Instead of learning a policy for a single step, we want to learn a policy that can predict n actions in advance. Our proposed method called policy horizon regression (PHR) uses knowledge of the environment sampled by A2C to learn an n dimensional policy vector in a policy distillation setup which yields n sequential actions per observation. We test our method on the MiniGrid and Pong environments and show drastic speedup during inference time by successfully predicting sequences of actions on a single observation.
【9】 Being a Bit Frequentist Improves Bayesian Neural Networks 标题:比特频率法改进了贝叶斯神经网络
作者:Agustinus Kristiadi,Matthias Hein,Philipp Hennig 机构:University of Tübingen and MPI for Intelligent Systems, Tübingen 链接:https://arxiv.org/abs/2106.10065 摘要:尽管贝叶斯神经网络(BNNs)具有令人信服的理论特性,但在基于分类的不确定性量化(UQ)任务(如分布外检测(OOD)和数据集移位鲁棒性)中,其性能往往比频点法差。在这项工作中,基于先前工作中的经验发现,我们假设这个问题是由于在所谓的“OOD训练”中避免了贝叶斯方法,这是一系列在训练过程中合并OOD数据的技术,这一直是最先进的频繁UQ方法的一个组成部分。为了验证这一点,我们将OOD数据作为BNN训练中的一等公民,探索了将OOD数据整合到贝叶斯推理中的四种不同方法。我们在广泛的实验中表明,OOD训练的bnn是有竞争力的,如果不是比最近的频繁基线。因此,这项工作提供了强有力的基线,为今后的工作在贝叶斯和频繁的UQ。 摘要:Despite their compelling theoretical properties, Bayesian neural networks (BNNs) tend to perform worse than frequentist methods in classification-based uncertainty quantification (UQ) tasks such as out-of-distribution (OOD) detection and dataset-shift robustness. In this work, based on empirical findings in prior works, we hypothesize that this issue is due to the avoidance of Bayesian methods in the so-called "OOD training" -- a family of techniques for incorporating OOD data during training process, which has since been an integral part of state-of-the-art frequentist UQ methods. To validate this, we treat OOD data as a first-class citizen in BNN training by exploring four different ways of incorporating OOD data in Bayesian inference. We show in extensive experiments that OOD-trained BNNs are competitive to, if not better than recent frequentist baselines. This work thus provides strong baselines for future work in both Bayesian and frequentist UQ.
【10】 Effective Model Sparsification by Scheduled Grow-and-Prune Methods 标题:基于调度生长修剪方法的有效模型稀疏
作者:Xiaolong Ma,Minghai Qin,Fei Sun,Zejiang Hou,Kun Yuan,Yi Xu,Yanzhi Wang,Yen-Kuang Chen,Rong Jin,Yuan Xie 机构: DAMO Academy, Alibaba Group, Northeastern University, Princeton University 链接:https://arxiv.org/abs/2106.09857 摘要:深度神经网络(DNNs)是解决现实问题的有效方法。较大的DNN模型通常表现出更好的质量(如精度),但其过多的计算会导致较长的训练和推理时间。模型稀疏化可以在保持模型质量的同时减少计算量和内存开销。现有的稀疏化算法大多是单向地去除权值,而其他算法则是随机或贪婪地在每一层中寻找权值的一小部分。算法的低效性降低了可实现的稀疏性水平。此外,许多算法仍然需要预先训练密集模型,因此内存占用大,训练时间长。本文提出了一种新的无需对稠密模型进行预训练的计划增长与剪枝(GaP)方法。它解决了以往工作的不足之处,通过反复增长一个子集的层密集,然后修剪回稀疏后,一些训练。实验表明,在图像分类、目标检测、三维物体分割和平移等多种任务中,该模型在80%的稀疏度下都能达到或超过高度优化的稠密模型的质量。它们也优于其他最先进的剪枝方法,包括从预先训练的密集模型中剪枝。例如,通过GaP获得的90%稀疏ResNet-50在ImageNet上达到77.9%的top-1精度,使SOTA结果提高了1.5%。 摘要:Deep neural networks (DNNs) are effective in solving many real-world problems. Larger DNN models usually exhibit better quality (e.g., accuracy) but their excessive computation results in long training and inference time. Model sparsification can reduce the computation and memory cost while maintaining model quality. Most existing sparsification algorithms unidirectionally remove weights, while others randomly or greedily explore a small subset of weights in each layer. The inefficiency of the algorithms reduces the achievable sparsity level. In addition, many algorithms still require pre-trained dense models and thus suffer from large memory footprint and long training time. In this paper, we propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. It addresses the shortcomings of the previous works by repeatedly growing a subset of layers to dense and then pruning back to sparse after some training. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks, such as image classification, objective detection, 3D object part segmentation, and translation. They also outperform other state-of-the-art (SOTA) pruning methods, including pruning from pre-trained dense models. As an example, a 90% sparse ResNet-50 obtained via GaP achieves 77.9% top-1 accuracy on ImageNet, improving the SOTA results by 1.5%.
【11】 Dual-Teacher Class-Incremental Learning With Data-Free Generative Replay 标题:双师课堂--无数据生成性回放的增量学习
作者:Yoojin Choi,Mostafa El-Khamy,Jungwon Lee 机构:SoC R&D, Samsung Semiconductor Inc., System LSI, Samsung Electronics, San Diego, CA , USA, South Korea 备注:CVPR 2021 Workshop on Continual Learning in Computer Vision (CLVision) 链接:https://arxiv.org/abs/2106.09835 摘要:提出了两种新的课堂增量学习知识转移技术。首先,我们提出了无数据生成重放(DF-GR)方法,利用生成模型中的合成样本来缓解CIL中的灾难性遗忘。在传统的生成重放中,生成模型是针对旧数据进行预训练,并在额外的内存中共享以供以后的增量学习。在我们提出的DF-GR中,我们在以往的预训练分类模型的基础上,不使用任何训练数据,从零开始训练生成模型,从而降低了共享预训练生成模型的成本。第二,我们引入双教师信息提取(DT-ID)来实现从两个教师到一个学生的知识提取。在CIL中,我们使用DT-ID在旧类的预训练模型和新类的新数据的另一个模型(预训练)的基础上增量地学习新类。我们在最先进的CIL方法的基础上实现了所提出的方案,并在CIFAR-100和ImageNet数据集上显示了性能的改进。 摘要:This paper proposes two novel knowledge transfer techniques for class-incremental learning (CIL). First, we propose data-free generative replay (DF-GR) to mitigate catastrophic forgetting in CIL by using synthetic samples from a generative model. In the conventional generative replay, the generative model is pre-trained for old data and shared in extra memory for later incremental learning. In our proposed DF-GR, we train a generative model from scratch without using any training data, based on the pre-trained classification model from the past, so we curtail the cost of sharing pre-trained generative models. Second, we introduce dual-teacher information distillation (DT-ID) for knowledge distillation from two teachers to one student. In CIL, we use DT-ID to learn new classes incrementally based on the pre-trained model for old classes and another model (pre-)trained on the new data for new classes. We implemented the proposed schemes on top of one of the state-of-the-art CIL methods and showed the performance improvement on CIFAR-100 and ImageNet datasets.
【12】 On Effects of Compression with Hyperdimensional Computing in Distributed Randomized Neural Networks 标题:分布式随机神经网络中超维计算的压缩效应研究
作者:Antonello Rosato,Massimo Panella,Evgeny Osipov,Denis Kleyko 机构: University of Rome “La Sapienza”, Rome, Italy, University of California, Berkeley, USA 备注:12 pages, 3 figures 链接:https://arxiv.org/abs/2106.09831 摘要:在不久的将来,主流的有监督学习技术将发生变化:从复杂的、计算量大的算法转向更灵活的、基本的训练算法。随机算法的强大生命力可以在这一前景中体现出来。我们最近提出了一个基于随机神经网络和超维计算的分布式分类模型,该模型考虑了使用压缩的代理之间信息交换的代价。压缩的使用很重要,因为它解决了与通信瓶颈相关的问题,然而,原始方法在使用压缩的方式上是僵化的。因此,在这项工作中,我们提出了一种更灵活的压缩方法,并将其与传统的压缩算法、降维和量化技术进行了比较。 摘要:A change of the prevalent supervised learning techniques is foreseeable in the near future: from the complex, computational expensive algorithms to more flexible and elementary training ones. The strong revitalization of randomized algorithms can be framed in this prospect steering. We recently proposed a model for distributed classification based on randomized neural networks and hyperdimensional computing, which takes into account cost of information exchange between agents using compression. The use of compression is important as it addresses the issues related to the communication bottleneck, however, the original approach is rigid in the way the compression is used. Therefore, in this work, we propose a more flexible approach to compression and compare it to conventional compression algorithms, dimensionality reduction, and quantization techniques.
【13】 CIRA Guide to Custom Loss Functions for Neural Networks in Environmental Sciences -- Version 1 标题:环境科学中神经网络的自定义损失函数的CIRA指南-第1版
作者:Imme Ebert-Uphoff,Ryan Lagerquist,Kyle Hilburn,Yoonjin Lee,Katherine Haynes,Jason Stock,Christina Kumler,Jebb Q. Stewart 机构:CIRA, ECE, CIRA, NOAA-GSL, CS, CIRES, NOAA-GSL 备注:37 pages 链接:https://arxiv.org/abs/2106.09757 摘要:神经网络在环境科学中的应用越来越广泛。此外,神经网络模型是通过最小化损失函数来训练的,对于环境科学应用来说,非常谨慎地选择损失函数是至关重要的,因为它决定了要优化什么。标准损失函数并不涵盖环境科学的所有需求,这使得科学家能够开发自己的自定义损失函数,以便能够实现环境科学中已经开发的许多经典性能度量,包括为空间模型验证而开发的度量,这一点很重要。然而,只有很少的资源能够全面地涵盖自定义损失函数开发的基础知识,据我们所知,没有一个资源关注于环境科学家的需求。本文试图通过提供如何编写面向环境科学应用的自定义损失函数的指南来填补这一空白。主题包括编写自定义损失函数的基础知识、常见陷阱、损失函数中使用的函数、示例(例如分数技能分数作为损失函数)、如何合并物理约束、离散化和软离散化,以及概念(例如焦点损失、稳健损失和自适应损失)。虽然本指南中目前提供了使用Keras的Python和TensorFlow后端的示例,但基本概念也适用于其他环境,例如使用PyTorch的Python。类似地,虽然这里提供的示例损失函数来自气象学,但这些只是如何创建自定义损失函数的示例。环境科学中的其他领域对自定义损失函数有着非常相似的需求,例如,有效地评估空间预测,这里讨论的概念也可以在那里应用。所有代码示例都在GitHub存储库中提供。 摘要:Neural networks are increasingly used in environmental science applications. Furthermore, neural network models are trained by minimizing a loss function, and it is crucial to choose the loss function very carefully for environmental science applications, as it determines what exactly is being optimized. Standard loss functions do not cover all the needs of the environmental sciences, which makes it important for scientists to be able to develop their own custom loss functions so that they can implement many of the classic performance measures already developed in environmental science, including measures developed for spatial model verification. However, there are very few resources available that cover the basics of custom loss function development comprehensively, and to the best of our knowledge none that focus on the needs of environmental scientists. This document seeks to fill this gap by providing a guide on how to write custom loss functions targeted toward environmental science applications. Topics include the basics of writing custom loss functions, common pitfalls, functions to use in loss functions, examples such as fractions skill score as loss function, how to incorporate physical constraints, discrete and soft discretization, and concepts such as focal, robust, and adaptive loss. While examples are currently provided in this guide for Python with Keras and the TensorFlow backend, the basic concepts also apply to other environments, such as Python with PyTorch. Similarly, while the sample loss functions provided here are from meteorology, these are just examples of how to create custom loss functions. Other fields in the environmental sciences have very similar needs for custom loss functions, e.g., for evaluating spatial forecasts effectively, and the concepts discussed here can be applied there as well. All code samples are provided in a GitHub repository.
【14】 PyKale: Knowledge-Aware Machine Learning from Multiple Sources in Python 标题:PyKale:基于Python的多源知识机器学习
作者:Haiping Lu,Xianyuan Liu,Robert Turner,Peizhen Bai,Raivo E Koot,Shuo Zhou,Mustafa Chasmai,Lawrence Schobs 机构:The University of Sheffield, Sheffield, United Kingdom, Indian Institute of Technology, Delhi, New Delhi, India 备注:This library is available at this https URL 链接:https://arxiv.org/abs/2106.09756 摘要:机器学习是一种多学科交叉研究的通用技术。然而,当大多数机器学习工具在不同领域分别开发时,在跨越学科界限方面存在着明显的障碍。我们介绍Pykale-一个Python库,用于图形、图像、文本和视频的知识感知机器学习,以支持和加速跨学科研究。我们在标准软件工程实践的基础上制定了新的绿色机器学习准则,并提出了一种新的基于流水线的应用程序编程接口(API)。PyKale专注于利用来自多个来源的知识进行准确和可解释的预测,从而通过最新的深度学习和降维模型支持多模式学习和迁移学习(特别是领域适应)。我们在Pytork上建立PyKale,并利用丰富的Pytork生态系统。我们基于管道的API设计加强了标准化和极简主义,通过减少重复和冗余、重用现有资源和跨领域回收学习模型,拥抱绿色机器学习概念。我们通过生物信息学、知识图、图像/视频识别和医学成像的例子来展示它的跨学科性质。 摘要:Machine learning is a general-purpose technology holding promises for many interdisciplinary research problems. However, significant barriers exist in crossing disciplinary boundaries when most machine learning tools are developed in different areas separately. We present Pykale - a Python library for knowledge-aware machine learning on graphs, images, texts, and videos to enable and accelerate interdisciplinary research. We formulate new green machine learning guidelines based on standard software engineering practices and propose a novel pipeline-based application programming interface (API). PyKale focuses on leveraging knowledge from multiple sources for accurate and interpretable prediction, thus supporting multimodal learning and transfer learning (particularly domain adaptation) with latest deep learning and dimensionality reduction models. We build PyKale on PyTorch and leverage the rich PyTorch ecosystem. Our pipeline-based API design enforces standardization and minimalism, embracing green machine learning concepts via reducing repetitions and redundancy, reusing existing resources, and recycling learning models across areas. We demonstrate its interdisciplinary nature via examples in bioinformatics, knowledge graph, image/video recognition, and medical imaging.
【15】 Fitting summary statistics of neural data with a differentiable spiking network simulator 标题:用微分尖峰网络模拟器拟合神经数据的汇总统计
作者:Guillaume Bellec,Shuqi Wang,Alireza Modirshanechi,Johanni Brea,Wulfram Gerstner 机构:Laboratory of Computational Neuroscience, École polytechnique fédérale de Lausanne (EPFL) 链接:https://arxiv.org/abs/2106.10064 摘要:将网络模型与神经活动相匹配正成为神经科学的一个重要工具。一种流行的方法是用概率循环尖峰网络来模拟大脑区域,其参数使记录活动的可能性最大化。虽然这是广泛使用的,我们表明,当未记录的神经元对记录的网络有实质性影响时,所得到的模型不会产生真实的神经活动,并且错误地估计了连接矩阵。为了纠正这一点,我们建议用测量模拟活动和记录活动之间差异的项来增加对数可能性。这种差异性是通过神经科学中常用的汇总统计来定义的,这种优化是有效的,因为它依赖于通过随机模拟的尖峰序列的反向传播。我们从理论上分析了该方法,并通过实验证明了该方法比其他方法生成更真实的活动统计信息和更好地恢复连通矩阵。 摘要:Fitting network models to neural activity is becoming an important tool in neuroscience. A popular approach is to model a brain area with a probabilistic recurrent spiking network whose parameters maximize the likelihood of the recorded activity. Although this is widely used, we show that the resulting model does not produce realistic neural activity and wrongly estimates the connectivity matrix when neurons that are not recorded have a substantial impact on the recorded network. To correct for this, we suggest to augment the log-likelihood with terms that measure the dissimilarity between simulated and recorded activity. This dissimilarity is defined via summary statistics commonly used in neuroscience, and the optimization is efficient because it relies on back-propagation through the stochastically simulated spike trains. We analyze this method theoretically and show empirically that it generates more realistic activity statistics and recovers the connectivity matrix better than other methods.
【16】 Wide stochastic networks: Gaussian limit and PAC-Bayesian training 标题:广域随机网络:高斯极限与PAC-贝叶斯训练
作者:Eugenio Clerico,George Deligiannidis,Arnaud Doucet 机构:Department of Statistics, University of Oxford, UK 备注:20 pages, 2 figures 链接:https://arxiv.org/abs/2106.09798 摘要:无限宽度的限制使过参数化神经网络的分析研究大大简化。通过适当的随机初始化,一个非常大的网络在训练之前和训练期间都可以很好地用高斯过程来逼近。在目前的工作中,我们建立了一个简单的随机结构,其参数是随机变量类似的结果。输出分布的显式评估允许直接优化泛化边界的PAC贝叶斯训练过程。对于一个大而有限宽度的网络,我们在MNIST上的实验表明,这种训练方法可以优于标准的PAC贝叶斯方法。 摘要:The limit of infinite width allows for substantial simplifications in the analytical study of overparameterized neural networks. With a suitable random initialization, an extremely large network is well approximated by a Gaussian process, both before and during training. In the present work, we establish a similar result for a simple stochastic architecture whose parameters are random variables. The explicit evaluation of the output distribution allows for a PAC-Bayesian training procedure that directly optimizes the generalization bound. For a large but finite-width network, we show empirically on MNIST that this training approach can outperform standard PAC-Bayesian methods.
【17】 Optimising simulations for diphoton production at hadron colliders using amplitude neural networks 标题:用振幅神经网络优化强子对撞机双光子产生模拟
作者:Joseph Aylett-Bullock,Simon Badger,Ryan Moodie 机构:Institute for Particle Physics Phenomenology, Department of Physics, Durham University, Durham, Institute for Data Science, Durham University, Durham, DH,LE, United Kingdom 备注:31 pages, 12 figures, 2 tables 链接:https://arxiv.org/abs/2106.09474 摘要:机器学习技术有可能极大地优化事件生成和模拟。我们继续研究使用神经网络来近似矩阵元素的高多重散射过程。我们专注于通过胶子聚变产生回路诱导双质子的情况,并发展了一种可应用于强子对撞机观测的真实模拟方法。神经网络训练使用在NJET C++库中实现的单回路振幅,并与夏尔巴蒙特卡罗事件发生器接口,在这里我们对2美元-3美元和2美元-4美元散射问题进行详细研究。我们还考虑了当改变影响相空间和神经网络模拟的可靠性的运动切割时,训练网络的性能。 摘要:Machine learning technology has the potential to dramatically optimise event generation and simulations. We continue to investigate the use of neural networks to approximate matrix elements for high-multiplicity scattering processes. We focus on the case of loop-induced diphoton production through gluon fusion and develop a realistic simulation method that can be applied to hadron collider observables. Neural networks are trained using the one-loop amplitudes implemented in the NJet C++ library and interfaced to the Sherpa Monte Carlo event generator where we perform a detailed study for $2\to3$ and $2\to4$ scattering problems. We also consider how the trained networks perform when varying the kinematic cuts effecting the phase space and the reliability of the neural network simulations.
其他(22篇)
【1】 Riemannian Convex Potential Maps 标题:黎曼凸势映射
作者:Samuel Cohen,Brandon Amos,Yaron Lipman 机构: Euclidean models will oftenEqual contribution 1University College London 2FacebookAI Research 3Weizmann Institute of Science 备注:ICML 2021 链接:https://arxiv.org/abs/2106.10272 摘要:对黎曼流形上的分布进行建模是理解非欧几里德数据的一个重要组成部分,例如在物理学和地质学中。这种空间中的萌芽方法受到表征和计算权衡的限制。提出并研究了一类利用黎曼最优输运的凸势的流。这些是通用的,可以在任何紧黎曼流形上建模分布,而不需要将流形的领域知识集成到体系结构中。我们证明,这些流动可以模拟球体上的标准分布,以及合成和地质数据上的环面。我们的源代码可以在网上免费获得http://github.com/facebookresearch/rcpm 摘要:Modeling distributions on Riemannian manifolds is a crucial component in understanding non-Euclidean data that arises, e.g., in physics and geology. The budding approaches in this space are limited by representational and computational tradeoffs. We propose and study a class of flows that uses convex potentials from Riemannian optimal transport. These are universal and can model distributions on any compact Riemannian manifold without requiring domain knowledge of the manifold to be integrated into the architecture. We demonstrate that these flows can model standard distributions on spheres, and tori, on synthetic and geological data. Our source code is freely available online at http://github.com/facebookresearch/rcpm
【2】 MADE: Exploration via Maximizing Deviation from Explored Regions 标题:进行:通过最大限度地偏离勘探区域进行勘探
作者:Tianjun Zhang,Paria Rashidinejad,Jiantao Jiao,Yuandong Tian,Joseph Gonzalez,Stuart Russell 机构:† Department of Electrical Engineering and Computer Sciences, UC Berkeley, ‡ Department of Statistics, UC Berkeley, § Facebook AI Research 备注:28 pages, 10 figures 链接:https://arxiv.org/abs/2106.10268 摘要:在在线强化学习(RL)中,在报酬稀少的高维环境中,有效的探索仍然是一个特别具有挑战性的问题。在低维环境中,表格参数化是可能的,基于计数的置信上限(UCB)勘探方法可以获得接近最优速率的极大极小值。然而,如何在包含非线性函数逼近的实际RL任务中有效地实现UCB仍然是个未知数。为了解决这个问题,我们提出了一种新的探索方法,通过最大化下一个策略的占用率与探索区域的偏差。我们将此项作为自适应正则化器添加到标准RL目标中,以平衡勘探与开发。我们将新的目标与一个可证明收敛的算法配对,从而产生一个新的内在奖励来调整现有的奖金。所提出的内禀报酬算法易于实现,并与现有的RL算法相结合进行探索。作为概念证明,我们通过各种基于模型和无模型的算法对表格示例的新内在回报进行了评估,显示了对仅计数探索策略的改进。当在MiniGrid和DeepMind Control Suite的导航和移动任务上进行测试时,我们的方法比最新的方法显著提高了样本效率。我们的代码在https://github.com/tianjunz/MADE. 摘要:In online reinforcement learning (RL), efficient exploration remains particularly challenging in high-dimensional environments with sparse rewards. In low-dimensional environments, where tabular parameterization is possible, count-based upper confidence bound (UCB) exploration methods achieve minimax near-optimal rates. However, it remains unclear how to efficiently implement UCB in realistic RL tasks that involve non-linear function approximation. To address this, we propose a new exploration approach via \textit{maximizing} the deviation of the occupancy of the next policy from the explored regions. We add this term as an adaptive regularizer to the standard RL objective to balance exploration vs. exploitation. We pair the new objective with a provably convergent algorithm, giving rise to a new intrinsic reward that adjusts existing bonuses. The proposed intrinsic reward is easy to implement and combine with other existing RL algorithms to conduct exploration. As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies. When tested on navigation and locomotion tasks from MiniGrid and DeepMind Control Suite benchmarks, our approach significantly improves sample efficiency over state-of-the-art methods. Our code is available at https://github.com/tianjunz/MADE.
【3】 Active Offline Policy Selection 标题:活动脱机策略选择
作者:Ksenia Konyushkova,Yutian Chen,Thomas Paine,Caglar Gulcehre,Cosmin Paduraru,Daniel J Mankowitz,Misha Denil,Nando de Freitas 机构:DeepMind 链接:https://arxiv.org/abs/2106.10251 摘要:本文研究了具有大量日志数据,但交互开销非常有限的域中的策略选择问题。解决这个问题将使离线强化学习策略在工业、机器人和医疗保健等领域的安全评估和部署成为可能。已经提出了几种非策略评估(OPE)技术,仅使用记录的数据来评估策略的价值。然而,OPE的评价与真实环境下的完全在线评价相比还有很大差距。为了缩小这个差距,我们引入了一个新的\emph{active offline policy selection}问题公式,它结合了记录的数据和有限的在线交互来确定最佳策略。我们依靠OPE的进步来开始评估。我们建立在贝叶斯优化的基础上,迭代地决定要评估哪些策略,以便明智地利用有限的环境交互。许多候选策略可以被提出,因此,我们专注于使我们的方法具有可伸缩性,并引入一个核函数来模拟策略之间的相似性。我们使用了几个基准环境来表明,所提出的方法改进了最新的OPE估计和完全在线的政策评估,并且预算有限。此外,我们还证明了该方法的每个组成部分都是重要的,它适用于不同数量和质量的OPE估计,甚至适用于大量的候选策略。 摘要:This paper addresses the problem of policy selection in domains with abundant logged data, but with a very restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and healthcare domain among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap between the evaluation by OPE and the full online evaluation in the real environment. To reduce this gap, we introduce a novel \emph{active offline policy selection} problem formulation, which combined logged data and limited online interactions to identify the best policy. We rely on the advances in OPE to warm start the evaluation. We build upon Bayesian optimization to iteratively decide which policies to evaluate in order to utilize the limited environment interactions wisely. Many candidate policies could be proposed, thus, we focus on making our approach scalable and introduce a kernel function to model similarity between policies. We use several benchmark environments to show that the proposed approach improves upon state-of-the-art OPE estimates and fully online policy evaluation with limited budget. Additionally, we show that each component of the proposed method is important, it works well with various number and quality of OPE estimates and even with a large number of candidate policies.
【4】 Nonparametric Hamiltonian Monte Carlo 标题:非参数哈密顿蒙特卡罗
作者:Carol Mak,Fabian Zaiser,Luke Ong 机构: it is 1Department of Computer Science, University of Ox-ford 备注:33 pages, 13 figures. To appear in Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021 链接:https://arxiv.org/abs/2106.10238 摘要:概率规划使用程序来表示生成模型,其后验概率由内置的推理机计算。一个具有挑战性的目标是开发通用概率编程语言(PPL)中任意程序的通用推理算法。这些程序定义的密度可以使用随机分支和递归,通常是非参数的,因为它们对应于无限维参数空间上的模型。然而,标准的推理算法,如哈密顿蒙特卡罗(HMC)算法,目标分布具有固定数量的参数。本文介绍了非参数哈密顿蒙特卡罗(NP-HMC)算法,该算法将HMC推广到非参数模型。NP-HMC的输入是一类新的可测函数,称为“树可表示”,它是通用PPL中概率程序密度函数的独立于语言的表示。我们提供了一个NP-HMC的正确性证明,并在几个非参数的例子中证明了与现有方法相比性能的显著改进。 摘要:Probabilistic programming uses programs to express generative models whose posterior probability is then computed by built-in inference engines. A challenging goal is to develop general purpose inference algorithms that work out-of-the-box for arbitrary programs in a universal probabilistic programming language (PPL). The densities defined by such programs, which may use stochastic branching and recursion, are (in general) nonparametric, in the sense that they correspond to models on an infinite-dimensional parameter space. However standard inference algorithms, such as the Hamiltonian Monte Carlo (HMC) algorithm, target distributions with a fixed number of parameters. This paper introduces the Nonparametric Hamiltonian Monte Carlo (NP-HMC) algorithm which generalises HMC to nonparametric models. Inputs to NP-HMC are a new class of measurable functions called "tree representable", which serve as a language-independent representation of the density functions of probabilistic programs in a universal PPL. We provide a correctness proof of NP-HMC, and empirically demonstrate significant performance improvements over existing approaches on several nonparametric examples.
【5】 Combining Pseudo-Point and State Space Approximations for Sum-Separable Gaussian Processes 标题:可和可分高斯过程的伪点和状态空间联合逼近
作者:Will Tebbutt,Arno Solin,Richard E. Turner 机构:University of Cambridge, UK, Aalto University, Finland 链接:https://arxiv.org/abs/2106.10210 摘要:高斯过程(GPs)是气候科学和流行病学等时空模拟问题中推理和学习的重要概率工具。然而,现有的GP近似不能同时支持大量的非网格空间数据点和长时间序列,这是许多应用的一个特点。伪点逼近是将GPs扩展到大数据集的金标准方法之一,非常适合处理离网空间数据。然而,它们不能有效地处理长时间的观测视界,在时间维度上恢复为三次计算尺度。状态空间GP近似非常适合处理时态数据,如果时态GP先验允许马尔可夫形式,导致时态观测数的线性复杂性,但是具有立方空间代价并且不能处理非网格空间数据。在这项工作中,我们展示了一种简单而优雅的方法,将伪点方法与状态空间GP近似框架相结合,以获得两者的最佳效果。这种方法依赖于一个令人惊讶的条件独立性,它适用于时空分离GPs。我们的经验证明,组合方法是更具可扩展性,适用于更大范围的时空问题比任何一种方法本身。 摘要:Gaussian processes (GPs) are important probabilistic tools for inference and learning in spatio-temporal modelling problems such as those in climate science and epidemiology. However, existing GP approximations do not simultaneously support large numbers of off-the-grid spatial data-points and long time-series which is a hallmark of many applications. Pseudo-point approximations, one of the gold-standard methods for scaling GPs to large data sets, are well suited for handling off-the-grid spatial data. However, they cannot handle long temporal observation horizons effectively reverting to cubic computational scaling in the time dimension. State space GP approximations are well suited to handling temporal data, if the temporal GP prior admits a Markov form, leading to linear complexity in the number of temporal observations, but have a cubic spatial cost and cannot handle off-the-grid spatial data. In this work we show that there is a simple and elegant way to combine pseudo-point methods with the state space GP approximation framework to get the best of both worlds. The approach hinges on a surprising conditional independence property which applies to space--time separable GPs. We demonstrate empirically that the combined approach is more scalable and applicable to a greater range of spatio-temporal problems than either method on its own.
【6】 Rational Shapley Values 标题:有理Shapley值
作者:David S. Watson 机构: WATSON 1 1Department of Statistical Science, University College London 备注:20 pages, 3 figures, 7 tables 链接:https://arxiv.org/abs/2106.10191 摘要:解释不透明机器学习算法的预测是一项重要且具有挑战性的任务,尤其是在复杂模型越来越多地用于辅助高风险决策(如医疗和金融领域的决策)的情况下。大多数流行的事后解释人工智能(XAI)工具要么对上下文不敏感(如特征属性),要么难以总结(如反事实)。在本文中,我将介绍\emph{rational Shapley values},这是一种新的XAI方法,它以严格、灵活的方式综合和扩展了这些看似不兼容的方法。我利用决策理论和因果建模的工具来形式化和实现一种实用的方法,解决XAI中的许多已知挑战。通过将随机变量的分布与给定解释任务的适当参考类配对,我通过理论和实验说明了用户目标和知识如何以迭代方式通知和约束解集。在一系列定量和定性的比较中,该方法优于最先进的XAI工具。 摘要:Explaining the predictions of opaque machine learning algorithms is an important and challenging task, especially as complex models are increasingly used to assist in high-stakes decisions such as those arising in healthcare and finance. Most popular tools for post-hoc explainable artificial intelligence (XAI) are either insensitive to context (e.g., feature attributions) or difficult to summarize (e.g., counterfactuals). In this paper, I introduce \emph{rational Shapley values}, a novel XAI method that synthesizes and extends these seemingly incompatible approaches in a rigorous, flexible manner. I leverage tools from decision theory and causal modeling to formalize and implement a pragmatic approach that resolves a number of known challenges in XAI. By pairing the distribution of random variables with the appropriate reference class for a given explanation task, I illustrate through theory and experiments how user goals and knowledge can inform and constrain the solution set in an iterative fashion. The method compares favorably to state of the art XAI tools in a range of quantitative and qualitative comparisons.
【7】 pyWATTS: Python Workflow Automation Tool for Time Series 标题:PyWATTS:面向时间序列的Python工作流自动化工具
作者:Benedikt Heidrich,Andreas Bartschat,Marian Turowski,Oliver Neumann,Kaleb Phipps,Stefan Meisenbacher,Kai Schmieder,Nicole Ludwig,Ralf Mikut,Veit Hagenmeyer 机构: HagenmeyeraaInstitute for Automation and Applied Informatics (IAI), Karlsruhe Institute of Technology(KIT), University of T¨ubingen 链接:https://arxiv.org/abs/2106.10157 摘要:从金融市场到能源系统,时间序列数据是各种应用的基础。由于其重要性,用于时间序列分析的工具和方法的数量和复杂性不断增加。然而,由于不清楚的api和缺乏文档,研究人员很难将它们集成到他们的研究项目中并复制结果。此外,在时间序列分析中,存在许多重复性的任务,这些任务经常被重新执行,不必要地耗费时间。为了解决这些问题,我们提供了\texttt{pyWATTS},这是一个基于Python的开源软件包,是一个用于分析时间序列数据的非顺序工作流自动化工具。pyWATTS包括具有明确定义的接口的模块,这些模块支持新方法或现有方法的无缝集成,子管道化可以轻松地复制重复的任务,加载和保存功能以简单地复制结果,以及对关键Python机器学习库(如sciket learn、PyTorch和Keras)的本机支持。 摘要:Time series data are fundamental for a variety of applications, ranging from financial markets to energy systems. Due to their importance, the number and complexity of tools and methods used for time series analysis is constantly increasing. However, due to unclear APIs and a lack of documentation, researchers struggle to integrate them into their research projects and replicate results. Additionally, in time series analysis there exist many repetitive tasks, which are often re-implemented for each project, unnecessarily costing time. To solve these problems we present \texttt{pyWATTS}, an open-source Python-based package that is a non-sequential workflow automation tool for the analysis of time series data. pyWATTS includes modules with clearly defined interfaces to enable seamless integration of new or existing methods, subpipelining to easily reproduce repetitive tasks, load and save functionality to simply replicate results, and native support for key Python machine learning libraries such as scikit-learn, PyTorch, and Keras.
【8】 Boolean Matrix Factorization with SAT and MaxSAT 标题:基于SAT和MaxSAT的布尔矩阵分解
作者:Florent Avellaneda,Roger Villemaire 机构:Universit´e du Qu´ebec a Montr´eal, Department of Computer Science, Canada 链接:https://arxiv.org/abs/2106.10105 摘要:布尔矩阵分解问题是用两个较小的布尔矩阵的布尔积来近似一个矩阵。为了在待分解矩阵较小时获得最优解,我们提出SAT和MaxSAT编码;然而,当要分解的矩阵较大时,我们提出了一种基于最大双液化边覆盖搜索的启发式算法。实验证明,我们的方法在保持合理计算时间的前提下,比现有方法具有更好的因式分解效果。我们的方法也允许处理不完整的矩阵和丢失的条目。 摘要:The Boolean matrix factorization problem consists in approximating a matrix by the Boolean product of two smaller Boolean matrices. To obtain optimal solutions when the matrices to be factorized are small, we propose SAT and MaxSAT encoding; however, when the matrices to be factorized are large, we propose a heuristic based on the search for maximal biclique edge cover. We experimentally demonstrate that our approaches allow a better factorization than existing approaches while keeping reasonable computation times. Our methods also allow the handling of incomplete matrices with missing entries.
【9】 Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Minimax Problems 标题:随机凸-凹极大问题的局部AdaGrad型算法
作者:Luofeng Liao,Li Shen,Jia Duan,Mladen Kolar,Dacheng Tao 机构:JD Explore Academy, University of Chicago 备注:24 pages 链接:https://arxiv.org/abs/2106.10022 摘要:大规模凸凹极大极小问题在博弈论、鲁棒训练、生成对抗网络训练等领域有着广泛的应用。尽管这些方法具有广泛的适用性,但在存在大量数据的情况下,利用现有的随机极大极小方法有效地解决这些问题仍然是一个挑战。研究了一类随机极大极小方法,提出了一种具有自适应学习速率的分布式随机外梯度算法LocalAdaSEG,该算法适用于求解参数服务器模型中的凸凹极大极小问题。LocalAdaSEG有三个主要特点:(i)周期性的通信策略降低了工人和服务器之间的通信成本(ii)本地计算的自适应学习率,允许无需调整的实现;从理论上证明了在光滑和非光滑凸凹条件下,由随机梯度估计引起的关于主方差项的近似线性加速。利用LocalAdaSEG求解随机双线性对策,训练生成型对抗网络。我们比较了LocalAdaSEG与现有的几种minimax问题的优化算法,并通过在同质和异质环境下的实验证明了它的有效性。 摘要:Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks. Despite their wide applicability, solving such problems efficiently and effectively is challenging in the presence of large amounts of data using existing stochastic minimax methods. We study a class of stochastic minimax methods and develop a communication-efficient distributed stochastic extragradient algorithm, LocalAdaSEG, with an adaptive learning rate suitable for solving convex-concave minimax problem in the Parameter-Server model. LocalAdaSEG has three main features: (i) periodic communication strategy reduces the communication cost between workers and the server; (ii) an adaptive learning rate that is computed locally and allows for tuning-free implementation; and (iii) theoretically, a nearly linear speed-up with respect to the dominant variance term, arising from estimation of the stochastic gradient, is proven in both the smooth and nonsmooth convex-concave settings. LocalAdaSEG is used to solve a stochastic bilinear game, and train generative adversarial network. We compare LocalAdaSEG against several existing optimizers for minimax problems and demonstrate its efficacy through several experiments in both the homogeneous and heterogeneous settings.
【10】 A Note on Optimizing Distributions using Kernel Mean Embeddings 标题:关于核均值嵌入优化分布的一个注记
作者:Boris Muzellec,Francis Bach,Alessandro Rudi 机构:∗ INRIA Paris, rue Simone Iff, Paris, France, ⋆ ENS - Département d’Informatique de l’École Normale Supérieure, ⋆ PSL Research University, rue Simone Iff, Paris, France 链接:https://arxiv.org/abs/2106.09994 摘要:核均值嵌入是一种常用的工具,它通过在再生核Hilbert空间中的无限维均值嵌入来表示概率测度。当核是特征时,均值嵌入可以用来定义概率测度之间的距离,称为最大均值差异(MMD)。均值嵌入和MMD的一个众所周知的优点是它们的低计算量和低样本复杂度。然而,由于很难描述哪些Hilbert空间向量对应于概率分布,核均值嵌入在优化分布问题中的应用受到了限制。在本文中,我们建议利用Marteau Ferey等人[2020]的正函数的核平方和参数化来拟合MMD几何体中的分布。首先,我们证明了当核是特征时,具有核平方和密度的分布是稠密的。然后,我们给出了在有限样本条件下优化这类分布的算法,并用密度拟合的数值实验加以说明。 摘要:Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used to define a distance between probability measures, known as the maximum mean discrepancy (MMD). A well-known advantage of mean embeddings and MMD is their low computational cost and low sample complexity. However, kernel mean embeddings have had limited applications to problems that consist in optimizing distributions, due to the difficulty of characterizing which Hilbert space vectors correspond to a probability distribution. In this note, we propose to leverage the kernel sums-of-squares parameterization of positive functions of Marteau-Ferey et al. [2020] to fit distributions in the MMD geometry. First, we show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense. Then, we provide algorithms to optimize such distributions in the finite-sample setting, which we illustrate in a density fitting numerical experiment.
【11】 Being Properly Improper 标题:适当地失当
作者:Richard Nock,Tyler Sypherd,Lalitha Sankar 机构:†Google Research, ‡School of Electrical, Computer, and Energy Engineering, Arizona State University, Tempe, AZ 链接:https://arxiv.org/abs/2106.09920 摘要:在今天的ML中,数据可以以各种方式扭曲(更改),无论是出于坏的目的还是出于好的目的。这种扭曲的数据挑战了监督损失适当性的基础理论,监督损失是许多流行的损失类别概率估计的基础。不幸的是,在它的核心,适当性确保了最优模型也学会了扭转。在本文中,我们分析了这类基于概率的损失,当它们被剥离强制性属性时;我们将扭度损失定义为能够从扭度中获得最佳(非扭度)估计值的损失,并表明S。阿里莫托是正确的。然后,我们转向一个理论,它提供了一些最佳的现成算法,以适当的损失,提高。提升可能需要获得损失凸共轭的导数来计算权重。由于计算或数学上的原因,这样的函数可能很难得到;原来阿里莫托的损失就是这样。我们绕过这个困难,将问题转化为:假设一个蓝图推进算法是用一个通用的权值更新函数实现的。最大限度减少的损失是什么?我们的答案是一个通用的boosting算法,它满足对弱学习者调用次数的最优boosting依赖;当应用于Arimoto的损失,它导致了一个简单的优化算法,其性能显示在几个领域和扭曲。 摘要:In today's ML, data can be twisted (changed) in various ways, either for bad or good intent. Such twisted data challenges the founding theory of properness for supervised losses which form the basis for many popular losses for class probability estimation. Unfortunately, at its core, properness ensures that the optimal models also learn the twist. In this paper, we analyse such class probability-based losses when they are stripped off the mandatory properness; we define twist-proper losses as losses formally able to retrieve the optimum (untwisted) estimate off the twists, and show that a natural extension of a half-century old loss introduced by S. Arimoto is twist proper. We then turn to a theory that has provided some of the best off-the-shelf algorithms for proper losses, boosting. Boosting can require access to the derivative of the convex conjugate of a loss to compute examples weights. Such a function can be hard to get, for computational or mathematical reasons; this turns out to be the case for Arimoto's loss. We bypass this difficulty by inverting the problem as follows: suppose a blueprint boosting algorithm is implemented with a general weight update function. What are the losses for which boosting-compliant minimisation happens? Our answer comes as a general boosting algorithm which meets the optimal boosting dependence on the number of calls to the weak learner; when applied to Arimoto's loss, it leads to a simple optimisation algorithm whose performances are showcased on several domains and twists.
【12】 Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments 标题:迭代特征匹配:对数环境下的可证域泛化
作者:Yining Chen,Elan Rosenfeld,Mark Sellke,Tengyu Ma,Andrej Risteski 机构:Stanford University, Carnegie Mellon University 链接:https://arxiv.org/abs/2106.09913 摘要:领域泛化的目标是在不可见的测试环境中使用有限数量的训练环境中的数据来表现良好。尽管这项任务的建议算法层出不穷,但从理论和经验上评估它们的性能仍然是非常具有挑战性的。此外,最近的方法,如不变风险最小化(IRM),需要大量的训练环境-在虚假特征空间的维度上是线性的$d_s$-即使是在简单的数据模型上,如[Rosenfeld et al.,2021]提出的。在这个模型的一个变种下,我们证明了ERM和IRM都不能推广到$o(dus)$环境。在此基础上,我们提出了一种新的基于迭代特征匹配的算法,该算法保证了在只看到$O(\log{dus})$环境的情况下,高概率地产生一个泛化的预测器。 摘要:Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments. Despite a proliferation of proposal algorithms for this task, assessing their performance, both theoretically and empirically is still very challenging. Moreover, recent approaches such as Invariant Risk Minimization (IRM) require a prohibitively large number of training environments - linear in the dimension of the spurious feature space $d_s$ - even on simple data models like the one proposed by [Rosenfeld et al., 2021]. Under a variant of this model, we show that both ERM and IRM cannot generalize with $o(d_s)$ environments. We then present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(\log{d_s})$ environments.
【13】 Heuristic Stopping Rules For Technology-Assisted Review 标题:技术辅助复习的启发式停止规则
作者:Eugene Yang,David D. Lewis,Ophir Frieder 机构:IR Lab, Georgetown University, Washington, DC, USA, Reveal Data, Chicago, IL, USA 备注:10 pages, 2 figures. Accepted at DocEng 21 链接:https://arxiv.org/abs/2106.09871 摘要:技术辅助审查(TAR)是指人在回路中的主动学习工作流程,用于在大型集合中查找相关文档。这些工作流程通常必须满足找到的相关文档比例的目标(即召回),同时还要降低成本。各种各样的启发式停止规则已经被建议用于在特定的环境中进行这种权衡,但是没有一种已经针对一系列的回忆目标和任务进行了测试。基于调查研究中基于模型的估计技术,我们提出了两种新的启发式停止规则:Quant和QuantCI。我们将它们与一系列提出的启发式算法进行比较,发现它们能够准确地达到一系列召回目标,同时大大降低了审查成本。 摘要:Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stopping rules have been suggested for striking this tradeoff in particular settings, but none have been tested against a range of recall targets and tasks. We propose two new heuristic stopping rules, Quant and QuantCI based on model-based estimation techniques from survey research. We compare them against a range of proposed heuristics and find they are accurate at hitting a range of recall targets while substantially reducing review costs.
【14】 Topological Indoor Mapping through WiFi Signals 标题:基于WiFi信号的室内拓扑测绘
作者:Bastian Schaefermeier,Gerd Stumme,Tom Hanika 机构: Knowledge & Data Engineering Group, University of Kassel, Germany, Interdisciplinary Research Center for Information System Design (ITeG), U Kassel 备注:18 pages 链接:https://arxiv.org/abs/2106.09789 摘要:无处不在的WiFi接入点和能够测量WiFi信号强度的移动设备允许在室内定位和地图中进行实际应用。特别是,不需要额外的基础设施。然而,以前在这一领域的方法常常受到诸如费力的地图构建过程、不断变化的环境和硬件差异等问题的阻碍。我们集中在拓扑图上解决这些问题。这些表示离散的位置,例如房间,以及它们之间的关系,例如距离和过渡频率。在我们的无监督方法中,我们采用WiFi信号强度分布、降维和聚类。它可以用于用户携带移动设备并遵循其正常程序的设置中。我们的目标是应用于短期的室内活动,如会议。 摘要:The ubiquitous presence of WiFi access points and mobile devices capable of measuring WiFi signal strengths allow for real-world applications in indoor localization and mapping. In particular, no additional infrastructure is required. Previous approaches in this field were, however, often hindered by problems such as effortful map-building processes, changing environments and hardware differences. We tackle these problems focussing on topological maps. These represent discrete locations, such as rooms, and their relations, e.g., distances and transition frequencies. In our unsupervised method, we employ WiFi signal strength distributions, dimension reduction and clustering. It can be used in settings where users carry mobile devices and follow their normal routine. We aim for applications in short-lived indoor events such as conferences.
【15】 On Invariance Penalties for Risk Minimization 标题:论风险最小化的不变罚金
作者:Kia Khezeli,Arno Blaas,Frank Soboczenski,Nicholas Chia,John Kalantari 机构:Oxford University, King’s College London 链接:https://arxiv.org/abs/2106.09777 摘要:Arjovsky等人[2019]首次提出不变风险最小化(IRM)原则,通过利用不同实验条件下的数据异构性来解决领域泛化问题。具体来说,IRM寻求一种数据表示,在这种数据表示下,最优分类器在所有域中保持不变。尽管IRM在概念上具有吸引力,但最初提出的不变性惩罚的有效性最近受到了质疑。特别是,存在反例,对于非不变数据表示,不变性惩罚可以任意小。我们提出了另一种不变性惩罚通过重新审视格拉曼矩阵的数据表示。我们讨论了它的特征值在风险和不变性惩罚之间的关系中的作用,并证明了它对于上述反例是病态的。该方法保证在温和的非简并条件下恢复线性设置的不变表示。在两个广泛的领域泛化测试平台DomainBed和不变性unittest上的实验证明了该方法的有效性。 摘要:The Invariant Risk Minimization (IRM) principle was first proposed by Arjovsky et al. [2019] to address the domain generalization problem by leveraging data heterogeneity from differing experimental conditions. Specifically, IRM seeks to find a data representation under which an optimal classifier remains invariant across all domains. Despite the conceptual appeal of IRM, the effectiveness of the originally proposed invariance penalty has recently been brought into question. In particular, there exists counterexamples for which that invariance penalty can be arbitrarily small for non-invariant data representations. We propose an alternative invariance penalty by revisiting the Gramian matrix of the data representation. We discuss the role of its eigenvalues in the relationship between the risk and the invariance penalty, and demonstrate that it is ill-conditioned for said counterexamples. The proposed approach is guaranteed to recover an invariant representation for linear settings under mild non-degeneracy conditions. Its effectiveness is substantiated by experiments on DomainBed and InvarianceUnitTest, two extensive test beds for domain generalization.
【16】 Dual-view Molecule Pre-training 标题:双视图分子预训
作者:Jinhua Zhu,Yingce Xia,Tao Qin,Wengang Zhou,Houqiang Li,Tie-Yan Liu 机构: University of Science and Technology of China, Microsoft Research Asia 备注:15 pages 链接:https://arxiv.org/abs/2106.10234 摘要:受其在自然语言处理和计算机视觉领域的成功经验的启发,预训练在化学信息学和生物信息学领域引起了广泛的关注,尤其是在分子任务方面。一个分子可以用一个图(其中原子通过键连接)或一个SMILES序列(其中深度优先搜索应用于具有特定规则的分子图)来表示。现有的分子预训练工作要么只使用图形表示法,要么只使用微笑表示法。在这项工作中,我们建议利用这两种表示,并设计一种新的预训练算法,双视图分子预训练(简称DMP),可以有效地结合两种类型的分子表示的优点。DMP模型由两个分支组成:以分子的SMILES序列为输入的变换分支和以分子图为输入的GNN分支。DMP的训练包含三个任务:(1)通过Transformer分支预测SMILES序列中的掩蔽标记;(2)通过GNN分支预测分子图中的掩蔽原子;(3)最大化Transformer和GNN分支分别输出的两个高级表示之间的一致性。在预训练之后,我们可以使用Transformer分支(根据经验结果推荐这个分支),GNN分支,或者两者都用于下游任务。DMP在9个分子性质预测任务中进行了测试,其中7个任务达到了最先进的性能。此外,我们在三个反向合成任务上测试了DMP,并在USPTO完整数据集上实现了结果的状态。我们的代码很快就会发布。 摘要:Inspired by its success in natural language processing and computer vision, pre-training has attracted substantial attention in cheminformatics and bioinformatics, especially for molecule based tasks. A molecule can be represented by either a graph (where atoms are connected by bonds) or a SMILES sequence (where depth-first-search is applied to the molecular graph with specific rules). Existing works on molecule pre-training use either graph representations only or SMILES representations only. In this work, we propose to leverage both the representations and design a new pre-training algorithm, dual-view molecule pre-training (briefly, DMP), that can effectively combine the strengths of both types of molecule representations. The model of DMP consists of two branches: a Transformer branch that takes the SMILES sequence of a molecule as input, and a GNN branch that takes a molecular graph as input. The training of DMP contains three tasks: (1) predicting masked tokens in a SMILES sequence by the Transformer branch, (2) predicting masked atoms in a molecular graph by the GNN branch, and (3) maximizing the consistency between the two high-level representations output by the Transformer and GNN branches separately. After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks. DMP is tested on nine molecular property prediction tasks and achieves state-of-the-art performances on seven of them. Furthermore, we test DMP on three retrosynthesis tasks and achieve state-of-the-result on the USPTO-full dataset. Our code will be released soon.
【17】 A learned conditional prior for the VAE acoustic space of a TTS system 标题:TTS系统VAE声学空间的学习条件先验
作者:Penny Karanasou,Sri Karlapati,Alexis Moinet,Arnaud Joly,Ammar Abbas,Simon Slangen,Jaime Lorenzo Trueba,Thomas Drugman 机构:Amazon Research, Cambridge, United Kingdom 备注:in Proceedings of Interspeech 2021 链接:https://arxiv.org/abs/2106.10229 摘要:许多因素会影响一个句子的不同表达方式。生成模型,如变分自动编码器(VAE),捕捉这种变化,并允许同一个句子通过采样的多个格式副本。韵律可变性的程度在很大程度上取决于采样时使用的先验知识。本文提出了一种新的计算神经文语转换(TTS)系统VAE潜空间信息先验的方法。通过这样做,我们的目标是样本具有更多的韵律可变性,同时获得对潜在空间结构的可控性。通过使用次级VAE的后验分布作为先验,我们将其作为说话人向量的条件,我们可以在明确考虑条件的情况下从主VAE中采样,并得到每个条件下潜在空间特定区域(即说话人)的样本。一个正式的偏好测试表明,该方法明显优于标准条件VAE。我们还提供了潜伏期的可视化,在潜伏期出现了分离良好的特定条件簇,以及烧蚀研究,以更好地了解系统的行为。 摘要:Many factors influence speech yielding different renditions of a given sentence. Generative models, such as variational autoencoders (VAEs), capture this variability and allow multiple renditions of the same sentence via sampling. The degree of prosodic variability depends heavily on the prior that is used when sampling. In this paper, we propose a novel method to compute an informative prior for the VAE latent space of a neural text-to-speech (TTS) system. By doing so, we aim to sample with more prosodic variability, while gaining controllability over the latent space's structure. By using as prior the posterior distribution of a secondary VAE, which we condition on a speaker vector, we can sample from the primary VAE taking explicitly the conditioning into account and resulting in samples from a specific region of the latent space for each condition (i.e. speaker). A formal preference test demonstrates significant preference of the proposed approach over standard Conditional VAE. We also provide visualisations of the latent space where well-separated condition-specific clusters appear, as well as ablation studies to better understand the behaviour of the system.
【18】 Deterministic Gibbs Sampling via Ordinary Differential Equations 标题:基于常微分方程的确定性Gibbs抽样
作者:Kirill Neklyudov,Roberto Bondesan,Max Welling 机构: While ergodicity of such samplers is not easy 1University of Amsterdam 2Qualcomm AI Research 链接:https://arxiv.org/abs/2106.10188 摘要:确定性动力学是许多MCMC算法的一个重要组成部分,例如混合蒙特卡罗或使用标准化流的采样器。利用微分几何中的自治常微分方程和工具,提出了一种确定性测度保持动力学的一般构造方法。我们展示了混合蒙特卡罗和其他确定性采样器如何作为我们理论的特例。然后,我们通过构造一个连续的非序列Gibbs采样,并将其扩展到离散状态空间,来证明我们的方法的实用性。我们发现我们的确定性采样器比随机采样器更有效,即使后者产生独立的样本。 摘要:Deterministic dynamics is an essential part of many MCMC algorithms, e.g. Hybrid Monte Carlo or samplers utilizing normalizing flows. This paper presents a general construction of deterministic measure-preserving dynamics using autonomous ODEs and tools from differential geometry. We show how Hybrid Monte Carlo and other deterministic samplers follow as special cases of our theory. We then demonstrate the utility of our approach by constructing a continuous non-sequential version of Gibbs sampling in terms of an ODE flow and extending it to discrete state spaces. We find that our deterministic samplers are more sample efficient than stochastic counterparts, even if the latter generate independent samples.
【19】 Problem Dependent View on Structured Thresholding Bandit Problems 标题:结构化阈值Bandit问题的问题依赖观点
作者:James Cheshire,Pierre Ménard,Alexandra Carpentier 机构: the probability that the learner mis-classifies 1Otto von Guericke University Magdeburg 备注:25 pages. arXiv admin note: text overlap with arXiv:2006.10006 链接:https://arxiv.org/abs/2106.10166 摘要:研究了随机门限土匪问题(TBP)在几种形状约束下的问题依赖区域。在TBP中,学习者的目标是在一个连续的游戏结束时输出一组平均值高于给定阈值的手臂。普通的,无结构的,案例已经在文献中得到了很好的研究。以$K$为臂数,我们考虑了(i)臂的均值序列$(\mu\u K){K=1}^K$单调递增(MTBP)和(ii)$(\mu\u K){K=1}^K$凹的情况(CTBP)。我们考虑这两种情况下的问题相关制度和研究的概率错误-即概率误分类至少一支手臂。在固定预算的情况下,我们给出了凹和单调两种情况下误差概率的上界和下界,以及相关的算法。在这两种情况下,边界在问题相关区域中匹配到指数中的普遍常数。 摘要:We investigate the problem dependent regime in the stochastic Thresholding Bandit problem (TBP) under several shape constraints. In the TBP, the objective of the learner is to output, at the end of a sequential game, the set of arms whose means are above a given threshold. The vanilla, unstructured, case is already well studied in the literature. Taking $K$ as the number of arms, we consider the case where (i) the sequence of arm's means $(\mu_k)_{k=1}^K$ is monotonically increasing (MTBP) and (ii) the case where $(\mu_k)_{k=1}^K$ is concave (CTBP). We consider both cases in the problem dependent regime and study the probability of error - i.e. the probability to mis-classify at least one arm. In the fixed budget setting, we provide upper and lower bounds for the probability of error in both the concave and monotone settings, as well as associated algorithms. In both settings the bounds match in the problem dependent regime up to universal constants in the exponential.
【20】 Low Resource German ASR with Untranscribed Data Spoken by Non-native Children -- INTERSPEECH 2021 Shared Task SPAPL System 标题:具有非母语儿童非转录数据的低资源德语ASR--InterSpeech 2021共享任务SPAPL系统
作者:Jinhan Wang,Yunzheng Zhu,Ruchao Fan,Wei Chu,Abeer Alwan 机构:Department of Electrical and Computer Engineering, University of California Los Angeles, USA, PAII Inc., USA 备注:Accepted to INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.09963 摘要:本文介绍了SPAPL系统在interspeech2021挑战赛中的应用:德语非母语儿童语音自动识别的共享任务提供5小时的转录数据和~60小时的未转录数据,以开发德国儿童ASR系统。对于转录数据的训练,我们提出了一种非语音状态鉴别丢失(NSDL)方法来减轻长时长非语音片段对语音的影响。为了探索未翻译数据的使用,各种方法被实现并结合在一起,以逐步提高系统性能。首先,使用双向自回归预测编码(Bi-APC)来学习声学模型的初始参数。其次,进一步利用增量半监督学习迭代生成伪转录数据。第三,在不同的训练阶段采用不同的数据扩充方案,以增加训练数据的可变性和规模。最后,采用递归神经网络语言模型(RNNLM)进行重排序。我们的系统在评估数据上实现了39.68%的字错误率(WER),比官方基线(45.21%)提高了约12%。 摘要:This paper describes the SPAPL system for the INTERSPEECH 2021 Challenge: Shared Task on Automatic Speech Recognition for Non-Native Children's Speech in German. ~ 5 hours of transcribed data and ~ 60 hours of untranscribed data are provided to develop a German ASR system for children. For the training of the transcribed data, we propose a non-speech state discriminative loss (NSDL) to mitigate the influence of long-duration non-speech segments within speech utterances. In order to explore the use of the untranscribed data, various approaches are implemented and combined together to incrementally improve the system performance. First, bidirectional autoregressive predictive coding (Bi-APC) is used to learn initial parameters for acoustic modelling using the provided untranscribed data. Second, incremental semi-supervised learning is further used to iteratively generate pseudo-transcribed data. Third, different data augmentation schemes are used at different training stages to increase the variability and size of the training data. Finally, a recurrent neural network language model (RNNLM) is used for rescoring. Our system achieves a word error rate (WER) of 39.68% on the evaluation data, an approximately 12% relative improvement over the official baseline (45.21%).
【21】 AI-Enabled Ultra-Low-Dose CT Reconstruction 标题:基于人工智能的超低剂量CT重建
作者:Weiwen Wu,Chuang Niu,Shadi Ebrahimian,Hengyong Yu,Mannu Kalra,Ge Wang 机构:Biomedical Imaging Center, Center for Biotechnology and Interdisciplinary Studies, Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston 备注:19 pages, 10 figures, 1 table, 44 references 链接:https://arxiv.org/abs/2106.09834 摘要:根据ALARA(尽可能低的合理实现)原则,超低剂量CT重建是一个圣杯,以尽量减少癌症风险和遗传损伤,特别是对儿童。随着医学CT技术的发展,迭代算法被广泛应用于低剂量CT图像的重建。近年来,人工智能技术在进一步降低CT辐射剂量方面显示出巨大的潜力。在这篇论文中,我们证明了人工智能供电的CT重建提供了诊断图像质量在超低剂量水平相当于射线照相。具体地说,我们开发了一个分裂展开的网格状替代重建(SUGAR)网络,其中集成了深度学习、物理建模和图像先验。临床数据的重建结果表明,用36个投影的SUGAR重建图像效果良好。这种方法有可能改变未来的医疗保健。 摘要:By the ALARA (As Low As Reasonably Achievable) principle, ultra-low-dose CT reconstruction is a holy grail to minimize cancer risks and genetic damages, especially for children. With the development of medical CT technologies, the iterative algorithms are widely used to reconstruct decent CT images from a low-dose scan. Recently, artificial intelligence (AI) techniques have shown a great promise in further reducing CT radiation dose to the next level. In this paper, we demonstrate that AI-powered CT reconstruction offers diagnostic image quality at an ultra-low-dose level comparable to that of radiography. Specifically, here we develop a Split Unrolled Grid-like Alternative Reconstruction (SUGAR) network, in which deep learning, physical modeling and image prior are integrated. The reconstruction results from clinical datasets show that excellent images can be reconstructed using SUGAR from 36 projections. This approach has a potential to change future healthcare.
【22】 Causal Bias Quantification for Continuous Treatment 标题:连续治疗的因果偏差量化方法
作者:Gianluca Detommaso,Michael Brückner,Philip Schulz,Victor Chernozhukov 机构:Massachusetts Institute of Technology & Amazon 链接:https://arxiv.org/abs/2106.09762 摘要:在这项工作中,我们开发了一个新的特征边际因果效应和因果偏见的连续治疗设置。我们证明了它们可以表示为关于条件概率分布的期望,这可以通过标准的统计和概率方法来估计。期望中的所有项都可以通过自动微分来计算,对于高度非线性的模型也是如此。我们进一步发展了一个新的通过协变量调整的因果效应可识别性的完整标准,如果满足该标准,则偏差等于零。我们研究了我们的框架在三种不同情景下的有效性:混杂、过度控制和内生选择偏差下的线性模型;一种非线性模型,由于数据丢失而无法完全辨识;他汀类药物与动脉粥样硬化性心血管疾病的模拟医学研究。 摘要:In this work we develop a novel characterization of marginal causal effect and causal bias in the continuous treatment setting. We show they can be expressed as an expectation with respect to a conditional probability distribution, which can be estimated via standard statistical and probabilistic methods. All terms in the expectations can be computed via automatic differentiation, also for highly non-linear models. We further develop a new complete criterion for identifiability of causal effects via covariate adjustment, showing the bias equals zero if the criterion is met. We study the effectiveness of our framework in three different scenarios: linear models under confounding, overcontrol and endogenous selection bias; a non-linear model where full identifiability cannot be achieved because of missing data; a simulated medical study of statins and atherosclerotic cardiovascular disease.