cs.LG 方向,今日共计137篇
Graph相关(图学习|图神经网络|图优化等)(13篇)
【1】 Self-attention Presents Low-dimensional Knowledge Graph Embeddings for Link Prediction 标题:用于链接预测的自我关注呈现低维知识图嵌入 链接:https://arxiv.org/abs/2112.10644
作者:Peyman Baghershahi,Reshad Hosseini,Hadi Moradi 机构:University of Tehran, Tehran, Iran 备注:12 pages, 1 figure, 3 tables 摘要:近年来,链路预测问题,也称为知识图完成问题,引起了人们的广泛关注。尽管最近很少有模型试图通过在低维中嵌入知识图来获得相对较好的性能,但当前最先进模型的最佳结果是以显著增加嵌入的维为代价的。然而,这会导致过度拟合,更重要的是,在知识库庞大的情况下会出现可伸缩性问题。受Transformer模型变体在深度学习方面的最新进展的启发,由于其自我注意机制,本文提出了一个基于Transformer模型的模型来解决上述局限性。在我们的模型中,自我关注是将依赖于查询的投影应用于实体和关系,并捕获它们之间的互信息以从低维嵌入中获得高度表达的表示的关键。在两个标准链路预测数据集FB15k-237和WN18RR上的实证结果表明,我们的模型取得了与三个最新最先进的竞争对手相当或更好的性能,平均嵌入维度显著降低76.3%。 摘要:Recently, link prediction problem, also known as knowledge graph completion, has attracted lots of researches. Even though there are few recent models tried to attain relatively good performance by embedding knowledge graphs in low dimensions, the best results of the current state-of-the-art models are earned at the cost of considerably increasing the dimensionality of embeddings. However, this causes overfitting and more importantly scalability issues in case of huge knowledge bases. Inspired by the recent advances in deep learning offered by variants of the Transformer model, because of its self-attention mechanism, in this paper we propose a model based on it to address the aforementioned limitation. In our model, self-attention is the key to applying query-dependant projections to entities and relations, and capturing the mutual information between them to gain highly expressive representations from low-dimensional embeddings. Empirical results on two standard link prediction datasets, FB15k-237 and WN18RR, demonstrate that our model achieves favorably comparable or better performance than our three best recent state-of-the-art competitors, with a significant reduction of 76.3% in the dimensionality of embeddings on average.
【2】 Lifelong Learning in Evolving Graphs with Limited Labeled Data and Unseen Class Detection 标题:有限标号数据和不可见类检测演化图的终身学习 链接:https://arxiv.org/abs/2112.10558
作者:Lukas Galke,Iacopo Vagliano,Benedikt Franke,Tobias Zielke,Ansgar Scherp 机构: Galke (0000-000 1-6 1 2 4- 109 2) was with Kiel University and ZBW –Leibniz Information Centre for Economics, Vagliano (0000-000 2- 3066-9 46 4) is with Amsterdam University – Uni-versity Medical Centre 备注:14 pages + 2 pages appendix, 9 figures, unifies and extents arXiv:1905.06018 and arXiv:2006.14422 摘要:现实世界中的大规模图形数据通常是动态的,而不是静态的。随着时间的推移,数据会随着新节点、边缘甚至类的出现而发生变化,例如在引用网络和研发协作网络中。图形神经网络(GNNs)已成为处理图形结构数据的众多任务的标准方法。在这项工作中,我们采用了两个步骤来探索GNN是如何逐渐适应新的看不见的图形数据的。首先,我们在标准基准数据集上分析了导入学习和归纳学习之间的界限。经过归纳预训练后,我们将未标记的数据添加到图中,表明模型是稳定的。然后,我们将探讨不断添加越来越多的标记数据的情况,同时考虑并非所有过去的实例都使用类标签进行注释的情况。此外,我们在图的演化过程中引入新的类,并探索从以前看不见的类中自动检测实例的方法。为了有原则地处理不断发展的图形,我们提出了图形数据的终身学习框架以及评估协议。在这个框架中,我们评估了具有代表性的GNN体系结构。我们观察到,当显性知识(即来自过去任务的数据)有限时,模型参数内的隐性知识变得更加重要。我们发现,在开放世界节点分类中,过去很少任务的数据足以达到通过记住所有过去任务的数据所达到的性能。在不可见类检测这一具有挑战性的任务中,我们发现使用加权交叉熵损失对于稳定性非常重要。 摘要:Large-scale graph data in the real-world are often dynamic rather than static. The data are changing with new nodes, edges, and even classes appearing over time, such as in citation networks and research-and-development collaboration networks. Graph neural networks (GNNs) have emerged as the standard method for numerous tasks on graph-structured data. In this work, we employ a two-step procedure to explore how GNNs can be incrementally adapted to new unseen graph data. First, we analyze the verge between transductive and inductive learning on standard benchmark datasets. After inductive pretraining, we add unlabeled data to the graph and show that the models are stable. Then, we explore the case of continually adding more and more labeled data, while considering cases, where not all past instances are annotated with class labels. Furthermore, we introduce new classes while the graph evolves and explore methods that automatically detect instances from previously unseen classes. In order to deal with evolving graphs in a principled way, we propose a lifelong learning framework for graph data along with an evaluation protocol. In this framework, we evaluate representative GNN architectures. We observe that implicit knowledge within model parameters becomes more important when explicit knowledge, i.e., data from past tasks, is limited. We find that in open-world node classification, the data from surprisingly few past tasks are sufficient to reach the performance reached by remembering data from all past tasks. In the challenging task of unseen class detection, we find that using a weighted cross-entropy loss is important for stability.
【3】 A Comprehensive Analytical Survey on Unsupervised and Semi-Supervised Graph Representation Learning Methods 标题:无监督和半监督图表示学习方法综述 链接:https://arxiv.org/abs/2112.10372
作者:Md. Khaledur Rahman,Ariful Azad 机构:Computer Science, Indiana University, Bloomington, IN, USA, Intelligent Systems Engineering, Indiana University, Bloomington, IN, USA 备注:Analytical Survey Paper on Graph Embedding, 26 pages 摘要:图表示学习是一个快速发展的领域,其主要目标之一是在低维空间中生成有意义的图表示。学习到的嵌入已成功应用于执行各种预测任务,如链接预测、节点分类、聚类和可视化。graph learning社区的集体努力提供了数百种方法,但没有一种方法在所有评估指标(如预测精度、运行时间、可扩展性等)下都优于其他方法。本次调查旨在通过考虑算法变化、参数选择、,可扩展性、硬件和软件平台、下游ML任务和各种数据集。我们使用分类法组织了图嵌入技术,包括手动特征工程、矩阵分解、浅层神经网络和深图卷积网络的方法。我们使用广泛使用的基准图评估了这些节点分类、链接预测、聚类和可视化任务的算法。我们在PyTorch几何库和DGL库的基础上设计了实验,并在不同的多核CPU和GPU平台上运行实验。我们严格检查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,帮助用户选择最适合其任务的方法。 摘要:Graph representation learning is a fast-growing field where one of the main objectives is to generate meaningful representations of graphs in lower-dimensional spaces. The learned embeddings have been successfully applied to perform various prediction tasks, such as link prediction, node classification, clustering, and visualization. The collective effort of the graph learning community has delivered hundreds of methods, but no single method excels under all evaluation metrics such as prediction accuracy, running time, scalability, etc. This survey aims to evaluate all major classes of graph embedding methods by considering algorithmic variations, parameter selections, scalability, hardware and software platforms, downstream ML tasks, and diverse datasets. We organized graph embedding techniques using a taxonomy that includes methods from manual feature engineering, matrix factorization, shallow neural networks, and deep graph convolutional networks. We evaluated these classes of algorithms for node classification, link prediction, clustering, and visualization tasks using widely used benchmark graphs. We designed our experiments on top of PyTorch Geometric and DGL libraries and run experiments on different multicore CPU and GPU platforms. We rigorously scrutinize the performance of embedding methods under various performance metrics and summarize the results. Thus, this paper may serve as a comparative guide to help users select methods that are most suitable for their tasks.
【4】 FedNI: Federated Graph Learning with Network Inpainting for Population-Based Disease Prediction 标题:FedNI:基于人口疾病预测的联合图学习和网络修复 链接:https://arxiv.org/abs/2112.10166
作者:Liang Peng,Nan Wang,Nicha Dvornek,Xiaofeng Zhu,Xiaoxiao Li 机构: Wang is with School of Computer Science and Technology, East ChinaNormal University, and the University of BritishColumbia, Dvornek is with the Department of Radiology and Biomedical Imaging 摘要:图卷积神经网络(GCNs)广泛应用于图分析。具体而言,在医学应用中,GCN可用于人口图上的疾病预测,其中图节点表示个体,边表示个体相似性。然而,GCN依赖于大量数据,这对于单个医疗机构来说是一个挑战。此外,大多数医疗机构继续面临的一个关键挑战是在数据信息不完整的情况下孤立地处理疾病预测问题。为了解决这些问题,联邦学习(FL)允许孤立的地方机构在不共享数据的情况下协作训练全球模型。在这项工作中,我们提出了一个框架FedNI,通过FL利用网络修复和机构间数据。具体而言,我们首先使用图生成对抗网络(GAN)联合训练缺失节点和边缘预测器,以完成本地网络的缺失信息。然后,我们使用联邦图学习平台跨机构训练全局GCN节点分类器。这种新颖的设计使我们能够利用联邦学习和图形学习方法构建更精确的机器学习模型。我们证明,我们的联邦模型优于局部和基线FL方法,在两个公共神经影像数据集上具有显著的优势。 摘要:Graph Convolutional Neural Networks (GCNs) are widely used for graph analysis. Specifically, in medical applications, GCNs can be used for disease prediction on a population graph, where graph nodes represent individuals and edges represent individual similarities. However, GCNs rely on a vast amount of data, which is challenging to collect for a single medical institution. In addition, a critical challenge that most medical institutions continue to face is addressing disease prediction in isolation with incomplete data information. To address these issues, Federated Learning (FL) allows isolated local institutions to collaboratively train a global model without data sharing. In this work, we propose a framework, FedNI, to leverage network inpainting and inter-institutional data via FL. Specifically, we first federatively train missing node and edge predictor using a graph generative adversarial network (GAN) to complete the missing information of local networks. Then we train a global GCN node classifier across institutions using a federated graph learning platform. The novel design enables us to build more accurate machine learning models by leveraging federated learning and also graph learning approaches. We demonstrate that our federated model outperforms local and baseline FL methods with significant margins on two public neuroimaging datasets.
【5】 CORE: A Knowledge Graph Entity Type Prediction Method via Complex Space Regression and Embedding 标题:CORE:一种基于复空间回归嵌入的知识图实体类型预测方法 链接:https://arxiv.org/abs/2112.10067
作者:Xiou Ge,Yun-Cheng Wang,Bin Wang,C. -C. Jay Kuo 机构:University of Southern California, Los Angeles, USA, National University of Singapore, C.-C. Jay Kuo 摘要:实体类型预测是知识图研究中的一个重要问题。本文提出了一种新的KG实体类型预测方法CORE(复空间回归嵌入法)。提出的核心方法利用了两个复杂空间嵌入模型的表达能力;即旋转和复杂模型。它使用旋转或复合将实体和类型嵌入到两个不同的复杂空间中。然后,我们推导了一个复杂的回归模型来连接这两个空间。最后,介绍了一种联合优化嵌入参数和回归参数的机制。实验表明,CORE在具有代表性的KG实体类型推理数据集上优于基准测试方法。分析了各种实体类型预测方法的优缺点。 摘要:Entity type prediction is an important problem in knowledge graph (KG) research. A new KG entity type prediction method, named CORE (COmplex space Regression and Embedding), is proposed in this work. The proposed CORE method leverages the expressive power of two complex space embedding models; namely, RotatE and ComplEx models. It embeds entities and types in two different complex spaces using either RotatE or ComplEx. Then, we derive a complex regression model to link these two spaces. Finally, a mechanism to optimize embedding and regression parameters jointly is introduced. Experiments show that CORE outperforms benchmarking methods on representative KG entity type inference datasets. Strengths and weaknesses of various entity type prediction methods are analyzed.
【6】 Deep Graph-level Anomaly Detection by Glocal Knowledge Distillation 标题:基于局部知识提取的深层图级异常检测 链接:https://arxiv.org/abs/2112.10063
作者:Rongrong Ma,Guansong Pang,Ling Chen,Anton van den Hengel 机构:Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, Australia, School of Computing and Information Systems, Singapore Management University, Australian Institute for Machine Learning, The University of Adelaide, Adelaide, Australia 备注:Accepted to WSDM 2022 摘要:图级异常检测(GAD)描述了与其他图相比,检测结构和/或节点特征异常的图的问题。GAD中的一个挑战是设计能够同时检测局部和全局异常图的图表示,即分别检测细粒度(节点级)或整体(图级)属性异常的图。为了应对这一挑战,我们引入了一种新的GAD深度异常检测方法,该方法通过图和节点表示的联合随机蒸馏来学习丰富的全局和局部正常模式信息。随机蒸馏是通过训练一个GNN来预测另一个具有随机初始化网络权值的GNN来实现的。对来自不同领域的16个真实图形数据集进行的大量实验表明,我们的模型显著优于7个最先进的模型。代码和数据集可在https://git.io/GLocalKD. 摘要:Graph-level anomaly detection (GAD) describes the problem of detecting graphs that are abnormal in their structure and/or the features of their nodes, as compared to other graphs. One of the challenges in GAD is to devise graph representations that enable the detection of both locally- and globally-anomalous graphs, i.e., graphs that are abnormal in their fine-grained (node-level) or holistic (graph-level) properties, respectively. To tackle this challenge we introduce a novel deep anomaly detection approach for GAD that learns rich global and local normal pattern information by joint random distillation of graph and node representations. The random distillation is achieved by training one GNN to predict another GNN with randomly initialized network weights. Extensive experiments on 16 real-world graph datasets from diverse domains show that our model significantly outperforms seven state-of-the-art models. Code and datasets are available at https://git.io/GLocalKD.
【7】 Low-resource Learning with Knowledge Graphs: A Comprehensive Survey 标题:基于知识图的低资源学习研究综述 链接:https://arxiv.org/abs/2112.10006
作者:Jiaoyan Chen,Yuxia Geng,Zhuo Chen,Jeff Z. Pan,Yuan He,Wen Zhang,Ian Horrocks,Huajun Chen 机构: Department of Computer Science, University of Oxford, Zhejiang University, School of Informatics, The University of Edinburgh 备注:This is a preliminary version of a survey on Low-resource Learning with Knowledge Graph. It has collected over 90 papers on this topic published by November 2021, with over 200 citations in total 摘要:机器学习方法,特别是深度神经网络已经取得了巨大的成功,但其中许多方法往往依赖于大量的标记样本进行训练。在现实世界的应用中,我们经常需要解决样本短缺问题,例如,具有新兴预测目标的动态上下文和昂贵的样本注释。因此,低资源学习(low-resource learning,简称low-resource learning,简称low-learning)正在被广泛研究,它的目标是在没有足够资源(特别是训练样本)的情况下学习鲁棒预测模型。在众多的低资源学习研究中,许多研究倾向于利用知识图(knowledgegraph,KG)形式的一些辅助信息来减少对标记样本的依赖,而知识图在知识表示中正变得越来越流行。在本次调查中,我们非常全面地审查了超过90美元的关于KG意识研究的论文,涉及两个主要的低资源学习环境——Few-Shot学习(zero shot learning,ZSL),其中新的预测课程从未出现在训练中,和少数镜头学习(FSL),其中新的预测类只有少量可用的标记样本。我们首先介绍了在ZSL和FSL研究中使用的KG以及现有和潜在的KG构建解决方案,然后系统地分类和总结了基于KG的ZSL和FSL方法,将它们划分为不同的范例,如基于映射、数据扩充、基于传播和基于优化。接下来,我们介绍了不同的应用,包括计算机视觉和自然语言处理中的KG增强预测任务,以及KG完成任务,以及每个任务的一些典型评估资源。我们最终讨论了一些挑战和未来方向,如新的学习和推理范式,以及高质量KG的构建。 摘要:Machine learning methods especially deep neural networks have achieved great success but many of them often rely on a number of labeled samples for training. In real-world applications, we often need to address sample shortage due to e.g., dynamic contexts with emerging prediction targets and costly sample annotation. Therefore, low-resource learning, which aims to learn robust prediction models with no enough resources (especially training samples), is now being widely investigated. Among all the low-resource learning studies, many prefer to utilize some auxiliary information in form of Knowledge Graph (KG), which is becoming more and more popular for knowledge representation, to reduce the reliance on labeled samples. In this survey, we very comprehensively reviewed over $90$ papers about KG-aware research for two major low-resource learning settings -- zero-shot learning (ZSL) where new classes for prediction have never appeared in training, and few-shot learning (FSL) where new classes for prediction have only a small number of labeled samples that are available. We first introduced the KGs used in ZSL and FSL studies as well as the existing and potential KG construction solutions, and then systematically categorized and summarized KG-aware ZSL and FSL methods, dividing them into different paradigms such as the mapping-based, the data augmentation, the propagation-based and the optimization-based. We next presented different applications, including both KG augmented prediction tasks in Computer Vision and Natural Language Processing but also tasks for KG completion, and some typical evaluation resources for each task. We eventually discussed some challenges and future directions on aspects such as new learning and reasoning paradigms, and the construction of high quality KGs.
【8】 FlowPool: Pooling Graph Representations with Wasserstein Gradient Flows 标题:FlowPool:使用Wasserstein梯度流合并图形表示 链接:https://arxiv.org/abs/2112.09990
作者:Effrosyni Simou 备注:The content of this article corresponds to a chapter included in the PhD thesis submitted on September 10th 2021 and successfully defended on October 15th 2021. The thesis manuscript will be published online by EPFL the day after its presentation at the PhD graduation ceremony 摘要:在一些针对图结构数据的机器学习任务中,所考虑的图可能由不同数量的节点组成。因此,有必要设计池方法,将不同大小的图表示聚合为固定大小的表示,以用于下游任务,如图分类。现有的图池方法不能保证图表示及其池版本的相似性。在这项工作中,我们通过提出FlowPool来解决这一限制,FlowPool是一种池方法,通过最小化图的Wasserstein距离,将图表示的统计信息最佳地保留到池中的对应项。这是通过对池图表示执行Wasserstein梯度流来实现的。我们提出了一种通用的实现方法,可以通过任何地面成本考虑表示空间的几何结构。该实现依赖于使用最近提出的隐式微分方案计算Wasserstein距离的梯度。我们的池方法易于自动区分,并且可以集成到端到端的深度学习体系结构中。此外,FlowPool对置换是不变的,因此可以与GNN中的置换等变特征提取层相结合,以获得独立于节点顺序的预测。实验结果表明,当在图分类任务中进行评估时,与现有的池方法相比,我们的方法提高了性能。 摘要:In several machine learning tasks for graph structured data, the graphs under consideration may be composed of a varying number of nodes. Therefore, it is necessary to design pooling methods that aggregate the graph representations of varying size to representations of fixed size which can be used in downstream tasks, such as graph classification. Existing graph pooling methods offer no guarantee with regards to the similarity of a graph representation and its pooled version. In this work we address this limitation by proposing FlowPool, a pooling method that optimally preserves the statistics of a graph representation to its pooled counterpart by minimizing their Wasserstein distance. This is achieved by performing a Wasserstein gradient flow with respect to the pooled graph representation. We propose a versatile implementation of our method which can take into account the geometry of the representation space through any ground cost. This implementation relies on the computation of the gradient of the Wasserstein distance with recently proposed implicit differentiation schemes. Our pooling method is amenable to automatic differentiation and can be integrated in end-to-end deep learning architectures. Further, FlowPool is invariant to permutations and can therefore be combined with permutation equivariant feature extraction layers in GNNs in order to obtain predictions that are independent of the ordering of the nodes. Experimental results demonstrate that our method leads to an increase in performance compared to existing pooling methods when evaluated in graph classification tasks.
【9】 GOPHER: Categorical probabilistic forecasting with graph structure via local continuous-time dynamics 标题:Gopher:基于局部连续时间动力学的图结构分类概率预测 链接:https://arxiv.org/abs/2112.09964
作者:Ke Alexander Wang,Danielle Maddix,Yuyang Wang 机构:Stanford University, Amazon Research 备注:NeurIPS 2021 Workshop ICBINB Spotlight 摘要:我们考虑具有图结构的类别的概率预测问题,其中顶点处的动态依赖于其局部连通性结构。我们提出了GOPHER,一种将图神经网络的归纳偏差与神经常微分方程相结合的方法,以捕获概率预测的内在局部连续时间动态。我们通过对比基线模型来研究这两种归纳偏差的好处,基线模型有助于理清每种偏差的好处。我们发现,捕捉图结构对于准确的域内概率预测和更有效的样本模型至关重要。令人惊讶的是,我们的实验表明,尽管反映了真实的概率动力学,但连续时间演化诱导偏差几乎没有带来任何好处。 摘要:We consider the problem of probabilistic forecasting over categories with graph structure, where the dynamics at a vertex depends on its local connectivity structure. We present GOPHER, a method that combines the inductive bias of graph neural networks with neural ODEs to capture the intrinsic local continuous-time dynamics of our probabilistic forecasts. We study the benefits of these two inductive biases by comparing against baseline models that help disentangle the benefits of each. We find that capturing the graph structure is crucial for accurate in-domain probabilistic predictions and more sample efficient models. Surprisingly, our experiments demonstrate that the continuous time evolution inductive bias brings little to no benefit despite reflecting the true probability dynamics.
【10】 DegreEmbed: incorporating entity embedding into logic rule learning for knowledge graph reasoning 标题:DegreEmbed:将实体嵌入到逻辑规则学习中进行知识图推理 链接:https://arxiv.org/abs/2112.09933
作者:Yuliang Wei,Haotian Li,Yao Wang,Guodong Xin,Hongri Liu 机构:a School of Computer Science and Technology, Harbin Institute of Technology at Weihai, China 备注:Submitted to the Semantic Web Journal for possible publication. arXiv admin note: text overlap with arXiv:2112.06189 摘要:知识图(Knowledge graphs,KG)作为现实世界事实的结构化表示,是包含人类知识的智能数据库,可以帮助机器模仿人类解决问题的方式。然而,由于快速迭代的性质以及数据的不完整性,KGs通常是巨大的,并且不可避免地存在KGs中缺少的事实。知识图的链接预测是在已有知识的基础上通过推理完成缺失事实的任务。两个主要的研究方向被广泛研究:一个是学习实体和关系的低维嵌入,以捕获潜在模式;另一个是通过挖掘逻辑规则获得良好的可解释性。不幸的是,以前的研究很少关注异质KG。在本文中,我们提出了DegreEmbed,一种结合基于嵌入的学习和逻辑规则挖掘的KGs推理模型。具体来说,我们从节点度的角度研究了涉及各种类型实体和关系的异构KG中的缺失链接预测问题。通过实验,我们证明了我们的嵌入模型在真实数据集上优于最先进的方法。同时,该模型挖掘出的规则具有较高的质量和可解释性。 摘要:Knowledge graphs (KGs), as structured representations of real world facts, are intelligent databases incorporating human knowledge that can help machine imitate the way of human problem solving. However, due to the nature of rapid iteration as well as incompleteness of data, KGs are usually huge and there are inevitably missing facts in KGs. Link prediction for knowledge graphs is the task aiming to complete missing facts by reasoning based on the existing knowledge. Two main streams of research are widely studied: one learns low-dimensional embeddings for entities and relations that can capture latent patterns, and the other gains good interpretability by mining logical rules. Unfortunately, previous studies rarely pay attention to heterogeneous KGs. In this paper, we propose DegreEmbed, a model that combines embedding-based learning and logic rule mining for inferring on KGs. Specifically, we study the problem of predicting missing links in heterogeneous KGs that involve entities and relations of various types from the perspective of the degrees of nodes. Experimentally, we demonstrate that our DegreEmbed model outperforms the state-of-the-art methods on real world datasets. Meanwhile, the rules mined by our model are of high quality and interpretability.
【11】 Improving Subgraph Recognition with Variational Graph Information Bottleneck 标题:利用变化图信息瓶颈改进子图识别 链接:https://arxiv.org/abs/2112.09899
作者:Junchi Yu,Jie Cao,Ran He 机构:Instittue of Automation, CAS, Beijing, China 摘要:子图识别的目的是发现一个图的压缩子结构,该子结构对图的属性信息最丰富。它可以通过使用互信息估计器优化图信息瓶颈(GIB)来表示。然而,由于图形数据的互信息本质上难以估计,GIB存在训练不稳定性。本文介绍了一种噪声注入方法来压缩子图中的信息,从而产生了一种新的变分图信息瓶颈(VGIB)框架。VGIB允许在温和的假设下对其目标进行易于处理的变分近似。因此,VGIB具有更稳定、更高效的训练过程——我们发现VGIB的收敛速度比GIB快10倍,在实践中性能有所提高。在图形解释、图形神经网络的可解释性和图形分类方面的大量实验表明,VGIB比现有方法找到了更好的子图。 摘要:Subgraph recognition aims at discovering a compressed substructure of a graph that is most informative to the graph property. It can be formulated by optimizing Graph Information Bottleneck (GIB) with a mutual information estimator. However, GIB suffers from training instability since the mutual information of graph data is intrinsically difficult to estimate. This paper introduces a noise injection method to compress the information in the subgraphs, which leads to a novel Variational Graph Information Bottleneck (VGIB) framework. VGIB allows a tractable variational approximation to its objective under mild assumptions. Therefore, VGIB enjoys more stable and efficient training process - we find that VGIB converges 10 times faster than GIB with improved performances in practice. Extensive experiments on graph interpretation, explainability of Graph Neural Networks, and graph classification show that VGIB finds better subgraphs than existing methods.
【12】 Towards the Explanation of Graph Neural Networks in Digital Pathology with Information Flows 标题:基于信息流的图形神经网络在数字病理学中的解释 链接:https://arxiv.org/abs/2112.09895
作者:Junchi Yu,Tingyang Xu,Ran He 机构:Instittue of Automation, CAS, Beijing, China, Tencent AI Lab, Shenzhen, China 摘要:随着图形神经网络(GNNs)在数字病理学中的广泛应用,人们越来越关注GNNs的解释模型(解释者),以提高临床决策的透明度。现有解释者发现了与预测相关的解释子图。然而,这样的子图不足以揭示预测的所有关键生物子结构,因为删除该子图后预测将保持不变。因此,解释性子图不仅对预测是必要的,而且足以揭示解释中最具预测性的区域。这种解释需要测量从不同输入子图传输到预测输出的信息,我们将其定义为信息流。在这项工作中,我们解决了这些关键挑战,并提出了IFEXPLAINER,它为GNNs提供了必要和充分的解释。为了评估GNN预测中的信息流,我们首先提出了一个新的预测性概念,名为$f$-信息,它具有方向性,并结合了GNN模型的实际容量。在此基础上,IFEXPLAINER生成具有最大信息流的解释子图。同时,在删除解释后,它最大限度地减少了从输入到预测结果的信息流。因此,所产生的解释对于预测是必要的,并且足以揭示最关键的子结构。我们评估IFEXPLAINER解释GNN对乳腺癌亚型的预测。在BRACS数据集上的实验结果表明了该方法的优越性。 摘要:As Graph Neural Networks (GNNs) are widely adopted in digital pathology, there is increasing attention to developing explanation models (explainers) of GNNs for improved transparency in clinical decisions. Existing explainers discover an explanatory subgraph relevant to the prediction. However, such a subgraph is insufficient to reveal all the critical biological substructures for the prediction because the prediction will remain unchanged after removing that subgraph. Hence, an explanatory subgraph should be not only necessary for prediction, but also sufficient to uncover the most predictive regions for the explanation. Such explanation requires a measurement of information transferred from different input subgraphs to the predictive output, which we define as information flow. In this work, we address these key challenges and propose IFEXPLAINER, which generates a necessary and sufficient explanation for GNNs. To evaluate the information flow within GNN's prediction, we first propose a novel notion of predictiveness, named $f$-information, which is directional and incorporates the realistic capacity of the GNN model. Based on it, IFEXPLAINER generates the explanatory subgraph with maximal information flow to the prediction. Meanwhile, it minimizes the information flow from the input to the predictive result after removing the explanation. Thus, the produced explanation is necessarily important to the prediction and sufficient to reveal the most crucial substructures. We evaluate IFEXPLAINER to interpret GNN's predictions on breast cancer subtyping. Experimental results on the BRACS dataset show the superior performance of the proposed method.
【13】 Meta Propagation Networks for Graph Few-shot Semi-supervised Learning 标题:用于图的Few-Shot半监督学习的Meta Propagation网络 链接:https://arxiv.org/abs/2112.09810
作者:Kaize Ding,Jianling Wang,James Caverlee,Huan Liu 机构:†Arizona State University, ‡Texas A&M University 备注:Accepted by AAAI2022 摘要:受深度学习广泛成功的启发,图形神经网络(GNN)被提出用于学习表达节点表示,并在各种图形学习任务中表现出良好的性能。然而,现有的研究主要集中在传统的半监督设置上,其中提供了相对丰富的金标记节点。然而,由于数据标记非常费力,并且需要大量的领域知识,特别是在考虑到图形结构数据的异构性时,这通常是不切实际的。在Few-Shot半监督设置下,大多数现有GNN的性能不可避免地受到过度拟合和过度平滑问题的影响,这主要是由于缺少标记数据。在本文中,我们提出了一个解耦的网络结构,配备了一种新的元学习算法来解决这个问题。本质上,我们的框架Meta-PN通过元学习标签传播策略在未标记的节点上推断出高质量的伪标签,该策略有效地增加了稀缺的标记数据,同时在训练期间实现了大的感受野。大量实验表明,与各种基准数据集上的现有技术相比,我们的方法提供了简单而实质性的性能提升。 摘要:Inspired by the extensive success of deep learning, graph neural networks (GNNs) have been proposed to learn expressive node representations and demonstrated promising performance in various graph learning tasks. However, existing endeavors predominately focus on the conventional semi-supervised setting where relatively abundant gold-labeled nodes are provided. While it is often impractical due to the fact that data labeling is unbearably laborious and requires intensive domain knowledge, especially when considering the heterogeneity of graph-structured data. Under the few-shot semi-supervised setting, the performance of most of the existing GNNs is inevitably undermined by the overfitting and oversmoothing issues, largely owing to the shortage of labeled data. In this paper, we propose a decoupled network architecture equipped with a novel meta-learning algorithm to solve this problem. In essence, our framework Meta-PN infers high-quality pseudo labels on unlabeled nodes via a meta-learned label propagation strategy, which effectively augments the scarce labeled data while enabling large receptive fields during training. Extensive experiments demonstrate that our approach offers easy and substantial performance gains compared to existing techniques on various benchmark datasets.
Transformer(3篇)
【1】 Transformers Can Do Bayesian Inference 标题:Transformer可以做贝叶斯推理 链接:https://arxiv.org/abs/2112.10510
作者:Samuel Müller,Noah Hollmann,Sebastian Pineda Arango,Josif Grabocka,Frank Hutter 机构:University of Freiburg,Charit´e Berlin,Bosch Center for Artificial Intelligence 摘要:目前,贝叶斯方法很难获得深度学习的好处,因为贝叶斯方法允许明确说明先验知识并准确捕获模型不确定性。我们提出了先前的数据拟合网络(PFN)。PFN利用大规模机器学习技术来逼近一大组后验概率。PFNs工作的唯一要求是能够从监督学习任务(或函数)的先验分布中采样。我们的方法将后验近似的目标重申为具有集值输入的监督分类问题:它重复地从前一个中提取任务(或函数),从中提取一组数据点及其标签,屏蔽其中一个标签,并学习根据其余数据点的集值输入对其进行概率预测。以一组来自新的有监督学习任务的样本作为输入,PFN在学习近似贝叶斯推理的基础上,对单个正向传播中的任意其他数据点进行概率预测。我们证明了PFNs可以近乎完美地模拟高斯过程,也可以对难以解决的问题进行有效的贝叶斯推理,与现有方法相比,在多个设置中的加速比超过200倍。我们在非常不同的领域获得了强有力的结果,如高斯过程回归、贝叶斯神经网络、小表格数据集分类和Few-Shot图像分类,证明了PFNs的通用性。代码和经过训练的PFN在https://github.com/automl/TransformersCanDoBayesianInference. 摘要:Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at https://github.com/automl/TransformersCanDoBayesianInference.
【2】 Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets 标题:利用Transformer检测会话式代码混合推文中的仇恨语音 链接:https://arxiv.org/abs/2112.09986
作者:Zaki Mustafa Farooqi,Sreyan Ghosh,Rajiv Ratn Shah 机构:Multimodal Digital Media Analysis Lab, Indraprastha Institute of Information Technology Delhi, India 备注:Accepted at FIRE 2021 - Hate Speech and offensive content detection (HASOC) Track 摘要:在当今互联网时代,人人都可以轻松访问社交媒体平台,人们往往不得不面对威胁、身份攻击、仇恨和欺凌,因为他们与演员、信仰、性别、宗教,甚至接受或拒绝某个概念有关。现有的在仇恨语音检测的工作主要集中在单个评论分类作为序列标签任务,并且经常不考虑会话的上下文。当确定推特背后的作者意图和情感时,对话的上下文通常起着重要作用。本文描述了由MIDAS-IIITD提出的系统,用于HASOC 2021子任务2,这是第一个共享任务,专注于检测推特上的英语语音代码混合对话中的仇恨言语。我们使用神经网络来解决这个问题,利用transformer的跨语言嵌入并进一步微调它们,以便在音译印地语文本中进行低资源仇恨语音分类。我们表现最好的系统是Indi-BERT、XLM-RoBERTa和多语言BERT的硬投票组合,F1的宏观得分为0.7253,使我们在整体排行榜上排名第一。 摘要:In the current era of the internet, where social media platforms are easily accessible for everyone, people often have to deal with threats, identity attacks, hate, and bullying due to their association with a cast, creed, gender, religion, or even acceptance or rejection of a notion. Existing works in hate speech detection primarily focus on individual comment classification as a sequence labeling task and often fail to consider the context of the conversation. The context of a conversation often plays a substantial role when determining the author's intent and sentiment behind the tweet. This paper describes the system proposed by team MIDAS-IIITD for HASOC 2021 subtask 2, one of the first shared tasks focusing on detecting hate speech from Hindi-English code-mixed conversations on Twitter. We approach this problem using neural networks, leveraging the transformer's cross-lingual embeddings and further finetuning them for low-resource hate-speech classification in transliterated Hindi text. Our best performing system, a hard voting ensemble of Indic-BERT, XLM-RoBERTa, and Multilingual BERT, achieved a macro F1 score of 0.7253, placing us first on the overall leaderboard standings.
【3】 Lerna: Transformer Architectures for Configuring Error Correction Tools for Short- and Long-Read Genome Sequencing 标题:Lerna:配置用于短读和长读基因组测序的纠错工具的Transformer体系结构 链接:https://arxiv.org/abs/2112.10068
作者:Atul Sharma,Pranjal Jain,Ashraf Mahgoub,Zihan Zhou,Kanak Mahadik,Somali Chaterji 机构:Correspondence:, Purdue University, West, Lafayette, US, Full list of author information is, available at the end of the article, †Equal contributor 备注:26 pages, 5 figures, 10 tables. Accepted to BMC Bioinformatics 摘要:测序技术容易出错,因此下游应用需要纠错(EC)。需要手动配置EC工具以获得最佳性能。我们发现,最佳参数(如k-mer大小)既依赖于工具,也依赖于数据集。此外,评估给定工具的性能(即对齐率或增益)通常依赖于参考基因组,但质量参考基因组并不总是可用的。我们介绍了Lerna,用于基于k-mer的电子商务工具的自动配置。Lerna首先创建了一个未修正基因组阅读的语言模型(LM);然后,计算困惑度度量,以评估不同参数选择的校正读数。接下来,它会找到一个在不使用参考基因组的情况下产生最高对齐率的基因。我们的方法的基本直觉是,困惑度度量与纠错后的装配质量成反比。结果:首先,我们表明,对于不同的数据集,即使对于相同的EC工具,最佳k-mer值也可能不同。其次,我们使用基于组件注意的变换器展示了LM的增益。我们展示了模型在纠错前后对困惑度度量的估计。校正后的困惑度越低,k-mer大小越好。我们还表明,为校正读取计算的对齐率和装配质量与复杂度呈强负相关,从而能够自动选择k-mer值以实现更好的错误校正,从而提高装配质量。此外,我们还表明,我们基于注意的模型在整个管道中的运行时有显著的改进——比以前的工作快18倍,这是由于注意机制的并行化和使用JIT编译进行GPU推断。 摘要:Sequencing technologies are prone to errors, making error correction (EC) necessary for downstream applications. EC tools need to be manually configured for optimal performance. We find that the optimal parameters (e.g., k-mer size) are both tool- and dataset-dependent. Moreover, evaluating the performance (i.e., Alignment-rate or Gain) of a given tool usually relies on a reference genome, but quality reference genomes are not always available. We introduce Lerna for the automated configuration of k-mer-based EC tools. Lerna first creates a language model (LM) of the uncorrected genomic reads; then, calculates the perplexity metric to evaluate the corrected reads for different parameter choices. Next, it finds the one that produces the highest alignment rate without using a reference genome. The fundamental intuition of our approach is that the perplexity metric is inversely correlated with the quality of the assembly after error correction. Results: First, we show that the best k-mer value can vary for different datasets, even for the same EC tool. Second, we show the gains of our LM using its component attention-based transformers. We show the model's estimation of the perplexity metric before and after error correction. The lower the perplexity after correction, the better the k-mer size. We also show that the alignment rate and assembly quality computed for the corrected reads are strongly negatively correlated with the perplexity, enabling the automated selection of k-mer values for better error correction, and hence, improved assembly quality. Additionally, we show that our attention-based models have significant runtime improvement for the entire pipeline -- 18X faster than previous works, due to parallelizing the attention mechanism and the use of JIT compilation for GPU inferencing.
GAN|对抗|攻击|生成相关(7篇)
【1】 GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models 标题:Glide:使用文本引导扩散模型生成和编辑照片级真实感图像 链接:https://arxiv.org/abs/2112.10741
作者:Alex Nichol,Prafulla Dhariwal,Aditya Ramesh,Pranav Shyam,Pamela Mishkin,Bob McGrew,Ilya Sutskever,Mark Chen 备注:20 pages, 18 figures 摘要:最近的研究表明,扩散模型可以生成高质量的合成图像,特别是当与导航技术结合使用时,可以在保真度上权衡多样性。我们探索了文本条件图像合成问题的扩散模型,并比较了两种不同的引导策略:剪辑引导和无分类器引导。我们发现,对于照片真实性和标题相似性,人类评估者更喜欢后者,并且经常产生照片真实性样本。使用无分类器指导的35亿参数文本条件扩散模型中的样本比来自DALL-E的样本更受人类评估者的青睐,即使后者使用昂贵的剪辑重新排序。此外,我们发现,我们的模型可以微调以执行图像修复,从而实现强大的文本驱动图像编辑。我们在过滤后的数据集上训练一个较小的模型,并在https://github.com/openai/glide-text2im. 摘要:Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples. Samples from a 3.5 billion parameter text-conditional diffusion model using classifier-free guidance are favored by human evaluators to those from DALL-E, even when the latter uses expensive CLIP reranking. Additionally, we find that our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing. We train a smaller model on a filtered dataset and release the code and weights at https://github.com/openai/glide-text2im.
【2】 CGAN-EB: A Non-parametric Empirical Bayes Method for Crash Hotspot Identification Using Conditional Generative Adversarial Networks: A Real-world Crash Data Study 标题:CGAN-EB:基于条件生成对抗网络的坠机热点识别的非参数经验贝叶斯方法:真实坠机数据研究 链接:https://arxiv.org/abs/2112.10588
作者:Mohammad Zarei,Bruce Hellinga,Pedram Izadpanah 机构: Graduate Research Assistant, Department of Civil and Environmental Engineering, University of Waterloo, University Ave., Waterloo, ON 备注:arXiv admin note: text overlap with arXiv:2112.06925 摘要:基于负二项(NB)等参数统计模型的经验贝叶斯(EB)方法已广泛用于道路网安全筛选过程中的站点排名。本文是作者先前研究的延续,在此研究中,提出了一种基于条件生成对抗网络(CGAN)的非参数EB方法,用于模拟碰撞频率数据,并对多个模拟碰撞数据集进行了评估。与参数方法不同,CGAN-EB中的因变量和自变量之间不需要预先指定的基本关系,并且它们能够模拟任何类型的分布。所提议的方法现已应用于2012年至2017年在华盛顿州收集的路段真实数据集。将CGAN-EB在模型拟合、预测性能和网络筛选结果方面的性能与传统方法(NB-EB)进行比较,作为基准。结果表明,提出的CGAN-EB方法在预测能力和热点识别测试方面优于NB-EB方法。 摘要:The empirical Bayes (EB) method based on parametric statistical models such as the negative binomial (NB) has been widely used for ranking sites in road network safety screening process. This paper is the continuation of the authors previous research, where a novel non-parametric EB method for modelling crash frequency data data based on Conditional Generative Adversarial Networks (CGAN) was proposed and evaluated over several simulated crash data sets. Unlike parametric approaches, there is no need for a pre-specified underlying relationship between dependent and independent variables in the proposed CGAN-EB and they are able to model any types of distributions. The proposed methodology is now applied to a real-world data set collected for road segments from 2012 to 2017 in Washington State. The performance of CGAN-EB in terms of model fit, predictive performance and network screening outcomes is compared with the conventional approach (NB-EB) as a benchmark. The results indicate that the proposed CGAN-EB approach outperforms NB-EB in terms of prediction power and hotspot identification tests.
【3】 NFTGAN: Non-Fungible Token Art Generation Using Generative Adversatial Networks 标题:NFTGAN:基于生成性广告网络的不可替代的代币艺术生成 链接:https://arxiv.org/abs/2112.10577
作者:Sakib Shahriar,Kadhim Hayawi 机构:College of Technological Innovation, Zayed University, Abu Dhabi, United Arab Emirates 摘要:随着不可替代代币(NFT)的出现,数字艺术获得了前所未有的普及。NFT是存储在区块链网络上的加密资产,代表不能伪造的数字所有权证书。NFT可以纳入智能合同,允许业主从未来的销售百分比中受益。虽然数字艺术制作人可以从NFT中受益匪浅,但他们的制作非常耗时。因此,本文探讨了使用生成性对抗网络(GAN)自动生成数字艺术的可能性。GAN是一种深度学习体系结构,广泛有效地用于音频、图像和视频内容的合成。然而,它们在NFT艺术中的应用受到限制。本文实现了一种基于GAN的数字艺术生成体系结构,并对其进行了评估。定性案例研究的结果表明,生成的艺术品与真实样品具有可比性。 摘要:Digital arts have gained an unprecedented level of popularity with the emergence of non-fungible tokens (NFTs). NFTs are cryptographic assets that are stored on blockchain networks and represent a digital certificate of ownership that cannot be forged. NFTs can be incorporated into a smart contract which allows the owner to benefit from a future sale percentage. While digital art producers can benefit immensely with NFTs, their production is time consuming. Therefore, this paper explores the possibility of using generative adversarial networks (GANs) for automatic generation of digital arts. GANs are deep learning architectures that are widely and effectively used for synthesis of audio, images, and video contents. However, their application to NFT arts have been limited. In this paper, a GAN-based architecture is implemented and evaluated for digital arts generation. Results from the qualitative case study indicate the generated artworks are comparable to the real samples.
【4】 Certified Federated Adversarial Training 标题:认证的联合对抗赛训练 链接:https://arxiv.org/abs/2112.10525
作者:Giulio Zizzo,Ambrish Rawat,Mathieu Sinn,Sergio Maffeis,Chris Hankin 机构:IBM Research Europe, Imperial College London 备注:First presented at the 1st NeurIPS Workshop on New Frontiers in Federated Learning (NFFL 2021) 摘要:在联邦学习(FL)中,已经开发了健壮的聚合方案来防止恶意客户端。许多健壮的聚合方案依赖于一定数量的良性客户机,这些客户机存在于工作人员的群体中。当客户端可以随意加入,或基于空闲系统状态、连接电源和WiFi等因素加入时,这可能很难保证。我们解决了当大量员工可能完全恶意时,保护FL系统进行对抗性训练的情况。我们对一个攻击者进行建模,该攻击者毒害该模型,在对抗性训练中插入弱点,使该模型显示出明显的对抗鲁棒性,而攻击者可以利用插入的弱点绕过对抗性训练,并迫使该模型对对抗性示例进行错误分类。我们使用抽象解释技术来检测此类秘密攻击并阻止损坏的模型更新。我们证明,这种防御可以保持对抗性的鲁棒性,即使是针对自适应攻击者。 摘要:In federated learning (FL), robust aggregation schemes have been developed to protect against malicious clients. Many robust aggregation schemes rely on certain numbers of benign clients being present in a quorum of workers. This can be hard to guarantee when clients can join at will, or join based on factors such as idle system status, and connected to power and WiFi. We tackle the scenario of securing FL systems conducting adversarial training when a quorum of workers could be completely malicious. We model an attacker who poisons the model to insert a weakness into the adversarial training such that the model displays apparent adversarial robustness, while the attacker can exploit the inserted weakness to bypass the adversarial training and force the model to misclassify adversarial examples. We use abstract interpretation techniques to detect such stealthy attacks and block the corrupted model updates. We show that this defence can preserve adversarial robustness even against an adaptive attacker.
【5】 Safe multi-agent deep reinforcement learning for joint bidding and maintenance scheduling of generation units 标题:发电机组联合竞价检修计划的安全多智能体深度强化学习 链接:https://arxiv.org/abs/2112.10459
作者:Pegah Rokhforoz,Olga Fink 摘要:针对电力市场竞争环境下的发电竞价决策和机组检修调度问题,提出了一种安全强化学习算法。在这个问题中,每个机组的目标是找到一个投标策略,该策略通过安排预防性维护来最大化其收入,同时保持其可靠性。维护计划提供了一些安全约束,这些约束应始终得到满足。当发电机组之间的竞价策略信息不完全时,满足关键的安全性和可靠性约束是一个具有挑战性的问题。双层优化和强化学习是解决这类问题的最先进方法。然而,无论是双层优化还是强化学习都无法应对信息不完整和关键安全约束的挑战。为了应对这些挑战,我们提出了安全的深层确定性策略梯度强化学习算法,该算法基于强化学习和预测安全过滤器的组合。案例研究表明,与其他最先进的方法相比,该方法在满足系统安全约束的同时可以获得更高的利润。 摘要:This paper proposes a safe reinforcement learning algorithm for generation bidding decisions and unit maintenance scheduling in a competitive electricity market environment. In this problem, each unit aims to find a bidding strategy that maximizes its revenue while concurrently retaining its reliability by scheduling preventive maintenance. The maintenance scheduling provides some safety constraints which should be satisfied at all times. Satisfying the critical safety and reliability constraints while the generation units have an incomplete information of each others' bidding strategy is a challenging problem. Bi-level optimization and reinforcement learning are state of the art approaches for solving this type of problems. However, neither bi-level optimization nor reinforcement learning can handle the challenges of incomplete information and critical safety constraints. To tackle these challenges, we propose the safe deep deterministic policy gradient reinforcement learning algorithm which is based on a combination of reinforcement learning and a predicted safety filter. The case study demonstrates that the proposed approach can achieve a higher profit compared to other state of the art methods while concurrently satisfying the system safety constraints.
【6】 Investigation of Densely Connected Convolutional Networks with Domain Adversarial Learning for Noise Robust Speech Recognition 标题:基于域对抗学习的密集连接卷积网络抗噪语音识别研究 链接:https://arxiv.org/abs/2112.10108
作者:Chia Yu Li,Ngoc Thang Vu 机构:Institute for Natural Language Processing, University of Stuttgart, Germany 备注:7 pages, 5 figures, The 30th Conference on Electronic Speech Signal Processing (ESSV2019) 摘要:我们研究了稠密连接卷积网络(DenseNets)及其在域对抗训练中的扩展,用于噪声鲁棒语音识别。DenseNets是一种非常深入、紧凑的卷积神经网络,与计算机视觉领域的最新成果相比有了惊人的改进。我们的实验结果表明,DenseNets比其他基于神经网络的模型(如深前馈神经网络和卷积神经网络)对噪声具有更强的鲁棒性。此外,领域对抗性学习可以进一步提高Densenet对已知和未知噪声条件的鲁棒性。 摘要:We investigate densely connected convolutional networks (DenseNets) and their extension with domain adversarial training for noise robust speech recognition. DenseNets are very deep, compact convolutional neural networks which have demonstrated incredible improvements over the state-of-the-art results in computer vision. Our experimental results reveal that DenseNets are more robust against noise than other neural network based models such as deep feed forward neural networks and convolutional neural networks. Moreover, domain adversarial learning can further improve the robustness of DenseNets against both, known and unknown noise conditions.
【7】 Managing dataset shift by adversarial validation for credit scoring 标题:通过信用评分的对抗性验证管理数据集移动 链接:https://arxiv.org/abs/2112.10078
作者:Hongyi Qian,Baohui Wang,Ping Ma,Lei Peng,Songfeng Gao,You Song 机构:School of Computer Science and Engineering, Beihang University, Beijing , PR China, School of Software, Beihang University, Beijing , PR China, HuaRong RongTong (Beijing) Technology Co., Ltd, Beijing , PR China 摘要:数据集转移在信用评分场景中很常见,训练数据的分布与实际需要预测的数据之间的不一致可能导致模型性能不佳。然而,目前的大多数研究都没有考虑到这一点,它们在训练模型时直接混合了来自不同时间段的数据。这带来了两个问题。首先,存在数据泄漏的风险,即使用未来数据预测过去。这可能会导致离线验证结果膨胀,但在实际应用中结果不令人满意。其次,不同时期的宏观经济环境和风险控制策略可能不同,借款人的行为模式也可能发生变化。用过去的数据训练的模型可能不适用于最近的阶段。因此,我们提出了一种基于对抗性验证的方法来缓解信用评分场景中的数据集转移问题。在该方法中,通过对抗性验证选择与预测数据分布最接近的部分训练集样本进行交叉验证,以确保训练模型在预测样本上的泛化性能。此外,通过简单的拼接方法,将训练数据中与测试数据分布不一致的样本也参与交叉验证的训练过程,充分利用了所有数据,进一步提高了模型性能。为了验证该方法的有效性,利用Lending Club提供的数据与其他几种数据分割方法进行了对比实验。实验结果证明了数据集迁移在信用评分领域的重要性和该方法的优越性。 摘要:Dataset shift is common in credit scoring scenarios, and the inconsistency between the distribution of training data and the data that actually needs to be predicted is likely to cause poor model performance. However, most of the current studies do not take this into account, and they directly mix data from different time periods when training the models. This brings about two problems. Firstly, there is a risk of data leakage, i.e., using future data to predict the past. This can result in inflated results in offline validation, but unsatisfactory results in practical applications. Secondly, the macroeconomic environment and risk control strategies are likely to be different in different time periods, and the behavior patterns of borrowers may also change. The model trained with past data may not be applicable to the recent stage. Therefore, we propose a method based on adversarial validation to alleviate the dataset shift problem in credit scoring scenarios. In this method, partial training set samples with the closest distribution to the predicted data are selected for cross-validation by adversarial validation to ensure the generalization performance of the trained model on the predicted samples. In addition, through a simple splicing method, samples in the training data that are inconsistent with the test data distribution are also involved in the training process of cross-validation, which makes full use of all the data and further improves the model performance. To verify the effectiveness of the proposed method, comparative experiments with several other data split methods are conducted with the data provided by Lending Club. The experimental results demonstrate the importance of dataset shift in the field of credit scoring and the superiority of the proposed method.
半/弱/无/有监督|不确定性|主动学习(7篇)
【1】 RvS: What is Essential for Offline RL via Supervised Learning? 标题:RVS:通过监督学习实现离线RL的关键是什么? 链接:https://arxiv.org/abs/2112.10751
作者:Scott Emmons,Benjamin Eysenbach,Ilya Kostrikov,Sergey Levine 机构:UC Berkeley, Carnegie Mellon University 摘要:最近的研究表明,在没有时间差(TD)学习的情况下,单独的监督学习对于离线学习是非常有效的。什么时候成立,哪些算法组件是必需的?通过大量实验,我们将离线学习的监督学习归结为其基本要素。在我们考虑的每个环境中,简单地用两层前馈MLP最大化似然与基于TD学习或Transformer的序列建模的更为复杂的方法的最新结果相竞争。仔细选择模型容量(例如,通过规范化或架构)以及选择哪些信息(例如,目标或奖励)对性能至关重要。这些见解可作为实践者通过监督学习(我们称之为“RvS学习”)进行强化学习的现场指南。他们还探讨了现有RvS方法的局限性,这些方法在随机数据方面相对较弱,并提出了一些有待解决的问题。 摘要:Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. When does this hold true, and which algorithmic components are necessary? Through extensive experiments, we boil supervised learning for offline RL down to its essential elements. In every environment suite we consider, simply maximizing likelihood with a two-layer feedforward MLP is competitive with state-of-the-art results of substantially more complex methods based on TD learning or sequence modeling with Transformers. Carefully choosing model capacity (e.g., via regularization or architecture) and choosing which information to condition on (e.g., goals or rewards) are critical for performance. These insights serve as a field guide for practitioners doing Reinforcement Learning via Supervised Learning (which we coin "RvS learning"). They also probe the limits of existing RvS methods, which are comparatively weak on random data, and suggest a number of open problems.
【2】 Denoised Labels for Financial Time-Series Data via Self-Supervised Learning 标题:基于自监督学习的金融时间序列数据去噪标签 链接:https://arxiv.org/abs/2112.10139
作者:Yanqing Ma,Carmine Ventre,Maria Polukarov 机构:King’s College London, London, United Kingdom 摘要:电子交易平台的引入有效地改变了传统系统性交易的组织结构,从报价驱动的市场转变为订单驱动的市场。其便利性导致金融数据量呈指数增长,但由于低信噪比和金融时间序列的非平稳性,很难将其用于预测未来价格。更简单的分类任务——目标是通过监督学习算法预测未来价格运动的方向——需要足够可靠的标签才能很好地概括。然而,与其他领域相比,标记金融数据的定义不那么明确:价格上涨是因为噪音还是因为信号?现有的标签方法在抗噪声和改进学习算法方面的效果有限。这项工作的灵感来自于交易中的图像分类和自监督学习的成功。我们研究将计算机视觉技术应用于金融时间序列的想法,以减少噪声暴露,从而生成正确的标签。我们将标签生成视为自我监督学习方法的借口任务,并将文献中常用的朴素(和噪声)标签与相同下游分类任务的去噪自动编码器生成的标签进行比较。我们的结果表明,我们的去噪标签提高了下游学习算法的性能,无论是对于小型还是大型数据集。我们进一步表明,我们获得的信号可以用于有效地与二进制策略进行交易。我们认为,利用所提出的技术,自我监督学习构成了一个强大的框架,可以生成“更好”的金融标签,有助于研究市场的基本模式。 摘要:The introduction of electronic trading platforms effectively changed the organisation of traditional systemic trading from quote-driven markets into order-driven markets. Its convenience led to an exponentially increasing amount of financial data, which is however hard to use for the prediction of future prices, due to the low signal-to-noise ratio and the non-stationarity of financial time series. Simpler classification tasks -- where the goal is to predict the directions of future price movement -- via supervised learning algorithms, need sufficiently reliable labels to generalise well. Labelling financial data is however less well defined than other domains: did the price go up because of noise or because of signal? The existing labelling methods have limited countermeasures against noise and limited effects in improving learning algorithms. This work takes inspiration from image classification in trading and success in self-supervised learning. We investigate the idea of applying computer vision techniques to financial time-series to reduce the noise exposure and hence generate correct labels. We look at the label generation as the pretext task of a self-supervised learning approach and compare the naive (and noisy) labels, commonly used in the literature, with the labels generated by a denoising autoencoder for the same downstream classification task. Our results show that our denoised labels improve the performances of the downstream learning algorithm, for both small and large datasets. We further show that the signals we obtain can be used to effectively trade with binary strategies. We suggest that with proposed techniques, self-supervised learning constitutes a powerful framework for generating "better" financial labels that are useful for studying the underlying patterns of the market.
【3】 Gradient-based Novelty Detection Boosted by Self-supervised Binary Classification 标题:基于自监督二值分类的梯度新颖性检测 链接:https://arxiv.org/abs/2112.09815
作者:Jingbo Sun,Li Yang,Jiaxin Zhang,Frank Liu,Mahantesh Halappanavar,Deliang Fan,Yu Cao 机构: Arizona State University, Oak Ridge National Laboratory, Pacific Northwest National Laboratory 摘要:新颖性检测旨在自动识别分布外(OOD)数据,而无需事先了解这些数据。这是数据监控、行为分析和其他应用程序中的关键步骤,有助于在该领域实现持续学习。传统的OOD检测方法是对一组数据或特征进行多变量分析,通常借助于OOD数据的监督来提高精度。实际上,这种监督是不切实际的,因为人们无法预测异常数据。在本文中,我们提出了一种新的自监督方法,该方法不依赖于任何预定义的OOD数据:(1)新方法评估In分布和OOD数据之间梯度的马氏距离。(2) 它由一个自监督二值分类器辅助,引导标签选择以生成梯度,并最大化马氏距离。在使用多个数据集(如CIFAR-10、CIFAR-100、SVHN和TinyImageNet)进行评估时,所提出的方法在接收机工作特性(AUROC)下的区域和精确召回曲线(AUPR)下的区域始终优于最先进的有监督和无监督方法。我们进一步证明,该检测器能够在持续学习中准确地学习一个OOD类。 摘要:Novelty detection aims to automatically identify out-of-distribution (OOD) data, without any prior knowledge of them. It is a critical step in data monitoring, behavior analysis and other applications, helping enable continual learning in the field. Conventional methods of OOD detection perform multi-variate analysis on an ensemble of data or features, and usually resort to the supervision with OOD data to improve the accuracy. In reality, such supervision is impractical as one cannot anticipate the anomalous data. In this paper, we propose a novel, self-supervised approach that does not rely on any pre-defined OOD data: (1) The new method evaluates the Mahalanobis distance of the gradients between the in-distribution and OOD data. (2) It is assisted by a self-supervised binary classifier to guide the label selection to generate the gradients, and maximize the Mahalanobis distance. In the evaluation with multiple datasets, such as CIFAR-10, CIFAR-100, SVHN and TinyImageNet, the proposed approach consistently outperforms state-of-the-art supervised and unsupervised methods in the area under the receiver operating characteristic (AUROC) and area under the precision-recall curve (AUPR) metrics. We further demonstrate that this detector is able to accurately learn one OOD class in continual learning.
【4】 A data-centric weak supervised learning for highway traffic incident detection 标题:一种以数据为中心的弱监督学习在高速公路交通事件检测中的应用 链接:https://arxiv.org/abs/2112.09792
作者:Yixuan Sun,Tanwi Mallick,Prasanna Balaprakash,Jane Macfarlane 机构:School of Mechanical Engineering, Purdue University, West Lafayette, IN, Argonne National Laboratory, Lemont, IL, Mathematics and Computer Science Division & Argonne Leadership Computing Facility, Sustainable Energy Systems Group 摘要:使用环路检测器传感器的数据对高速公路上的交通事故进行近实时检测对于避免重大交通拥堵至关重要。虽然最近的有监督机器学习方法通过利用人类标记的事件数据提供了事件检测的解决方案,但虚警率往往过高,无法在实践中使用。具体而言,事件的人类标记的不一致性显著影响监督学习模型的性能。为此,我们专注于以数据为中心的方法,以提高高速公路交通事件检测的准确性并降低误报率。我们开发了一个弱监督学习工作流,为没有地面真值标签的事件数据生成高质量的训练标签,并在监督学习设置中使用这些生成的标签进行最终检测。这种方法包括三个阶段。首先,我们介绍了一个数据预处理和管理管道,该管道通过利用标记函数(可以是领域知识相关的或简单的启发式规则)处理交通传感器数据以生成高质量的训练数据。其次,我们使用三种监督学习模型(随机森林、k近邻和支持向量机集成)和长-短期记忆分类器评估弱监督生成的训练数据。结果表明,使用弱监督生成的训练数据后,所有模型的精度都显著提高。第三,我们开发了一种在线实时事件检测方法,在检测事件时利用模型集成和不确定性量化。总之,我们证明了我们提出的弱监督学习工作流实现了高事件检测率(0.90)和低虚警率(0.08)。 摘要:Using the data from loop detector sensors for near-real-time detection of traffic incidents in highways is crucial to averting major traffic congestion. While recent supervised machine learning methods offer solutions to incident detection by leveraging human-labeled incident data, the false alarm rate is often too high to be used in practice. Specifically, the inconsistency in the human labeling of the incidents significantly affects the performance of supervised learning models. To that end, we focus on a data-centric approach to improve the accuracy and reduce the false alarm rate of traffic incident detection on highways. We develop a weak supervised learning workflow to generate high-quality training labels for the incident data without the ground truth labels, and we use those generated labels in the supervised learning setup for final detection. This approach comprises three stages. First, we introduce a data preprocessing and curation pipeline that processes traffic sensor data to generate high-quality training data through leveraging labeling functions, which can be domain knowledge-related or simple heuristic rules. Second, we evaluate the training data generated by weak supervision using three supervised learning models -- random forest, k-nearest neighbors, and a support vector machine ensemble -- and long short-term memory classifiers. The results show that the accuracy of all of the models improves significantly after using the training data generated by weak supervision. Third, we develop an online real-time incident detection approach that leverages the model ensemble and the uncertainty quantification while detecting incidents. Overall, we show that our proposed weak supervised learning workflow achieves a high incident detection rate (0.90) and low false alarm rate (0.08).
【5】 Can uncertainty boost the reliability of AI-based diagnostic methods in digital pathology? 标题:不确定性能提高数字病理学中基于人工智能的诊断方法的可靠性吗? 链接:https://arxiv.org/abs/2112.09693
作者:Milda Pocevičiūtė,Gabriel Eilertsen,Sofia Jarkman,Claes Lundström 机构:Lundstr¨om, Department of Science and Technology, Link¨oping University, Sweden, Center for Medical Image Science and Visualization, Link¨oping University, Sweden, Department of Clinical Pathology, and Department of Biomedical and Clinical 摘要:深度学习(DL)在数字病理学应用中显示出巨大潜力。基于诊断DL的解决方案的健壮性对于安全的临床部署至关重要。在这项工作中,我们评估了在数字病理学中增加DL预测的不确定性估计是否可以通过提高总体预测性能或检测预测失误而增加临床应用价值。我们比较了模型集成方法(MC退出和深度集成)与模型不可知方法(测试时间增强,TTA)的有效性。此外,还比较了四种不确定性度量。我们的实验集中于两种领域转移场景:转移到不同的医疗中心和一种代表性不足的癌症亚型。我们的结果表明,不确定性估计可以增加一些可靠性,降低分类阈值选择的敏感性。虽然高级度量和深度集成在我们的比较中表现最好,但与简单度量和TTA相比,附加值很小。重要的是,所有评估的不确定性估计方法的好处都会因域转移而减少。 摘要:Deep learning (DL) has shown great potential in digital pathology applications. The robustness of a diagnostic DL-based solution is essential for safe clinical deployment. In this work we evaluate if adding uncertainty estimates for DL predictions in digital pathology could result in increased value for the clinical applications, by boosting the general predictive performance or by detecting mispredictions. We compare the effectiveness of model-integrated methods (MC dropout and Deep ensembles) with a model-agnostic approach (Test time augmentation, TTA). Moreover, four uncertainty metrics are compared. Our experiments focus on two domain shift scenarios: a shift to a different medical center and to an underrepresented subtype of cancer. Our results show that uncertainty estimates can add some reliability and reduce sensitivity to classification threshold selection. While advanced metrics and deep ensembles perform best in our comparison, the added value over simpler metrics and TTA is small. Importantly, the benefit of all evaluated uncertainty estimation methods is diminished by domain shift.
【6】 QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation -- Analysis of Ranking Metrics and Benchmarking Results 标题:Qu-brats:MICCAI brats 2020脑瘤分割量化不确定性挑战--排名指标和基准结果分析 链接:https://arxiv.org/abs/2112.10074
作者:Raghav Mehta,Angelos Filos,Ujjwal Baid,Chiharu Sako,Richard McKinley,Michael Rebsamen,Katrin Dätwyler,Raphael Meier,Piotr Radojewski,Gowtham Krishnan Murugesan,Sahil Nalawade,Chandan Ganesh,Ben Wagner,Fang F. Yu,Baowei Fei,Ananth J. Madhuranthakam,Joseph A. Maldjian,Laura Daza,Catalina Gómez,Pablo Arbeláez,Chengliang Dai,Shuo Wang,Hadrien Raynaud,Yuanhan Mo,Elsa Angelini,Yike Guo,Wenjia Bai,Subhashis Banerjee,Linmin Pei,Murat AK,Sarahi Rosas-González,Illyess Zemmoura,Clovis Tauber,Minh H. Vu,Tufve Nyholm,Tommy Löfstedt,Laura Mora Ballestar,Veronica Vilaplana,Hugh McHugh,Gonzalo Maso Talou,Alan Wang,Jay Patel,Ken Chang,Katharina Hoebel,Mishka Gidwani,Nishanth Arun,Sharut Gupta,Mehak Aggarwal,Praveer Singh,Elizabeth R. Gerstner,Jayashree Kalpathy-Cramer,Nicolas Boutry,Alexis Huard,Lasitha Vidyaratne,Md Monibor Rahman,Khan M. Iftekharuddin,Joseph Chazalon,Elodie Puybareau,Guillaume Tochon,Jun Ma,Mariano Cabezas,Xavier Llado,Arnau Oliver,Liliana Valencia,Sergi Valverde,Mehdi Amian,Mohammadreza Soltaninejad,Andriy Myronenko,Ali Hatamizadeh,Xue Feng,Quan Dou,Nicholas Tustison,Craig Meyer,Nisarg A. Shah,Sanjay Talbar,Marc-Andr Weber,Abhishek Mahajan,Andras Jakab,Roland Wiest,Hassan M. Fathallah-Shaykh,Arash Nazeri,Mikhail Milchenko,Daniel Marcus,Aikaterini Kotrotsou,Rivka Colen,John Freymann,Justin Kirby,Christos Davatzikos,Bjoern Menze,Spyridon Bakas,Yarin Gal,Tal Arbel 机构:Centre for Intelligent Machines (CIM), McGill University, Montreal, QC, Canada,Oxford Applied and Theo-, retical Machine Learning (OATML) Group, University of Oxford, Oxford, England,Center for Biomedical Image 备注:Under submission at MELBA journal 摘要:深度学习(DL)模型在各种医学成像基准挑战中提供了最先进的性能,包括脑肿瘤分割(BraTS)挑战。然而,病灶病理学多室分割(如肿瘤和病变亚区)的任务尤其具有挑战性,潜在的错误阻碍了DL模型转化为临床工作流程。以不确定性的形式量化DL模型预测的可靠性,可以对最不确定的区域进行临床审查,从而建立信任并为临床翻译铺平道路。最近,许多不确定性估计方法被引入到DL医学图像分割任务中。制定评估和比较不确定性度量性能的指标将有助于最终用户做出更明智的决策。在本研究中,我们探索和评估了BraTS 2019-2020年不确定性量化任务(QU-BraTS)期间制定的一项指标,旨在评估和排序脑肿瘤多室分割的不确定性估计。该指标(1)奖励在正确断言中产生高置信度的不确定性估计,以及在错误断言中分配低置信度的不确定性估计,以及(2)惩罚导致低置信正确断言百分比较高的不确定性度量。我们进一步对QU BraTS 2020的14个独立参与团队产生的细分不确定性进行了基准测试,所有团队也参与了主要的BraTS细分任务。总的来说,我们的研究结果证实了不确定性估计对分割算法的重要性和互补价值,因此强调了医学图像分析中不确定性量化的必要性。我们的评估代码公开于https://github.com/RagMeh11/QU-BraTS. 摘要:Deep learning (DL) models have provided the state-of-the-art performance in a wide variety of medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder the translation of DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties, could enable clinical review of the most uncertain regions, thereby building trust and paving the way towards clinical translation. Recently, a number of uncertainty estimation methods have been introduced for DL medical image segmentation tasks. Developing metrics to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a metric developed during the BraTS 2019-2020 task on uncertainty quantification (QU-BraTS), and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This metric (1) rewards uncertainty estimates that produce high confidence in correct assertions, and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentages of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, and hence highlight the need for uncertainty quantification in medical image analyses. Our evaluation code is made publicly available at https://github.com/RagMeh11/QU-BraTS.
【7】 Supervised Multivariate Learning with Simultaneous Feature Auto-grouping and Dimension Reduction 标题:同时进行特征自动分组和降维的有监督多元学习 链接:https://arxiv.org/abs/2112.09746
作者:Yiyuan She,Jiahui Shen,Chao Zhang 机构:Department of Statistics, Florida State University, Tallahassee, USA., Center for Information Science, Peking University, Beijing, China. 摘要:现代高维方法通常采用“赌稀疏”原则,而在有监督的多元学习中,统计学家可能面临大量非零系数的“密集”问题。本文提出了一种新的聚类降秩学习(CRL)框架,该框架采用两种联合矩阵正则化来自动分组特征,以构造预测因子。CRL比低秩模型更具解释性,并且在变量选择中放松了严格的稀疏性假设。本文提出了新的信息理论极限,揭示了寻求聚类的内在代价,以及多元学习中维度带来的好处。此外,还提出了一种高效的优化算法,该算法在保证收敛的前提下进行子空间学习和聚类。所得到的不动点估计虽然不一定是全局最优的,但在某些正则条件下,其统计精度超过了标准似然设置。此外,提出了一种新的信息准则及其无标度形式,用于聚类和秩选择,在不假设无限样本量的情况下,具有严格的理论支持。大量的仿真和实际数据实验证明了该方法的统计精度和可解释性。 摘要:Modern high-dimensional methods often adopt the ``bet on sparsity'' principle, while in supervised multivariate learning statisticians may face ``dense'' problems with a large number of nonzero coefficients. This paper proposes a novel clustered reduced-rank learning (CRL) framework that imposes two joint matrix regularizations to automatically group the features in constructing predictive factors. CRL is more interpretable than low-rank modeling and relaxes the stringent sparsity assumption in variable selection. In this paper, new information-theoretical limits are presented to reveal the intrinsic cost of seeking for clusters, as well as the blessing from dimensionality in multivariate learning. Moreover, an efficient optimization algorithm is developed, which performs subspace learning and clustering with guaranteed convergence. The obtained fixed-point estimators, though not necessarily globally optimal, enjoy the desired statistical accuracy beyond the standard likelihood setup under some regularity conditions. Moreover, a new kind of information criterion, as well as its scale-free form, is proposed for cluster and rank selection, and has a rigorous theoretical support without assuming an infinite sample size. Extensive simulations and real-data experiments demonstrate the statistical accuracy and interpretability of the proposed method.
迁移|Zero/Few/One-Shot|自适应(5篇)
【1】 Towards a Principled Learning Rate Adaptation for Natural Evolution Strategies 标题:面向自然进化策略的原则性学习率适应 链接:https://arxiv.org/abs/2112.10680
作者:Masahiro Nomura,Isao Ono 机构:Towards a Principled Learning Rate Adaptationfor Natural Evolution StrategiesMasahiro Nomura 1 and Isao Ono 2 1 Tokyo Institute of Technology, jp 2 Tokyo Institute of Technology 摘要:自然进化策略(NES)是解决黑箱连续优化问题的一个很有前途的框架。NES根据估计的自然梯度优化概率分布的参数,影响性能的关键参数之一是学习率。我们认为,从自然梯度法的观点来看,学习率应该根据自然梯度的估计精度来确定。为此,我们提出了一种新的学习速率自适应机制。提出的机制可以为相对容易优化的问题设置较高的学习率,从而加快搜索速度。另一方面,在难以优化的问题(例如,多峰函数)中,当自然梯度的估计精度似乎较低时,所提出的机制可以设置保守的学习速率,从而导致鲁棒和稳定的搜索。对单峰函数和多峰函数的实验评估表明,该机制可以根据搜索情况正常工作,并且比现有方法(即使用固定学习率)更有效。 摘要:Natural Evolution Strategies (NES) is a promising framework for black-box continuous optimization problems. NES optimizes the parameters of a probability distribution based on the estimated natural gradient, and one of the key parameters affecting the performance is the learning rate. We argue that from the viewpoint of the natural gradient method, the learning rate should be determined according to the estimation accuracy of the natural gradient. To do so, we propose a new learning rate adaptation mechanism for NES. The proposed mechanism makes it possible to set a high learning rate for problems that are relatively easy to optimize, which results in speeding up the search. On the other hand, in problems that are difficult to optimize (e.g., multimodal functions), the proposed mechanism makes it possible to set a conservative learning rate when the estimation accuracy of the natural gradient seems to be low, which results in the robust and stable search. The experimental evaluations on unimodal and multimodal functions demonstrate that the proposed mechanism works properly depending on a search situation and is effective over the existing method, i.e., using the fixed learning rate.
【2】 Rethinking Importance Weighting for Transfer Learning 标题:对迁移学习重要性权重的再思考 链接:https://arxiv.org/abs/2112.10157
作者:Nan Lu,Tianyi Zhang,Tongtong Fang,Takeshi Teshima,Masashi Sugiyama 机构:jp‡The University of Tokyo 摘要:监督学习的一个关键假设是训练和测试数据遵循相同的概率分布。然而,这一基本假设在实践中并不总是得到满足,例如,由于环境变化、样本选择偏差、隐私问题或高标签成本。迁移学习(Transfer learning,TL)放松了这一假设,允许我们在分布转移的情况下学习。经典的TL方法通常依赖于重要性加权——根据重要性加权的训练损失(即训练密度比测试)来训练预测值。然而,随着现实世界中的机器学习任务变得越来越复杂、高维和动态,最近人们探索了新的方法来应对这些挑战。在介绍基于重要性加权的TL的基础上,我们回顾了基于联合和动态重要性预估的最新进展。此外,我们还介绍了一种因果机制转换的方法,该方法将因果结构融入到翻译目的语中。最后,我们讨论了翻译目的语研究的未来前景。 摘要:A key assumption in supervised learning is that training and test data follow the same probability distribution. However, this fundamental assumption is not always satisfied in practice, e.g., due to changing environments, sample selection bias, privacy concerns, or high labeling costs. Transfer learning (TL) relaxes this assumption and allows us to learn under distribution shift. Classical TL methods typically rely on importance-weighting -- a predictor is trained based on the training losses weighted according to the importance (i.e., the test-over-training density ratio). However, as real-world machine learning tasks are becoming increasingly complex, high-dimensional, and dynamical, novel approaches are explored to cope with such challenges recently. In this article, after introducing the foundation of TL based on importance-weighting, we review recent advances based on joint and dynamic importance-predictor estimation. Furthermore, we introduce a method of causal mechanism transfer that incorporates causal structure in TL. Finally, we discuss future perspectives of TL research.
【3】 Continual Learning with Knowledge Transfer for Sentiment Classification 标题:用于情感分类的带知识转移的持续学习 链接:https://arxiv.org/abs/2112.10021
作者:Zixuan Ke,Bing Liu,Hao Wang,Lei Shu 机构: University of Illinois at Chicago, USA, Southwest Jiaotong University, Chengdu, China 备注:None 摘要:本文研究情绪分类(SC)中的连续学习(CL)。在此设置中,CL系统在神经网络中增量学习SC任务序列,其中每个任务构建分类器,以对特定产品类别或领域的评论情绪进行分类。两个自然的问题是:系统能否将过去从以前的任务中学到的知识转移到新任务中,以帮助它为新任务学习更好的模型?而且,以前任务的旧模型也能在过程中得到改进吗?本文提出了一种称为KAN的新技术来实现这些目标。KAN可以通过正向和反向知识转移显著提高新任务和旧任务的SC准确性。通过大量实验证明了KAN的有效性。 摘要:This paper studies continual learning (CL) for sentiment classification (SC). In this setting, the CL system learns a sequence of SC tasks incrementally in a neural network, where each task builds a classifier to classify the sentiment of reviews of a particular product category or domain. Two natural questions are: Can the system transfer the knowledge learned in the past from the previous tasks to the new task to help it learn a better model for the new task? And, can old models for previous tasks be improved in the process as well? This paper proposes a novel technique called KAN to achieve these objectives. KAN can markedly improve the SC accuracy of both the new task and the old tasks via forward and backward knowledge transfer. The effectiveness of KAN is demonstrated through extensive experiments.
【4】 AutoTransfer: Subject Transfer Learning with Censored Representations on Biosignals Data 标题:AutoTransfer:生物信号数据删失表征下的主题迁移学习 链接:https://arxiv.org/abs/2112.09796
作者:Niklas Smedemark-Margulies,Ye Wang,Toshiaki Koike-Akino,Deniz Erdogmus 摘要:我们为主题迁移学习提供了一个正则化框架,其中我们试图训练编码器和分类器,以最小化分类损失,并对潜在表示和主题标签之间的独立性进行惩罚。我们介绍了独立性的三个概念和相应的惩罚条款,使用互信息或分歧作为独立性的代理。对于每个惩罚项,我们使用分析方法和神经批评函数提供了几种具体的估计算法。我们提供了一种不干涉策略,用于将这种不同的正则化算法家族应用于一个新的数据集,我们称之为“自动转移”。我们在EEG、EMG和ECoG数据集上评估了这些个体正则化策略和我们的自转移方法的性能,表明这些方法可以改善具有挑战性的真实数据集的主题转移学习。 摘要:We provide a regularization framework for subject transfer learning in which we seek to train an encoder and classifier to minimize classification loss, subject to a penalty measuring independence between the latent representation and the subject label. We introduce three notions of independence and corresponding penalty terms using mutual information or divergence as a proxy for independence. For each penalty term, we provide several concrete estimation algorithms, using analytic methods as well as neural critic functions. We provide a hands-off strategy for applying this diverse family of regularization algorithms to a new dataset, which we call "AutoTransfer". We evaluate the performance of these individual regularization strategies and our AutoTransfer method on EEG, EMG, and ECoG datasets, showing that these approaches can improve subject transfer learning for challenging real-world datasets.
【5】 Coded Consensus Monte Carlo: Robust One-Shot Distributed Bayesian Learning with Stragglers 标题:编码共识蒙特卡罗:带掉队的鲁棒单次分布式贝叶斯学习 链接:https://arxiv.org/abs/2112.09794
作者:Hari Hara Suthan Chittoor,Osvaldo Simeone 机构: and Information Processing (KCLIP) lab at the Department of Engineering of Kings CollegeLondon 备注:submitted 摘要:这封信研究了分布式贝叶斯学习在一个包含中央服务器和多个工人的环境中的问题,重点是减轻掉队者的影响。通过提出两种基于分组和编码的散乱弹性解决方案,推广了标准的一次性或令人尴尬的并行贝叶斯学习协议,称为一致性蒙特卡罗(CMC)。所提出的方法称为基于组的CMC(G-CMC)和编码的CMC(C-CMC),利用工作人员的冗余计算,以便能够基于工作人员的部分输出在服务器上估计全局后验样本。模拟结果表明,C-CMC可能在少数工人中优于G-GCMC,而G-CMC通常更适合于更多工人。 摘要:This letter studies distributed Bayesian learning in a setting encompassing a central server and multiple workers by focusing on the problem of mitigating the impact of stragglers. The standard one-shot, or embarrassingly parallel, Bayesian learning protocol known as consensus Monte Carlo (CMC) is generalized by proposing two straggler-resilient solutions based on grouping and coding. The proposed methods, referred to as Group-based CMC (G-CMC) and Coded CMC (C-CMC), leverage redundant computing at the workers in order to enable the estimation of global posterior samples at the server based on partial outputs from the workers. Simulation results show that C-CMC may outperform G-GCMC for a small number of workers, while G-CMC is generally preferable for a larger number of workers.
强化学习(6篇)
【1】 Optimization for Master-UAV-powered Auxiliary-Aerial-IRS-assisted IoT Networks: An Option-based Multi-agent Hierarchical Deep Reinforcement Learning Approach 标题:主-无人机辅助-空中-IRS辅助物联网优化:一种基于选项的多智能体分层深度强化学习方法 链接:https://arxiv.org/abs/2112.10630
作者:Jingren Xu,Xin Kang,Ronghaixiang Zhang,Ying-Chang Liang,Sumei Sun 摘要:本文研究了主无人机(MUAV)驱动的物联网(IoT)网络,其中我们建议使用配备智能反射面(IRS)的可充电辅助无人机(AUAV)来增强MUAV的通信信号,并利用MUAV作为充电电源。在该模型下,我们研究了这些能量有限的无人机的最优协作策略,以最大化物联网的累积吞吐量。根据两个无人机之间是否存在充电,提出了两个优化问题。为了解决这一问题,提出了两种多智能体深度强化学习(DRL)方法,即集中训练多智能体深度确定性策略梯度(CT-MADDPG)和多智能体深度确定性策略选项批评家(MADDPOC)。结果表明,CT-MADDPG可以大大降低对无人机硬件计算能力的要求,所提出的MADDPOC能够支持连续动作域中的低水平多智能体协作学习,与现有的基于选项的分层DRL相比,它具有很大的优势,仅支持单代理学习和离散动作。 摘要:This paper investigates a master unmanned aerial vehicle (MUAV)-powered Internet of Things (IoT) network, in which we propose using a rechargeable auxiliary UAV (AUAV) equipped with an intelligent reflecting surface (IRS) to enhance the communication signals from the MUAV and also leverage the MUAV as a recharging power source. Under the proposed model, we investigate the optimal collaboration strategy of these energy-limited UAVs to maximize the accumulated throughput of the IoT network. Depending on whether there is charging between the two UAVs, two optimization problems are formulated. To solve them, two multi-agent deep reinforcement learning (DRL) approaches are proposed, which are centralized training multi-agent deep deterministic policy gradient (CT-MADDPG) and multi-agent deep deterministic policy option critic (MADDPOC). It is shown that the CT-MADDPG can greatly reduce the requirement on the computing capability of the UAV hardware, and the proposed MADDPOC is able to support low-level multi-agent cooperative learning in the continuous action domains, which has great advantages over the existing option-based hierarchical DRL that only support single-agent learning and discrete actions.
【2】 Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation 标题:水上航海安全深度强化学习的标杆研究 链接:https://arxiv.org/abs/2112.10593
作者:Enrico Marchesini,Davide Corsi,Alessandro Farinelli 机构: This aspect is par-ticularly important in our scenario where the unpredictablemovements of the water significantly increase the complexityequal contributionAuthors are with the Department of Computer Science, University ofVerona 备注:6 pages, 5 figures, 1 table. Accepted at IROS 2021 摘要:我们提出了一种新的基于水上导航的安全强化学习基准环境。由于非平稳环境和机器人平台的不确定性,水上导航是一项极具挑战性的任务,因此,通过分析训练网络的行为以避免危险情况(例如碰撞),考虑问题的安全方面是至关重要的。为此,我们考虑了一种基于价值和策略梯度的深度强化学习(DRL),并提出了一种基于交叉的策略,它结合了基于梯度和梯度的DRL来提高样本效率。此外,我们提出了一种基于区间分析的验证策略,用于检查训练模型在一组期望属性上的行为。我们的结果表明,基于交叉的训练优于先前的DRL方法,而我们的验证允许我们量化违反属性描述的行为的配置数量。至关重要的是,这将作为未来在这一应用领域研究的基准。 摘要:We propose a novel benchmark environment for Safe Reinforcement Learning focusing on aquatic navigation. Aquatic navigation is an extremely challenging task due to the non-stationary environment and the uncertainties of the robotic platform, hence it is crucial to consider the safety aspect of the problem, by analyzing the behavior of the trained network to avoid dangerous situations (e.g., collisions). To this end, we consider a value-based and policy-gradient Deep Reinforcement Learning (DRL) and we propose a crossover-based strategy that combines gradient-based and gradient-free DRL to improve sample-efficiency. Moreover, we propose a verification strategy based on interval analysis that checks the behavior of the trained models over a set of desired properties. Our results show that the crossover-based training outperforms prior DRL approaches, while our verification allows us to quantify the number of configurations that violate the behaviors that are described by the properties. Crucially, this will serve as a benchmark for future research in this domain of applications.
【3】 Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic 标题:基于保守模型的行动者-批评者样本有效强化学习 链接:https://arxiv.org/abs/2112.10504
作者:Zhihai Wang,Jie Wang,Qi Zhou,Bin Li,Houqiang Li 机构:CAS Key Laboratory of Technology in GIPAS, University of Science and Technology of China, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center 备注:Accepted to AAAI22 摘要:基于模型的强化学习算法旨在学习环境模型以做出决策,与无模型的算法相比,该算法具有更高的样本效率。基于模型的方法的样本效率取决于模型能否很好地逼近环境。然而,学习一个精确的模型是具有挑战性的,特别是在复杂和嘈杂的环境中。为了解决这个问题,我们提出了保守的基于模型的演员评论家(CMBAC),这是一种新的方法,在不依赖精确学习模型的情况下实现了高样本效率。具体而言,CMBAC从一组不准确的模型中学习Q值函数的多个估计值,并使用底部k估计值的平均值(保守估计值)来优化策略。CMBAC的一个吸引人的特点是,保守的估计有效地鼓励代理避免不可靠的“有希望的行动”——其价值仅在一小部分模型中较高。实验表明,CMBAC在多个具有挑战性的任务上的采样效率明显优于现有的方法,并且该方法在噪声环境下比以前的方法更具鲁棒性。 摘要:Model-based reinforcement learning algorithms, which aim to learn a model of the environment to make decisions, are more sample efficient than their model-free counterparts. The sample efficiency of model-based approaches relies on whether the model can well approximate the environment. However, learning an accurate model is challenging, especially in complex and noisy environments. To tackle this problem, we propose the conservative model-based actor-critic (CMBAC), a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models. Specifically, CMBAC learns multiple estimates of the Q-value function from a set of inaccurate models and uses the average of the bottom-k estimates -- a conservative estimate -- to optimize the policy. An appealing feature of CMBAC is that the conservative estimates effectively encourage the agent to avoid unreliable "promising actions" -- whose values are high in only a small fraction of the models. Experiments demonstrate that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks, and the proposed method is more robust than previous methods in noisy environments.
【4】 Exploration-exploitation trade-off for continuous-time episodic reinforcement learning with linear-convex models 标题:基于线性-凸模型的连续时间分段强化学习的探索开发权衡 链接:https://arxiv.org/abs/2112.10264
作者:Lukasz Szpruch,Tanut Treetanthiploet,Yufei Zhang 机构:uk†Alan Turing Institute, uk‡Department of Statistics 摘要:我们开发了一个概率框架,用于分析情景环境中基于模型的强化学习。然后,我们将其应用于研究具有线性动力学但系数未知和凸但可能不规则目标函数的有限时域随机控制问题。利用概率表示法,我们研究了相关成本函数的规律性,并建立了应用由估计模型参数和真实模型参数导出的最优反馈控制之间性能差距的精确估计。我们确定了该性能差距为二次的条件,改善了最近工作中的线性性能差距[X.Guo,A.Hu和Y.Zhang,arXiv预印本,arXiv:2104.09311,(2021)],这与随机线性二次问题的结果相匹配。接下来,我们提出了一种基于阶段的学习算法,展示了如何在高概率和高期望下优化勘探开发权衡并实现次线性遗憾。当二次性能差距所需的假设成立时,该算法在一般情况下实现了$\mathcal{O}(\sqrt{N}\ln N)阶高概率遗憾,在自我探索情况下实现了$\mathcal{O}((\ln N)^2)阶预期遗憾,超过$N$集,与文献中的最佳可能结果相匹配。该分析需要我们推导的相关连续时间观测的新浓度不等式。 摘要:We develop a probabilistic framework for analysing model-based reinforcement learning in the episodic setting. We then apply it to study finite-time horizon stochastic control problems with linear dynamics but unknown coefficients and convex, but possibly irregular, objective function. Using probabilistic representations, we study regularity of the associated cost functions and establish precise estimates for the performance gap between applying optimal feedback control derived from estimated and true model parameters. We identify conditions under which this performance gap is quadratic, improving the linear performance gap in recent work [X. Guo, A. Hu, and Y. Zhang, arXiv preprint, arXiv:2104.09311, (2021)], which matches the results obtained for stochastic linear-quadratic problems. Next, we propose a phase-based learning algorithm for which we show how to optimise exploration-exploitation trade-off and achieve sublinear regrets in high probability and expectation. When assumptions needed for the quadratic performance gap hold, the algorithm achieves an order $\mathcal{O}(\sqrt{N} \ln N)$ high probability regret, in the general case, and an order $\mathcal{O}((\ln N)^2)$ expected regret, in self-exploration case, over $N$ episodes, matching the best possible results from the literature. The analysis requires novel concentration inequalities for correlated continuous-time observations, which we derive.
【5】 Creativity of AI: Automatic Symbolic Option Discovery for Facilitating Deep Reinforcement Learning 标题:人工智能的创造性:促进深度强化学习的自动符号选项发现 链接:https://arxiv.org/abs/2112.09836
作者:Mu Jin,Zhihao Ma,Kebing Jin,Hankz Hankui Zhuo,Chen Chen,Chao Yu 机构: School of Computer Science and Engineering, Sun Yat-Sen University, Huawei Noah’s Ark Lab 摘要:尽管在现实生活中取得了巨大的成功,深度强化学习(DRL)仍然面临着三个关键问题,即数据效率、缺乏可解释性和可转移性。最近的研究表明,将符号知识嵌入DRL在解决这些挑战方面是有希望的。受此启发,我们引入了一个具有符号选项的新型深度强化学习框架。该框架具有一个循环训练过程,该过程能够通过使用从交互轨迹自动学习的行动模型和符号选项进行规划来指导策略的改进。学习的符号选项减轻了专家领域知识的密集需求,并提供了策略的内在解释性。此外,通过使用行动模型进行规划,可以进一步提高可转移性和数据效率。为了验证该框架的有效性,我们分别在Montezuma的复仇和Office World两个领域进行了实验。结果表明,性能相当,提高了数据效率、可解释性和可转移性。 摘要:Despite of achieving great success in real life, Deep Reinforcement Learning (DRL) is still suffering from three critical issues, which are data efficiency, lack of the interpretability and transferability. Recent research shows that embedding symbolic knowledge into DRL is promising in addressing those challenges. Inspired by this, we introduce a novel deep reinforcement learning framework with symbolic options. This framework features a loop training procedure, which enables guiding the improvement of policy by planning with action models and symbolic options learned from interactive trajectories automatically. The learned symbolic options alleviate the dense requirement of expert domain knowledge and provide inherent interpretability of policies. Moreover, the transferability and data efficiency can be further improved by planning with the action models. To validate the effectiveness of this framework, we conduct experiments on two domains, Montezuma's Revenge and Office World, respectively. The results demonstrate the comparable performance, improved data efficiency, interpretability and transferability.
【6】 Quantum Algorithms for Reinforcement Learning with a Generative Model 标题:基于产生式模型的强化学习量子算法 链接:https://arxiv.org/abs/2112.08451
作者:Daochen Wang,Aarthi Sundaram,Robin Kothari,Ashish Kapoor,Martin Roetteler 机构:University of Maryland 备注:None 摘要:强化学习研究代理如何与环境交互以最大化其累积回报。抽象地研究这个问题的一个标准方法是询问一个代理需要从环境中获取多少样本来学习$\gamma$折扣马尔可夫决策过程(MDP)的最优策略。对于这样的MDP,我们设计了近似于最优策略($\pi^*$)、最优值函数($v^*$)和最优$Q$-函数($Q^*$)的量子算法,假设这些算法能够以量子叠加的方式访问环境中的样本。只要存在环境模拟器,这种假设是合理的;例如,如果环境是视频游戏或其他程序。我们的量子算法受到值迭代的启发,在近似精度($\epsilon$)和MDP的两个主要参数(有效时间范围($\frac{1}{1-\gamma}$)和动作空间大小($)方面,在最佳可能的经典样本复杂度上实现了二次加速。此外,通过证明匹配的量子下界,我们证明了计算$q^*$的量子算法是最优的。 摘要:Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an optimal policy for a $\gamma$-discounted Markov decision process (MDP). For such an MDP, we design quantum algorithms that approximate an optimal policy ($\pi^*$), the optimal value function ($v^*$), and the optimal $Q$-function ($q^*$), assuming the algorithms can access samples from the environment in quantum superposition. This assumption is justified whenever there exists a simulator for the environment; for example, if the environment is a video game or some other program. Our quantum algorithms, inspired by value iteration, achieve quadratic speedups over the best-possible classical sample complexities in the approximation accuracy ($\epsilon$) and two main parameters of the MDP: the effective time horizon ($\frac{1}{1-\gamma}$) and the size of the action space ($A$). Moreover, we show that our quantum algorithm for computing $q^*$ is optimal by proving a matching quantum lower bound.
医学相关(2篇)
【1】 Equilibrated Zeroth-Order Unrolled Deep Networks for Accelerated MRI 标题:用于加速MRI的平衡零阶展开深度网络 链接:https://arxiv.org/abs/2112.09891
作者:Zhuo-Xu Cui,Jing Cheng,Qinyong Zhu,Yuanyuan Liu,Sen Jia,Kankan Zhao,Ziwen Ke,Wenqi Huang,Haifeng Wang,Yanjie Zhu,Dong Liang 机构:SIAT, National Innovation Center for Advanced Medical Devices, Shanghai Jiaotong University, Technical University of Munich, Corresponding author: 备注:11 figures 摘要:最近,模型驱动的深度学习通过将正则化器的一阶信息(即(次)梯度或近端算子)替换为网络模块,将正则化模型的某个迭代算法展开为级联网络,与普通的数据驱动网络相比,该网络模块更易于解释和预测。相反,在理论上,不一定存在这样的函数正则化器,其一阶信息与替换的网络模块匹配,这意味着网络输出可能不被原始正则化模型覆盖。此外,到目前为止,还没有理论能够保证在现实假设下展开网络的全局收敛性和鲁棒性(正则性)。为了弥补这一差距,本文提出了一种安全的网络展开方法。具体来说,针对加速MRI,我们展开了一个零阶算法,其中网络模块表示正则化器本身,因此网络输出仍然可以被正则化模型覆盖。此外,受深度均衡模型的启发,在反向传播之前,我们通过展开迭代网络收敛到一个不动点来保证收敛性。在测量数据含有噪声的情况下,我们证明了该网络对噪声干扰具有鲁棒性。最后,数值实验表明,该网络的性能始终优于最新的MRI重建方法,包括传统的正则化方法和其他深度学习方法。 摘要:Recently, model-driven deep learning unrolls a certain iterative algorithm of a regularization model into a cascade network by replacing the first-order information (i.e., (sub)gradient or proximal operator) of the regularizer with a network module, which appears more explainable and predictable compared to common data-driven networks. Conversely, in theory, there is not necessarily such a functional regularizer whose first-order information matches the replaced network module, which means the network output may not be covered by the original regularization model. Moreover, up to now, there is also no theory to guarantee the global convergence and robustness (regularity) of unrolled networks under realistic assumptions. To bridge this gap, this paper propose to present a safeguarded methodology on network unrolling. Specifically, focusing on accelerated MRI, we unroll a zeroth-order algorithm, of which the network module represents the regularizer itself, so that the network output can be still covered by the regularization model. Furthermore, inspired by the ideal of deep equilibrium models, before backpropagating, we carry out the unrolled iterative network to converge to a fixed point to ensure the convergence. In case the measurement data contains noise, we prove that the proposed network is robust against noisy interference. Finally, numerical experiments show that the proposed network consistently outperforms the state-of-the-art MRI reconstruction methods including traditional regularization methods and other deep learning methods.
【2】 Cross-Domain Federated Learning in Medical Imaging 标题:医学影像学中的跨域联合学习 链接:https://arxiv.org/abs/2112.10001
作者:Vishwa S Parekh,Shuhao Lai,Vladimir Braverman,Jeff Leal,Steven Rowe,Jay J Pillai,Michael A Jacobs 机构:Jay J. Pillai , Department of Computer Science, The Johns Hopkins University, Baltimore, MD , The Russell H. Morgan Department of Radiology and Radiological Sciences, The Johns Hopkins, University School of Medicine, Baltimore, MD , USA 备注:Under Review for MIDL 2022 摘要:在医学成像领域,联邦学习正日益得到探索,以在分布于不同数据中心的大规模数据集上训练深度学习模型,同时避免传输敏感患者信息,从而保护隐私。在这篇手稿中,我们探讨了多领域、多任务环境中的联合学习,其中不同的参与节点可能包含来自不同领域的数据集,并经过训练以解决不同的任务。我们评估了跨域联邦学习在两种不同实验环境下的目标检测和分割任务:多模态和多器官。我们在跨域联邦学习框架上的实验结果非常令人鼓舞,器官定位的重叠相似度为0.79,病变分割的重叠相似度为0.65。我们的结果证明了联邦学习在开发多领域、多任务深度学习模型方面的潜力,而无需共享来自不同领域的数据。 摘要:Federated learning is increasingly being explored in the field of medical imaging to train deep learning models on large scale datasets distributed across different data centers while preserving privacy by avoiding the need to transfer sensitive patient information. In this manuscript, we explore federated learning in a multi-domain, multi-task setting wherein different participating nodes may contain datasets sourced from different domains and are trained to solve different tasks. We evaluated cross-domain federated learning for the tasks of object detection and segmentation across two different experimental settings: multi-modal and multi-organ. The result from our experiments on cross-domain federated learning framework were very encouraging with an overlap similarity of 0.79 for organ localization and 0.65 for lesion segmentation. Our results demonstrate the potential of federated learning in developing multi-domain, multi-task deep learning models without sharing data from different domains.
推荐(1篇)
【1】 Context-Based Music Recommendation Algorithm Evaluation 标题:基于上下文的音乐推荐算法评测 链接:https://arxiv.org/abs/2112.10612
作者:Marissa Baxter,Lisa Ha,Kirill Perfiliev,Natalie Sayre 机构: Professor at the University of Technology in Australia 摘要:人工智能(AI)已经非常成功地根据在线用户的数据创建和预测音乐播放列表;从用户那里接收到的数据使用该应用程序进行体验,例如搜索他们喜欢的歌曲。由于Spotify、Pandora等音乐平台所有者之间的竞争,AI目前有很多技术进步。在这篇论文中,6种机器学习算法及其预测用户是否喜欢歌曲的个体准确性在3个不同的平台上进行了探索,包括Weka、SKLearn和Orange。探索的算法包括逻辑回归、朴素贝叶斯、序列最小优化(SMO)、多层感知器(神经网络)、最近邻和随机森林。通过分析Spotify API提供的每首歌曲的具体特征[1],Random Forest是预测用户是否喜欢歌曲的最成功的算法,准确率为84%。这比Mungekar使用随机森林技术发现的82.72%的准确率要高,并且歌曲的特征略有不同[2]。Mungekars Random Forest算法中的特征更多地关注艺术家和流行度,而不是歌曲的声音特征。去除流行性因素,将注意力完全放在音质上,可以提高推荐的准确性。最后,本文展示了歌曲预测是如何在没有任何货币投资的情况下完成的,并由此启发了一个想法,即通过全面的金融研究可以实现多么惊人的结果。 摘要:Artificial Intelligence (AI ) has been very successful in creating and predicting music playlists for online users based on their data; data received from users experience using the app such as searching the songs they like. There are lots of current technological advancements in AI due to the competition between music platform owners such as Spotify, Pandora, and more. In this paper, 6 machine learning algorithms and their individual accuracy for predicting whether a user will like a song are explored across 3 different platforms including Weka, SKLearn, and Orange. The algorithms explored include Logistic Regression, Naive Bayes, Sequential Minimal Optimization (SMO), Multilayer Perceptron (Neural Network), Nearest Neighbor, and Random Forest. With the analysis of the specific characteristics of each song provided by the Spotify API [1], Random Forest is the most successful algorithm for predicting whether a user will like a song with an accuracy of 84%. This is higher than the accuracy of 82.72% found by Mungekar using the Random Forest technique and slightly different characteristics of a song [2]. The characteristics in Mungekars Random Forest algorithm focus more on the artist and popularity rather than the sonic features of the songs. Removing the popularity aspect and focusing purely on the sonic qualities improve the accuracy of recommendations. Finally, this paper shows how song prediction can be accomplished without any monetary investments, and thus, inspires an idea of what amazing results can be accomplished with full financial research.
聚类(3篇)
【1】 TECM: Transfer Evidential C-means Clustering 标题:TECM:转移证据C-均值聚类 链接:https://arxiv.org/abs/2112.10152
作者:Lianmeng Jiao,Feng Wang,Zhun-ga Liu,Quan Pan 机构: Pan are with the School ofAutomation, Northwestern Polytechnical University 摘要:聚类广泛应用于文本分析、自然语言处理、图像分割等数据挖掘领域。作为一种很有前途的聚类算法,证据c-均值(ECM)允许一个对象属于若干类的子集,从而扩展了硬聚类、模糊聚类和可能性聚类,从而对数据提供了更深入的了解。然而,由于它需要估计比其他经典的基于分区的算法多得多的参数,因此它只有在可用数据充足且质量良好的情况下才能很好地工作。为了克服这些缺点,本文通过引入迁移学习策略,提出了一种迁移证据c-均值(TECM)算法。在ECM目标函数的基础上,在源域引入重心,得到TECM的目标函数,并采用迭代优化策略求解目标函数。此外,TECM可以适应源域和目标域中集群数量不同的情况。该算法已在合成数据集和真实数据集上得到验证。实验结果表明,与原ECM以及其他具有代表性的多任务或转移聚类算法相比,TECM是有效的。 摘要:Clustering is widely used in text analysis, natural language processing, image segmentation, and other data mining fields. As a promising clustering algorithm, the evidential c-means (ECM) can provide a deeper insight on the data by allowing an object to belong to several subsets of classes, which extends those of hard, fuzzy, and possibilistic clustering. However, as it needs to estimate much more parameters than the other classical partition-based algorithms, it only works well when the available data is sufficient and of good quality. In order to overcome these shortcomings, this paper proposes a transfer evidential c-means (TECM) algorithm, by introducing the strategy of transfer learning. The objective function of TECM is obtained by introducing barycenters in the source domain on the basis of the objective function of ECM, and the iterative optimization strategy is used to solve the objective function. In addition, the TECM can adapt to situation where the number of clusters in the source domain and the target domain is different. The proposed algorithm has been validated on synthetic and real-world datasets. Experimental results demonstrate the effectiveness of TECM in comparison with the original ECM as well as other representative multitask or transfer clustering algorithms.
【2】 An iterative clustering algorithm for the Contextual Stochastic Block Model with optimality guarantees 标题:具有最优性保证的上下文随机挡路模型的迭代聚类算法 链接:https://arxiv.org/abs/2112.10467
作者:Guillaume Braun,Hemant Tyagi,Christophe Biernacki 摘要:现实世界中的网络通常附带辅助信息,这些信息有助于提高网络分析任务(如群集)的性能。尽管在过去十年中,对网络聚类方法进行了大量的实证和理论研究,但对边信息的附加值以及用于将其最佳地纳入聚类算法的方法的了解相对较少。我们提出了一种新的迭代算法来对带有节点边信息(以协变量的形式)的网络进行聚类,并证明了我们的算法在上下文对称随机块模型下是最优的。与先前提出的方法相比,我们的算法可以应用于一般上下文随机块模型,并且避免了超参数调整。我们在合成数据实验中证实了我们的理论结果,其中我们的算法明显优于其他方法,并且表明它也可以应用于有符号图。最后,我们在实际数据上展示了我们的方法的实际意义。 摘要:Real-world networks often come with side information that can help to improve the performance of network analysis tasks such as clustering. Despite a large number of empirical and theoretical studies conducted on network clustering methods during the past decade, the added value of side information and the methods used to incorporate it optimally in clustering algorithms are relatively less understood. We propose a new iterative algorithm to cluster networks with side information for nodes (in the form of covariates) and show that our algorithm is optimal under the Contextual Symmetric Stochastic Block Model. Our algorithm can be applied to general Contextual Stochastic Block Models and avoids hyperparameter tuning in contrast to previously proposed methods. We confirm our theoretical results on synthetic data experiments where our algorithm significantly outperforms other methods, and show that it can also be applied to signed graphs. Finally we demonstrate the practical interest of our method on real data.
【3】 Model-based Clustering with Missing Not At Random Data 标题:基于模型的缺失非随机数据聚类 链接:https://arxiv.org/abs/2112.10425
作者:Aude Sportisse,Christophe Biernacki,Claire Boyer,Julie Josse,Matthieu Marbac Lourdelle,Gilles Celeux,Fabien Laporte 机构:Inria Sophia Antipolis, Université Côte d’Azur,IA Côte d’Azur, Sophia Antipolis, Inria Lille, Université Lille , CNRS, Lille, Sorbonne Université, Paris, Inria Sophia Antipolis, Montpellier, IDESP, Université Rennes, Ensai, CNRS, CREST -UMR , Rennes 摘要:近几十年来,技术进步使得收集大型数据集成为可能。在这种情况下,基于模型的聚类是一种非常流行、灵活和可解释的方法,用于在定义良好的统计框架中进行数据探索。大型数据集增加的一个讽刺之处是,缺少值的情况更加频繁。然而,传统的方法(如丢弃缺失值的观察值或插补方法)并不用于聚类目的。此外,尽管在实践中经常出现缺失非随机(MNAR)值的情况,但它们很少适用于一般情况,即缺失程度取决于未观测到的数据值,也可能取决于观测到的数据值。本文的目标是提出一种将MNAR数据直接嵌入到基于模型的聚类算法中的新方法。我们引入了一个数据和缺失数据指标联合分布的选择模型。它对应于数据分布的混合模型和缺失数据机制的一般MNAR模型,这可能取决于基础类(未知)和/或缺失变量本身的值。导出了大量有意义的MNAR子模型,并研究了每个子模型的参数可辨识性,这通常是任何MNAR方案的关键问题。估计中考虑了EM和随机EM算法。最后,我们在合成数据上对所提出的子模型进行了实证评估,并在医学登记簿TraumaBase(R)数据集上说明了我们的方法的相关性。 摘要:In recent decades, technological advances have made it possible to collect large data sets. In this context, the model-based clustering is a very popular, flexible and interpretable methodology for data exploration in a well-defined statistical framework. One of the ironies of the increase of large datasets is that missing values are more frequent. However, traditional ways (as discarding observations with missing values or imputation methods) are not designed for the clustering purpose. In addition, they rarely apply to the general case, though frequent in practice, of Missing Not At Random (MNAR) values, i.e. when the missingness depends on the unobserved data values and possibly on the observed data values. The goal of this paper is to propose a novel approach by embedding MNAR data directly within model-based clustering algorithms. We introduce a selection model for the joint distribution of data and missing-data indicator. It corresponds to a mixture model for the data distribution and a general MNAR model for the missing-data mechanism, which may depend on the underlying classes (unknown) and/or the values of the missing variables themselves. A large set of meaningful MNAR sub-models is derived and the identifiability of the parameters is studied for each of the sub-models, which is usually a key issue for any MNAR proposals. The EM and Stochastic EM algorithms are considered for estimation. Finally, we perform empirical evaluations for the proposed submodels on synthetic data and we illustrate the relevance of our method on a medical register, the TraumaBase (R) dataset.
超分辨率|去噪|去模糊|去雾(2篇)
【1】 Heavy-tailed denoising score matching 标题:重尾去噪得分匹配 链接:https://arxiv.org/abs/2112.09788
作者:Jacob Deasy,Nikola Simidjievski,Pietro Liò 机构: 1Department of Computer Science and Technology, Universityof Cambridge 摘要:在过去几年中,基于分数的模型研究已经通过高斯去噪分数匹配(DSM)产生了最先进的生成模型。然而,高斯噪声假设有几个高维限制,促使未来更具体地进行更高维的PDF估计。在将该理论推广到更广泛的噪声分布族——即广义正态分布之前,我们概述了这一局限性。为了从理论上证明这一点,我们放松了(去噪)分数匹配理论中的一个关键假设,证明了几乎处处可微的分布允许与高斯分布相同的客观简化。对于噪声向量长度分布,我们证明了在深度学习中普遍存在的高维空间中,度量的有利集中。在此过程中,我们发现了一个扭曲的噪声向量长度分布,并开发了一个迭代噪声缩放算法,以一致地初始化退火Langevin动力学中的多个噪声级别。在实践方面,我们使用重尾DSM改进了分数估计、可控的采样收敛性,以及对不平衡数据集更平衡的无条件生成性能。 摘要:Score-based model research in the last few years has produced state of the art generative models by employing Gaussian denoising score-matching (DSM). However, the Gaussian noise assumption has several high-dimensional limitations, motivating a more concrete route toward even higher dimension PDF estimation in future. We outline this limitation, before extending the theory to a broader family of noising distributions -- namely, the generalised normal distribution. To theoretically ground this, we relax a key assumption in (denoising) score matching theory, demonstrating that distributions which are differentiable \textit{almost everywhere} permit the same objective simplification as Gaussians. For noise vector length distributions, we demonstrate favourable concentration of measure in the high-dimensional spaces prevalent in deep learning. In the process, we uncover a skewed noise vector length distribution and develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in annealed Langevin dynamics. On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
【2】 A-ESRGAN: Training Real-World Blind Super-Resolution with Attention U-Net Discriminators 标题:A-ESRGAN:训练带注意U网鉴别器的真实世界盲超分辨率 链接:https://arxiv.org/abs/2112.10046
作者:Zihao Wei,Yidong Huang,Yuang Chen,Chenhao Zheng,Jinnan Gao 机构: Department of Computer Science Engineering, University of Michigan, Ann Arbor, USA, UM-SJTU Joint Institute, Shanghai Jiao Tong University, Shanghai, China 备注:6 pages, 9 figures 摘要:盲图像超分辨率(SR)是CV中的一项长期任务,旨在恢复遭受未知和复杂失真的低分辨率图像。最近的工作主要集中在采用更复杂的退化模型来模拟真实世界的退化。由此产生的模型在感知损失方面取得了突破,并产生了令人信服的感知结果。然而,当前生成性对抗性网络结构带来的限制仍然很明显:平等对待像素会导致忽略图像的结构特征,并导致性能缺陷,如扭曲线和背景过度锐化或模糊。在本文中,我们提出了一种用于盲SR任务的GAN模型A-ESRGAN,该模型具有一个基于注意U网络的多尺度鉴别器,可以与其他生成器无缝集成。据我们所知,这是首次引入注意U网络结构作为GAN的鉴别器来解决盲SR问题。本文还解释了多尺度注意U-Net背后的机制,它给模型带来了性能上的突破。通过与先前工作的对比实验,我们的模型在非参考自然图像质量评估指标上呈现出最先进的性能。我们的消融研究表明,使用我们的鉴别器,基于RRDB的发生器可以在多个尺度上利用图像的结构特征,因此与以前的工作相比,可以产生更逼真的高分辨率图像。 摘要:Blind image super-resolution(SR) is a long-standing task in CV that aims to restore low-resolution images suffering from unknown and complex distortions. Recent work has largely focused on adopting more complicated degradation models to emulate real-world degradations. The resulting models have made breakthroughs in perceptual loss and yield perceptually convincing results. However, the limitation brought by current generative adversarial network structures is still significant: treating pixels equally leads to the ignorance of the image's structural features, and results in performance drawbacks such as twisted lines and background over-sharpening or blurring. In this paper, we present A-ESRGAN, a GAN model for blind SR tasks featuring an attention U-Net based, multi-scale discriminator that can be seamlessly integrated with other generators. To our knowledge, this is the first work to introduce attention U-Net structure as the discriminator of GAN to solve blind SR problems. And the paper also gives an interpretation for the mechanism behind multi-scale attention U-Net that brings performance breakthrough to the model. Through comparison experiments with prior works, our model presents state-of-the-art level performance on the non-reference natural image quality evaluator metric. And our ablation studies have shown that with our discriminator, the RRDB based generator can leverage the structural features of an image in multiple scales, and consequently yields more perceptually realistic high-resolution images compared to prior works.
自动驾驶|车辆|车道检测等(1篇)
【1】 Expression is enough: Improving traffic signal control with advanced traffic state representation 标题:表达就够了:用先进的交通状态表示改善交通信号控制 链接:https://arxiv.org/abs/2112.10107
作者:Liang Zhang,Qiang Wu,Jun Shen,Linyuan Lü,Jianqing Wu,Bo Du 机构: School of Life Sciences, Lanzhou University, Lanzhou , China, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 备注:9 pages, 6 figures 摘要:近年来,寻找交通状态表示的基本性质比复杂的交通信号控制(TSC)算法更为关键。在本文中,我们(1)提出了一种新颖、灵活、直观的方法——高级最大压力法(advanced MP),该方法考虑了运行车辆和排队车辆,以决定是否改变当前相位;(2) 新颖设计先进MP(即先进交通状态(ATS))的有效压力和有效运行车辆的交通运动表示;(3) 通过将ATS与当前RL方法相结合,开发基于RL的算法模板Advanced XLight,并生成两个RL算法,“Advanced MPLight”和“Advanced CoLight”。在多个真实数据集上的综合实验表明:(1)高级MP优于基线方法,部署效率高且可靠;(2) 高级MPLight和高级Collight可以实现新的技术水平。我们的代码在Github上发布。 摘要:Recently, finding fundamental properties for traffic state representation is more critical than complex algorithms for traffic signal control (TSC).In this paper, we (1) present a novel, flexible and straightforward method advanced max pressure (Advanced-MP), taking both running and queueing vehicles into consideration to decide whether to change current phase; (2) novelty design the traffic movement representation with the efficient pressure and effective running vehicles from Advanced-MP, namely advanced traffic state (ATS); (3) develop an RL-based algorithm template Advanced-XLight, by combining ATS with current RL approaches and generate two RL algorithms, "Advanced-MPLight" and "Advanced-CoLight". Comprehensive experiments on multiple real-world datasets show that: (1) the Advanced-MP outperforms baseline methods, which is efficient and reliable for deployment; (2) Advanced-MPLight and Advanced-CoLight could achieve new state-of-the-art. Our code is released on Github.
联邦学习|隐私保护|加密(2篇)
【1】 Semi-Decentralized Federated Edge Learning with Data and Device Heterogeneity 标题:数据和设备异构的半分散联合边缘学习 链接:https://arxiv.org/abs/2112.10313
作者:Yuchang Sun,Jiawei Shao,Yuyi Mao,Jessie Hui Wang,Jun Zhang 机构: Mao is withthe Department of Electronic and Information Engineering, the Hone Kong Polytechnic University 摘要:联邦边缘学习(FEEL)作为一种隐私保护范式,在网络边缘有效地整合分布式数据,用于训练深度学习模型,已经引起了广泛关注。然而,单个边缘服务器的有限覆盖导致参与的客户端节点数量不足,这可能会影响学习性能。在本文中,我们研究了一种新的FEEL框架,即半分散联合边缘学习(SD-FEEL),其中使用多个边缘服务器来共同协调大量客户端节点。通过利用边缘服务器之间的低延迟通信实现高效的模型共享,SD-FEEL可以包含更多的训练数据,同时与传统的联合学习相比,享受更低的延迟。我们详细介绍了SD-FEEL的训练算法,包括局部模型更新、簇内和簇间模型聚合三个主要步骤。在非独立同分布(非IID)数据上证明了该算法的收敛性,这也有助于揭示关键参数对训练效率的影响,并提供了实用的设计指南。同时,边缘器件的异质性也可能导致SD-FEEL的离散效应和收敛速度下降。为了解决这一问题,我们提出了一种基于陈旧感知的SD-FEEL聚合方案的异步训练算法,并对其收敛性能进行了分析。仿真结果证明了所提出的SD-FEEL算法的有效性和效率,并证实了我们的分析。 摘要:Federated edge learning (FEEL) has attracted much attention as a privacy-preserving paradigm to effectively incorporate the distributed data at the network edge for training deep learning models. Nevertheless, the limited coverage of a single edge server results in an insufficient number of participated client nodes, which may impair the learning performance. In this paper, we investigate a novel framework of FEEL, namely semi-decentralized federated edge learning (SD-FEEL), where multiple edge servers are employed to collectively coordinate a large number of client nodes. By exploiting the low-latency communication among edge servers for efficient model sharing, SD-FEEL can incorporate more training data, while enjoying much lower latency compared with conventional federated learning. We detail the training algorithm for SD-FEEL with three main steps, including local model update, intra-cluster, and inter-cluster model aggregations. The convergence of this algorithm is proved on non-independent and identically distributed (non-IID) data, which also helps to reveal the effects of key parameters on the training efficiency and provides practical design guidelines. Meanwhile, the heterogeneity of edge devices may cause the straggler effect and deteriorate the convergence speed of SD-FEEL. To resolve this issue, we propose an asynchronous training algorithm with a staleness-aware aggregation scheme for SD-FEEL, of which, the convergence performance is also analyzed. The simulation results demonstrate the effectiveness and efficiency of the proposed algorithms for SD-FEEL and corroborate our analysis.
【2】 Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better 标题:联合动态稀疏训练:计算更少,交流更少,学习更好 链接:https://arxiv.org/abs/2112.09824
作者:Sameer Bibikar,Haris Vikalo,Zhangyang Wang,Xiaohan Chen 机构:Department of Electrical and Computer Engineering, The University of Texas at Austin 摘要:联邦学习(FL)支持将机器学习工作负载从云分发到资源有限的边缘设备。不幸的是,当前的深度网络不仅计算量太大,无法在边缘设备上进行推理和训练,而且对于在带宽受限的网络上进行更新通信来说也太大。在本文中,我们开发、实现并实验验证了一种称为联邦动态稀疏训练(FedDST)的新型FL框架,通过该框架可以部署和训练复杂的神经网络,大大提高了设备计算和网络通信的效率。FedDST的核心是从目标完整网络中提取和训练稀疏子网络的动态过程。在这个方案中,“一举两得”:每个客户端对自己的稀疏网络进行有效的训练,而不是完整的模型,并且只有稀疏网络在设备和云之间传输。此外,我们的结果表明,与固定的共享稀疏掩码相比,FL训练期间的动态稀疏性更灵活地适应FL代理中的局部异质性。此外,动态稀疏自然地将“及时自我感知效应”引入到训练动态中,并改善FL性能,即使在密集训练中也是如此。在现实且具有挑战性的非i.i.d.FL设置中,FedDST在我们的实验中始终优于竞争算法:例如,在非iid CIFAR-10上的任何固定上传数据上限下,在给定相同上传数据上限时,它比FedAvgM获得了10%的令人印象深刻的准确性优势;即使给FedAvgM 2倍上传数据上限,精度差距仍保持在3%,进一步证明了FedDST的有效性。代码可从以下网址获取:https://github.com/bibikar/feddst. 摘要:Federated learning (FL) enables distribution of machine learning workloads from the cloud to resource-limited edge devices. Unfortunately, current deep networks remain not only too compute-heavy for inference and training on edge devices, but also too large for communicating updates over bandwidth-constrained networks. In this paper, we develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (FedDST) by which complex neural networks can be deployed and trained with substantially improved efficiency in both on-device computation and in-network communication. At the core of FedDST is a dynamic process that extracts and trains sparse sub-networks from the target full network. With this scheme, "two birds are killed with one stone:" instead of full models, each client performs efficient training of its own sparse networks, and only sparse networks are transmitted between devices and the cloud. Furthermore, our results reveal that the dynamic sparsity during FL training more flexibly accommodates local heterogeneity in FL agents than the fixed, shared sparse masks. Moreover, dynamic sparsity naturally introduces an "in-time self-ensembling effect" into the training dynamics and improves the FL performance even over dense training. In a realistic and challenging non i.i.d. FL setting, FedDST consistently outperforms competing algorithms in our experiments: for instance, at any fixed upload data cap on non-iid CIFAR-10, it gains an impressive accuracy advantage of 10% over FedAvgM when given the same upload data cap; the accuracy gap remains 3% even when FedAvgM is given 2x the upload data cap, further demonstrating efficacy of FedDST. Code is available at: https://github.com/bibikar/feddst.
推理|分析|理解|解释(11篇)
【1】 Exact Shapley Values for Local and Model-True Explanations of Decision Tree Ensembles 标题:决策树集成的局部和模型-真解释的精确Shapley值 链接:https://arxiv.org/abs/2112.10592
作者:Thomas W. Campbell,Heinrich Roder,Robert W. Georgantas III,Joanna Roder 机构:software, writing – original draft, writing – review and editing. Corresponding Author. Address:, Wilderness Place, Suite , Boulder, CO 备注:23 pages, 7 figures 摘要:使用Shapley值的加性特征解释已变得流行,因为它可以提供每个特征对机器学习模型的单个预测的相对重要性的透明度。虽然Shapley值在合作博弈理论中提供了一种独特的加性特征属性,但即使是单个机器学习模型也可以生成的Shapley值远非唯一的,理论和实施决策会影响结果属性。在这里,我们考虑应用Shapley值解释决策树集成,并提出了一种新的方法,基于Shapley值的特征归因,可以应用于随机森林和增强决策树。这种新方法提供的属性准确地反映了单个实例的模型预测算法的细节,同时在计算上与当前使用最广泛的方法之一具有竞争力。我们解释了标准方法和新方法之间的理论差异,并使用合成数据和真实数据比较了它们的性能。 摘要:Additive feature explanations using Shapley values have become popular for providing transparency into the relative importance of each feature to an individual prediction of a machine learning model. While Shapley values provide a unique additive feature attribution in cooperative game theory, the Shapley values that can be generated for even a single machine learning model are far from unique, with theoretical and implementational decisions affecting the resulting attributions. Here, we consider the application of Shapley values for explaining decision tree ensembles and present a novel approach to Shapley value-based feature attribution that can be applied to random forests and boosted decision trees. This new method provides attributions that accurately reflect details of the model prediction algorithm for individual instances, while being computationally competitive with one of the most widely used current methods. We explain the theoretical differences between the standard and novel approaches and compare their performance using synthetic and real data.
【2】 Multimodal Adversarially Learned Inference with Factorized Discriminators 标题:基于因子分解鉴别器的多模态对抗性学习推理 链接:https://arxiv.org/abs/2112.10384
作者:Wenxue Chen,Jianke Zhu 机构: Zhejiang University, Alibaba-Zhejiang University Joint Institute of Frontier Technologies 备注:9 pages, 6 figures 摘要:从多模态数据中学习是机器学习的一个重要研究课题,它有可能获得更好的表示。在这项工作中,我们提出了一种基于生成对抗网络的多模态数据生成建模的新方法。为了学习一个相干的多模态生成模型,我们证明有必要将不同的编码器分布与联合解码器分布同时对齐。为此,我们构造了一种特定形式的鉴别器,以使我们的模型能够有效地利用数据,并且可以对数据进行对比训练。通过对鉴别器进行因子分解,利用对比学习,我们在单峰数据上训练了我们的模型。我们在基准数据集上进行了实验,实验结果表明,我们提出的方法在各种指标上都优于最先进的方法。源代码将公开提供。 摘要:Learning from multimodal data is an important research topic in machine learning, which has the potential to obtain better representations. In this work, we propose a novel approach to generative modeling of multimodal data based on generative adversarial networks. To learn a coherent multimodal generative model, we show that it is necessary to align different encoder distributions with the joint decoder distribution simultaneously. To this end, we construct a specific form of the discriminator to enable our model to utilize data efficiently, which can be trained constrastively. By taking advantage of contrastive learning through factorizing the discriminator, we train our model on unimodal data. We have conducted experiments on the benchmark datasets, whose promising results show that our proposed approach outperforms the-state-of-the-art methods on a variety of metrics. The source code will be made publicly available.
【3】 On Causal Inference for Data-free Structured Pruning 标题:关于无数据结构化剪枝的因果推理 链接:https://arxiv.org/abs/2112.10229
作者:Martin Ferianc,Anush Sankaran,Olivier Mastropietro,Ehsan Saboori,Quentin Cappart 机构:Department of Electronic and Electrical Engineering, University College London, London, UK WC,E ,JE, Department of Computer Engineering and Software Engineering, Polytechnique Montr´eal, Montreal, QC, Canada H,T ,J 备注:Accepted to ITCI'22: The AAAI-22 Workshop on Information-Theoretic Methods for Causal Inference and Discovery 摘要:神经网络(NNs)正在对研究和工业产生巨大影响。然而,随着NNs精度的提高,其规模、所需计算操作数和能耗也随之增加。资源消耗的增加导致NNs采用率的降低和实际部署的不切实际。因此,需要对NNs进行压缩,使其可供更广泛的受众使用,同时降低其运行时成本。在这项工作中,我们从因果推理的角度来处理这一挑战,并提出了一种评分机制来促进NNs的结构化修剪。该方法基于在最大熵扰动下测量互信息,通过神经网络顺序传播。我们在两个数据集和不同规模的神经网络上展示了该方法的性能,并且我们证明了我们的方法在具有挑战性的条件下取得了有竞争力的性能。 摘要:Neural networks (NNs) are making a large impact both on research and industry. Nevertheless, as NNs' accuracy increases, it is followed by an expansion in their size, required number of compute operations and energy consumption. Increase in resource consumption results in NNs' reduced adoption rate and real-world deployment impracticality. Therefore, NNs need to be compressed to make them available to a wider audience and at the same time decrease their runtime costs. In this work, we approach this challenge from a causal inference perspective, and we propose a scoring mechanism to facilitate structured pruning of NNs. The approach is based on measuring mutual information under a maximum entropy perturbation, sequentially propagated through the NN. We demonstrate the method's performance on two datasets and various NNs' sizes, and we show that our approach achieves competitive performance under challenging conditions.
【4】 Data-Driven Reachability analysis and Support set Estimation with Christoffel Functions 标题:基于Christoffel函数的数据驱动可达性分析和支持集估计 链接:https://arxiv.org/abs/2112.09995
作者:Alex Devonport,Forest Yang,Laurent El Ghaoui,Murat Arcak 机构:ARCAK† 备注:20 pages, 3 figures. Submitted to the SIAM Journal on Control and Optimization. arXiv admin note: text overlap with arXiv:2104.13902 摘要:我们提出了仅使用有限个独立同分布样本集合估计动力系统前向可达集的算法。产生的估计是一个称为经验逆Christoffel函数的函数的子级集:已知经验逆Christoffel函数提供了支持概率分布的良好近似。除了可达性分析之外,同样的方法也可以应用于估计随机变量支持度的一般问题,这在数据科学中有着检测数据集中的新奇和异常值的应用。在涉及安全的应用中,保证有限数据集的准确性至关重要。在本文中,我们证明了我们的算法在可能近似正确(PAC)框架下的界。除了应用经典的Vapnik-Chervonenkis(VC)维界参数外,我们还通过利用核化经验逆Christoffel函数和高斯过程回归模型之间的形式联系来应用PAC-Bayes定理。基于PAC-Bayes的界适用于比VC维参数更一般的Christoffel函数类,并在实验中获得了更高的样本效率。 摘要:We present algorithms for estimating the forward reachable set of a dynamical system using only a finite collection of independent and identically distributed samples. The produced estimate is the sublevel set of a function called an empirical inverse Christoffel function: empirical inverse Christoffel functions are known to provide good approximations to the support of probability distributions. In addition to reachability analysis, the same approach can be applied to general problems of estimating the support of a random variable, which has applications in data science towards detection of novelties and outliers in data sets. In applications where safety is a concern, having a guarantee of accuracy that holds on finite data sets is critical. In this paper, we prove such bounds for our algorithms under the Probably Approximately Correct (PAC) framework. In addition to applying classical Vapnik-Chervonenkis (VC) dimension bound arguments, we apply the PAC-Bayes theorem by leveraging a formal connection between kernelized empirical inverse Christoffel functions and Gaussian process regression models. The bound based on PAC-Bayes applies to a more general class of Christoffel functions than the VC dimension argument, and achieves greater sample efficiency in experiments.
【5】 Does Explainable Machine Learning Uncover the Black Box in Vision Applications? 标题:可解释机器学习能揭开视觉应用中的黑匣子吗? 链接:https://arxiv.org/abs/2112.09898
作者:Manish Narwaria 机构:Department of Electrical Engineering, Indian Institute of Technology Jodhpur 备注:None 摘要:机器学习(ML)特别是深度学习(DL)已经成为一些视觉应用(如目标检测、超分辨率、分割、目标跟踪等)中非常流行的工具。几乎同时,视觉中ML的可解释性问题(即解释/阐述经过训练的ML模型做出决策的方式的能力)也受到了各个方面相当大的关注。然而,我们认为,可解释ML背后的当前哲学存在某些局限性,由此产生的解释可能无法有意义地揭示黑盒ML模型。为了阐述我们的主张,我们首先提出了一些在相应文献中没有充分讨论的基本问题。我们还提供了ML中的可解释性如何通过依赖相关领域中更严格的原则而受益的观点。 摘要:Machine learning (ML) in general and deep learning (DL) in particular has become an extremely popular tool in several vision applications (like object detection, super resolution, segmentation, object tracking etc.). Almost in parallel, the issue of explainability in ML (i.e. the ability to explain/elaborate the way a trained ML model arrived at its decision) in vision has also received fairly significant attention from various quarters. However, we argue that the current philosophy behind explainable ML suffers from certain limitations, and the resulting explanations may not meaningfully uncover black box ML models. To elaborate our assertion, we first raise a few fundamental questions which have not been adequately discussed in the corresponding literature. We also provide perspectives on how explainablity in ML can benefit by relying on more rigorous principles in the related areas.
【6】 Interpretable Data-Based Explanations for Fairness Debugging 标题:公平性调试的可解释的基于数据的解释 链接:https://arxiv.org/abs/2112.09745
作者:Romila Pradhan,Jiongli Zhu,Boris Glavic,Babak Salimi 机构:Purdue University, West Lafayette, IN, USA, University of California, San Diego, La Jolla, CA, USA, Illinois Institute of Technology, Chicago, IL, USA 备注:Proceedings of the 2022 International Conference on Management of Data. ACM, 2022 摘要:文献中提出了各种各样的公平性度量和可解释人工智能(XAI)方法来识别在关键现实环境中使用的机器学习模型中的偏差。然而,仅仅报告模型的偏差,或者使用现有的XAI技术生成解释,不足以定位并最终缓解偏差的来源。在这项工作中,我们介绍了Gopher,这是一个系统,它通过识别作为偏差或意外模型行为根本原因的训练数据的一致子集,为偏差或意外模型行为提供紧凑、可解释和因果解释。具体而言,我们引入了因果责任的概念,该概念量化了通过删除或更新训练数据子集来干预训练数据能够解决偏差的程度。基于这一概念,我们开发了一种生成top-k模式的有效方法,用于解释模型偏差,该方法利用ML社区的技术来近似因果责任,并使用剪枝规则来管理模式的大搜索空间。我们的实验评估证明了Gopher在识别和调试偏差源时产生可解释解释的有效性。 摘要:A wide variety of fairness metrics and eXplainable Artificial Intelligence (XAI) approaches have been proposed in the literature to identify bias in machine learning models that are used in critical real-life contexts. However, merely reporting on a model's bias, or generating explanations using existing XAI techniques is insufficient to locate and eventually mitigate sources of bias. In this work, we introduce Gopher, a system that produces compact, interpretable, and causal explanations for bias or unexpected model behavior by identifying coherent subsets of the training data that are root-causes for this behavior. Specifically, we introduce the concept of causal responsibility that quantifies the extent to which intervening on training data by removing or updating subsets of it can resolve the bias. Building on this concept, we develop an efficient approach for generating the top-k patterns that explain model bias that utilizes techniques from the ML community to approximate causal responsibility and uses pruning rules to manage the large search space for patterns. Our experimental evaluation demonstrates the effectiveness of Gopher in generating interpretable explanations for identifying and debugging sources of bias.
【7】 Improving Ethical Outcomes with Machine-in-the-Loop: Broadening Human Understanding of Data Annotations 标题:通过机器在环中改善伦理结果:拓宽人类对数据注释的理解 链接:https://arxiv.org/abs/2112.09738
作者:Ashis Kumer Biswas,Geeta Verma,Justin Otto Barber 机构:Computer Science & Engineering, University of Colorado Denver, Colorado, CO , School of Education and Human Development, Denver, CO , Radiology Partners, El Segundo, CA 备注:Accepted and presented at the Human Centered AI workshop at the 35th Conference on Neural Information Processing Systems (NeurIPS), Dec 13th 2021 摘要:我们介绍了一种机器在环管道,旨在解决教育领域基于自然语言的有监督机器学习任务中不必要的偏见的根本原因。从学生的经历中学习是教育研究者和学术管理者的基础。在新的知识经济中,从经验中学到的21世纪技能正在成为大学和职业准备以及招聘过程的核心部分。少数族裔学生在日常生活中展示了这些技能,但记录、评估和验证这些技能对于教育机构来说是一个巨大的问题。作为一个注重公平的在线平台,LivedX将未经认证的学生的生活经历转化为21世纪的技能,颁发微证书,并创建个人21世纪技能组合。为了从学生提交的论文中获取的自然语言文本中自动进行微证书挖掘,我们采用了一个词袋模型来构造一个多输出分类器。尽管有我们的目标,我们的模式最初加剧了对少数民族学生的不同影响。我们使用一个机器在环模型开发管道来解决这个问题,并改进前面提到的模型,以确保其预测的公平性。 摘要:We introduce a machine-in-the-loop pipeline that aims to address root causes of unwanted bias in natural language based supervised machine learning tasks in the education domain. Learning from the experiences of students is foundational for education researchers, and academic administrators. 21st-century skills learned from experience are becoming a core part of college and career readiness as well as the hiring process in the new knowledge economy. Minoritized students demonstrate these skills in their daily lives, but documenting, assessing, and validating these skills is a huge problem for educational institutions. As an equity focused online platform, LivedX translates minoritized students' lived experiences into the 21st century skills, issues micro-credentials, and creates personal 21st century skills portfolio. To automate the micro credential mining from the natural language texts received from the students' submitted essays, we employed a bag-of-word model to construct a multi-output classifier. Despite our goal, our model initially exacerbated disparate impact on minoritized students. We used a machine-in-the-loop model development pipeline to address the problem and refine the aforementioned model to ensure fairness in its prediction.
【8】 Analysis of the HiSCORE Simulated Events in TAIGA Experiment Using Convolutional Neural Networks 标题:用卷积神经网络分析TAIGA实验中的HiSCORE模拟事件 链接:https://arxiv.org/abs/2112.10170
作者:Anna Vlaskina,Alexander Kryukov 机构:M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow, Russian Federation, Department, University, Street number, City, Country Skobeltsyn Institute of Nuclear Physics,(,) Leninskie gory, Moscow 备注:In Proceedings of 5th International Workshop on Deep Learning in Computational Physics (DLCP2021), 28-29 June, 2021, Moscow, Russia. 8 pages, 5 figures, 1 table 摘要:TAIGA是一个混合观测站,用于高能伽马射线天文学,范围从10 TeV到几个EeV。它由诸如TAIGA-IACT、TAIGA HiSCORE等仪器组成。特别是TAIGA HiSCORE,是一组广角定时切伦科夫光站。TAIGA HiSCORE数据能够重建空气簇射特征,如空气簇射能量、到达方向和轴坐标。在本报告中,我们建议考虑卷积神经网络在空气簇射特性确定任务中的应用。我们使用卷积神经网络(CNN)来分析他的核心事件,把它们当作图像处理。为此,使用了在其核心站记录的事件的时间和振幅。本文讨论了一种简单的卷积神经网络及其训练方法。此外,我们给出了确定空气簇射参数的一些初步结果,如簇射轴的方向和位置以及初级粒子的能量,并与传统方法得到的结果进行了比较。 摘要:TAIGA is a hybrid observatory for gamma-ray astronomy at high energies in range from 10 TeV to several EeV. It consists of instruments such as TAIGA-IACT, TAIGA-HiSCORE, and others. TAIGA-HiSCORE, in particular, is an array of wide-angle timing Cherenkov light stations. TAIGA-HiSCORE data enable to reconstruct air shower characteristics, such as air shower energy, arrival direction, and axis coordinates. In this report, we propose to consider the use of convolution neural networks in task of air shower characteristics determination. We use Convolutional Neural Networks (CNN) to analyze HiSCORE events, treating them like images. For this, the times and amplitudes of events recorded at HiSCORE stations are used. The work discusses a simple convolutional neural network and its training. In addition, we present some preliminary results on the determination of the parameters of air showers such as the direction and position of the shower axis and the energy of the primary particle and compare them with the results obtained by the traditional method.
【9】 The Preliminary Results on Analysis of TAIGA-IACT Images Using Convolutional Neural Networks 标题:用卷积神经网络对TAIGA-IACT图像进行分析的初步结果 链接:https://arxiv.org/abs/2112.10168
作者:Elizaveta Gres,and Alexander Kryukov 机构:Research Institute of Applied Physics, Irkutsk State University, Gagarina Blvd, Irkutsk, Russian Federation, Skobeltsyn Institute of Nuclear Physics, M.V. Lomonosov Moscow State University, (,) Leninskie gory, Moscow, Russian Federation 备注:In Proceedings of 5th International Workshop on Deep Learning in Computational Physics (DLCP2021), 28-29 June, 2021, Moscow, Russia. 9 pages, 3 figures, 2 tables 摘要:位于布里亚特共和国通卡河谷的成像切伦科夫望远镜(TAIGA-IACT)在短时间内积累了大量数据,必须对这些数据进行高效、快速的分析。这种分析的方法之一是机器学习,近年来在许多技术和科学领域证明了它的有效性。这项工作的目的是研究机器学习应用程序解决TAIGA-IACT任务集的可能性:识别宇宙线初级粒子并重建其物理参数。在这项工作中,卷积神经网络(CNN)的方法被应用于处理和分析用CORSIKA模拟的蒙特卡罗事件。还考虑了用于处理的各种CNN架构。结果表明,该方法在确定大范围空气簇射(EAS)初级粒子类型和重建伽马射线能量方面具有良好的效果。在立体观察的情况下,结果显著改善。 摘要:The imaging Cherenkov telescopes TAIGA-IACT, located in the Tunka valley of the republic Buryatia, accumulate a lot of data in a short period of time which must be efficiently and quickly analyzed. One of the methods of such analysis is the machine learning, which has proven its effectiveness in many technological and scientific fields in recent years. The aim of the work is to study the possibility of the machine learning application to solve the tasks set for TAIGA-IACT: the identification of the primary particle of cosmic rays and reconstruction their physical parameters. In the work the method of Convolutional Neural Networks (CNN) was applied to process and analyze Monte-Carlo events simulated with CORSIKA. Also various CNN architectures for the processing were considered. It has been demonstrated that this method gives good results in the determining the type of primary particles of Extensive Air Shower (EAS) and the reconstruction of gamma-rays energy. The results are significantly improved in the case of stereoscopic observations.
【10】 3D Structural Analysis of the Optic Nerve Head to Robustly Discriminate Between Papilledema and Optic Disc Drusen 标题:视神经头三维结构分析对视乳头水肿和视盘玻璃疣的有力鉴别 链接:https://arxiv.org/abs/2112.09970
作者:Michaël J. A. Girard,Satish K. Panda,Tin Aung Tun,Elisabeth A. Wibroe,Raymond P. Najjar,Aung Tin,Alexandre H. Thiéry,Steffen Hamann,Clare Fraser,Dan Milea 机构:., Ophthalmic Engineering & Innovation Laboratory, Singapore Eye Research Institute, Singapore, Duke-NUS Graduate Medical School, Singapore, Institute for Molecular and Clinical Ophthalmology, Basel, Switzerland 摘要:目的:(1)开发一种在三维光学相干断层扫描(OCT)中识别视神经头(ONH)主要组织结构的深度学习算法;(2) 利用这些信息有力地区分健康人、视盘drusen(ODD)和视乳头水肿。这是一项横断面对比研究,包括确诊的ODD(105只眼)、高颅内压引起的乳头水肿(51只眼)和健康对照组(100只眼)。使用OCT获取ONHs的3D扫描,然后进行处理以提高深部组织的可见度。首先,使用984次B扫描(来自130只眼睛)开发了一种深度学习算法,以识别:主要神经/结缔组织和奇数区域。使用骰子系数(DC)评估我们算法的性能。在第二步中,设计了一种分类算法(随机森林),使用150个OCT体积,严格按照drusen和prelamina肿胀分数(源自分割)进行三级分类(1:奇数,2:乳头水肿,3:健康)。为了评估性能,我们报告了每个类别的接收器工作特性曲线(AUC)下的面积。我们的分割算法能够分离神经和结缔组织,以及任何时候出现的奇数区域。测试集上的平均DC为0.93$\pm$0.03证实了这一点,对应于良好的性能。通过高AUC实现分类,即ODD检测为0.99$\pm$0.01,乳头水肿检测为0.99$\pm$0.01,健康ONHs检测为0.98$\pm$0.02。我们的人工智能方法通过单次OCT扫描准确地区分了奇数和乳头水肿。我们的分类性能非常出色,但需要注意的是,在更大的人群中进行验证是必要的。我们的方法有可能使OCT成为神经眼科诊断成像的主要手段。 摘要:Purpose: (1) To develop a deep learning algorithm to identify major tissue structures of the optic nerve head (ONH) in 3D optical coherence tomography (OCT) scans; (2) to exploit such information to robustly differentiate among healthy, optic disc drusen (ODD), and papilledema ONHs. It was a cross-sectional comparative study with confirmed ODD (105 eyes), papilledema due to high intracranial pressure (51 eyes), and healthy controls (100 eyes). 3D scans of the ONHs were acquired using OCT, then processed to improve deep-tissue visibility. At first, a deep learning algorithm was developed using 984 B-scans (from 130 eyes) in order to identify: major neural/connective tissues, and ODD regions. The performance of our algorithm was assessed using the Dice coefficient (DC). In a 2nd step, a classification algorithm (random forest) was designed using 150 OCT volumes to perform 3-class classifications (1: ODD, 2: papilledema, 3: healthy) strictly from their drusen and prelamina swelling scores (derived from the segmentations). To assess performance, we reported the area under the receiver operating characteristic curves (AUCs) for each class. Our segmentation algorithm was able to isolate neural and connective tissues, and ODD regions whenever present. This was confirmed by an average DC of 0.93$\pm$0.03 on the test set, corresponding to good performance. Classification was achieved with high AUCs, i.e. 0.99$\pm$0.01 for the detection of ODD, 0.99 $\pm$ 0.01 for the detection of papilledema, and 0.98$\pm$0.02 for the detection of healthy ONHs. Our AI approach accurately discriminated ODD from papilledema, using a single OCT scan. Our classification performance was excellent, with the caveat that validation in a much larger population is warranted. Our approach may have the potential to establish OCT as the mainstay of diagnostic imaging in neuro-ophthalmology.
【11】 Deep Learning for Stability Analysis of a Freely Vibrating Sphere at Moderate Reynolds Number 标题:中等雷诺数自由振动球体稳定性分析的深度学习 链接:https://arxiv.org/abs/2112.09858
作者:A. Chizfahm,R. Jaiman 机构:Department of Mechanical Engineering, The University of British Columbia, Canada, (Received xx; revised xx; accepted xx) 备注:33 pages, 14 figures 摘要:本文提出了一种基于深度学习的降阶模型(DL-ROM),用于非定常三维流固耦合系统的稳定性预测。所提出的DL-ROM具有非线性状态空间模型的格式,并采用具有长短时记忆(LSTM)的递归神经网络。我们考虑一个规范的流体结构系统的弹性安装球与不可压缩流体流耦合的状态空间格式。我们开发了一种非线性数据驱动耦合,用于预测横向自由振动球体的非定常力和涡激振动(VIV)锁定。我们为流体结构系统的低维近似设计了一个输入-输出关系,作为力和位移数据集的时间序列。基于VIV锁定过程的先验知识,输入函数包含一系列频率和振幅,从而实现高效的DL-ROM,而无需大量的低维建模训练数据集。经过训练后,该网络提供输入-输出动力学的非线性映射,可以通过反馈过程预测更长时间内的耦合流体-结构动力学。通过将LSTM网络与特征系统实现算法(ERA)相结合,我们构建了用于降阶稳定性分析的数据驱动状态空间模型。我们通过特征值选择过程研究VIV的潜在机制和稳定性特征。为了理解频率锁定机制,我们研究了一系列降低的振荡频率和质量比的特征值轨迹。与全阶模拟一致,频率锁定分支通过组合LSTM-ERA程序准确捕获。所提出的DL-ROM与基于物理的涉及流体-结构相互作用的工程系统数字孪晶的发展相一致。 摘要:In this paper, we present a deep learning-based reduced-order model (DL-ROM) for the stability prediction of unsteady 3D fluid-structure interaction systems. The proposed DL-ROM has the format of a nonlinear state-space model and employs a recurrent neural network with long short-term memory (LSTM). We consider a canonical fluid-structure system of an elastically-mounted sphere coupled with incompressible fluid flow in a state-space format. We develop a nonlinear data-driven coupling for predicting unsteady forces and vortex-induced vibration (VIV) lock-in of the freely vibrating sphere in a transverse direction. We design an input-output relationship as a temporal sequence of force and displacement datasets for a low-dimensional approximation of the fluid-structure system. Based on the prior knowledge of the VIV lock-in process, the input function contains a range of frequencies and amplitudes, which enables an efficient DL-ROM without the need for a massive training dataset for the low-dimensional modeling. Once trained, the network provides a nonlinear mapping of input-output dynamics that can predict the coupled fluid-structure dynamics for a longer horizon via the feedback process. By integrating the LSTM network with the eigensystem realization algorithm (ERA), we construct a data-driven state-space model for the reduced-order stability analysis. We investigate the underlying mechanism and stability characteristics of VIV via an eigenvalue selection process. To understand the frequency lock-in mechanism, we study the eigenvalue trajectories for a range of the reduced oscillation frequencies and the mass ratios. Consistent with the full-order simulations, the frequency lock-in branches are accurately captured by the combined LSTM-ERA procedure. The proposed DL-ROM aligns with the development of physics-based digital twin of engineering systems involving fluid-structure interactions.
检测相关(5篇)
【1】 An ensemble deep learning technique for detecting suicidal ideation from posts in social media platforms 标题:从社交媒体平台的帖子中检测自杀意念的集成深度学习技术 链接:https://arxiv.org/abs/2112.10609
作者:Shini Renjith,Annie Abraham,Surya B. Jyothi,Lekshmi Chandran,Jincy Thomson 机构:Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Thiruvananthapuram, Kerala, India, a r t i c l e, i n f o, Article history:, Available online xxxx 备注:None 摘要:社交媒体中的自杀意念检测是一项不断发展的研究,具有很大的挑战性。许多有自杀倾向的人通过社交媒体平台分享他们的想法和观点。作为许多研究的一部分,人们观察到,社交媒体公开发布的帖子包含有效检测有自杀想法的个人的有价值的标准。预防自杀最困难的部分是发现和理解可能导致自杀的复杂危险因素和警告信号。这可以通过自动识别用户行为的突然变化来实现。自然语言处理技术可用于从社交媒体互动中收集行为和文本特征,这些特征可被传递到专门设计的框架中,以检测人类互动中的异常,这些异常是自杀意图的指标。我们可以使用深度学习和/或基于机器学习的分类方法快速检测自杀意念。为此,我们可以使用LSTM和CNN模型的组合,从用户的帖子中检测此类情绪。为了提高准确度,可以采取一些方法,如使用更多的数据进行训练,使用注意模型提高现有模型的效率等。本文提出了一个LSTM-Attention-CNN组合模型来分析社交媒体提交,以检测潜在的自杀意图。在评估过程中,所提出的模型的准确率为90.3%,F1分数为92.6%,高于基准模型。 摘要:Suicidal ideation detection from social media is an evolving research with great challenges. Many of the people who have the tendency to suicide share their thoughts and opinions through social media platforms. As part of many researches it is observed that the publicly available posts from social media contain valuable criteria to effectively detect individuals with suicidal thoughts. The most difficult part to prevent suicide is to detect and understand the complex risk factors and warning signs that may lead to suicide. This can be achieved by identifying the sudden changes in a user behavior automatically. Natural language processing techniques can be used to collect behavioral and textual features from social media interactions and these features can be passed to a specially designed framework to detect anomalies in human interactions that are indicators of suicidal intentions. We can achieve fast detection of suicidal ideation using deep learning and/or machine learning based classification approaches. For such a purpose, we can employ the combination of LSTM and CNN models to detect such emotions from posts of the users. In order to improve the accuracy, some approaches like using more data for training, using attention model to improve the efficiency of existing models etc. could be done. This paper proposes a LSTM-Attention-CNN combined model to analyze social media submissions to detect any underlying suicidal intentions. During evaluations, the proposed model demonstrated an accuracy of 90.3 percent and an F1-score of 92.6 percent, which is greater than the baseline models.
【2】 Early Detection of Security-Relevant Bug Reports using Machine Learning: How Far Are We? 标题:使用机器学习及早检测与安全相关的错误报告:我们还有多远? 链接:https://arxiv.org/abs/2112.10123
作者:Arthur D. Sawadogo,Quentin Guimard,Tegawendé F. Bissyandé,Abdoul Kader Kaboré,Jacques Klein,Naouel Moha 机构:Universit´e du Qu´ebec a Montr´eal, †University of Luxembourg 备注:10 pages 摘要:Bug报告是软件开发中常见的人工制品。它们是用户向开发人员传达他们在使用已发布版本的软件程序时遇到的问题的信息的主要渠道。然而,在问题的描述中,用户可能有意或无意地暴露漏洞。在典型的维护场景中,开发团队在准备纠正补丁时会优先考虑此类安全相关的bug报告。然而,当安全相关性没有立即表达(例如,通过标签)或没有迅速被分类团队识别时,开放式安全相关bug报告可能会成为敏感信息的严重泄漏,攻击者可以利用这些敏感信息执行零日攻击。为了支持从业人员对bug报告进行分类,研究社区提出了许多方法来检测与安全相关的bug报告。近年来,基于机器学习的方法在这方面得到了广泛的应用。我们的工作重点是这些方法,并重新审视它们的组成部分,以提供对当前成就的全面看法。为此,我们建立了一个大型实验数据集,并对特征集和学习算法进行了广泛的实验。最后,我们的研究强调了产生最佳性能分类器的不同方法配置。 摘要:Bug reports are common artefacts in software development. They serve as the main channel for users to communicate to developers information about the issues that they encounter when using released versions of software programs. In the descriptions of issues, however, a user may, intentionally or not, expose a vulnerability. In a typical maintenance scenario, such security-relevant bug reports are prioritised by the development team when preparing corrective patches. Nevertheless, when security relevance is not immediately expressed (e.g., via a tag) or rapidly identified by triaging teams, the open security-relevant bug report can become a critical leak of sensitive information that attackers can leverage to perform zero-day attacks. To support practitioners in triaging bug reports, the research community has proposed a number of approaches for the detection of security-relevant bug reports. In recent years, approaches in this respect based on machine learning have been reported with promising performance. Our work focuses on such approaches, and revisits their building blocks to provide a comprehensive view on the current achievements. To that end, we built a large experimental dataset and performed extensive experiments with variations in feature sets and learning algorithms. Eventually, our study highlights different approach configurations that yield best performing classifiers.
【3】 Rapid Face Mask Detection and Person Identification Model based on Deep Neural Networks 标题:基于深度神经网络的快速人脸面具检测与身份识别模型 链接:https://arxiv.org/abs/2112.09951
作者:Abdullah Ahmad Khan,Mohd. Belal,GhufranUllah 机构:Department of Computer Science, Aligarh Muslim University, Uttar Pradesh, India, ABSTACT, As Covid-, has been constantly getting mutated and in three or four months a new variant gets introduced to us and 备注:12 pages , 15 figures , International Conference 摘要:随着COVID-19不断变异,在三个月或四个月内,一个新的变体被介绍给我们,它带来了更致命的问题。阻止我们感染新冠病毒的因素是接种疫苗和戴口罩。在本文中,我们已经实现了一个新的面罩检测和人脸识别模型Insight Face,该模型基于SoftMax损失分类算法Arc Face loss,并将其命名为RFMPI-DNN(基于深度神经网络的快速人脸检测和Peron识别模型),以与其他模型相比快速检测面罩和身份可获得的为了比较我们的新模型,我们使用了以前的MobileNet_V2模型和人脸识别模块,以便在时间的基础上进行有效的比较。在系统中实现的模型在各个方面都优于本文所比较的模型 摘要:As Covid-19 has been constantly getting mutated and in three or four months a new variant gets introduced to us and it comes with more deadly problems. The things that prevent us from getting Covid is getting vaccinated and wearing a face mask. In this paper, we have implemented a new Face Mask Detection and Person Recognition model named Insight face which is based on SoftMax loss classification algorithm Arc Face loss and names it as RFMPI-DNN(Rapid Face Detection and Peron Identification Model based on Deep Neural Networks) to detect face mask and person identity rapidly as compared to other models available. To compare our new model, we have used previous MobileNet_V2 model and face recognition module for effective comparison on the basis of time. The proposed model implemented in the system has outperformed the model compared in this paper in every aspect
【4】 Exploiting Expert-guided Symmetry Detection in Markov Decision Processes 标题:利用专家引导的对称性检测在马尔可夫决策过程中的应用 链接:https://arxiv.org/abs/2112.09943
作者:Giorgio Angelotti,Nicolas Drougard,Caroline P. C. Chanel 机构:ISAE-Supaero, University of Toulouse, France, ANITI, University of Toulouse, France 备注:Preprint - Under review 摘要:马尔可夫决策过程(MDP)动态模型的离线估计是一项非常重要的任务,在很大程度上取决于学习阶段可用的数据。有时,模型的动力学对于当前状态和动作的某些变换是不变的。最近的工作表明,专家指导的管道依赖于密度估计方法,如基于深度神经网络的规范化流,可以在确定性环境中有效地检测这种结构,包括分类和连续值环境。可以利用获得的知识来扩充原始数据集,最终减少真实模型和学习模型之间的分布转移。在这项工作中,我们扩展了该范式,以处理非确定性MDP,特别是1)我们提出了基于统计距离的分类环境中的检测阈值,2)我们引入了一个基于Wilcoxon符号秩统计检验的连续环境中分布转移的基准;3)我们表明,在求解学习的MDP,然后在实际环境中应用最优策略时,前面的结果会导致性能的提高。 摘要:Offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task that greatly depends on the data available to the learning phase. Sometimes the dynamics of the model is invariant with respect to some transformations of the current state and action. Recent works showed that an expert-guided pipeline relying on Density Estimation methods as Deep Neural Network based Normalizing Flows effectively detects this structure in deterministic environments, both categorical and continuous-valued. The acquired knowledge can be exploited to augment the original data set, leading eventually to a reduction in the distributional shift between the true and the learnt model. In this work we extend the paradigm to also tackle non deterministic MDPs, in particular 1) we propose a detection threshold in categorical environments based on statistical distances, 2) we introduce a benchmark of the distributional shift in continuous environments based on the Wilcoxon signed-rank statistical test and 3) we show that the former results lead to a performance improvement when solving the learnt MDP and then applying the optimal policy in the real environment.
【5】 Morpheme Boundary Detection & Grammatical Feature Prediction for Gujarati : Dataset & Model 标题:古吉拉特文语素边界检测与语法特征预测:数据集与模型 链接:https://arxiv.org/abs/2112.09860
作者:Jatayu Baxi,Dr. Brijesh Bhatt 机构:Dharmsinh Desai University Nadiad 摘要:为低资源语言开发自然语言处理资源是一项具有挑战性但必不可少的任务。在本文中,我们提出了一个古吉拉特邦形态分析仪。我们使用了一种基于双向LSTM的方法来执行语素边界检测和语法特征标记。我们已经创建了一个古吉拉特语词汇的数据集,具有引理和语法特征。本文讨论的基于Bi-LSTM的词形分析器模型在不知道任何手工制作的后缀规则的情况下有效地处理了语言的词形。据我们所知,这是古吉拉特语的第一个数据集和语素分析器模型,它执行语法特征标记和语素边界检测任务。 摘要:Developing Natural Language Processing resources for a low resource language is a challenging but essential task. In this paper, we present a Morphological Analyzer for Gujarati. We have used a Bi-Directional LSTM based approach to perform morpheme boundary detection and grammatical feature tagging. We have created a data set of Gujarati words with lemma and grammatical features. The Bi-LSTM based model of Morph Analyzer discussed in the paper handles the language morphology effectively without the knowledge of any hand-crafted suffix rules. To the best of our knowledge, this is the first dataset and morph analyzer model for the Gujarati language which performs both grammatical feature tagging and morpheme boundary detection tasks.
分类|识别(10篇)
【1】 Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Switched Linear Systems 标题:自治切换线性系统切换最小二乘系统辨识的一致性和收敛速度 链接:https://arxiv.org/abs/2112.10753
作者:Borna Sayedana,Mohammad Afshari,Peter E. Caines,Aditya Mahajan 机构:Electrical and Computer Engineering, McGill University, Canada, Computer Science, University of Alberta, Canada 摘要:本文研究具有完全状态观测的自治切换线性系统的系统辨识问题。我们提出了用于切换线性系统辨识的切换最小二乘法,证明了该方法的强一致性,并推导了与数据相关和与数据无关的收敛速度。特别是,我们的数据相关收敛速度表明,几乎可以肯定,系统识别错误是$\mathcal{O}\big(\sqrt{\log(T)/T}\big)$,其中,$T$是时间范围。这些结果表明,对于切换线性系统,我们的方法与对于非切换线性系统的最小二乘法具有相同的收敛速度。我们将我们的结果与文献中的结果进行比较。我们给出了数值例子来说明所提出的系统辨识方法的性能。 摘要:In this paper, we investigate the problem of system identification for autonomous switched linear systems with complete state observations. We propose switched least squares method for the identification for switched linear systems, show that this method is strongly consistent, and derive data-dependent and data-independent rates of convergence. In particular, our data-dependent rate of convergence shows that, almost surely, the system identification error is $\mathcal{O}\big(\sqrt{\log(T)/T} \big)$ where $T$ is the time horizon. These results show that our method for switched linear systems has the same rate of convergence as least squares method for non-switched linear systems. We compare our results with those in the literature. We present numerical examples to illustrate the performance of the proposed system identification method.
【2】 A singular Riemannian geometry approach to Deep Neural Networks II. Reconstruction of 1-D equivalence classes 标题:深层神经网络的奇异黎曼几何方法Ⅱ.一维等价类的重构 链接:https://arxiv.org/abs/2112.10583
作者:Alessandro Benfenati,Alessio Marta 机构:• The equivalence classes are sets of points that are mapped in the same way by the Neural Network 摘要:在以前的工作中,我们提出了一个几何框架来研究深层神经网络,它被视为流形之间的映射序列,采用奇异黎曼几何。在本文中,我们给出了该框架的一个应用,提出了一种建立输入点等价类的方法:该类定义为输入流形上的点集,通过神经网络映射到相同的输出。换句话说,我们在输入空间的输出流形中构建点的前像。特别地。为了简单起见,我们将重点放在神经网络从n维实空间映射到(n-1)维实空间的情况上,我们提出了一种算法,允许建立位于同一等价类上的点集。这种方法导致了两个主要应用:生成新的合成数据,并且它可以提供一些关于分类器如何被输入数据的小扰动所混淆的见解(例如,被分类为包含吉娃娃的图像的企鹅图像)。此外,对于从二维到一维实空间的神经网络,我们还讨论了如何找到实线闭合区间的前像。我们还介绍了一些数值实验,其中包括一些训练用于执行非线性回归任务的神经网络,包括二元分类器的情况。 摘要:In a previous work, we proposed a geometric framework to study a deep neural network, seen as sequence of maps between manifolds, employing singular Riemannian geometry. In this paper, we present an application of this framework, proposing a way to build the class of equivalence of an input point: such class is defined as the set of the points on the input manifold mapped to the same output by the neural network. In other words, we build the preimage of a point in the output manifold in the input space. In particular. we focus for simplicity on the case of neural networks maps from n-dimensional real spaces to (n - 1)-dimensional real spaces, we propose an algorithm allowing to build the set of points lying on the same class of equivalence. This approach leads to two main applications: the generation of new synthetic data and it may provides some insights on how a classifier can be confused by small perturbation on the input data (e.g. a penguin image classified as an image containing a chihuahua). In addition, for neural networks from 2D to 1D real spaces, we also discuss how to find the preimages of closed intervals of the real line. We also present some numerical experiments with several neural networks trained to perform non-linear regression tasks, including the case of a binary classifier.
【3】 Classifier Calibration: How to assess and improve predicted class probabilities: a survey 标题:分类器校准:如何评估和改进预测类概率:一项调查 链接:https://arxiv.org/abs/2112.10327
作者:Telmo Silva Filho,Hao Song,Miquel Perello-Nieto,Raul Santos-Rodriguez,Meelis Kull,Peter Flach 机构: and Peter, Flach, Department of Statistics, Federal University of Para´ıba, Cidade, Universit´aria, Jo˜ao Pessoa,.,-, Para´ıba, Brazil., Intelligent Systems Laboratory, University of Bristol, Merchant, Venturers Building, Woodland Road, Bristol, BS,UB, United 摘要:本文介绍并详细概述了分类器校准的原理和实践。经过良好校准的分类器可正确量化与其实例预测相关的不确定性或置信度水平。这对于关键应用程序、最佳决策、成本敏感的分类以及某些类型的上下文更改是至关重要的。校准研究有着丰富的历史,比机器学习作为一个学术领域的诞生早了几十年。然而,最近对校准的兴趣增加,导致了新的方法和从二进制扩展到多类设置。选择的空间和要考虑的问题是很大的,并且导航它需要正确的概念和工具集。我们提供主要概念和方法的介绍性材料和最新技术细节,包括适当的评分规则和其他评估指标、可视化方法、二元和多类分类的事后校准方法的综合说明,以及几个高级主题。 摘要:This paper provides both an introduction to and a detailed overview of the principles and practice of classifier calibration. A well-calibrated classifier correctly quantifies the level of uncertainty or confidence associated with its instance-wise predictions. This is essential for critical applications, optimal decision making, cost-sensitive classification, and for some types of context change. Calibration research has a rich history which predates the birth of machine learning as an academic field by decades. However, a recent increase in the interest on calibration has led to new methods and the extension from binary to the multiclass setting. The space of options and issues to consider is large, and navigating it requires the right set of concepts and tools. We provide both introductory material and up-to-date technical details of the main concepts and methods, including proper scoring rules and other evaluation metrics, visualisation approaches, a comprehensive account of post-hoc calibration methods for binary and multiclass classification, and several advanced topics.
【4】 DXML: Distributed Extreme Multilabel Classification 标题:DXML:分布式极端多标签分类 链接:https://arxiv.org/abs/2112.10297
作者:Pawan Kumar 机构:International Institute of Information Technology, Hyderabad, India 备注:8 pages, 0 figures, 2 tables 摘要:作为一种大数据应用,极端多标签分类已成为一个重要的研究课题,并在产品和项目的排名和推荐中得到应用。提出了一种用于大规模排序和推荐的极端分类的可扩展混合分布式共享内存实现方案。具体而言,该实现是使用MPI跨节点传递消息和使用OpenMP在节点上使用多线程的混合。推导了通信延迟和通信量的表达式。针对共享内存体系结构,推导了使用工作跨度模型的并行性。这就揭示了类似极端分类方法的预期可伸缩性。实验表明,该实现在一些大型数据集上的训练和测试速度相对较快。在某些情况下,模型尺寸相对较小。 摘要:As a big data application, extreme multilabel classification has emerged as an important research topic with applications in ranking and recommendation of products and items. A scalable hybrid distributed and shared memory implementation of extreme classification for large scale ranking and recommendation is proposed. In particular, the implementation is a mix of message passing using MPI across nodes and using multithreading on the nodes using OpenMP. The expression for communication latency and communication volume is derived. Parallelism using work-span model is derived for shared memory architecture. This throws light on the expected scalability of similar extreme classification methods. Experiments show that the implementation is relatively faster to train and test on some large datasets. In some cases, model size is relatively small.
【5】 Active Weighted Aging Ensemble for Drifted Data Stream Classification 标题:用于漂移数据流分类的主动加权老化集成 链接:https://arxiv.org/abs/2112.10150
作者:Michał Woźniak,Paweł Zyblewski,Paweł Ksieniewicz 机构: andPawe�l Ksieniewicz[0000−000 1−9 578−8 39 5]Department of Systems and Computer Networks, Wroc�law University of Science and Technology 备注:29 pages, 3 figures 摘要:流式数据分类的一个重要问题是概念漂移的发生,即分类任务概率特征的变化。这种现象破坏了分类模型的性能,并严重降低了其质量。为了使分类器适应不断变化的概率特征,需要采取适当的策略来消除这种现象。实现这种解决方案的一个重要问题是访问数据标签。这通常是昂贵的,因此为了最小化与此过程相关的费用,提出了基于半监督学习的学习策略,例如,采用主动学习方法,指示哪些传入对象对于提高分类器的性能是有价值的。本文提出了一种基于分类器集成学习的非平稳数据流分块分类方法和一种考虑有限预算的主动学习策略,可成功应用于任何数据流分类算法。所提出的方法已经通过使用真实数据流和生成的数据流的计算机实验进行了评估。结果表明,与现有的方法相比,该算法具有较高的质量。 摘要:One of the significant problems of streaming data classification is the occurrence of concept drift, consisting of the change of probabilistic characteristics of the classification task. This phenomenon destabilizes the performance of the classification model and seriously degrades its quality. An appropriate strategy counteracting this phenomenon is required to adapt the classifier to the changing probabilistic characteristics. One of the significant problems in implementing such a solution is the access to data labels. It is usually costly, so to minimize the expenses related to this process, learning strategies based on semi-supervised learning are proposed, e.g., employing active learning methods indicating which of the incoming objects are valuable to be labeled for improving the classifier's performance. This paper proposes a novel chunk-based method for non-stationary data streams based on classifier ensemble learning and an active learning strategy considering a limited budget that can be successfully applied to any data stream classification algorithm. The proposed method has been evaluated through computer experiments using both real and generated data streams. The results confirm the high quality of the proposed algorithm over state-of-the-art methods.
【6】 Evaluating System Identification Methods for Predicting Thermal Dissipation of Heterogeneous SoCs 标题:异质SoC散热预测的系统辨识方法评价 链接:https://arxiv.org/abs/2112.10121
作者:Joel Öhrling,Sébastien Lafond,Dragos Truscan 机构:˚Abo Akademi University, Turku, Finland 备注:arXiv admin note: substantial text overlap with arXiv:2104.10387 摘要:在本文中,我们评估了如何使用系统识别方法来构建异构SoC平台的热预测模型,该模型可用于快速预测不同配置的温度,而无需硬件。具体来说,我们关注的建模方法可以根据时钟频率和每个核心的利用率预测温度。我们研究了三种方法的预测精度:使用多项式回归器的线性状态空间识别方法、NARX神经网络方法和配置在FIR模型结构中的递归神经网络方法。我们在具有Exynos 5422 SoC的Odroid-XU4板上评估了这些方法。结果表明,当使用1小时和6小时的数据进行训练时,基于多项式回归的模型显著优于其他两个模型。 摘要:In this paper we evaluate the use of system identification methods to build a thermal prediction model of heterogeneous SoC platforms that can be used to quickly predict the temperature of different configurations without the need of hardware. Specifically, we focus on modeling approaches that can predict the temperature based on the clock frequency and the utilization percentage of each core. We investigate three methods with respect to their prediction accuracy: a linear state-space identification approach using polynomial regressors, a NARX neural network approach and a recurrent neural network approach configured in an FIR model structure. We evaluate the methods on an Odroid-XU4 board featuring an Exynos 5422 SoC. The results show that the model based on polynomial regressors significantly outperformed the other two models when trained with 1 hour and 6 hours of data.
【7】 Set Twister for Single-hop Node Classification 标题:设置单跳节点分类的Twister 链接:https://arxiv.org/abs/2112.09752
作者:Yangze Zhou,Vinayak Rao,Bruno Ribeiro 机构:Purdue University, West Lafayette, IN 备注:Accepted for presentation at the 2nd GCLR workshop in conjunction with AAAI 2022 摘要:节点分类是关系学习中的一项中心任务,当前的最新技术基于两个关键原则:(i)预测对节点邻居的排序是置换不变的,以及(ii)预测是节点的$r$-hop邻域拓扑和属性的函数,$r\geq 2$。图形神经网络和集体推理方法(如信念传播)都依赖于高达$r$-跳的信息。在这项工作中,我们研究了使用更强大的置换不变函数是否有时可以避免分类器收集超过$1$-hop的信息。为此,我们引入了一种新的体系结构,即集合扭曲器,它概括了DeepSets(Zaheer等人,2017),这是一种简单且广泛使用的置换不变表示。从理论上讲,Set Twister增加了deepset的表达能力,允许它捕获高阶依赖项,同时保持其简单性和较低的计算成本。从经验上看,我们在几个任务中看到了集合扭曲器在深度集以及各种图形神经网络和集体推理方案上的精度改进,同时展示了它的实现简单性和计算效率。 摘要:Node classification is a central task in relational learning, with the current state-of-the-art hinging on two key principles: (i) predictions are permutation-invariant to the ordering of a node's neighbors, and (ii) predictions are a function of the node's $r$-hop neighborhood topology and attributes, $r \geq 2$. Both graph neural networks and collective inference methods (e.g., belief propagation) rely on information from up to $r$-hops away. In this work, we study if the use of more powerful permutation-invariant functions can sometimes avoid the need for classifiers to collect information beyond $1$-hop. Towards this, we introduce a new architecture, the Set Twister, which generalizes DeepSets (Zaheer et al., 2017), a simple and widely-used permutation-invariant representation. Set Twister theoretically increases expressiveness of DeepSets, allowing it to capture higher-order dependencies, while keeping its simplicity and low computational cost. Empirically, we see accuracy improvements of Set Twister over DeepSets as well as a variety of graph neural networks and collective inference schemes in several tasks, while showcasing its implementation simplicity and computational efficiency.
【8】 Rank4Class: A Ranking Formulation for Multiclass Classification 标题:Rank4Class:一种多类分类的排序公式 链接:https://arxiv.org/abs/2112.09727
作者:Nan Wang,Zhen Qin,Le Yan,Honglei Zhuang,Xuanhui Wang,Michael Bendersky,Marc Najork 机构: deciding 1Department of Computer Science, University of Virginia 摘要:多类分类(MCC)是一个基本的机器学习问题,其目的是将每个实例分类为一组预定义的类中的一个。给定一个实例,分类模型为每个类计算一个分数,然后使用所有分数对类进行排序。分类模型的性能通常通过Top-K精度/误差(例如,K=1或5)来衡量。在本文中,我们的目的并不是像最近的工作那样提出新的神经表征学习模型,而是通过排名的角度表明,通过一种新的公式很容易提高MCC的性能。特别是,通过将MCC视为对实例中的类进行排名,我们首先认为排名指标,如标准化贴现累积增益(NDCG),可以比现有的Top-K指标提供更多信息。我们进一步证明,占主导地位的神经MCC体系结构可以表述为一个具有特定设计选择集的神经排序框架。基于这样的概括,我们表明,利用丰富的信息检索文献中的技术来提高MCC的开箱即用性能是简单而直观的。在具有不同数据集和主干模型的文本和图像分类任务上(例如,用于文本和图像分类的BERT和ResNet)的大量实证结果表明了我们提出的框架的价值。 摘要:Multiclass classification (MCC) is a fundamental machine learning problem which aims to classify each instance into one of a predefined set of classes. Given an instance, a classification model computes a score for each class, all of which are then used to sort the classes. The performance of a classification model is usually measured by Top-K Accuracy/Error (e.g., K=1 or 5). In this paper, we do not aim to propose new neural representation learning models as most recent works do, but to show that it is easy to boost MCC performance with a novel formulation through the lens of ranking. In particular, by viewing MCC as to rank classes for an instance, we first argue that ranking metrics, such as Normalized Discounted Cumulative Gain (NDCG), can be more informative than existing Top-K metrics. We further demonstrate that the dominant neural MCC architecture can be formulated as a neural ranking framework with a specific set of design choices. Based on such generalization, we show that it is straightforward and intuitive to leverage techniques from the rich information retrieval literature to improve the MCC performance out of the box. Extensive empirical results on both text and image classification tasks with diverse datasets and backbone models (e.g., BERT and ResNet for text and image classification) show the value of our proposed framework.
【9】 Skin lesion segmentation and classification using deep learning and handcrafted features 标题:基于深度学习和手工特征的皮肤病变分割与分类 链接:https://arxiv.org/abs/2112.10307
作者:Redha Ali,Hussin K. Ragb 机构:Department of Electrical and Computer Engineering, University of Dayton, College Park, Dayton, Ohio , Christian Brothers University, Memphis, Tennessee 备注:7 pages, 3 figures 摘要:皮肤损伤的准确诊断是皮肤镜图像分类的关键任务。在本研究中,我们形成了一种新的图像特征,称为混合特征,它比单一方法特征具有更强的识别能力。本研究涉及一种新技术,在训练过程中,我们将手工制作的特征或特征转换注入到卷积神经网络(CNN)模型的完全连接层。根据我们的文献回顾,到目前为止,还没有研究检查或调查在训练过程中将手工特征注入CNN模型对分类性能的影响。此外,我们还研究了分割掩码的影响及其对整体分类性能的影响。我们的模型实现了92.3%的均衡多类准确率,比典型的用于深度学习的单方法分类器结构高6.8%。 摘要:Accurate diagnostics of a skin lesion is a critical task in classification dermoscopic images. In this research, we form a new type of image features, called hybrid features, which has stronger discrimination ability than single method features. This study involves a new technique where we inject the handcrafted features or feature transfer into the fully connected layer of Convolutional Neural Network (CNN) model during the training process. Based on our literature review until now, no study has examined or investigated the impact on classification performance by injecting the handcrafted features into the CNN model during the training process. In addition, we also investigated the impact of segmentation mask and its effect on the overall classification performance. Our model achieves an 92.3% balanced multiclass accuracy, which is 6.8% better than the typical single method classifier architecture for deep learning.
【10】 Interpretable and Interactive Deep Multiple Instance Learning for Dental Caries Classification in Bitewing X-rays 标题:可解释交互式深度多示例学习在咬合X射线龋病分类中的应用 链接:https://arxiv.org/abs/2112.09694
作者:Benjamin Bergner,Csaba Rohrer,Aiham Taleb,Martha Duchrau,Guilherme De Leon,Jonas Almeida Rodrigues,Falk Schwendicke,Joachim Krois,Christoph Lippert 机构: Digital Health & Machine Learning, Hasso Plattner Institute, University of Potsdam, Germany, Department of Oral Diagnostics, Digital Health and Health Services Research, Charit´e - Univer-, sit¨atsmedizin Berlin, Germany 备注:19 pages, 10 figures, submitted to MIDL 2022 摘要:我们提出了一种简单有效的基于深度多实例学习的图像分类体系结构,并将其应用于具有挑战性的牙齿X光片龋齿检测任务中。从技术上讲,我们的方法有两方面的贡献:第一,它输出局部斑块分类概率的热图,尽管使用弱图像级别标签进行训练。第二,可以通过学习分割标签来指导训练。与现有方法相比,人类用户可以忠实地解释预测,并与模型交互,以决定关注哪些区域。实验是在一个庞大的临床数据集上进行的,该数据集的比特数为$\sim$38k($\sim$316k牙齿),与各种基线相比,我们取得了具有竞争力的性能。在外部龋齿分割模型的指导下,分类和定位性能显著提高。 摘要:We propose a simple and efficient image classification architecture based on deep multiple instance learning, and apply it to the challenging task of caries detection in dental radiographs. Technically, our approach contributes in two ways: First, it outputs a heatmap of local patch classification probabilities despite being trained with weak image-level labels. Second, it is amenable to learning from segmentation labels to guide training. In contrast to existing methods, the human user can faithfully interpret predictions and interact with the model to decide which regions to attend to. Experiments are conducted on a large clinical dataset of $\sim$38k bitewings ($\sim$316k teeth), where we achieve competitive performance compared to various baselines. When guided by an external caries segmentation model, a significant improvement in classification and localization performance is observed.
表征(3篇)
【1】 Learning Semi-Structured Representations of Radiology Reports 标题:放射学报告的半结构化表示学习 链接:https://arxiv.org/abs/2112.10746
作者:Tamara Katic,Martin Pavlovski,Danijela Sekulic,Slobodan Vucetic 机构: Temple University, Philadelphia, PA, USA, University Clinical Center of Serbia, Belgrade, Serbia 摘要:除了主要的诊断目的外,放射学报告一直是医学研究中宝贵的信息来源。鉴于放射报告的语料库,研究人员通常对确定描述特定医学发现的报告子集感兴趣。由于放射学报告中的医学发现空间巨大且可能无限,最近的研究建议将放射学报告中的自由文本语句映射为从有限词汇表中提取的半结构化术语字符串。本文旨在提出一种自动生成放射报告半结构化表示的方法。该方法包括将放射报告中的句子匹配到手动创建的半结构化表示,然后学习将匹配句子映射到其半结构化表示的序列到序列神经模型。我们在人工注释的胸部x射线放射学报告的OpenI语料库上评估了所提出的方法。结果表明,无论是在(1)定量测量,如BLEU、ROUGE和METOR方面,还是在(2)放射科医生的定性判断方面,所提出的方法都优于几个基线。研究结果还表明,训练后的模型在来自不同医疗机构的胸部x射线放射学报告样本语料库上产生了合理的半结构化表示。 摘要:Beyond their primary diagnostic purpose, radiology reports have been an invaluable source of information in medical research. Given a corpus of radiology reports, researchers are often interested in identifying a subset of reports describing a particular medical finding. Because the space of medical findings in radiology reports is vast and potentially unlimited, recent studies proposed mapping free-text statements in radiology reports to semi-structured strings of terms taken from a limited vocabulary. This paper aims to present an approach for the automatic generation of semi-structured representations of radiology reports. The approach consists of matching sentences from radiology reports to manually created semi-structured representations, followed by learning a sequence-to-sequence neural model that maps matched sentences to their semi-structured representations. We evaluated the proposed approach on the OpenI corpus of manually annotated chest x-ray radiology reports. The results indicate that the proposed approach is superior to several baselines, both in terms of (1) quantitative measures such as BLEU, ROUGE, and METEOR and (2) qualitative judgment of a radiologist. The results also demonstrate that the trained model produces reasonable semi-structured representations on an out-of-sample corpus of chest x-ray radiology reports from a different medical provider.
【2】 Representation Learning for Dynamic Hyperedges 标题:动态超边的表示学习 链接:https://arxiv.org/abs/2112.10154
作者:Tony Gracious,Ambedkar Dukkipati 机构:Department of Computer Science and Automation, Indian Institute of Science Bangalore 摘要:最近,人们对从交互数据中提取信息产生了极大的兴趣。传统上,这是通过在动态网络中的特定时间将其建模为成对交互来实现的。然而,现实世界的互动很少是成对的;它们可能涉及两个以上的节点。在文献中,这些类型的群体互动是通过超边/超链接建模的。现有的超边缘建模工作只关注静态网络,无法在节点与其他节点交互时对节点的时间演化进行建模。此外,他们无法回答诸如下一步将发生哪种类型的交互以及交互何时发生之类的临时查询。为了解决这些限制,在本文中,我们开发了一个用于超链接预测的时间点过程模型。我们提出的模型使用节点的动态表示技术来模拟进化,并在神经点过程框架中使用这种表示进行推理。我们在五个真实的交互数据上评估了我们的模型,并表明我们的动态模型比静态模型有显著的性能增益。此外,我们还展示了我们的技术相对于成对交互建模技术的优势。 摘要:Recently there has been a massive interest in extracting information from interaction data. Traditionally this is done by modeling it as pair-wise interaction at a particular time in a dynamic network. However, real-world interactions are seldom pair-wise; they can involve more than two nodes. In literature, these types of group interactions are modeled by hyperedges/hyperlinks. The existing works for hyperedge modeling focused only on static networks, and they cannot model the temporal evolution of nodes as they interact with other nodes. Also, they cannot answer temporal queries like which type of interaction will occur next and when the interaction will occur. To address these limitations, in this paper, we develop a temporal point process model for hyperlink prediction. Our proposed model uses dynamic representation techniques for nodes to model the evolution and uses this representation in a neural point process framework to make inferences. We evaluate our models on five real-world interaction data and show that our dynamic model has significant performance gain over the static model. Further, we also demonstrate the advantages of our technique over the pair-wise interaction modeling technique.
【3】 RELAX: Representation Learning Explainability 标题:放松:表达学习可理解性 链接:https://arxiv.org/abs/2112.10161
作者:Kristoffer K. Wickstrøm,Daniel J. Trosten,Sigurd Løkse,Karl Øyvind Mikalsen,Michael C. Kampffmeyer,Robert Jenssen 机构:Department of Physics and Technology, UiT The Arctic University of Norway 摘要:尽管通过自我监督的表征学习在从未标记数据学习时取得了显著的进步,但没有任何方法可以解释什么会影响所学表征。我们通过我们提出的RELAX方法来解决这一需求,RELAX是第一种基于属性的表示解释方法。我们的方法还可以对其解释中的不确定性进行建模,这对于生成可信的解释至关重要。RELAX通过测量输入和隐藏版本之间的表示空间的相似性来解释表示,提供直观的解释,显著优于基于梯度的基线。我们提供RELAX的理论解释,并对使用有监督和无监督学习训练的特征提取器进行新的分析,提供对不同学习策略的见解。最后,我们说明了RELAX在多视图聚类中的可用性,并强调了合并不确定性对于提供低复杂性解释是必不可少的,这是解释表示的关键一步。 摘要:Despite the significant improvements that representation learning via self-supervision has led to when learning from unlabeled data, no methods exist that explain what influences the learned representation. We address this need through our proposed approach, RELAX, which is the first approach for attribution-based explanations of representations. Our approach can also model the uncertainty in its explanations, which is essential to produce trustworthy explanations. RELAX explains representations by measuring similarities in the representation space between an input and masked out versions of itself, providing intuitive explanations and significantly outperforming the gradient-based baseline. We provide theoretical interpretations of RELAX and conduct a novel analysis of feature extractors trained using supervised and unsupervised learning, providing insights into different learning strategies. Finally, we illustrate the usability of RELAX in multi-view clustering and highlight that incorporating uncertainty can be essential for providing low-complexity explanations, taking a crucial step towards explaining representations.
3D|3D重建等相关(1篇)
【1】 3D Instance Segmentation of MVS Buildings 标题:MVS建筑的三维实例分割 链接:https://arxiv.org/abs/2112.09902
作者:Yanghui Xu,Jiazhou Chen,Shufang Lu,Ronghua Liang,Liangliang Nan 机构: Liang are with the School of ComputerScience and Technology, Zhejiang University of Technology, Nan is with the Deft University of Technology 摘要:我们提出了一个新的框架,用于从多视图立体(MVS)城市场景中分割三维建筑物。与关注城市场景语义分割的现有工作不同,这项工作的重点在于检测和分割三维建筑实例,即使它们附着并嵌入在大型且不精确的三维曲面模型中。首先通过添加高度贴图将多视图RGB图像增强为RGBH图像,然后使用微调的2D实例分割神经网络对其进行分割以获得所有屋顶实例。然后将来自不同多视图图像的屋顶实例遮罩聚集到全局遮罩中。我们的遮罩聚类考虑了空间遮挡和重叠,可以消除多视图图像之间的分割模糊性。基于这些全局遮罩,通过遮罩背面投影分割出三维屋顶实例,并通过马尔可夫随机场(MRF)优化将其扩展到整个建筑实例。定量评估和消融研究表明了该方法所有主要步骤的有效性。还提供了一个用于评估三维建筑模型实例分割的数据集。据我们所知,它是第一个实例分割级别的三维城市建筑数据集。 摘要:We present a novel framework for instance segmentation of 3D buildings from Multi-view Stereo (MVS) urban scenes. Unlike existing works focusing on semantic segmentation of an urban scene, the emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model. Multi-view RGB images are first enhanced to RGBH images by adding a heightmap and are segmented to obtain all roof instances using a fine-tuned 2D instance segmentation neural network. Roof instance masks from different multi-view images are then clustered into global masks. Our mask clustering accounts for spatial occlusion and overlapping, which can eliminate segmentation ambiguities among multi-view images. Based on these global masks, 3D roof instances are segmented out by mask back-projections and extended to the entire building instances through a Markov random field (MRF) optimization. Quantitative evaluations and ablation studies have shown the effectiveness of all major steps of the method. A dataset for the evaluation of instance segmentation of 3D building models is provided as well. To the best of our knowledge, it is the first dataset for 3D urban buildings on the instance segmentation level.
优化|敛散性(6篇)
【1】 Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization 标题:基于状态守恒策略优化的过渡动态鲁棒抗扰策略学习 链接:https://arxiv.org/abs/2112.10513
作者:Yufei Kuang,Miao Lu,Jie Wang,Qi Zhou,Bin Li,Houqiang Li 机构:CAS Key Laboratory of Technology in GIPAS, University of Science and Technology of China, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center 备注:Accepted to AAAI 2022 摘要:由于源环境和目标环境之间的差异,深度强化学习算法在实际任务中的性能较差。这种差异通常被视为过渡动力学中的扰动。现有的许多算法通过对干扰进行建模并在训练期间将其应用于源环境来学习鲁棒策略,这通常需要事先了解模拟器的干扰和控制。然而,当目标环境的干扰未知或难以在模拟器中建模时,这些算法可能会失败。为了解决这个问题,我们提出了一种新的无模型参与者-批评家算法——状态保守策略优化(SCPO)——在不预先建模干扰的情况下学习鲁棒策略。具体地说,SCPO将过渡动力学中的扰动降低到状态空间中的扰动,然后通过一个简单的基于梯度的正则化器对其进行逼近。SCPO的吸引人的特点包括:它易于实现,不需要额外的干扰知识或专门设计的模拟器。在多个机器人控制任务中的实验表明,SCPO能够针对过渡动力学中的干扰学习鲁棒策略。 摘要:Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators. However, these algorithms can fail in scenarios where the disturbance from target environments is unknown or is intractable to model in simulators. To tackle this problem, we propose a novel model-free actor-critic algorithm -- namely, state-conservative policy optimization (SCPO) -- to learn robust policies without modeling the disturbance in advance. Specifically, SCPO reduces the disturbance in transition dynamics to that in state space and then approximates it by a simple gradient-based regularizer. The appealing features of SCPO include that it is simple to implement and does not require additional knowledge about the disturbance or specially designed simulators. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
【2】 Learning for Robust Combinatorial Optimization: Algorithm and Application 标题:鲁棒组合优化学习算法及其应用 链接:https://arxiv.org/abs/2112.10377
作者:Zhihui Shao,Jianyi Yang,Cong Shen,Shaolei Ren 机构:UC Riverside, University of Virginia 备注:Accepted by the IEEE International Conference on Computer Communications (INFOCOM) 2022 摘要:通过利用神经网络强大的预测能力和比传统求解器更低的运行时复杂度,学习优化(L2O)最近成为解决优化问题的一种很有前途的方法。虽然L2O已被应用于各种问题,但一类关键但具有挑战性的问题——以极大极小优化形式出现的稳健组合优化——在很大程度上仍有待探索。除了指数大的决策空间外,鲁棒组合优化的一个关键挑战在于内部优化问题,该问题通常是非凸的,并且与外部优化纠缠在一起。在本文中,我们研究了鲁棒组合优化,并提出了一种新的基于学习的优化器,称为LRCO(learning for robust combination optimization),该优化器能够在不确定环境下快速输出鲁棒解。LRCO利用了一对基于学习的优化器——一个用于最小化,另一个用于最大化——它们使用各自的目标函数作为损失,并且可以在不需要标签的情况下对问题实例进行训练。为了评估LRCO的性能,我们对车辆边缘计算中的任务卸载问题进行了仿真。我们的结果强调,LRCO可以极大地降低最坏情况下的成本并提高健壮性,同时具有非常低的运行时复杂性。 摘要:Learning to optimize (L2O) has recently emerged as a promising approach to solving optimization problems by exploiting the strong prediction power of neural networks and offering lower runtime complexity than conventional solvers. While L2O has been applied to various problems, a crucial yet challenging class of problems -- robust combinatorial optimization in the form of minimax optimization -- have largely remained under-explored. In addition to the exponentially large decision space, a key challenge for robust combinatorial optimization lies in the inner optimization problem, which is typically non-convex and entangled with outer optimization. In this paper, we study robust combinatorial optimization and propose a novel learning-based optimizer, called LRCO (Learning for Robust Combinatorial Optimization), which quickly outputs a robust solution in the presence of uncertain context. LRCO leverages a pair of learning-based optimizers -- one for the minimizer and the other for the maximizer -- that use their respective objective functions as losses and can be trained without the need of labels for training problem instances. To evaluate the performance of LRCO, we perform simulations for the task offloading problem in vehicular edge computing. Our results highlight that LRCO can greatly reduce the worst-case cost and improve robustness, while having a very low runtime complexity.
【3】 Distributed and Stochastic Optimization Methods with Gradient Compression and Local Steps 标题:具有梯度压缩和局部步长的分布式随机优化方法 链接:https://arxiv.org/abs/2112.10645
作者:Eduard Gorbunov 机构:Supervisors:, D.Sc., Professor at MIPT, Alexander Gasnikov, Ph.D., Professor at KAUST, Peter Richtárik, A Thesis Submitted for the Degree of Doctor of Philosophy, Phystech School of Applied Mathematics and Informatics, Moscow Institute of Physics and Technology 备注:PhD thesis, 416 pages, 27 figures 摘要:在这篇论文中,我们提出了新的理论框架,用于分析具有误差补偿和局部更新的随机和分布式方法。利用这些框架,我们开发了20多种新的优化方法,包括第一种线性收敛的误差补偿SGD和第一种针对任意异质局部函数的线性收敛局部SGD。此外,本文还包含了几种新的分布式非凸优化问题的无偏压缩方法。对于所考虑的问题,这些方法得到的复杂度结果优于以前最著名的结果。最后,我们提出了一种新的可扩展的分散容错分布式方法,并在合理的假设下,推导了该方法与集中式局部SGD方法相匹配的迭代复杂度界。 摘要:In this thesis, we propose new theoretical frameworks for the analysis of stochastic and distributed methods with error compensation and local updates. Using these frameworks, we develop more than 20 new optimization methods, including the first linearly converging Error-Compensated SGD and the first linearly converging Local-SGD for arbitrarily heterogeneous local functions. Moreover, the thesis contains several new distributed methods with unbiased compression for distributed non-convex optimization problems. The derived complexity results for these methods outperform the previous best-known results for the considered problems. Finally, we propose a new scalable decentralized fault-tolerant distributed method, and under reasonable assumptions, we derive the iteration complexity bounds for this method that match the ones of centralized Local-SGD.
【4】 Quasi-uniform designs with optimal and near-optimal uniformity constant 标题:具有最优和接近最优均匀性常数的拟均匀设计 链接:https://arxiv.org/abs/2112.10401
作者:Luc Pronzato,Anatoly Zhigljavsky 摘要:设计是给定集合$X$中不同点的集合,假定该集合是$R^d$的一个紧凑子集,设计的网格比率是其填充距离与其分离半径的比率。嵌套设计序列的均匀性常数是设计网格比的最小上限。我们推导了这个一致性常数的一个下界,并证明了一个简单的贪婪构造可以达到这个下界。然后,我们扩展此方案,以便在设计和施工中具有更大的灵活性。 摘要:A design is a collection of distinct points in a given set $X$, which is assumed to be a compact subset of $R^d$, and the mesh-ratio of a design is the ratio of its fill distance to its separation radius. The uniformity constant of a sequence of nested designs is the smallest upper bound for the mesh-ratios of the designs. We derive a lower bound on this uniformity constant and show that a simple greedy construction achieves this lower bound. We then extend this scheme to allow more flexibility in the design construction.
【5】 Quantum Approximate Optimization Algorithm applied to the binary perceptron 标题:量子近似优化算法在二元感知器中的应用 链接:https://arxiv.org/abs/2112.10219
作者:Pietro Torta,Glen B. Mbeng,Carlo Baldassi,Riccardo Zecchina,Giuseppe E. Santoro 机构:SISSA, Via Bonomea , I-, Trieste, Italy, Universit¨at Innsbruck, Technikerstraße , a, A-, Innsbruck, Austria, Department of Computing Sciences, Bocconi University, Milan, Italy, International Centre for Theoretical Physics (ICTP), P.O.Box , I-, Trieste, Italy 备注:14 pages, 9 figures 摘要:我们将数字化量子退火(QA)和量子近似优化算法(QOA)应用于人工神经网络中监督学习的一个范例任务:二元感知器突触权重的优化。经典哈密顿量的特征是高度非局域的多自旋相互作用,这与量子自旋链在MaxCut或量子自旋链基态制备中的应用不同。然而,我们为QOA参数的最优光滑解的存在提供了证据,这在同一问题的典型实例中是可转移的,并且我们在数值上证明了QOA比传统的QA有更好的性能。我们还调查了QOA优化景观几何在该问题中的作用,表明在QA中遇到的间隙闭合转换的不利影响也会对QOA实施的性能产生负面影响。 摘要:We apply digitized Quantum Annealing (QA) and Quantum Approximate Optimization Algorithm (QAOA) to a paradigmatic task of supervised learning in artificial neural networks: the optimization of synaptic weights for the binary perceptron. At variance with the usual QAOA applications to MaxCut, or to quantum spin-chains ground state preparation, the classical Hamiltonian is characterized by highly non-local multi-spin interactions. Yet, we provide evidence for the existence of optimal smooth solutions for the QAOA parameters, which are transferable among typical instances of the same problem, and we prove numerically an enhanced performance of QAOA over traditional QA. We also investigate on the role of the QAOA optimization landscape geometry in this problem, showing that the detrimental effect of a gap-closing transition encountered in QA is also negatively affecting the performance of our implementation of QAOA.
【6】 Probabilistic Inverse Optimal Transport 标题:概率逆最优运输 链接:https://arxiv.org/abs/2112.09754
作者:Wei-Ting Chiu,Pei Wang,Patrick Shafto 机构:Department of Mathematics and Computer Science, Rutgers University Newark, NJ , School of Mathematics, Institute for Advanced Study (IAS), Princeton NJ 备注:18 pages, 9 figures 摘要:最优运输(OT)形式化了在给定成本矩阵的概率测度之间寻找最优耦合的问题。推断给定耦合成本的逆问题是逆最优传输(IOT)。物联网比OT更难理解。我们使用熵正则化OT研究中的工具对物联网的特性进行形式化和系统化分析。理论贡献包括交叉比等效成本流形的表征、模型先验的含义以及MCMC采样器的推导。经验贡献包括在基本示例上可视化交叉比等效效应和验证理论结果的模拟。 摘要:Optimal transport (OT) formalizes the problem of finding an optimal coupling between probability measures given a cost matrix. The inverse problem of inferring the cost given a coupling is Inverse Optimal Transport (IOT). IOT is less well understood than OT. We formalize and systematically analyze the properties of IOT using tools from the study of entropy-regularized OT. Theoretical contributions include characterization of the manifold of cross-ratio equivalent costs, the implications of model priors, and derivation of an MCMC sampler. Empirical contributions include visualizations of cross-ratio equivalent effect on basic examples and simulations validating theoretical results.
预测|估计(3篇)
【1】 Efficient Wind Speed Nowcasting with GPU-Accelerated Nearest Neighbors Algorithm 标题:基于GPU加速近邻算法的高效风速预报 链接:https://arxiv.org/abs/2112.10408
作者:Arnaud Pannatier,Ricardo Picatoste,François Fleuret 机构:Fran¸cois Fleuret ‡ 备注:9 pages, 5 figures, accepted at Siam Data Mining 2022 (SDM 2022) 摘要:本文提出了一种简单而有效的高空风力临近预报管道。它有效地处理了飞机在整个空域记录的大量实时数据,并以良好的精度重建了风场。它为数据集中的每个点创建一个唯一的上下文,然后从中推断。由于创建这样的上下文是计算密集型的,本文提出了一种新的算法,该算法通过有效地获取数据集中的最近邻来减少时间和内存开销,该数据集中的元素沿平滑的轨迹组织,可以用分段线性结构近似。我们介绍了一种通过代数张量运算实现的高效而精确的策略,该策略非常适合现代基于GPU的计算基础设施。该方法采用可伸缩的欧几里德度量,允许沿一维屏蔽数据点。当应用时,该方法比普通的欧几里德k-NN和其他著名的数据选择方法(如KDTrees)更有效,并提供了几倍的加速。我们在PyTorch中提供了一个实现和一个新的数据集,以允许复制经验结果。 摘要:This paper proposes a simple yet efficient high-altitude wind nowcasting pipeline. It processes efficiently a vast amount of live data recorded by airplanes over the whole airspace and reconstructs the wind field with good accuracy. It creates a unique context for each point in the dataset and then extrapolates from it. As creating such context is computationally intensive, this paper proposes a novel algorithm that reduces the time and memory cost by efficiently fetching nearest neighbors in a data set whose elements are organized along smooth trajectories that can be approximated with piece-wise linear structures. We introduce an efficient and exact strategy implemented through algebraic tensorial operations, which is well-suited to modern GPU-based computing infrastructure. This method employs a scalable Euclidean metric and allows masking data points along one dimension. When applied, this method is more efficient than plain Euclidean k-NN and other well-known data selection methods such as KDTrees and provides a several-fold speedup. We provide an implementation in PyTorch and a novel data set to allow the replication of empirical results.
【2】 SSDNet: State Space Decomposition Neural Network for Time Series Forecasting 标题:SSDNet:用于时间序列预测的状态空间分解神经网络 链接:https://arxiv.org/abs/2112.10251
作者:Yang Lin,Irena Koprinska,Mashud Rana 机构:School of Computer Science, The University of Sydney, Sydney, Australia, Data, CSIRO 备注:ICDM 2021 Regular paper 摘要:在本文中,我们提出了SSDNet,一种新的时间序列预测深度学习方法。SSDNet将Transformer结构与状态空间模型相结合,以提供概率和可解释的预测,包括趋势和季节性成分以及对预测非常重要的先前时间步骤。Transformer结构用于学习时间模式并直接有效地估计状态空间模型的参数,无需卡尔曼滤波器。我们在五个数据集上综合评估了SSDNet的性能,表明SSDNet在准确性和速度方面是一种有效的方法,优于最先进的深度学习和统计方法,并且能够提供有意义的趋势和季节性成分。 摘要:In this paper, we present SSDNet, a novel deep learning approach for time series forecasting. SSDNet combines the Transformer architecture with state space models to provide probabilistic and interpretable forecasts, including trend and seasonality components and previous time steps important for the prediction. The Transformer architecture is used to learn the temporal patterns and estimate the parameters of the state space model directly and efficiently, without the need for Kalman filters. We comprehensively evaluate the performance of SSDNet on five data sets, showing that SSDNet is an effective method in terms of accuracy and speed, outperforming state-of-the-art deep learning and statistical methods, and able to provide meaningful trend and seasonality components.
【3】 Stable Conformal Prediction Sets 标题:稳定的共形预测集 链接:https://arxiv.org/abs/2112.10224
作者:Eugene Ndiaye 机构:Georgia Institute of Technology (ISyE) 摘要:当观察变量序列$(x_1,y_1)时。。。,(x_n,y_n)$,共形预测是一种方法,它允许通过仅假设数据的分布是可交换的来估计给定$x_{n+1}$的$y_{n+1}$的置信集。虽然有吸引力,但这种集合的计算通常是不可行的,例如,当未知变量$y{n+1}$是连续的时。在本文中,我们将保角预测技术与算法稳定性界相结合,推导出一个可计算的预测集,该预测集采用单一模型拟合。我们进行了一些数值实验,说明了当样本量足够大时,我们估计的严密性。 摘要:When one observes a sequence of variables $(x_1, y_1), ..., (x_n, y_n)$, conformal prediction is a methodology that allows to estimate a confidence set for $y_{n+1}$ given $x_{n+1}$ by merely assuming that the distribution of the data is exchangeable. While appealing, the computation of such set turns out to be infeasible in general, e.g. when the unknown variable $y_{n+1}$ is continuous. In this paper, we combine conformal prediction techniques with algorithmic stability bounds to derive a prediction set computable with a single model fit. We perform some numerical experiments that illustrate the tightness of our estimation when the sample size is sufficiently large.
其他神经网络|深度学习|模型|建模(25篇)
【1】 Learning Spatio-Temporal Specifications for Dynamical Systems 标题:学习动态系统的时空规范 链接:https://arxiv.org/abs/2112.10714
作者:Suhail Alsalehi,Erfan Aasi,Ron Weiss,Calin Belta 机构:: Division of Systems Engineering, Boston University, Boston, MA, USA, : Mechanical Engineering Department, Boston University, Boston, MA, USA, : Biological Engineering Department, Massachusetts Institute of Technology, Cambridge, MA, USA 备注:12 pages, submitted to L4DC 2021 摘要:从数据中学习动力系统的属性提供了重要的见解,帮助我们理解此类系统并缓解不期望的结果。在这项工作中,我们提出了一个从数据中学习时空(ST)属性作为形式逻辑规范的框架。我们介绍了SVM-STL,它是信号时序逻辑(STL)的一个扩展,能够描述各种具有时变空间模式的动态系统的时空特性。我们的框架利用机器学习技术从空间模式序列给出的系统执行中学习SVM-STL规范。我们提出了处理标记和未标记数据的方法。此外,给定SVM-STL规范形式的系统需求,我们提供了一种参数合成方法,以找到最大程度满足此类规范的参数。我们的学习框架和参数综合方法在一个反应扩散系统的例子中得到了展示。 摘要:Learning dynamical systems properties from data provides important insights that help us understand such systems and mitigate undesired outcomes. In this work, we propose a framework for learning spatio-temporal (ST) properties as formal logic specifications from data. We introduce SVM-STL, an extension of Signal Signal Temporal Logic (STL), capable of specifying spatial and temporal properties of a wide range of dynamical systems that exhibit time-varying spatial patterns. Our framework utilizes machine learning techniques to learn SVM-STL specifications from system executions given by sequences of spatial patterns. We present methods to deal with both labeled and unlabeled data. In addition, given system requirements in the form of SVM-STL specifications, we provide an approach for parameter synthesis to find parameters that maximize the satisfaction of such specifications. Our learning framework and parameter synthesis approach are showcased in an example of a reaction-diffusion system.
【2】 Efficient Large Scale Language Modeling with Mixtures of Experts 标题:高效的混合专家大规模语言建模 链接:https://arxiv.org/abs/2112.10684
作者:Mikel Artetxe,Shruti Bhosale,Naman Goyal,Todor Mihaylov,Myle Ott,Sam Shleifer,Xi Victoria Lin,Jingfei Du,Srinivasan Iyer,Ramakanth Pasunuru,Giri Anantharaman,Xian Li,Shuohui Chen,Halil Akin,Mandeep Baines,Louis Martin,Xing Zhou,Punit Singh Koura,Brian O'Horo,Jeff Wang,Luke Zettlemoyer,Mona Diab,Zornitsa Kozareva,Ves Stoyanov 机构:Meta AI 摘要:混合专家层(MOE)通过条件计算实现语言模型的有效缩放。本文对自回归MoE语言模型与稠密模型在各种环境下的扩展进行了详细的实证研究:域内和域外语言建模、零触发和少触发以及完全微调。除了微调之外,我们发现MOE的计算效率更高。在较低的训练预算下,MOE可以使用比计算量少4倍的$\sim$来匹配密集模型的性能。这一差距在规模上缩小,但我们最大的MoE模型(1.1T参数)始终优于计算等效密集模型(6.7B参数)。总的来说,不同任务和领域之间的性能差距差异很大,这表明MoE和稠密模型的概括方式不同,值得进一步研究。我们将我们的代码和模型公开供研究使用。 摘要:Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full fine-tuning. With the exception of fine-tuning, we find MoEs to be substantially more compute efficient. At more modest training budgets, MoEs can match the performance of dense models using $\sim$4 times less compute. This gap narrows at scale, but our largest MoE model (1.1T parameters) consistently outperforms a compute-equivalent dense model (6.7B parameters). Overall, this performance gap varies greatly across tasks and domains, suggesting that MoE and dense models generalize differently in ways that are worthy of future study. We make our code and models publicly available for research use.
【3】 Bayesian neural network priors for edge-preserving inversion 标题:保边反演的贝叶斯神经网络先验 链接:https://arxiv.org/abs/2112.10663
作者:Chen Li,Matthew Dunlop,Georg Stadler 机构:Courant Institute of Mathematical Sciences, New York University, Mercer Street, New York, NY , USA 摘要:我们考虑贝叶斯逆问题,其中未知状态被假定为具有不连续结构的函数先验。根据已有的关于重尾权神经网络的无限宽度极限的结果,引入了一类基于神经网络输出的先验分布。我们从理论上证明,即使在网络宽度有限的情况下,来自这些先验的样本也具有理想的不连续性质,这使得它们适合于边缘保持反演。数值上考虑了在一维和二维空间域上定义的反褶积问题,以说明这些先验的有效性;MAP估计、维鲁棒MCMC采样和基于集合的近似用于探测后验分布。点估计的精度超过了从非重尾先验获得的精度,不确定性估计提供了更有用的定性信息。 摘要:We consider Bayesian inverse problems wherein the unknown state is assumed to be a function with discontinuous structure a priori. A class of prior distributions based on the output of neural networks with heavy-tailed weights is introduced, motivated by existing results concerning the infinite-width limit of such networks. We show theoretically that samples from such priors have desirable discontinuous-like properties even when the network width is finite, making them appropriate for edge-preserving inversion. Numerically we consider deconvolution problems defined on one- and two-dimensional spatial domains to illustrate the effectiveness of these priors; MAP estimation, dimension-robust MCMC sampling and ensemble-based approximations are utilized to probe the posterior distribution. The accuracy of point estimates is shown to exceed those obtained from non-heavy tailed priors, and uncertainty estimates are shown to provide more useful qualitative information.
【4】 Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models 标题:Latte:用于评估基于潜在的产生式模型的跨框架Python包 链接:https://arxiv.org/abs/2112.10638
作者:Karn N. Watcharasupat,Junyoung Lee,Alexander Lerch 机构:Center for Music Technology, Georgia Institute of Technology, Atlanta, GA, USA, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 备注:Under review for Software Impacts Vol 11 摘要:Latte(用于潜在张量评估)是一个Python库,用于评估解纠缠学习和可控生成领域中基于潜在的生成模型。Latte与PyTorch和TensorFlow/Keras兼容,并提供功能性和模块化API,可轻松扩展以支持其他深度学习框架。使用基于NumPy和框架无关的实现,Latte确保了可重复、一致和确定性的度量计算,无论选择何种深度学习框架。 摘要:Latte (for LATent Tensor Evaluation) is a Python library for evaluation of latent-based generative models in the fields of disentanglement learning and controllable generation. Latte is compatible with both PyTorch and TensorFlow/Keras, and provides both functional and modular APIs that can be easily extended to support other deep learning frameworks. Using NumPy-based and framework-agnostic implementation, Latte ensures reproducible, consistent, and deterministic metric calculations regardless of the deep learning framework of choice.
【5】 Turbo-Sim: a generalised generative model with a physical latent space 标题:Turbo-Sim:一种具有物理潜在空间的广义生成模型 链接:https://arxiv.org/abs/2112.10629
作者:Guillaume Quétant,Mariia Drozdova,Vitaliy Kinakh,Tobias Golling,Slava Voloshynovkiy 机构:Department of Computer Science, University of Geneva, Carouge, Department of Particle Physics, University of Geneva, Genève 备注:8 pages, 2 figures, 1 table 摘要:我们介绍了turbosim,这是一个从信息论原理衍生出来的通用自动编码器框架,可以用作生成模型。通过最大化编码器和解码器的输入和输出之间的互信息,我们能够重新发现通常在对抗性自动编码器和生成性对抗性网络以及各种更复杂的相关模型中发现的损失项。我们的通用框架使这些模型在数学上可以解释,并通过分别设置每个损失项的权重,允许新模型的多样性。该框架还独立于编码器和解码器的固有架构,因此为整个网络的构建块留下了广泛的选择。我们将turbosim应用于对撞机的物理生成问题:几个粒子的特性在碰撞之后从理论空间转换到观测空间,在实验检测之后。 摘要:We present Turbo-Sim, a generalised autoencoder framework derived from principles of information theory that can be used as a generative model. By maximising the mutual information between the input and the output of both the encoder and the decoder, we are able to rediscover the loss terms usually found in adversarial autoencoders and generative adversarial networks, as well as various more sophisticated related models. Our generalised framework makes these models mathematically interpretable and allows for a diversity of new ones by setting the weight of each loss term separately. The framework is also independent of the intrinsic architecture of the encoder and the decoder thus leaving a wide choice for the building blocks of the whole network. We apply Turbo-Sim to a collider physics generation problem: the transformation of the properties of several particles from a theory space, right after the collision, to an observation space, right after the detection in an experiment.
【6】 Hybrid Bayesian network discovery with latent variables by scoring multiple interventions 标题:基于多干预评分的潜在变量混合贝叶斯网络发现 链接:https://arxiv.org/abs/2112.10574
作者:Kiattikun Chobtham,Anthony C. Constantinou,Neville K. Kitson 机构:. Bayesian Artificial Intelligence research lab, Risk and Information Management (RIM) research group, School of Electronic Engineering and Computer Science, Queen Mary University of London (QMUL), London, UK, E,NS. 摘要:在贝叶斯网络(BNs)中,边的方向对于因果推理和推理至关重要。然而,马尔可夫等价类的考虑意味着并不总是能够确定边缘方向,这就是为什么许多BN结构学习算法不能从纯观测数据中确定所有边缘的方向。此外,潜在的混杂因素可能导致假阳性边缘。提出的解决这些问题的方法相对较少。在这项工作中,我们提出了混合mFGS-BS(多数规则和快速贪婪等价搜索与贝叶斯评分)算法,用于从包含观测数据集和一个或多个干预数据集的离散数据中进行结构学习。该算法假设潜在变量存在因果关系不足,并生成部分祖先图(PAG)。结构学习依赖于一种混合方法和一种新的贝叶斯评分范式,该范式计算每个有向边被添加到学习图中的后验概率。基于高达109个变量和10k样本大小的著名网络的实验结果表明,mFGS-BS相对于现有技术提高了结构学习精度,并且计算效率高。 摘要:In Bayesian Networks (BNs), the direction of edges is crucial for causal reasoning and inference. However, Markov equivalence class considerations mean it is not always possible to establish edge orientations, which is why many BN structure learning algorithms cannot orientate all edges from purely observational data. Moreover, latent confounders can lead to false positive edges. Relatively few methods have been proposed to address these issues. In this work, we present the hybrid mFGS-BS (majority rule and Fast Greedy equivalence Search with Bayesian Scoring) algorithm for structure learning from discrete data that involves an observational data set and one or more interventional data sets. The algorithm assumes causal insufficiency in the presence of latent variables and produces a Partial Ancestral Graph (PAG). Structure learning relies on a hybrid approach and a novel Bayesian scoring paradigm that calculates the posterior probability of each directed edge being added to the learnt graph. Experimental results based on well-known networks of up to 109 variables and 10k sample size show that mFGS-BS improves structure learning accuracy relative to the state-of-the-art and it is computationally efficient.
【7】 General Greedy De-bias Learning 标题:一般的贪婪去偏向学习 链接:https://arxiv.org/abs/2112.10572
作者:Xinzhe Han,Shuhui Wang,Chi Su,Qingming Huang,Qi Tian 备注:under-review 摘要:神经网络通常依靠数据集的虚假相关性而不是感兴趣任务的内在属性进行预测,从而面临分布外(OOD)测试数据的急剧退化。现有的去偏见学习框架试图通过偏见注释捕获特定的数据集偏见,但它们无法处理复杂的OOD场景。其他人通过对低能力偏差模型或损失的特殊设计隐式地识别数据集偏差,但当训练和测试数据来自同一分布时,数据集偏差会降低。在本文中,我们提出了一个通用的贪婪去偏差学习框架(GGD),它在函数空间中贪婪地训练有偏差模型和梯度下降等基本模型。它鼓励基础模型将重点放在有偏模型难以解决的示例上,从而在测试阶段保持对虚假相关性的鲁棒性。GGD在很大程度上提高了模型在各种任务上的OOD泛化能力,但有时会高估偏差水平,并在分布内测试中退化。我们进一步重新分析了GGD的集成过程,并在课程学习的启发下,将课程规则化引入GGD,实现了分布内和分布外性能之间的良好平衡。大量的图像分类、对抗性问答和可视化问答实验证明了该方法的有效性。GGD可以在具有先验知识的任务特定偏差模型和不具有先验知识的自集成偏差模型的设置下学习一个更健壮的基础模型。 摘要:Neural networks often make predictions relying on the spurious correlations from the datasets rather than the intrinsic properties of the task of interest, facing sharp degradation on out-of-distribution (OOD) test data. Existing de-bias learning frameworks try to capture specific dataset bias by bias annotations, they fail to handle complicated OOD scenarios. Others implicitly identify the dataset bias by the special design on the low capability biased model or the loss, but they degrade when the training and testing data are from the same distribution. In this paper, we propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. It encourages the base model to focus on examples that are hard to solve with biased models, thus remaining robust against spurious correlations in the test stage. GGD largely improves models' OOD generalization ability on various tasks, but sometimes over-estimates the bias level and degrades on the in-distribution test. We further re-analyze the ensemble process of GGD and introduce the Curriculum Regularization into GGD inspired by curriculum learning, which achieves a good trade-off between in-distribution and out-of-distribution performance. Extensive experiments on image classification, adversarial question answering, and visual question answering demonstrate the effectiveness of our method. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
【8】 Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP 标题:在词与字之间:自然语言处理中的开放词汇建模与词汇化简史 链接:https://arxiv.org/abs/2112.10508
作者:Sabrina J. Mielke,Zaid Alyafeai,Elizabeth Salesky,Colin Raffel,Manan Dey,Matthias Gallé,Arun Raja,Chenglei Si,Wilson Y. Lee,Benoît Sagot,Samson Tan 机构:BigScience Workshop Tokenization Working Group, Johns Hopkins University, HuggingFace, King Fahd University of Petroleum and Minerals, SAP, Naver Labs Europe, Institute for Infocomm Research, ASTAR Singapore, University of Maryland, Inria Paris 备注:15 page preprint 摘要:我们要建模的文本单位是什么?从字节到多词表达式,文本可以以多种粒度进行分析和生成。直到最近,大多数自然语言处理(NLP)模型都是对单词进行操作的,将其视为离散的原子标记,但从字节对编码(BPE)开始,基于子单词的方法在许多领域占据主导地位,支持小型词汇表,同时仍然允许快速推理。道路终点是字符级模型还是字节级处理?在这项调查中,我们通过展示如何提出和评估单词和字符的混合方法以及基于学习切分的子单词方法,将前神经和神经时代的几项工作联系起来。我们的结论是,对于所有应用程序来说,都存在而且可能永远不会有一个银弹单一解决方案,认真考虑标记化对于许多应用程序仍然很重要。 摘要:What are the units of text that we want to model? From bytes to multi-word expressions, text can be analyzed and generated at many granularities. Until recently, most natural language processing (NLP) models operated over words, treating those as discrete and atomic tokens, but starting with byte-pair encoding (BPE), subword-based approaches have become dominant in many areas, enabling small vocabularies while still allowing for fast inference. Is the end of the road character-level model or byte-level processing? In this survey, we connect several lines of work from the pre-neural and neural era, by showing how hybrid approaches of words and characters as well as subword-based approaches based on learned segmentation have been proposed and evaluated. We conclude that there is and likely will never be a silver bullet singular solution for all applications and that thinking seriously about tokenization remains important for many applications.
【9】 Towards Trustworthy Cross-patient Model Development 标题:走向可信赖的交叉病人模式发展 链接:https://arxiv.org/abs/2112.10441
作者:Ali El-Merhi,Helena Odenstedt Hergés,Linda Block,Mikael Elam,Richard Vithal,Jaquette Liljencrantz,Miroslaw Staron 机构:Sahlgrenska Academy, Institute of Clinical Sciences, University of Gothenburg, Gothenburg, Sweden, Department of anesthesia and intensive care, Sahlgrenska university hospital, Gothenburg, Sweden 备注:None 摘要:机器学习在医学中用于支持医生进行检查、诊断和预测结果。最具活力的领域之一是使用重症监护病房患者生成的健康数据。本文的目标是通过结合患者的人口统计数据和生理数据来演示如何推进跨患者ML模型的开发。我们使用了一组接受颈动脉端部切除术(CEA)的患者,研究了所有患者和一次一名患者在模型性能和可解释性方面的差异。结果表明,患者的人口统计学特征对绩效、可解释性和可信度有很大影响。我们的结论是,通过根据患者的人口统计学和手术程序仔细选择模型和患者,我们可以在跨患者的情况下增加对ML模型的信任。 摘要:Machine learning is used in medicine to support physicians in examination, diagnosis, and predicting outcomes. One of the most dynamic area is the usage of patient generated health data from intensive care units. The goal of this paper is to demonstrate how we advance cross-patient ML model development by combining the patient's demographics data with their physiological data. We used a population of patients undergoing Carotid Enderarterectomy (CEA), where we studied differences in model performance and explainability when trained for all patients and one patient at a time. The results show that patients' demographics has a large impact on the performance and explainability and thus trustworthiness. We conclude that we can increase trust in ML models in a cross-patient context, by careful selection of models and patients based on their demographics and the surgical procedure.
【10】 Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks 标题:时变网络上减方差的分散随机近邻梯度下降 链接:https://arxiv.org/abs/2112.10389
作者:Xuanjie Li,Yuedong Xu,Jessie Hui Wang,Xin Wang,John C. S. Lui 机构:Xin Wang is with the Key Laboratory of EMW Information (MoE) 备注:16 pages, 14 figures 摘要:在分散学习中,一个节点网络合作最小化一个总体目标函数,该目标函数通常是其局部目标的有限和,并加入一个非光滑正则项以获得更好的泛化能力。分散随机近端梯度(DSPG)方法是训练这类学习模型的常用方法,但随机梯度的方差会影响收敛速度。在本文中,我们提出了一种新的算法,即DPSVRG,通过利用方差缩减技术来加速分散训练。其基本思想是在每个节点中引入一个估计器,该估计器周期性地跟踪局部全梯度,以便在每次迭代中校正随机梯度。通过将我们的分散算法转化为具有方差缩减的集中不精确近似梯度算法,并控制误差序列的界,我们证明了对于一般凸目标,DPSVRG以$O(1/T)$的速率加上一个以$T$作为迭代次数的非光滑项收敛,而DSPG的收敛速度为$O(\frac{1}{\sqrt{T})$。我们在不同应用、网络拓扑和学习模型上的实验表明,DPSVRG的收敛速度比DSPG快得多,并且随着训练时间的延长,DPSVRG的损失函数逐渐减小。 摘要:In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives, and incorporates a non-smooth regularization term for the better generalization ability. Decentralized stochastic proximal gradient (DSPG) method is commonly used to train this type of learning models, while the convergence rate is retarded by the variance of stochastic gradients. In this paper, we propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique. The basic idea is to introduce an estimator in each node, which tracks the local full gradient periodically, to correct the stochastic gradient at each iteration. By transforming our decentralized algorithm into a centralized inexact proximal gradient algorithm with variance reduction, and controlling the bounds of error sequences, we prove that DPSVRG converges at the rate of $O(1/T)$ for general convex objectives plus a non-smooth term with $T$ as the number of iterations, while DSPG converges at the rate $O(\frac{1}{\sqrt{T}})$. Our experiments on different applications, network topologies and learning models demonstrate that DPSVRG converges much faster than DSPG, and the loss function of DPSVRG decreases smoothly along with the training epochs.
【11】 Feature Selection for Efficient Local-to-Global Bayesian Network Structure Learning 标题:高效局部到全局贝叶斯网络结构学习的特征选择 链接:https://arxiv.org/abs/2112.10369
作者:Kui Yu,Zhaolong Ling,Lin Liu,Hao Wang,Jiuyong Li 机构: Hefei University of Technology, University of South Australia 摘要:局部到全局学习方法在贝叶斯网络结构学习中起着重要作用。现有的局部到全局学习算法首先通过学习数据集中每个变量的MB(马尔可夫覆盖)或PC(父和子)来构造DAG(有向无环图)的骨架,然后在骨架中确定边的方向。然而,现有的MB或PC学习方法通常计算成本很高,尤其是对于大型BN,这导致了低效的局部到全局学习算法。为了解决这个问题,本文提出了一种基于特征选择的局部到全局学习方法。具体来说,我们首先分析了众所周知的最小冗余和最大相关性(MRMR)特征选择方法的基本原理,该方法用于学习变量的PC集。在此基础上,我们提出了一种有效的基于特征选择的结构学习(F2SL)方法,用于从局部到全局的BN结构学习。F2SL方法首先使用MRMR方法学习DAG骨架,然后在骨架中确定边的方向。利用独立性测试或分数函数来定向边,我们将F2SL方法实例化为两个新算法,F2SL-c(使用独立性测试)和F2SL-s(使用分数函数)。与现有的局部到全局BN学习算法相比,实验验证了本文提出的算法具有更高的效率和更具竞争力的结构学习质量。 摘要:Local-to-global learning approach plays an essential role in Bayesian network (BN) structure learning. Existing local-to-global learning algorithms first construct the skeleton of a DAG (directed acyclic graph) by learning the MB (Markov blanket) or PC (parents and children) of each variable in a data set, then orient edges in the skeleton. However, existing MB or PC learning methods are often computationally expensive especially with a large-sized BN, resulting in inefficient local-to-global learning algorithms. To tackle the problem, in this paper, we develop an efficient local-to-global learning approach using feature selection. Specifically, we first analyze the rationale of the well-known Minimum-Redundancy and Maximum-Relevance (MRMR) feature selection approach for learning a PC set of a variable. Based on the analysis, we propose an efficient F2SL (feature selection-based structure learning) approach to local-to-global BN structure learning. The F2SL approach first employs the MRMR approach to learn a DAG skeleton, then orients edges in the skeleton. Employing independence tests or score functions for orienting edges, we instantiate the F2SL approach into two new algorithms, F2SL-c (using independence tests) and F2SL-s (using score functions). Compared to the state-of-the-art local-to-global BN learning algorithms, the experiments validated that the proposed algorithms in this paper are more efficient and provide competitive structure learning quality than the compared algorithms.
【12】 Inverse deep learning methods and benchmarks for artificial electromagnetic material design 标题:人工电磁材料设计的逆深度学习方法和基准 链接:https://arxiv.org/abs/2112.10254
作者:Simiao Ren,Ashwin Mahendra,Omar Khatib,Yang Deng,Willie J. Padilla,Jordan M. Malof 机构:Department of Electrical and Computer Engineering, Duke University, Durham, NC 摘要:深度学习(DL)反演技术提高了人工电磁材料(AEM)设计的速度,并改善了最终设备的质量。许多DL反演技术已经在许多AEM设计任务中取得了成功,但要比较、对比和评估各种技术,澄清反演问题的潜在不适定性至关重要。在这里,我们回顾了最先进的方法,并对AEM设计中的深度学习逆方法、可逆和条件可逆神经网络进行了全面综述。我们制作了易于访问且可快速实现的AEM设计基准,它提供了一种有效确定最适合解决不同设计挑战的DL技术的方法。我们的方法以重复模拟的约束和一个易于集成的度量为指导,我们提出的度量表示任何AEM设计问题的相对不适定性。我们发现,当问题变得越来越不适定时,无论模拟约束如何,带边界损失的神经伴随(NA)都能更快地生成更好的解。在简单的AEM设计任务中,当模拟受到限制时,直接神经网络(NN)表现更好,而通过混合密度网络(MDN)和条件变分自动编码器(VAE)预测的几何结构可以通过持续采样和重新模拟得到改善。 摘要:Deep learning (DL) inverse techniques have increased the speed of artificial electromagnetic material (AEM) design and improved the quality of resulting devices. Many DL inverse techniques have succeeded on a number of AEM design tasks, but to compare, contrast, and evaluate assorted techniques it is critical to clarify the underlying ill-posedness of inverse problems. Here we review state-of-the-art approaches and present a comprehensive survey of deep learning inverse methods and invertible and conditional invertible neural networks to AEM design. We produce easily accessible and rapidly implementable AEM design benchmarks, which offers a methodology to efficiently determine the DL technique best suited to solving different design challenges. Our methodology is guided by constraints on repeated simulation and an easily integrated metric, which we propose expresses the relative ill-posedness of any AEM design problem. We show that as the problem becomes increasingly ill-posed, the neural adjoint with boundary loss (NA) generates better solutions faster, regardless of simulation constraints. On simpler AEM design tasks, direct neural networks (NN) fare better when simulations are limited, while geometries predicted by mixture density networks (MDN) and conditional variational auto-encoders (VAE) can improve with continued sampling and re-simulation.
【13】 Modelling of Received Signals in Molecular Communication Systems based machine learning: Comparison of azure machine learning and Python tools 标题:基于机器学习的分子通信系统接收信号建模:Azure机器学习与Python工具的比较 链接:https://arxiv.org/abs/2112.10214
作者:Soha Mohamed,Mahmoud S. Fayed 机构: School of Computer Science and Technology, Harbin Institute of Technology, Harbin , China., College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia. 摘要:在纳米网络上实现的分子通信(MC)在能量效率、可靠性和鲁棒性方面具有极其诱人的特性。尽管如此,令人难以置信的缓慢分子扩散和高可变性环境的影响仍然未知。通信系统的分析和设计通常依赖于开发描述通信信道的数学模型。然而,在某些系统中,如MC系统,潜在的通道模型是未知的,在MC系统中,化学信号被用来传递信息。在这些情况下,需要一种新的分析和设计方法。在本文中,我们专注于MC系统的一个关键方面,建模MC接收信号直到时间t,并证明使用ML工具有希望训练检测器,这些检测器可以在没有任何信道模型信息的情况下很好地执行。机器学习(ML)是一种智能方法,在该领域已显示出良好的效果。本文将Azure机器学习(Azure ML)应用于柔性路面养护回归问题及解决方案。对于预测,使用四个参数作为输入:接收器半径、发射器半径、接收器和发射器之间的距离以及扩散系数,而输出是接收信号的mAP(平均精度)。Azure ML使算法能够从数据和经验中学习并完成任务,而无需编码。在所建立的Azure ML中,选择了boost决策树回归、贝叶斯线性回归、神经网络和决策林回归等回归算法。选择最佳性能作为最优性标准。最后,演示了Azure ML工具与本地PC上开发人员使用的基于编程的工具(Python)的潜在优势的比较 摘要:Molecular communication (MC) implemented on Nano networks has extremely attractive characteristics in terms of energy efficiency, dependability, and robustness. Even though, the impact of incredibly slow molecule diffusion and high variability environments remains unknown. Analysis and designs of communication systems usually rely on developing mathematical models that describe the communication channel. However, the underlying channel models are unknown in some systems, such as MC systems, where chemical signals are used to transfer information. In these cases, a new method to analyze and design is needed. In this paper, we concentrate on one critical aspect of the MC system, modelling MC received signal until time t , and demonstrate that using tools from ML makes it promising to train detectors that can be executed well without any information about the channel model. Machine learning (ML) is one of the intelligent methodologies that has shown promising results in the domain. This paper applies Azure Machine Learning (Azure ML) for flexible pavement maintenance regressions problems and solutions. For prediction, four parameters are used as inputs: the receiver radius, transmitter radius, distance between receiver and transmitter, and diffusion coefficient, while the output is mAP (mean average precision) of the received signal. Azure ML enables algorithms that can learn from data and experiences and accomplish tasks without having to be coded. In the established Azure ML, the regression algorithms such as, boost decision tree regression, Bayesian linear regression, neural network, and decision forest regression are selected. The best performance is chosen as an optimality criterion. Finally, a comparison that shows the potential benefits of Azure ML tool over programmed based tool (Python), used by developers on local PCs, is demonstrated
【14】 RoboAssembly: Learning Generalizable Furniture Assembly Policy in a Novel Multi-robot Contact-rich Simulation Environment 标题:RoboAssembly:在新型多机器人富接触仿真环境中学习泛化家具装配策略 链接:https://arxiv.org/abs/2112.10143
作者:Mingxin Yu,Lin Shao,Zhehuan Chen,Tianhao Wu,Qingnan Fan,Kaichun Mo,Hao Dong 机构: Peking University, Stanford University 备注:Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2022 摘要:零件组装是机器人技术中一项典型但具有挑战性的任务,机器人将一组单独的零件组装成一个完整的形状。在本文中,我们开发了一个用于家具装配的机器人装配仿真环境。我们将零件装配任务描述为一个具体的强化学习问题,并提出了一个机器人学习装配各种椅子的管道。实验表明,当使用看不见的椅子进行测试时,我们的方法在以对象为中心设置下的成功率为74.5%,在完全设置下的成功率为50.0%。我们采用RRT连接算法作为基线,在计算时间显著延长后,成功率仅为18.8%。补充材料和视频可在我们的项目网页上找到。 摘要:Part assembly is a typical but challenging task in robotics, where robots assemble a set of individual parts into a complete shape. In this paper, we develop a robotic assembly simulation environment for furniture assembly. We formulate the part assembly task as a concrete reinforcement learning problem and propose a pipeline for robots to learn to assemble a diverse set of chairs. Experiments show that when testing with unseen chairs, our approach achieves a success rate of 74.5% under the object-centric setting and 50.0% under the full setting. We adopt an RRT-Connect algorithm as the baseline, which only achieves a success rate of 18.8% after a significantly longer computation time. Supplemental materials and videos are available on our project webpage.
【15】 Wasserstein Generative Learning of Conditional Distribution 标题:条件分布的Wasserstein生成学习 链接:https://arxiv.org/abs/2112.10039
作者:Shiao Liu,Xingyu Zhou,Yuling Jiao,Jian Huang 备注:34 pages, 8 figures 摘要:条件分布是描述响应和预测之间关系的基本量。我们提出了一种学习条件分布的Wasserstein生成方法。该方法使用条件生成器将已知分布转换为目标条件分布。通过匹配包含条件生成器的联合分布和目标联合分布,使用Wasserstein距离作为这些联合分布的差异度量,来估计条件生成器。我们建立了由该方法生成的条件抽样分布的非渐近误差界,并证明了在假设数据分布在低维集合上的情况下,该方法能够减轻维数灾难。我们进行了数值实验来验证所提出的方法,并说明了它在条件样本生成、非参数条件密度估计、预测不确定性量化、二元响应数据、图像重建和图像生成中的应用。 摘要:Conditional distribution is a fundamental quantity for describing the relationship between a response and a predictor. We propose a Wasserstein generative approach to learning a conditional distribution. The proposed approach uses a conditional generator to transform a known distribution to the target conditional distribution. The conditional generator is estimated by matching a joint distribution involving the conditional generator and the target joint distribution, using the Wasserstein distance as the discrepancy measure for these joint distributions. We establish non-asymptotic error bound of the conditional sampling distribution generated by the proposed method and show that it is able to mitigate the curse of dimensionality, assuming that the data distribution is supported on a lower-dimensional set. We conduct numerical experiments to validate proposed method and illustrate its applications to conditional sample generation, nonparametric conditional density estimation, prediction uncertainty quantification, bivariate response data, image reconstruction and image generation.
【16】 Continual Learning of a Mixed Sequence of Similar and Dissimilar Tasks 标题:相似和不相似任务混合序列的连续学习 链接:https://arxiv.org/abs/2112.10017
作者:Zixuan Ke,Bing Liu,Xingchang Huang 机构: Department of Computer Science, University of Illinois at Chicago, ETH Zurich 备注:None 摘要:关于任务序列的持续学习的现有研究集中于处理灾难性遗忘,其中任务被假定为不同的,并且几乎没有共享的知识。当任务相似且共享知识时,还进行了一些工作,将以前学到的知识转移到新任务中。就我们所知,还没有人提出过一种技术来学习一系列类似和不同的混合任务,这些任务既可以处理遗忘,也可以前后传递知识。本文提出了一种在同一网络中学习这两类任务的方法。对于不同的任务,该算法侧重于处理遗忘,对于相似的任务,该算法侧重于有选择地转移从以前相似任务中学习到的知识,以改进新的任务学习。此外,该算法自动检测新任务是否与以前的任何任务相似。使用混合任务序列的实证评估证明了所提出模型的有效性。 摘要:Existing research on continual learning of a sequence of tasks focused on dealing with catastrophic forgetting, where the tasks are assumed to be dissimilar and have little shared knowledge. Some work has also been done to transfer previously learned knowledge to the new task when the tasks are similar and have shared knowledge. To the best of our knowledge, no technique has been proposed to learn a sequence of mixed similar and dissimilar tasks that can deal with forgetting and also transfer knowledge forward and backward. This paper proposes such a technique to learn both types of tasks in the same network. For dissimilar tasks, the algorithm focuses on dealing with forgetting, and for similar tasks, the algorithm focuses on selectively transferring the knowledge learned from some similar previous tasks to improve the new task learning. Additionally, the algorithm automatically detects whether a new task is similar to any previous tasks. Empirical evaluation using sequences of mixed tasks demonstrates the effectiveness of the proposed model.
【17】 Learning-based methods to model small body gravity fields for proximity operations: Safety and Robustness 标题:用于近距离操作的基于学习的小物体重力场建模方法:安全性和健壮性 链接:https://arxiv.org/abs/2112.09998
作者:Daniel Neamati,Yashwanth Kumar Nakka,Soon-Jo Chung 机构:California Institute of Technology, Pasadena, CA 备注:Accepted Scitech, AI for Space 摘要:精确的重力场模型对于小天体周围的安全接近操作至关重要。最先进的技术使用球谐函数或高保真多面体形状模型。不幸的是,这些技术可能在小天体表面附近变得不准确,或具有较高的计算成本,特别是对于二进制或异构小天体。新的基于学习的技术不编码预定义的结构,更通用。作为多功能性的交换,基于学习的技术在训练数据域之外可能不那么健壮。在部署过程中,航天器轨道是动力学数据的主要来源。因此,训练数据域应包括航天器轨迹,以准确评估学习模型的安全性和鲁棒性。我们开发了一种基于学习的重力模型的新方法,该方法直接使用航天器过去的轨迹。我们进一步介绍了一种通过比较训练域内外的准确性来评估基于学习的技术的安全性和鲁棒性的方法。我们在两个基于学习的框架:高斯过程和神经网络中演示了这种安全性和鲁棒性方法。根据提供的详细分析,我们根据经验确定了当用于接近操作时,需要对学习的重力模型进行鲁棒性验证。 摘要:Accurate gravity field models are essential for safe proximity operations around small bodies. State-of-the-art techniques use spherical harmonics or high-fidelity polyhedron shape models. Unfortunately, these techniques can become inaccurate near the surface of the small body or have high computational costs, especially for binary or heterogeneous small bodies. New learning-based techniques do not encode a predefined structure and are more versatile. In exchange for versatility, learning-based techniques can be less robust outside the training data domain. In deployment, the spacecraft trajectory is the primary source of dynamics data. Therefore, the training data domain should include spacecraft trajectories to accurately evaluate the learned model's safety and robustness. We have developed a novel method for learning-based gravity models that directly uses the spacecraft's past trajectories. We further introduce a method to evaluate the safety and robustness of learning-based techniques via comparing accuracy within and outside of the training domain. We demonstrate this safety and robustness method for two learning-based frameworks: Gaussian processes and neural networks. Along with the detailed analysis provided, we empirically establish the need for robustness verification of learned gravity models when used for proximity operations.
【18】 Weisfeiler and Leman go Machine Learning: The Story so far 标题:魏斯费勒和莱曼围棋机器学习:到目前为止的故事 链接:https://arxiv.org/abs/2112.09992
作者:Christopher Morris,Yaron Lipman,Haggai Maron,Bastian Rieck,Nils M. Kriege,Martin Grohe,Matthias Fey,Karsten Borgwardt 机构:McGill University and Mila – Quebec AI Institute, Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, NVIDIA Research, AIDOS Lab, Institute of AI for Health, Helmholtz Zentrum München, University of Vienna, Vienna 摘要:近年来,基于Weisfeiler-Leman算法的算法和神经结构(Weisfeiler-Leman算法是解决图同构问题的一种著名的启发式算法)成为利用图和关系数据进行机器学习的有力工具。在这里,我们对该算法在机器学习环境中的应用进行了全面概述,重点介绍了监督机制。我们讨论了理论背景,展示了如何将其用于有监督的图和节点表示学习,讨论了最近的扩展,并概述了该算法与(置换)等变神经结构的联系。此外,我们概述了当前的应用和未来的方向,以促进进一步的研究。 摘要:In recent years, algorithms and neural architectures based on the Weisfeiler-Leman algorithm, a well-known heuristic for the graph isomorphism problem, emerged as a powerful tool for machine learning with graphs and relational data. Here, we give a comprehensive overview of the algorithm's use in a machine learning setting, focusing on the supervised regime. We discuss the theoretical background, show how to use it for supervised graph- and node representation learning, discuss recent extensions, and outline the algorithm's connection to (permutation-)equivariant neural architectures. Moreover, we give an overview of current applications and future directions to stimulate further research.
【19】 Being Friends Instead of Adversaries: Deep Networks Learn from Data Simplified by Other Networks 标题:做朋友而不是对手:深度网络从其他网络简化的数据中学习 链接:https://arxiv.org/abs/2112.09968
作者:Simone Marullo,Matteo Tiezzi,Marco Gori,Stefano Melacci 机构: Dept. of Information Engineering, University of Florence (Italy), Dept. of Information Engineering and Mathematics, University of Siena (Italy), MAASAI, Universite Cˆote d’Azur, Nice (France) 摘要:在旨在使神经网络的学习过程更加有效的各种方法中,科学界制定了一些策略,根据示例的估计复杂性对示例进行排序,从更大的网络中提取知识,或利用对抗性机器学习背后的原理。最近有人提出了一种不同的想法,称为友好训练(Friendly Training),它通过添加自动估计的扰动来改变输入数据,目的是促进神经分类器的学习过程。随着训练的进行,这种转变逐渐消失,直到完全消失。在这项工作中,我们重新审视并扩展了这一想法,引入了一种完全不同且新颖的方法,其灵感来自对抗性机器学习环境中神经发生器的有效性。我们提出了一个辅助的多层网络,负责改变输入数据,使其在当前的训练过程中更容易被分类器处理。辅助网络与神经分类器联合训练,因此本质上增加了分类器的“深度”,并有望发现数据更改过程中的一般规律。辅助网络的影响逐渐减小,直到训练结束,此时辅助网络被完全丢弃,分类器被部署用于应用。我们称这种方法为神经友好训练。一个涉及多个数据集和不同神经结构的扩展实验过程表明,神经友好训练克服了最初提出的友好训练技术,提高了分类器的泛化能力,特别是在有噪声数据的情况下。 摘要:Amongst a variety of approaches aimed at making the learning procedure of neural networks more effective, the scientific community developed strategies to order the examples according to their estimated complexity, to distil knowledge from larger networks, or to exploit the principles behind adversarial machine learning. A different idea has been recently proposed, named Friendly Training, which consists in altering the input data by adding an automatically estimated perturbation, with the goal of facilitating the learning process of a neural classifier. The transformation progressively fades-out as long as training proceeds, until it completely vanishes. In this work we revisit and extend this idea, introducing a radically different and novel approach inspired by the effectiveness of neural generators in the context of Adversarial Machine Learning. We propose an auxiliary multi-layer network that is responsible of altering the input data to make them easier to be handled by the classifier at the current stage of the training procedure. The auxiliary network is trained jointly with the neural classifier, thus intrinsically increasing the 'depth' of the classifier, and it is expected to spot general regularities in the data alteration process. The effect of the auxiliary network is progressively reduced up to the end of training, when it is fully dropped and the classifier is deployed for applications. We refer to this approach as Neural Friendly Training. An extended experimental procedure involving several datasets and different neural architectures shows that Neural Friendly Training overcomes the originally proposed Friendly Training technique, improving the generalization of the classifier, especially in the case of noisy data.
【20】 Revisiting Memory Efficient Kernel Approximation: An Indefinite Learning Perspective 标题:重温内存效率核近似:一种不确定学习视角 链接:https://arxiv.org/abs/2112.09893
作者:Simon Heilig,Maximilian Münch,Frank-Michael Schleif 机构:Maximilian M¨unch, University of Bamberg, UAS W¨urzburg-Schweinfurt, University of Groningen 摘要:矩阵逼近是大规模代数机器学习方法中的一个关键元素。最近提出的方法MEKA(Si et al.,2014)有效地利用了Hilbert空间中的两个常见假设:从平移不变核函数获得的内积矩阵的低秩属性和通过固有块簇结构获得的数据紧性假设。在这项工作中,我们将MEKA扩展为不仅适用于移位不变核,而且适用于非平稳核,如多项式核和极端学习核。我们还详细讨论了如何在MEKA中处理由近似本身或有意使用一般核函数引起的非正半定核函数。我们提出了一个基于Lanczos的谱移估计,以发展一个稳定的半正定MEKA近似,也可用于经典的凸优化框架。此外,我们通过理论考虑和对合成和真实数据的各种实验来支持我们的发现。 摘要:Matrix approximations are a key element in large-scale algebraic machine learning approaches. The recently proposed method MEKA (Si et al., 2014) effectively employs two common assumptions in Hilbert spaces: the low-rank property of an inner product matrix obtained from a shift-invariant kernel function and a data compactness hypothesis by means of an inherent block-cluster structure. In this work, we extend MEKA to be applicable not only for shift-invariant kernels but also for non-stationary kernels like polynomial kernels and an extreme learning kernel. We also address in detail how to handle non-positive semi-definite kernel functions within MEKA, either caused by the approximation itself or by the intentional use of general kernel functions. We present a Lanczos-based estimation of a spectrum shift to develop a stable positive semi-definite MEKA approximation, also usable in classical convex optimization frameworks. Furthermore, we support our findings with theoretical considerations and a variety of experiments on synthetic and real-world data.
【21】 GPEX, A Framework For Interpreting Artificial Neural Networks 标题:GPEX,一个解释人工神经网络的框架 链接:https://arxiv.org/abs/2112.09820
作者:Amir Akbarnejad,Gilbert Bigras,Nilanjan Ray 机构:University of Alberta 摘要:机器学习研究人员长期以来一直注意到可解释性和预测性能之间的权衡。一方面,传统的模型通常是可以解释给人类的,但是它们不能达到很高的预测性能。相反,深度模型可以在许多任务中实现最先进的性能。然而,深部模型的预测是人类无法理解的。在本文中,我们提出了一个框架,缩短了上述两组方法之间的差距。给定一个人工神经网络(ANN),我们的方法发现一个高斯过程(GP),其预测几乎和ANN的预测相匹配。由于GPs具有高度的可解释性,我们使用经过训练的GP来解释ANN的决策。我们使用我们的方法来解释ANN对五月数据集的决定。这些解释提供了有关ANN决策的有趣见解。据我们所知,我们的GPs推理公式是第一个自然出现ANN和类似行为的高斯过程的公式。此外,我们还研究了一些已知的理论条件,在这些条件下,人工神经网络可以被GPs解释。其中一些理论条件对现代建筑来说过于严格。然而,我们假设这些理论条件中只有一部分是充分的。最后,我们将我们的框架实现为一个名为GPEX的公共可用工具。给定任何Pytork前馈模块,GPEX允许用户轻松地解释模块的任何ANN子组件,而无需参与推理算法。GPEX可在线公开获取:www.github。com/Nilanjan-Ray/gpex 摘要:Machine learning researchers have long noted a trade-off between interpretability and prediction performance. On the one hand, traditional models are often interpretable to humans but they cannot achieve high prediction performances. At the opposite end of the spectrum, deep models can achieve state-of-the-art performances in many tasks. However, deep models' predictions are known to be uninterpretable to humans. In this paper we present a framework that shortens the gap between the two aforementioned groups of methods. Given an artificial neural network (ANN), our method finds a Gaussian process (GP) whose predictions almost match those of the ANN. As GPs are highly interpretable, we use the trained GP to explain the ANN's decisions. We use our method to explain ANNs' decisions on may datasets. The explanations provide intriguing insights about the ANNs' decisions. With the best of our knowledge, our inference formulation for GPs is the first one in which an ANN and a similarly behaving Gaussian process naturally appear. Furthermore, we examine some of the known theoretical conditions under which an ANN is interpretable by GPs. Some of those theoretical conditions are too restrictive for modern architectures. However, we hypothesize that only a subset of those theoretical conditions are sufficient. Finally, we implement our framework as a publicly available tool called GPEX. Given any pytorch feed-forward module, GPEX allows users to interpret any ANN subcomponent of the module effortlessly and without having to be involved in the inference algorithm. GPEX is publicly available online:www.github.com/Nilanjan-Ray/gpex
【22】 Neurashed: A Phenomenological Model for Imitating Deep Learning Training 标题:Neurash:一种模仿深度学习训练的现象学模型 链接:https://arxiv.org/abs/2112.09741
作者:Weijie J. Su 备注:8 pages 摘要:为了在未来十年推进深度学习方法,需要一个关于现代神经网络推理的理论框架。尽管人们越来越多地试图揭开深度学习为何如此有效的神秘面纱,但仍然缺乏一个全面的图景,表明更好的理论是可能的。我们认为,未来的深度学习理论应该继承三个特征:层次结构的网络结构、使用基于随机梯度的方法优化的参数、以及来自数据的信息。作为一个实例,我们将这些特性集成到一个名为\textit{neurashed}的图形模型中。该模型有效地解释了深度学习中一些常见的经验模式。特别是,neurashed能够深入了解隐式正则化、信息瓶颈和局部弹性。最后,我们讨论了neurashed如何指导深度学习理论的发展。 摘要:To advance deep learning methodologies in the next decade, a theoretical framework for reasoning about modern neural networks is needed. While efforts are increasing toward demystifying why deep learning is so effective, a comprehensive picture remains lacking, suggesting that a better theory is possible. We argue that a future deep learning theory should inherit three characteristics: a \textit{hierarchically} structured network architecture, parameters \textit{iteratively} optimized using stochastic gradient-based methods, and information from the data that evolves \textit{compressively}. As an instantiation, we integrate these characteristics into a graphical model called \textit{neurashed}. This model effectively explains some common empirical patterns in deep learning. In particular, neurashed enables insights into implicit regularization, information bottleneck, and local elasticity. Finally, we discuss how neurashed can guide the development of deep learning theories.
【23】 NetKet 3: Machine Learning Toolbox for Many-Body Quantum Systems 标题:NetKet 3:多体量子系统的机器学习工具箱 链接:https://arxiv.org/abs/2112.10526
作者:Filippo Vicentini,Damian Hofmann,Attila Szabó,Dian Wu,Christopher Roth,Clemens Giuliani,Gabriel Pescia,Jannes Nys,Vladimir Vargas-Calderon,Nikita Astrakhantsev,Giuseppe Carleo 机构: Institute of Physics, ´Ecole Polytechnique F´ed´erale de Lausanne (EPFL), CH-, Lausanne, Switzerland, Max Planck Institute for the Structure and Dynamics of Matter, Luruper Chaussee , Hamburg, Germany, Rudolf Peierls Centre for Theoretical Physics 备注:55 pages, 5 figures. Accompanying code at this https URL 摘要:我们将介绍NetKet的第3版,这是用于多体量子物理的机器学习工具箱。NetKet围绕神经网络量子态构建,并为其评估和优化提供高效算法。这个新版本构建在JAX之上,JAX是Python编程语言的可微编程和加速线性代数框架。最重要的新特性是定义任意神经网络ans的可能性\“atze在纯Python代码中使用了机器学习框架的简明符号,它允许即时编译,以及由于自动微分而隐式生成渐变。NetKet 3还支持GPU和TPU加速器,高级支持离散对称组,分块以扩展到自由度的用途,量子力学应用的驱动程序和改进的模块化,允许用户只使用工具箱的一部分作为他们自己代码的基础。 摘要:We introduce version 3 of NetKet, the machine learning toolbox for many-body quantum physics. NetKet is built around neural-network quantum states and provides efficient algorithms for their evaluation and optimization. This new version is built on top of JAX, a differentiable programming and accelerated linear algebra framework for the Python programming language. The most significant new feature is the possibility to define arbitrary neural network ans\"atze in pure Python code using the concise notation of machine-learning frameworks, which allows for just-in-time compilation as well as the implicit generation of gradients thanks to automatic differentiation. NetKet 3 also comes with support for GPU and TPU accelerators, advanced support for discrete symmetry groups, chunking to scale up to thousands of degrees of freedom, drivers for quantum dynamics applications, and improved modularity, allowing users to use only parts of the toolbox as a foundation for their own code.
【24】 Learning to Model the Relationship Between Brain Structural and Functional Connectomes 标题:学习模拟大脑结构和功能连接之间的关系 链接:https://arxiv.org/abs/2112.09906
作者:Yang Li,Gonzalo Mateos,Zhengwu Zhang 备注:Submitted to the IEEE Transactions on Signal and Information Processing over Networks 摘要:神经成像技术的最新进展以及从网络数据中进行统计学习的算法创新,为整合大脑结构和功能提供了独特的途径,从而有助于在系统层面揭示大脑的一些组织原理。在这个方向上,我们开发了一个有监督的图形表示学习框架,通过图形编码-解码系统来模拟大脑结构连接(SC)和功能连接(FC)之间的关系,其中SC被用作预测经验FC的输入。一个可训练的图形卷积编码器捕获大脑感兴趣区域之间的直接和间接交互作用,模拟实际的神经通信,并集成来自结构网络拓扑和节点(即特定区域)属性的信息。编码器学习节点级SC嵌入,这些嵌入被结合起来生成(全脑)图形级表示,用于重建经验FC网络。所提出的端到端模型利用多目标损失函数联合重建FC网络,并学习SC到FC映射的区别性图形表示,用于下游主题(即图形级)分类。综合实验表明,所学的上述关系表征从受试者大脑网络的内在属性中获取了有价值的信息,并提高了从人类连接组项目中对大量酗酒者和非酗酒者进行分类的准确性。我们的工作为大脑网络之间的关系提供了新的见解,支持使用图形表示学习来发现更多关于人类大脑活动和功能的前景。 摘要:Recent advances in neuroimaging along with algorithmic innovations in statistical learning from network data offer a unique pathway to integrate brain structure and function, and thus facilitate revealing some of the brain's organizing principles at the system level. In this direction, we develop a supervised graph representation learning framework to model the relationship between brain structural connectivity (SC) and functional connectivity (FC) via a graph encoder-decoder system, where the SC is used as input to predict empirical FC. A trainable graph convolutional encoder captures direct and indirect interactions between brain regions-of-interest that mimic actual neural communications, as well as to integrate information from both the structural network topology and nodal (i.e., region-specific) attributes. The encoder learns node-level SC embeddings which are combined to generate (whole brain) graph-level representations for reconstructing empirical FC networks. The proposed end-to-end model utilizes a multi-objective loss function to jointly reconstruct FC networks and learn discriminative graph representations of the SC-to-FC mapping for downstream subject (i.e., graph-level) classification. Comprehensive experiments demonstrate that the learnt representations of said relationship capture valuable information from the intrinsic properties of the subject's brain networks and lead to improved accuracy in classifying a large population of heavy drinkers and non-drinkers from the Human Connectome Project. Our work offers new insights on the relationship between brain networks that support the promising prospect of using graph representation learning to discover more about human brain activity and function.
【25】 Multimeasurement Generative Models 标题:多测度产生式模型 链接:https://arxiv.org/abs/2112.09822
作者:Saeed Saremi,Rupesh Kumar Srivastava 机构:NNAISENSE Inc., Redwood Center, UC Berkeley 摘要:我们正式地将密度为$p_X$in$\mathbb{R}^d$的未知分布的采样问题映射到学习和采样$p_X{Y}$in$\mathbb{R}{Md}$的问题,该问题是通过将$p_X$与固定的阶乘核进行卷积得到的:$p_X{Y}$称为M-密度,阶乘核称为多重测量噪声模型(MNM)。M-密度比$p_X$更平滑,更容易学习和取样,但是对于大的$M$,这两个问题在数学上是等价的,因为$X$可以使用Bayes估计器$\widehat{X}(\mathbf{Y})=\mathbb{E}[X\vert\mathbf{Y}=\mathbf{Y}]$精确地在给定的情况下进行估计。为了解决这个问题,我们推导了$\widehat{x}(\mathbf{y})$对于泊松和高斯MNM,用非规范化的$p\mathbf{y}表示。这导致了学习参数能量和分数函数的简单最小二乘目标。我们提出了各种感兴趣的参数化方案,包括研究高斯M-密度直接导致多重去噪自动编码器的方案——这是文献中去噪自动编码器和经验贝叶斯之间的第一个理论联系。来自$p_X$的样本通过欠阻尼Langevin MCMC(walk)通过步行跳跃采样(Saremi&Hyvarinen,2019)获得,从$p_\mathbf{Y}$采样,并通过多测量Bayes估计$X$(跳跃)。我们研究了MNIST、CIFAR-10和FFHQ-256数据集上的置换不变高斯M-密度,并证明了该框架在实现高维快速混合稳定马尔可夫链方面的有效性。 摘要:We formally map the problem of sampling from an unknown distribution with density $p_X$ in $\mathbb{R}^d$ to the problem of learning and sampling $p_\mathbf{Y}$ in $\mathbb{R}^{Md}$ obtained by convolving $p_X$ with a fixed factorial kernel: $p_\mathbf{Y}$ is referred to as M-density and the factorial kernel as multimeasurement noise model (MNM). The M-density is smoother than $p_X$, easier to learn and sample from, yet for large $M$ the two problems are mathematically equivalent since $X$ can be estimated exactly given $\mathbf{Y}=\mathbf{y}$ using the Bayes estimator $\widehat{x}(\mathbf{y})=\mathbb{E}[X\vert\mathbf{Y}=\mathbf{y}]$. To formulate the problem, we derive $\widehat{x}(\mathbf{y})$ for Poisson and Gaussian MNMs expressed in closed form in terms of unnormalized $p_\mathbf{Y}$. This leads to a simple least-squares objective for learning parametric energy and score functions. We present various parametrization schemes of interest, including one in which studying Gaussian M-densities directly leads to multidenoising autoencoders--this is the first theoretical connection made between denoising autoencoders and empirical Bayes in the literature. Samples from $p_X$ are obtained by walk-jump sampling (Saremi & Hyvarinen, 2019) via underdamped Langevin MCMC (walk) to sample from $p_\mathbf{Y}$ and the multimeasurement Bayes estimation of $X$ (jump). We study permutation invariant Gaussian M-densities on MNIST, CIFAR-10, and FFHQ-256 datasets, and demonstrate the effectiveness of this framework for realizing fast-mixing stable Markov chains in high dimensions.
其他(21篇)
【1】 Mask2Former for Video Instance Segmentation 标题:用于视频实例分割的Mask2Former 链接:https://arxiv.org/abs/2112.10764
作者:Bowen Cheng,Anwesa Choudhuri,Ishan Misra,Alexander Kirillov,Rohit Girdhar,Alexander G. Schwing 机构:†Equal advising., Back-, bone, Pixel, Decoder, Transformer, 𝐿×, mask, class 备注:Code and models: this https URL 摘要:我们发现Mask2Former在视频实例分割方面也实现了最先进的性能,而无需修改体系结构、丢失甚至训练管道。在本报告中,我们展示了通用的图像分割体系结构,通过直接预测三维分割体积,可以简单地推广到视频分割。具体而言,Mask2Former在YouTubeVIS-2019和YouTubeVIS-2021上分别设定了60.4 AP和52.6 AP的最新水平。我们相信Mask2Former还能够处理视频语义和全景分割,因为它在图像分割方面具有多功能性。我们希望这将使最先进的视频分割研究更容易获得,并使设计通用的图像和视频分割体系结构受到更多关注。 摘要:We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouTubeVIS-2019 and 52.6 AP on YouTubeVIS-2021. We believe Mask2Former is also capable of handling video semantic and panoptic segmentation, given its versatility in image segmentation. We hope this will make state-of-the-art video segmentation research more accessible and bring more attention to designing universal image and video segmentation architectures.
【2】 Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs 标题:MEGA-NERF:面向虚拟飞翔的大规模NERF的可伸缩构造 链接:https://arxiv.org/abs/2112.10703
作者:Haithem Turki,Deva Ramanan,Mahadev Satyanarayanan 机构:Carnegie Mellon University, Argo AI 备注:Project page: this https URL GitHub: this https URL 摘要:我们探索如何利用神经辐射场(NeRFs)从主要从无人机数据收集的跨越建筑物甚至多个城市街区的大规模视觉捕获构建交互式三维环境。与传统上对NeRFs进行评估的单对象场景相比,此设置带来了多个挑战,包括(1)需要将数千张具有不同照明条件的图像合并,所有这些图像仅捕获场景的一小部分,(2)高得令人望而却步的模型容量和光线采样要求,超出了在单个GPU上可以简单训练的范围;(3)任意多的可能视点,使得事先无法预计算所有相关信息(实时NeRF渲染器通常会这样做)。为了应对这些挑战,我们首先分析大规模场景的可见性统计数据,并激发稀疏网络结构,其中参数专用于场景的不同区域。我们介绍了一种简单的几何聚类算法,该算法将训练图像(或者说像素)划分为不同的NeRF子模块,这些子模块可以并行训练。我们评估了从Quad 6k和UrbanScene3D数据集中采集的场景以及我们自己的无人机镜头中的方法,并显示了3倍的训练加速,同时平均PSNR提高了11%以上。随后,我们在Mega-NeRF上对最近的NeRF快速渲染器进行了实证评估,并介绍了一种利用时间一致性的新方法。我们的技术在PSNR质量保持在0.5 db以内的情况下,比传统的NeRF渲染速度提高了40倍,超过了现有快速渲染器的保真度。 摘要:We explore how to leverage neural radiance fields (NeRFs) to build interactive 3D environments from large-scale visual captures spanning buildings or even multiple city blocks collected primarily from drone data. In contrast to the single object scenes against which NeRFs have been traditionally evaluated, this setting poses multiple challenges including (1) the need to incorporate thousands of images with varying lighting conditions, all of which capture only a small subset of the scene, (2) prohibitively high model capacity and ray sampling requirements beyond what can be naively trained on a single GPU, and (3) an arbitrarily large number of possible viewpoints that make it unfeasible to precompute all relevant information beforehand (as real-time NeRF renderers typically do). To address these challenges, we begin by analyzing visibility statistics for large-scale scenes, motivating a sparse network structure where parameters are specialized to different regions of the scene. We introduce a simple geometric clustering algorithm that partitions training images (or rather pixels) into different NeRF submodules that can be trained in parallel. We evaluate our approach across scenes taken from the Quad 6k and UrbanScene3D datasets as well as against our own drone footage and show a 3x training speedup while improving PSNR by over 11% on average. We subsequently perform an empirical evaluation of recent NeRF fast renderers on top of Mega-NeRF and introduce a novel method that exploits temporal coherence. Our technique achieves a 40x speedup over conventional NeRF rendering while remaining within 0.5 db in PSNR quality, exceeding the fidelity of existing fast renderers.
【3】 Mind-proofing Your Phone: Navigating the Digital Minefield with GreaseTerminator 标题:保护您的手机:使用GreseTerminator导航数字雷区 链接:https://arxiv.org/abs/2112.10699
作者:Siddhartha Datta,Konrad Kollnig,Nigel Shadbolt 机构: Department of Computer Science, University of Oxford 备注:Accepted in ACM IUI 2022 摘要:数字危害在移动生态系统中广泛存在。随着这些设备在我们的日常生活中越来越突出,也增加了针对个人的恶意攻击的可能性。针对一系列数字危害的最后一道防线是用户界面,其中包括数字分心、仇恨言论导致的政治两极分化以及儿童接触有害材料。这项工作引入了Greasterminator,使研究人员能够与最终用户一起开发、部署和测试针对这些危害的干预措施。我们在五个深入的案例研究中展示了干预开发和部署的简易性,以及润滑脂测定器可能涵盖的广泛危害。 摘要:Digital harms are widespread in the mobile ecosystem. As these devices gain ever more prominence in our daily lives, so too increases the potential for malicious attacks against individuals. The last line of defense against a range of digital harms - including digital distraction, political polarisation through hate speech, and children being exposed to damaging material - is the user interface. This work introduces GreaseTerminator to enable researchers to develop, deploy, and test interventions against these harms with end-users. We demonstrate the ease of intervention development and deployment, as well as the broad range of harms potentially covered with GreaseTerminator in five in-depth case studies.
【4】 Adversarially Robust Stability Certificates can be Sample-Efficient 标题:相反,健壮的稳定性证书可以是样本效率高的 链接:https://arxiv.org/abs/2112.10690
作者:Thomas T. C. K. Zhang,Stephen Tu,Nicholas M. Boffi,Jean-Jacques E. Slotine,Nikolai Matni 机构:Department of Electrical and Systems Engineering, University of Pennsylvania, Google Brain Robotics, Courant Institute of Mathematical Sciences, New York University, Nonlinear Systems Laboratory, Massachusetts Institute of Technology 摘要:在安全关键系统的背景下,通过模拟仿真到实际间隙,我们考虑了解未知非线性动力系统的反向鲁棒稳定性证书。根据鲁棒控制的方法,我们考虑干扰系统动力学的加性和Lipschitz有界对偶。我们证明,在适当的基础系统增量稳定性假设下,学习对抗稳定性证书的统计成本在常数因子下与学习名义稳定性证书的统计成本相等。我们的结果取决于由此产生的对抗性损失类别的Rademacher复杂性的新边界,这可能是独立的利益。据我们所知,这是对动态系统生成的数据执行对抗性学习时样本复杂度界限的第一个表征。我们进一步提供了一个近似对抗训练算法的实用算法,并通过一个阻尼摆的例子验证了我们的发现。 摘要:Motivated by bridging the simulation to reality gap in the context of safety-critical systems, we consider learning adversarially robust stability certificates for unknown nonlinear dynamical systems. In line with approaches from robust control, we consider additive and Lipschitz bounded adversaries that perturb the system dynamics. We show that under suitable assumptions of incremental stability on the underlying system, the statistical cost of learning an adversarial stability certificate is equivalent, up to constant factors, to that of learning a nominal stability certificate. Our results hinge on novel bounds for the Rademacher complexity of the resulting adversarial loss class, which may be of independent interest. To the best of our knowledge, this is the first characterization of sample-complexity bounds when performing adversarial learning over data generated by a dynamical system. We further provide a practical algorithm for approximating the adversarial training algorithm, and validate our findings on a damped pendulum example.
【5】 Differentially Private Regret Minimization in Episodic Markov Decision Processes 标题:区间马尔可夫决策过程的差分私有遗憾最小化 链接:https://arxiv.org/abs/2112.10599
作者:Sayak Ray Chowdhury,Xingyu Zhou 机构:Equal contributions†Indian Institute of Science, in‡Wayne State University 备注:Accepted by AAAI 2022 摘要:研究了有限时域表马尔可夫决策过程(MDP)在微分隐私(DP)约束下的遗憾最小化问题。这是由于强化学习(RL)在现实世界的顺序决策问题中的广泛应用,在现实世界中,保护用户的敏感和私有信息变得至关重要。我们考虑DP的两种变体——联合DP(JDP),其中集中式代理负责保护用户的敏感数据和本地DP(LDP),其中需要在用户侧直接保护信息。我们首先提出两个通用框架——一个用于策略优化,另一个用于值迭代——用于设计私有的乐观RL算法。然后,我们用合适的隐私机制实例化这些框架,以满足JDP和LDP需求,同时获得次线性后悔保证。遗憾边界表明,在JDP下,隐私成本只是一个低阶加法项,而在LDP下,对于更强的隐私保护,所遭受的成本是乘法的。最后,通过统一分析得到了遗憾界,我们认为,遗憾界可以扩展到表格MDP之外。 摘要:We study regret minimization in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in real-world sequential decision making problems, where protecting users' sensitive and private information is becoming paramount. We consider two variants of DP -- joint DP (JDP), where a centralized agent is responsible for protecting users' sensitive data and local DP (LDP), where information needs to be protected directly on the user side. We first propose two general frameworks -- one for policy optimization and another for value iteration -- for designing private, optimistic RL algorithms. We then instantiate these frameworks with suitable privacy mechanisms to satisfy JDP and LDP requirements, and simultaneously obtain sublinear regret guarantees. The regret bounds show that under JDP, the cost of privacy is only a lower order additive term, while for a stronger privacy protection under LDP, the cost suffered is multiplicative. Finally, the regret bounds are obtained by a unified analysis, which, we believe, can be extended beyond tabular MDPs.
【6】 Scope and Sense of Explainability for AI-Systems 标题:人工智能系统的可解释性范围和意义 链接:https://arxiv.org/abs/2112.10551
作者:A. -M. Leventi-Peetz,T. Östreich,W. Lennartz,K. Weber 机构:Federal Office for Information Security, BSI, Bonn, Germany, inducto GmbH, Dorfen, Germany 备注:None 摘要:人工智能系统可解释性的某些方面将进行批判性讨论。这尤其关注使每个人工智能系统都可解释的任务的可行性。重点将放在与高度复杂和高效人工智能系统的可解释性相关的困难上,这些系统提供的决策的解释违反了因果的经典逻辑模式。人工智能系统提供了令人费解的解决方案,回想起来,这些解决方案的特点是独创性(例如AlphaGo游戏2的move 37)。本文将详细阐述支持以下观点的论点:如果人工智能解决方案由于不可完全理解而被提前抛弃,那么智能系统的大量潜力将被浪费。 摘要:Certain aspects of the explainability of AI systems will be critically discussed. This especially with focus on the feasibility of the task of making every AI system explainable. Emphasis will be given to difficulties related to the explainability of highly complex and efficient AI systems which deliver decisions whose explanation defies classical logical schemes of cause and effect. AI systems have provably delivered unintelligible solutions which in retrospect were characterized as ingenious (for example move 37 of the game 2 of AlphaGo). It will be elaborated on arguments supporting the notion that if AI-solutions were to be discarded in advance because of their not being thoroughly comprehensible, a great deal of the potentiality of intelligent systems would be wasted.
【7】 Deep Surrogate for Direct Time Fluid Dynamics 标题:直接时间流体动力学的深度代理 链接:https://arxiv.org/abs/2112.10296
作者:Lucas Meyer,Louen Pottier,Alejandro Ribes,Bruno Raffin 机构:EDF Lab Paris-Saclay, Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, EDF Lab Paris-Saclay, ENS Paris-Saclay, Palaiseau, France, Grenoble, France 备注:None 摘要:流体在物理世界中的普遍性解释了在许多科学和工程应用中需要精确模拟其动力学。传统上,成熟但资源密集的CFD解算器提供此类模拟。近年来,大量深入学习代理模型取代了这些解算器,以缓解模拟过程。一些构建数据驱动代理的方法模拟了求解器迭代过程。他们根据流体的前一种状态推断流体的下一种状态。其他人直接从时间输入推断状态。不同的方法在空间信息的管理上也有所不同。图形神经网络(GNN)可以解决CFD模拟中常用的不规则网格的特殊性。在本文中,我们介绍了我们正在进行的工作,以设计一种新的不规则网格的直接时间GNN体系结构。它由一系列通过样条卷积连接的图组成。我们在vonk{'a}rm{'a}n的涡街基准上测试我们的架构。它实现了较小的泛化误差,同时减少了沿轨迹的误差累积。 摘要:The ubiquity of fluids in the physical world explains the need to accurately simulate their dynamics for many scientific and engineering applications. Traditionally, well established but resource intensive CFD solvers provide such simulations. The recent years have seen a surge of deep learning surrogate models substituting these solvers to alleviate the simulation process. Some approaches to build data-driven surrogates mimic the solver iterative process. They infer the next state of the fluid given its previous one. Others directly infer the state from time input. Approaches also differ in their management of the spatial information. Graph Neural Networks (GNN) can address the specificity of the irregular meshes commonly used in CFD simulations. In this article, we present our ongoing work to design a novel direct time GNN architecture for irregular meshes. It consists of a succession of graphs of increasing size connected by spline convolutions. We test our architecture on the Von K{\'a}rm{\'a}n's vortex street benchmark. It achieves small generalization errors while mitigating error accumulation along the trajectory.
【8】 Distributionally Robust Group Backwards Compatibility 标题:分布健壮的组向后兼容性 链接:https://arxiv.org/abs/2112.10290
作者:Martin Bertran,Natalia Martinez,Alex Oesterling,Guillermo Sapiro 机构:Duke University 摘要:机器学习模型会随着新数据的获取或新体系结构的开发而更新。这些更新通常会提高模型性能,但可能会引入向后兼容性错误,在这种情况下,单个用户或用户组会看到他们在更新模型上的性能受到不利影响。当训练数据集不能准确反映总体人口统计数据时,也会出现这一问题,因为一些群体在数据收集过程中的参与度总体较低,造成了严重的公平问题。我们分析了分布稳健性和极大极小公平性的思想如何在这种情况下帮助向后兼容,并提出了两种方法来直接解决这个问题。我们的理论分析得到了CIFAR-10、CelebA和Waterbirds三个标准图像分类数据集的实验结果的支持。代码可在github获得。com/natalialmg/GroupBC 摘要:Machine learning models are updated as new data is acquired or new architectures are developed. These updates usually increase model performance, but may introduce backward compatibility errors, where individual users or groups of users see their performance on the updated model adversely affected. This problem can also be present when training datasets do not accurately reflect overall population demographics, with some groups having overall lower participation in the data collection process, posing a significant fairness concern. We analyze how ideas from distributional robustness and minimax fairness can aid backward compatibility in this scenario, and propose two methods to directly address this issue. Our theoretical analysis is backed by experimental results on CIFAR-10, CelebA, and Waterbirds, three standard image classification datasets. Code available at github.com/natalialmg/GroupBC
【9】 Estimating Causal Effects of Multi-Aspect Online Reviews with Multi-Modal Proxies 标题:用多模态指标估计多方面在线评论的因果效应 链接:https://arxiv.org/abs/2112.10274
作者:Lu Cheng,Ruocheng Guo,Huan Liu 机构: School of Computing and Augmented Intelligence, Arizona State University, USA, School of Data Science, City University of Hong Kong, China 备注:10 pages, 6 figures, accepted to WSDM22 摘要:在线评论使消费者能够与公司接触并提供重要反馈。由于高维文本的复杂性,这些评论通常被简化为单个数字分数,例如评级或情绪分数。这项工作经验性地考察了用户产生的在线评论在粒度水平上的因果效应:我们考虑多个方面,例如餐厅的食物和服务。了解消费者对不同方面的意见有助于详细评估业务绩效并有效制定业务运营战略。具体来说,我们的目标是回答一些介入性问题,如如果w.r.t.的优质服务增加10%,餐厅的受欢迎程度会如何?使用观察数据进行因果推断的决定性挑战是“混杂因素”的存在,这可能无法观察或测量,例如,消费者对食物类型的偏好,导致估计的影响有偏差和高方差。为了应对这一挑战,我们求助于多模式代理,如消费者档案信息和消费者与企业之间的互动。我们展示了如何有效地利用丰富的信息来识别和估计在线评论中多个方面的因果效应。对合成数据和真实数据的实证评估证实了该方法的有效性,并阐明了该方法的可行性。 摘要:Online reviews enable consumers to engage with companies and provide important feedback. Due to the complexity of the high-dimensional text, these reviews are often simplified as a single numerical score, e.g., ratings or sentiment scores. This work empirically examines the causal effects of user-generated online reviews on a granular level: we consider multiple aspects, e.g., the Food and Service of a restaurant. Understanding consumers' opinions toward different aspects can help evaluate business performance in detail and strategize business operations effectively. Specifically, we aim to answer interventional questions such as What will the restaurant popularity be if the quality w.r.t. its aspect Service is increased by 10%? The defining challenge of causal inference with observational data is the presence of "confounder", which might not be observed or measured, e.g., consumers' preference to food type, rendering the estimated effects biased and high-variance. To address this challenge, we have recourse to the multi-modal proxies such as the consumer profile information and interactions between consumers and businesses. We show how to effectively leverage the rich information to identify and estimate causal effects of multiple aspects embedded in online reviews. Empirical evaluations on synthetic and real-world data corroborate the efficacy and shed light on the actionable insight of the proposed approach.
【10】 ArcFace Knows the Gender, Too! 标题:ArcFace也知道性别! 链接:https://arxiv.org/abs/2112.10101
作者:Majid Farzaneh 备注:9 pages, 4 images, 2 tables 摘要:本文的主要思想是,如果一个模型能够识别一个人,当然,它也必须能够知道这个人的性别。因此,本文不是定义一个新的性别分类模型,而是基于面部特征使用ArcFace特征来确定性别。将人脸图像提供给ArcFace,并为该人脸获取512个特征。然后,借助传统的机器学习模型,确定性别。诸如支持向量机(SVM)、线性判别法和逻辑回归等判别方法很好地证明了从弧面提取的特征在性别类别之间产生了显著的区别。在性别分类数据集上的实验表明,基于高斯核的支持向量机能够利用ArcFace特征对性别进行分类,准确率为96.4%。 摘要:The main idea of this paper is that if a model can recognize a person, of course, it must be able to know the gender of that person, too. Therefore, instead of defining a new model for gender classification, this paper uses ArcFace features to determine gender, based on the facial features. A face image is given to ArcFace and 512 features are obtained for the face. Then, with the help of traditional machine learning models, gender is determined. Discriminative methods such as Support Vector Machine (SVM), Linear Discriminant, and Logistic Regression well demonstrate that the features extracted from the ArcFace create a remarkable distinction between the gender classes. Experiments on the Gender Classification Dataset show that SVM with Gaussian kernel is able to classify gender with an accuracy of 96.4% using ArcFace features.
【11】 Efficient Strong Scaling Through Burst Parallel Training 标题:通过突发并行训练实现高效的强扩展 链接:https://arxiv.org/abs/2112.10065
作者:Seo Jin Park,Joshua Fried,Sunghyun Kim,Mohammad Alizadeh,Adam Belay 摘要:随着新兴的深度神经网络(DNN)模型不断扩大,使用大型GPU集群来训练DNN已成为实现可接受训练时间的基本要求。在本文中,我们考虑的情况下,未来的集群规模的增加将导致全球批量大小,可用于训练模型达到基本限制:超过某一点,较大的全球批量大小导致样本效率降低,增加整体时间到准确性。因此,为了实现训练性能的进一步改进,我们必须考虑保持全局批次大小恒定并向每个GPU分配更小批次的“强缩放”策略。不幸的是,这大大增加了高效使用集群资源的难度。我们介绍了DeepPool,一个通过两个关键思想解决这一效率挑战的系统。首先,突发并行将大量GPU以突发方式分配给前台作业,以利用层间并行性的不均匀性。其次,GPU多路复用优先考虑前台训练作业的吞吐量,同时打包后台训练作业以回收未充分利用的GPU资源,从而提高集群范围的利用率。这两种想法结合在一起,使DeepPool在集群规模较大时,通过单个任务,总集群吞吐量比标准数据并行性提高2.2-2.4倍。 摘要:As emerging deep neural network (DNN) models continue to grow in size, using large GPU clusters to train DNNs is becoming an essential requirement to achieving acceptable training times. In this paper, we consider the case where future increases in cluster size will cause the global batch size that can be used to train models to reach a fundamental limit: beyond a certain point, larger global batch sizes cause sample efficiency to degrade, increasing overall time to accuracy. As a result, to achieve further improvements in training performance, we must instead consider "strong scaling" strategies that hold the global batch size constant and allocate smaller batches to each GPU. Unfortunately, this makes it significantly more difficult to use cluster resources efficiently. We present DeepPool, a system that addresses this efficiency challenge through two key ideas. First, burst parallelism allocates large numbers of GPUs to foreground jobs in bursts to exploit the unevenness in parallelism across layers. Second, GPU multiplexing prioritizes throughput for foreground training jobs, while packing in background training jobs to reclaim underutilized GPU resources, thereby improving cluster-wide utilization. Together, these two ideas enable DeepPool to deliver a 2.2 - 2.4x improvement in total cluster throughput over standard data parallelism with a single task when the cluster scale is large.
【12】 The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large Web Corpus 标题:网络就是你的牡蛎--知识密集型自然语言处理与庞大的网络语料库 链接:https://arxiv.org/abs/2112.09924
作者:Aleksandra Piktus,Fabio Petroni,Vladimir Karpukhin,Dmytro Okhonko,Samuel Broscheit,Gautier Izacard,Patrick Lewis,Barlas Oğuz,Edouard Grave,Wen-tau Yih,Sebastian Riedel 机构:Facebook AI Research ,University College London, University of Mannheim , ENS, PSL University , Inria 摘要:为了满足现实世界中日益增长的应用需求,知识密集型NLP(KI-NLP)的研究应该通过捕捉真正开放领域环境的挑战来推进:网络规模的知识、缺乏结构、质量不一致和噪音。为此,我们提出了一种评估现有KI-NLP任务的新设置,其中我们将背景语料库概括为通用web快照。我们重新利用KILT,一个最初为维基百科开发的标准KI-NLP基准,并要求系统使用CCNet的一个子集——球体语料库——作为知识源。与维基百科相比,Sphere要大几个数量级,更好地反映了互联网上知识的多样性。我们发现,尽管存在潜在的覆盖范围差距、规模挑战、缺乏结构和较低的质量,但从Sphere检索使最先进的检索和阅读系统能够在几个KILT任务上与基于Wikipedia的模型相匹配,甚至优于基于Wikipedia的模型,即使我们积极过滤看起来像Wikipedia的内容。我们还观察到,虽然Wikipedia上的单一密集通道索引可以优于稀疏BM25版本,但在Sphere上这还不可能。为了促进这一领域的进一步研究,并最大限度地减少社区对专有黑盒搜索引擎的依赖,我们将分享我们的指数、评估指标和基础设施。 摘要:In order to address the increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web scale knowledge, lack of structure, inconsistent quality, and noise. To this end, we propose a new setup for evaluating existing KI-NLP tasks in which we generalize the background corpus to a universal web snapshot. We repurpose KILT, a standard KI-NLP benchmark initially developed for Wikipedia, and ask systems to use a subset of CCNet - the Sphere corpus - as a knowledge source. In contrast to Wikipedia, Sphere is orders of magnitude larger and better reflects the full diversity of knowledge on the Internet. We find that despite potential gaps of coverage, challenges of scale, lack of structure and lower quality, retrieval from Sphere enables a state-of-the-art retrieve-and-read system to match and even outperform Wikipedia-based models on several KILT tasks - even if we aggressively filter content that looks like Wikipedia. We also observe that while a single dense passage index over Wikipedia can outperform a sparse BM25 version, on Sphere this is not yet possible. To facilitate further research into this area, and minimise the community's reliance on proprietary black box search engines, we will share our indices, evaluation metrics and infrastructure.
【13】 Cascading Adaptors to Leverage English Data to Improve Performance of Question Answering for Low-Resource Languages 标题:利用英语数据的级联适配器来提高低资源语言的问答性能 链接:https://arxiv.org/abs/2112.09866
作者:Hariom A. Pandya,Bhavik Ardeshna,Dr. Brijesh S. Bhatt 机构:Dharmsinh Desai University, Nadiad-Gujarat(India) 摘要:基于Transformer的体系结构在许多下行任务(包括问题回答)上显示了显著的结果。另一方面,数据的可用性阻碍了低资源语言获得合理的性能。在本文中,我们研究了预训练的多语言模型的适用性,以提高低资源语言的问答性能。我们在类似于MLQA数据集的七种语言上使用多语言转换器架构测试了语言和任务适配器的四种组合。此外,我们还提出了使用语言和任务适配器进行低资源问答的Zero-Shot迁移学习。我们观察到,对于低资源语言,堆叠语言和任务适配器可以显著提高多语言转换器模型的性能。 摘要:Transformer based architectures have shown notable results on many down streaming tasks including question answering. The availability of data, on the other hand, impedes obtaining legitimate performance for low-resource languages. In this paper, we investigate the applicability of pre-trained multilingual models to improve the performance of question answering in low-resource languages. We tested four combinations of language and task adapters using multilingual transformer architectures on seven languages similar to MLQA dataset. Additionally, we have also proposed zero-shot transfer learning of low-resource question answering using language and task adapters. We observed that stacking the language and the task adapters improves the multilingual transformer models' performance significantly for low-resource languages.
【14】 Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP 标题:线性MDP随机最短路径的改进无遗憾算法 链接:https://arxiv.org/abs/2112.09859
作者:Liyu Chen,Rahul Jain,Haipeng Luo 机构: 1University of Southern California 摘要:我们介绍了两种新的无遗憾算法,用于求解具有线性MDP的随机最短路径(SSP)问题,该算法显著改进了(Vial等人,2021年)的现有结果。我们的第一个算法计算效率高,实现了遗憾界$\widetilde{O}\left(\sqrt{d^3B{star}^2T{star}K}\right)$,其中$d$是特征空间的维数,$B{\star}$和$T{\star}$分别是最优策略的预期成本和命中时间的上界,$K$是事件数。同样的算法稍加修改后,也实现了$O\left(\frac{d^3B{\star}^4}{c{\min}^2\text{gap}}\ln^5\frac{dB{\star}K}{c{\min}\right)$的对数遗憾,其中$\text{gap}{\min}是最小次最优性间隔,$c{\min}是所有动作对的最小状态代价。我们的结果是通过对(Cohen et al.,2021)的有限视界近似进行更简单和改进的分析得出的,该分析具有较小的近似误差,这可能具有独立的意义。另一方面,在全局优化问题中使用方差感知置信集,我们的第二个算法计算效率低下,但实现了第一个“无地平线”遗憾界$\widetilde{O}(d^{3.5}B{\star}\sqrt{K}$,对$T{\star}$或$1/c{\min}$没有多项式依赖性,几乎与(Min等人,2021年)的$\Omega(dB_{\star}\sqrt{K})$下限相匹配。 摘要:We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al., 2021). Our first algorithm is computationally efficient and achieves a regret bound $\widetilde{O}\left(\sqrt{d^3B_{\star}^2T_{\star} K}\right)$, where $d$ is the dimension of the feature space, $B_{\star}$ and $T_{\star}$ are upper bounds of the expected costs and hitting time of the optimal policy respectively, and $K$ is the number of episodes. The same algorithm with a slight modification also achieves logarithmic regret of order $O\left(\frac{d^3B_{\star}^4}{c_{\min}^2\text{gap}_{\min}}\ln^5\frac{dB_{\star} K}{c_{\min}} \right)$, where $\text{gap}_{\min}$ is the minimum sub-optimality gap and $c_{\min}$ is the minimum cost over all state-action pairs. Our result is obtained by developing a simpler and improved analysis for the finite-horizon approximation of (Cohen et al., 2021) with a smaller approximation error, which might be of independent interest. On the other hand, using variance-aware confidence sets in a global optimization problem, our second algorithm is computationally inefficient but achieves the first "horizon-free" regret bound $\widetilde{O}(d^{3.5}B_{\star}\sqrt{K})$ with no polynomial dependency on $T_{\star}$ or $1/c_{\min}$, almost matching the $\Omega(dB_{\star}\sqrt{K})$ lower bound from (Min et al., 2021).
【15】 Manifold embedding data-driven mechanics 标题:流形嵌入数据驱动机制 链接:https://arxiv.org/abs/2112.09842
作者:Bahador Bahmani,WaiChing Sun 摘要:本文介绍了一种新的数据驱动方法,该方法利用可逆神经网络生成的流形嵌入来提高有限数据下无本构关系模拟的鲁棒性、效率和准确性。我们通过训练深层神经网络将数据从本构流形全局映射到低维欧氏向量空间来实现这一点。因此,我们建立了映射欧几里德向量空间的范数与流形度量之间的关系,并为材料数据引入了更为物理一致的距离概念。作为回报,这种处理方法允许我们绕过昂贵的组合优化,当数据丰富且高维时,组合优化可以显著加快无模型模拟的速度。同时,当数据在参数空间中稀疏或分布不均匀时,嵌入学习也提高了算法的鲁棒性。通过数值实验验证和测量流形嵌入技术在不同情况下的性能。将该方法所得结果与经典能量范数所得结果进行了比较。 摘要:This article introduces a new data-driven approach that leverages a manifold embedding generated by the invertible neural network to improve the robustness, efficiency, and accuracy of the constitutive-law-free simulations with limited data. We achieve this by training a deep neural network to globally map data from the constitutive manifold onto a lower-dimensional Euclidean vector space. As such, we establish the relation between the norm of the mapped Euclidean vector space and the metric of the manifold and lead to a more physically consistent notion of distance for the material data. This treatment in return allows us to bypass the expensive combinatorial optimization, which may significantly speed up the model-free simulations when data are abundant and of high dimensions. Meanwhile, the learning of embedding also improves the robustness of the algorithm when the data is sparse or distributed unevenly in the parametric space. Numerical experiments are provided to demonstrate and measure the performance of the manifold embedding technique under different circumstances. Results obtained from the proposed method and those obtained via the classical energy norms are compared.
【16】 Improving the performance of bagging ensembles for data streams through mini-batching 标题:通过小批处理提高数据流打包集成的性能 链接:https://arxiv.org/abs/2112.09834
作者:Guilherme Cassales,Heitor Gomes,Albert Bifet,Bernhard Pfahringer,Hermes Senger 机构:Federal University of S˜ao Carlos, Brazil, University of Waikato, New Zealand 备注:None 摘要:通常,机器学习应用程序必须处理动态环境,其中数据以具有潜在无限长和瞬态行为的连续数据流的形式收集。与传统(批处理)数据挖掘相比,流处理算法在计算资源和数据演化适应性方面有额外的要求。它们必须以增量方式处理实例,因为数据的连续流禁止为多次传递存储数据。集成学习在这种情况下取得了显著的预测性能。作为一组(多个)单独的分类器实现,集成自然可以针对任务并行性进行修改。然而,用于捕获概念漂移的增量学习和动态数据结构增加了缓存未命中率,并阻碍了并行性的好处。本文提出了一种小批量策略,该策略可以改善多核环境中用于流挖掘的几种集成算法的内存访问局部性和性能。借助于一个正式的框架,我们证明了迷你批处理可以显著减少重用距离(以及缓存未命中的数量)。应用四个具有不同特性的基准数据集,在六种不同的最先进集成算法上进行的实验表明,在8核处理器上的加速比高达5倍。这些好处是以预测性能的小幅度降低为代价的。 摘要:Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data mining, stream processing algorithms have additional requirements regarding computational resources and adaptability to data evolution. They must process instances incrementally because the data's continuous flow prohibits storing data for multiple passes. Ensemble learning achieved remarkable predictive performance in this scenario. Implemented as a set of (several) individual classifiers, ensembles are naturally amendable for task parallelism. However, the incremental learning and dynamic data structures used to capture the concept drift increase the cache misses and hinder the benefit of parallelism. This paper proposes a mini-batching strategy that can improve memory access locality and performance of several ensemble algorithms for stream mining in multi-core environments. With the aid of a formal framework, we demonstrate that mini-batching can significantly decrease the reuse distance (and the number of cache misses). Experiments on six different state-of-the-art ensemble algorithms applying four benchmark datasets with varied characteristics show speedups of up to 5X on 8-core processors. These benefits come at the expense of a small reduction in predictive performance.
【17】 Improving Multi-Domain Generalization through Domain Re-labeling 标题:通过域重标注改进多域泛化 链接:https://arxiv.org/abs/2112.09802
作者:Kowshik Thopalli,Sameeksha Katoch,Andreas Spanias,Pavan Turaga,Jayaraman J. Thiagarajan 机构: Arizona StateUniversity 摘要:领域泛化(DG)方法旨在开发可泛化到测试分布不同于训练数据的设置的模型。在本文中,我们重点研究了具有挑战性的多源零炮DG问题,其中来自多个源域的标记训练数据可用,但无法访问来自目标域的数据。尽管这个问题已经成为一个重要的研究课题,但令人惊讶的是,将所有源数据汇集在一起并训练单个分类器的简单解决方案在标准基准上具有很强的竞争力。更重要的是,即使是针对不同领域的不变性进行显式优化的复杂方法,也不一定能在ERM上获得不小的收益。在本文中,我们首次研究了预先指定的域标签与泛化性能之间的重要联系。通过一个激励性的案例研究和分布式稳健优化算法的一个新变体GroupDRO++,我们首先演示了推断自定义域组如何能够导致与数据集附带的原始域标签相比的一致性改进。随后,我们介绍了一种通用的多域泛化方法MulDEns,该方法使用基于ERM的深层加密主干,并通过元优化算法执行隐式域重新标记。通过对多个标准基准的实证研究,我们发现MulDEns不需要针对数据集定制增强策略或训练过程,其表现始终显著优于ERM,并产生最先进的泛化性能,即使与利用域标签的现有方法相比。 摘要:Domain generalization (DG) methods aim to develop models that generalize to settings where the test distribution is different from the training data. In this paper, we focus on the challenging problem of multi-source zero-shot DG, where labeled training data from multiple source domains is available but with no access to data from the target domain. Though this problem has become an important topic of research, surprisingly, the simple solution of pooling all source data together and training a single classifier is highly competitive on standard benchmarks. More importantly, even sophisticated approaches that explicitly optimize for invariance across different domains do not necessarily provide non-trivial gains over ERM. In this paper, for the first time, we study the important link between pre-specified domain labels and the generalization performance. Using a motivating case-study and a new variant of a distributional robust optimization algorithm, GroupDRO++, we first demonstrate how inferring custom domain groups can lead to consistent improvements over the original domain labels that come with the dataset. Subsequently, we introduce a general approach for multi-domain generalization, MulDEns, that uses an ERM-based deep ensembling backbone and performs implicit domain re-labeling through a meta-optimization algorithm. Using empirical studies on multiple standard benchmarks, we show that MulDEns does not require tailoring the augmentation strategy or the training process specific to a dataset, consistently outperforms ERM by significant margins, and produces state-of-the-art generalization performance, even when compared to existing methods that exploit the domain labels.
【18】 On the Evolution of the MCTS Upper Confidence Bounds for Trees by Means of Evolutionary Algorithms in the Game of Carcassonne 标题:用进化算法求解Carcassonne博弈中树的MCTS置信上界 链接:https://arxiv.org/abs/2112.09697
作者:Edgar Galván,Gavin Simpson 机构:Edgar Galv´an and Gavin Simpson are both with the Naturally InspiredComputation Research Group and with the Department of Computer Sci-ence, Maynooth University 备注:9 pages, 1 figure, 11 tables 摘要:蒙特卡罗树搜索(MCTS)是一种搜索最优决策的抽样最佳优先方法。MCTS之所以受欢迎,是因为它在具有挑战性的双人游戏围棋中取得了非凡的成绩。围棋被认为比国际象棋更难,直到最近,它还被认为是人工智能方法不可行的游戏。MCT的成功在很大程度上取决于树的构建方式,而选择过程在这方面起着基础性的作用。一种被证明是可靠的特定选择机制是基于树的置信上限,通常称为UCT。UCT试图通过考虑MCT统计树中存储的值来很好地平衡勘探和开发。但是,要使其正常工作,需要对MCTS UCT进行一些调整。在这项工作中,我们使用进化算法(EAs)来进化数学表达式,目的是替代UCT数学表达式。我们将我们提出的称为MCTS中的进化策略(ES-MCTS)的方法与MCTS UCT的五种变体、star minimax算法家族的三种变体以及Carcassonne游戏中的随机控制器进行了比较。我们还使用了我们提出的基于EA的控制器的变体,称为部分集成在MCTS中的ES。我们展示了ES-MCTS控制器如何能够超越所有这10个智能控制器,包括鲁棒MCTS UCT控制器。 摘要:Monte Carlo Tree Search (MCTS) is a sampling best-first method to search for optimal decisions. The MCTS's popularity is based on its extraordinary results in the challenging two-player based game Go, a game considered much harder than Chess and that until very recently was considered infeasible for Artificial Intelligence methods. The success of MCTS depends heavily on how the tree is built and the selection process plays a fundamental role in this. One particular selection mechanism that has proved to be reliable is based on the Upper Confidence Bounds for Trees, commonly referred as UCT. The UCT attempts to nicely balance exploration and exploitation by considering the values stored in the statistical tree of the MCTS. However, some tuning of the MCTS UCT is necessary for this to work well. In this work, we use Evolutionary Algorithms (EAs) to evolve mathematical expressions with the goal to substitute the UCT mathematical expression. We compare our proposed approach, called Evolution Strategy in MCTS (ES-MCTS) against five variants of the MCTS UCT, three variants of the star-minimax family of algorithms as well as a random controller in the Game of Carcassonne. We also use a variant of our proposed EA-based controller, dubbed ES partially integrated in MCTS. We show how the ES-MCTS controller, is able to outperform all these 10 intelligent controllers, including robust MCTS UCT controllers.
【19】 Discovering State Variables Hidden in Experimental Data 标题:发现实验数据中隐藏的状态变量 链接:https://arxiv.org/abs/2112.10755
作者:Boyuan Chen,Kuang Huang,Sunand Raghupathi,Ishaan Chandratreya,Qiang Du,Hod Lipson 机构:Columbia University 备注:Project website with code, data, and overview video is at: this https URL 摘要:所有物理定律都被描述为状态变量之间的关系,这些关系给出了相关系统动力学的完整和非冗余描述。然而,尽管计算能力和人工智能的普及,识别隐藏状态变量本身的过程抵制了自动化。大多数数据驱动的物理现象建模方法仍然假设观测到的数据流已经对应于相关的状态变量。一个关键的挑战是,在只提供高维观测数据的情况下,从零开始确定可能的状态变量集。在这里,我们提出了一个新的原则来确定一个观测系统可能有多少状态变量,以及这些变量可能是什么,直接来自视频流。我们使用从弹性双摆到火焰的各种物理动力系统的视频记录来证明这种方法的有效性。在没有任何基础物理知识的情况下,我们的算法发现了观测到的动力学的内在维度,并确定了状态变量的候选集。我们认为,这种方法有助于促进对日益复杂的系统的理解、预测和控制。项目网站:https://www.cs.columbia.edu/~bchen/神经状态变量 摘要:All physical laws are described as relationships between state variables that give a complete and non-redundant description of the relevant system dynamics. However, despite the prevalence of computing power and AI, the process of identifying the hidden state variables themselves has resisted automation. Most data-driven methods for modeling physical phenomena still assume that observed data streams already correspond to relevant state variables. A key challenge is to identify the possible sets of state variables from scratch, given only high-dimensional observational data. Here we propose a new principle for determining how many state variables an observed system is likely to have, and what these variables might be, directly from video streams. We demonstrate the effectiveness of this approach using video recordings of a variety of physical dynamical systems, ranging from elastic double pendulums to fire flames. Without any prior knowledge of the underlying physics, our algorithm discovers the intrinsic dimension of the observed dynamics and identifies candidate sets of state variables. We suggest that this approach could help catalyze the understanding, prediction and control of increasingly complex systems. Project website is at: https://www.cs.columbia.edu/~bchen/neural-state-variables
【20】 Information Field Theory as Artificial Intelligence 标题:作为人工智能的信息场理论 链接:https://arxiv.org/abs/2112.10133
作者:Torsten Enßlin 机构: Max Planck Institute for Astrophysics, Karl-Schwarzschild-Str. , Garching, Germany, Ludwig-Maximilians-Universität München, Geschwister-Scholl-Platz , Munich, Germany 备注:8 pages, no figures, invited talk at MaxEnt2020/2021 摘要:信息场理论(IFT)是场的信息理论,是信号重构和非参数逆问题的数学框架。这里,字段表示作为空间(和时间)函数不断变化的物理量,信息论指配备相关熵信息度量的贝叶斯概率逻辑。用IFT重建信号是一个类似于训练生成型神经网络(GNN)的计算问题。本文从GNN训练的角度对IFT中的推理进行了重新表述,并讨论了IFT中的数值变分推理方法与机器学习的交叉应用。讨论表明,IFT推理可以被视为人工智能的一种特殊形式。与经典神经网络相比,基于IFT的GNN由于将专家知识整合到其体系结构中,因此无需预先训练即可运行。 摘要:Information field theory (IFT), the information theory for fields, is a mathematical framework for signal reconstruction and non-parametric inverse problems. Here, fields denote physical quantities that change continuously as a function of space (and time) and information theory refers to Bayesian probabilistic logic equipped with the associated entropic information measures. Reconstructing a signal with IFT is a computational problem similar to training a generative neural network (GNN). In this paper, the inference in IFT is reformulated in terms of GNN training and the cross-fertilization of numerical variational inference methods used in IFT and machine learning are discussed. The discussion suggests that IFT inference can be regarded as a specific form of artificial intelligence. In contrast to classical neural networks, IFT based GNNs can operate without pre-training thanks to incorporating expert knowledge into their architecture.
【21】 Off-Policy Evaluation Using Information Borrowing and Context-Based Switching 标题:利用信息借用和基于上下文的切换进行非政策性评估 链接:https://arxiv.org/abs/2112.09865
作者:Sutanoy Dasgupta,Yabo Niu,Kishan Panaganti,Dileep Kalathil,Debdeep Pati,Bani Mallick 机构:Department of Statistics, Texas A&M University, Department of Electrical and Computer Engineering 备注:23 pages, 6 figures, manuscript under review 摘要:我们考虑在上下文盗贼中的Office Opple Debug(OPE)问题,其目标是使用日志记录策略收集的数据来估计目标策略的值。最流行的OPE方法是双稳健(DR)估计器的变体,该估计器通过结合直接法(DM)估计器和涉及逆倾向评分(IPS)的校正项而获得。现有的算法主要集中在减少由大型IP引起的DR估计方差的策略上。我们提出了一种新的方法,称为基于信息借用和上下文切换的双重鲁棒(DR-IC)估计器,该估计器专注于减少偏差和方差。DR-IC估计器将标准DM估计器替换为参数奖励模型,该模型通过依赖于IPS的相关结构从“更密切”的上下文中借用信息。DR-IC估计器还基于特定于上下文的切换规则在该修改的DM估计器和修改的DR估计器之间自适应插值。我们对DR-IC估计的性能给出了可证明的保证。我们还证明了DR-IC估计器在许多基准问题上的性能优于最先进的OPE算法。 摘要:We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algorithms primarily focus on strategies to reduce the variance of the DR estimator arising from large IPS. We propose a new approach called the Doubly Robust with Information borrowing and Context-based switching (DR-IC) estimator that focuses on reducing both bias and variance. The DR-IC estimator replaces the standard DM estimator with a parametric reward model that borrows information from the 'closer' contexts through a correlation structure that depends on the IPS. The DR-IC estimator also adaptively interpolates between this modified DM estimator and a modified DR estimator based on a context-specific switching rule. We give provable guarantees on the performance of the DR-IC estimator. We also demonstrate the superior performance of the DR-IC estimator compared to the state-of-the-art OPE algorithms on a number of benchmark problems.
机器翻译,仅供参考