前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >机器学习学术速递[9.9]

机器学习学术速递[9.9]

作者头像
公众号-arXiv每日学术速递
发布2021-09-16 16:51:41
1.2K0
发布2021-09-16 16:51:41
举报

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.LG 方向,今日共计77篇

Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】 AppQ: Warm-starting App Recommendation Based on View Graphs 标题:AppQ:基于查看图的热启动App推荐 链接:https://arxiv.org/abs/2109.03798

作者:Dan Su,Jiqiang Liu,Sencun Zhu,Xiaoyang Wang,Wei Wang,Xiangliang Zhang 机构: Zhu is with Department of Computer Science and Engineering, ThePennsylvania State University 备注:13 pages, 11 figures 摘要:当前的应用程序排名和推荐系统主要基于用户生成的信息,例如下载数量和评级。然而,新应用程序通常很少(甚至没有)用户反馈,存在典型的冷启动问题。如何快速识别并推荐高质量的新应用程序是一个具有挑战性的问题。在这里,一个基本要求是能够根据应用程序的固有功能而不是用户生成的功能准确衡量应用程序的质量。由于用户通过与应用程序的视图交互来获得应用程序的第一手体验,因此我们推测,天生的功能在很大程度上与应用程序中各个视图的视觉质量以及视图之间切换的方式有关。在这项工作中,我们提出了AppQ,一个新的应用程序质量分级和推荐系统,该系统基于应用程序源代码提取应用程序的固有特征。特别是,AppQ并行工作,执行代码分析以提取应用程序级功能,以及动态分析以捕获视图级布局层次结构和视图之间的切换。然后,每个应用程序被表示为一个属性视图图,该视图图被转换为一个向量,并反馈给分类器以识别其质量类别。我们使用Google Play的一个应用程序数据集进行的评估表明,AppQ获得了最佳性能,准确率为85.0\%。这表明了许多使用AppQ暖启动应用程序分级和推荐系统的承诺。 摘要:Current app ranking and recommendation systems are mainly based on user-generated information, e.g., number of downloads and ratings. However, new apps often have few (or even no) user feedback, suffering from the classic cold-start problem. How to quickly identify and then recommend new apps of high quality is a challenging issue. Here, a fundamental requirement is the capability to accurately measure an app's quality based on its inborn features, rather than user-generated features. Since users obtain first-hand experience of an app by interacting with its views, we speculate that the inborn features are largely related to the visual quality of individual views in an app and the ways the views switch to one another. In this work, we propose AppQ, a novel app quality grading and recommendation system that extracts inborn features of apps based on app source code. In particular, AppQ works in parallel to perform code analysis to extract app-level features as well as dynamic analysis to capture view-level layout hierarchy and the switching among views. Each app is then expressed as an attributed view graph, which is converted into a vector and fed to classifiers for recognizing its quality classes. Our evaluation with an app dataset from Google Play reports that AppQ achieves the best performance with accuracy of 85.0\%. This shows a lot of promise to warm-start app grading and recommendation systems with AppQ.

【2】 On Event-Driven Knowledge Graph Completion in Digital Factories 标题:数字工厂中事件驱动的知识图补全研究 链接:https://arxiv.org/abs/2109.03655

作者:Martin Ringsquandl,Evgeny Kharlamov,Daria Stepanova,Steffen Lamparter,Raffaello Lepratti,Ian Horrocks,Peer Kröger 机构:Ludwig-Maximilians Universit¨at, Munich, Germany, Oxford University, Oxford, United Kingdom, Max-Planck-Institut f¨ur Informatik, Saarbr¨ucken, Germany, Siemens AG CT, Siemens PLM Software, Genoa, Italy, Peer Kr¨oger 备注:None 摘要:智能工厂配备了能够感知其制造环境、相互作用和控制生产过程的机器。此类工厂的顺利运行要求进行监测和诊断的机器和工程人员共享有关工厂的详细的通用工业知识,例如以知识图的形式。创建和维护此类知识的成本很高,需要自动化。在这项工作中,我们展示了专门针对工业应用的机器学习如何帮助完成知识图。特别是,我们展示了知识完成如何从智能工厂中常见的事件日志中获益。我们从一个受现实世界启发的智能工厂的知识图上对此进行评估,结果令人鼓舞。 摘要:Smart factories are equipped with machines that can sense their manufacturing environments, interact with each other, and control production processes. Smooth operation of such factories requires that the machines and engineering personnel that conduct their monitoring and diagnostics share a detailed common industrial knowledge about the factory, e.g., in the form of knowledge graphs. Creation and maintenance of such knowledge is expensive and requires automation. In this work we show how machine learning that is specifically tailored towards industrial applications can help in knowledge graph completion. In particular, we show how knowledge completion can benefit from event logs that are common in smart factories. We evaluate this on the knowledge graph from a real world-inspired smart factory with encouraging results.

【3】 Power to the Relational Inductive Bias: Graph Neural Networks in Electrical Power Grids 标题:相对感应偏差的功率:电网中的图神经网络 链接:https://arxiv.org/abs/2109.03604

作者:Martin Ringsquandl,Houssem Sellami,Marcel Hildebrandt,Dagmar Beyer,Sylwia Henselmeyer,Sebastian Weber,Mitchell Joblin 机构:Siemens, Germany 摘要:图形神经网络(GNNs)在电网领域的应用对智能电网监控具有很大的潜在影响。尽管在GNN中,功率流与消息传递之间有着天然的对应关系,但它们在电网中的性能还没有得到很好的理解。我们认为,由基准驱动的GNN研究之间存在差距,基准包含在几个重要方面不同于电网的图形。此外,还没有利用实际数据对多个电网拓扑中的GNN进行归纳学习。我们通过(i)在归纳环境中定义电网图数据集,(ii)对图属性进行探索性分析,以及(iii)对真实电网状态估计的具体学习任务进行实证研究来解决这一差距。我们的结果表明,与基线相比,GNN对噪声具有更高的鲁棒性,误差低达400%。此外,由于电网的独特特性,我们没有观察到众所周知的GNN过平滑现象,发现性能最好的模型具有异常深的13层。这与现有的基准数据集形成了鲜明对比,在基准数据集中,普遍认为2到3层GNN的性能最好。我们的结果表明,该领域的一个关键挑战是有效处理长程依赖。 摘要:The application of graph neural networks (GNNs) to the domain of electrical power grids has high potential impact on smart grid monitoring. Even though there is a natural correspondence of power flow to message-passing in GNNs, their performance on power grids is not well-understood. We argue that there is a gap between GNN research driven by benchmarks which contain graphs that differ from power grids in several important aspects. Additionally, inductive learning of GNNs across multiple power grid topologies has not been explored with real-world data. We address this gap by means of (i) defining power grid graph datasets in inductive settings, (ii) an exploratory analysis of graph properties, and (iii) an empirical study of the concrete learning task of state estimation on real-world power grids. Our results show that GNNs are more robust to noise with up to 400% lower error compared to baselines. Furthermore, due to the unique properties of electrical grids, we do not observe the well known over-smoothing phenomenon of GNNs and find the best performing models to be exceptionally deep with up to 13 layers. This is in stark contrast to existing benchmark datasets where the consensus is that 2 to 3 layer GNNs perform best. Our results demonstrate that a key challenge in this domain is to effectively handle long-range dependence.

【4】 Graph-MVP: Multi-View Prototypical Contrastive Learning for Multiplex Graphs 标题:Graph-MVP:多重图的多视点原型对比学习 链接:https://arxiv.org/abs/2109.03560

作者:Baoyu Jing,Yuejia Xiang,Xi Chen,Yu Chen,Hanghang Tong 机构: University of Illinois at Urbana-Champaign, Platform and Content Group, Tencent 备注:Preprint. Work in progress 摘要:对比学习(CL)是最流行的图表示学习自监督学习框架之一,它通过区分正负节点对来训练图神经网络(GNN)。然而,图上的CL有两个挑战。一方面,传统的CL方法不可避免地会引入语义错误,因为它们将一些语义相似的节点视为负对。另一方面,大多数现有的CL方法忽略了现实世界图的多重性,其中节点通过各种关系连接,每个关系表示图的一个视图。为了应对这些挑战,我们提出了一种新的图多视图原型(Graph MVP)框架来提取多路图上的节点嵌入。首先,我们引入了一个图-原型对比学习(Graph-PCL)框架来捕获多重图的每个视图的节点级和语义级信息。Graph PCL通过一种简单而有效的数据转换技术捕获节点级信息。它通过期望最大化(EM)算法捕获语义层信息,该算法交替地对GNN的节点嵌入和参数更新执行聚类。接下来,我们在Graph PCL的基础上引入Graph MVP来联合建模多路图的不同视图。我们在Graph MVP背后的关键洞察是,同一节点的不同视图特定嵌入应该具有相似的底层语义,基于此,我们提出了Graph MVP的两个版本:Graph-MVP_hard和Graph-MVP_soft,以对齐视图之间的嵌入。最后,我们在各种真实数据集和下游任务上评估了所提出的图PCL和图MVP。实验结果证明了所提出的图PCL和图MVP框架的有效性。 摘要:Contrastive Learning (CL) is one of the most popular self-supervised learning frameworks for graph representation learning, which trains a Graph Neural Network (GNN) by discriminating positive and negative node pairs. However, there are two challenges for CL on graphs. On the one hand, traditional CL methods will unavoidably introduce semantic errors since they will treat some semantically similar nodes as negative pairs. On the other hand, most of the existing CL methods ignore the multiplexity nature of the real-world graphs, where nodes are connected by various relations and each relation represents a view of the graph. To address these challenges, we propose a novel Graph Multi-View Prototypical (Graph-MVP) framework to extract node embeddings on multiplex graphs. Firstly, we introduce a Graph Prototypical Contrastive Learning (Graph-PCL) framework to capture both node-level and semantic-level information for each view of multiplex graphs. Graph-PCL captures the node-level information by a simple yet effective data transformation technique. It captures the semantic-level information by an Expectation-Maximization (EM) algorithm, which alternatively performs clustering over node embeddings and parameter updating for GNN. Next, we introduce Graph-MVP based on Graph-PCL to jointly model different views of the multiplex graphs. Our key insight behind Graph-MVP is that different view-specific embeddings of the same node should have similar underlying semantic, based on which we propose two versions of Graph-MVP: Graph-MVP_hard and Graph-MVP_soft to align embeddings across views. Finally, we evaluate the proposed Graph-PCL and Graph-MVP on a variety of real-world datasets and downstream tasks. The experimental results demonstrate the effectiveness of the proposed Graph-PCL and Graph-MVP frameworks.

GAN|对抗|攻击|生成相关(5篇)

【1】 Diagnostics-Guided Explanation Generation 标题:诊断引导的解释生成 链接:https://arxiv.org/abs/2109.03756

作者:Pepa Atanasova,Jakob Grue Simonsen,Christina Lioma,Isabelle Augenstein 机构:Department of Computer Science, University of Copenhagen, Denmark 摘要:解释可以阐明机器学习模型的原理,并有助于识别其推理过程中的缺陷。解释生成模型通常以有监督的方式进行训练,给出人工解释。当这些注释不可用时,解释通常被选为输入中最大化下游任务性能的部分,这对应于优化解释对给定模型的忠实性。忠实性是几个所谓的诊断属性之一,之前的工作已经确定,它有助于在不需要注释的情况下衡量解释的质量。其他诊断属性包括数据一致性(测量相似输入实例的相似解释程度)和置信度指示(显示解释是否反映模型的置信度)。在这项工作中,我们展示了如何在训练模型生成句子级解释时直接优化这些诊断属性,这显著提高了解释质量、与人类基本原理的一致性以及三个复杂推理任务的下游任务性能。 摘要:Explanations shed light on a machine learning model's rationales and can aid in identifying deficiencies in its reasoning process. Explanation generation models are typically trained in a supervised way given human explanations. When such annotations are not available, explanations are often selected as those portions of the input that maximise a downstream task's performance, which corresponds to optimising an explanation's Faithfulness to a given model. Faithfulness is one of several so-called diagnostic properties, which prior work has identified as useful for gauging the quality of an explanation without requiring annotations. Other diagnostic properties are Data Consistency, which measures how similar explanations are for similar input instances, and Confidence Indication, which shows whether the explanation reflects the confidence of the model. In this work, we show how to directly optimise for these diagnostic properties when training a model to generate sentence-level explanations, which markedly improves explanation quality, agreement with human rationales, and downstream task performance on three complex reasoning tasks.

【2】 Shuffled Patch-Wise Supervision for Presentation Attack Detection 标题:用于表示攻击检测的混洗补丁智能监督 链接:https://arxiv.org/abs/2109.03484

作者:Alperen Kantarcı,Hasan Dertli,Hazım Kemal Ekenel 机构: Most of the systems that use CNNs overfit the data easily by memorizing reflection 1 Department of Computer Engineering, Istanbul Technical University 备注:Accepted to 20th International Conference of the Biometrics Special Interest Group (BIOSIG 2021) as Oral paper 摘要:面部防欺骗对于通过使用照片、视频、面具或其他替代品来防止虚假面部验证至关重要。大多数最先进的表示攻击检测(PAD)系统都存在过度拟合问题,即它们在单个数据集上获得接近完美的分数,但在具有更真实数据的不同数据集上失败。这个问题促使研究人员开发出在现实条件下表现良好的模型。对于使用卷积神经网络(CNN)的基于帧的表示攻击检测系统来说,这是一个特别具有挑战性的问题。为此,我们提出了一种新的PAD方法,它将像素级二进制监控与基于补丁的CNN相结合。我们相信,使用面片训练CNN可以让模型在不学习背景或数据集特定痕迹的情况下识别欺骗。我们在标准基准数据集(Replay Mobile,OULU-NPU)和真实数据集上测试了所提出的方法。该方法在具有挑战性的实验装置上显示了其优越性。也就是说,它在OULU-NPU协议3、4和数据集间的真实实验中实现了更高的性能。 摘要:Face anti-spoofing is essential to prevent false facial verification by using a photo, video, mask, or a different substitute for an authorized person's face. Most of the state-of-the-art presentation attack detection (PAD) systems suffer from overfitting, where they achieve near-perfect scores on a single dataset but fail on a different dataset with more realistic data. This problem drives researchers to develop models that perform well under real-world conditions. This is an especially challenging problem for frame-based presentation attack detection systems that use convolutional neural networks (CNN). To this end, we propose a new PAD approach, which combines pixel-wise binary supervision with patch-based CNN. We believe that training a CNN with face patches allows the model to distinguish spoofs without learning background or dataset-specific traces. We tested the proposed method both on the standard benchmark datasets -- Replay-Mobile, OULU-NPU -- and on a real-world dataset. The proposed approach shows its superiority on challenging experimental setups. Namely, it achieves higher performance on OULU-NPU protocol 3, 4 and on inter-dataset real-world experiments.

【3】 Real-World Adversarial Examples involving Makeup Application 标题:涉及化妆应用的真实对抗性例子 链接:https://arxiv.org/abs/2109.03329

作者:Chang-Sheng Lin,Chia-Yi Hsu,Pin-Yu Chen,Chia-Mu Yu 机构:Nation Chung Hsing University, National Yang Ming Chiao Tung University, IBM Thomas J. Watson Research Center 摘要:深度神经网络发展迅速,在图像分类和自然语言处理等多项任务中取得了优异的性能。然而,最近的研究表明,数字和物理对抗的例子都可以愚弄神经网络。人脸识别系统用于各种应用中,涉及来自物理对抗示例的安全威胁。在此,我们建议使用全脸化妆进行身体对抗性攻击。面部化妆是一种合理的可能性,这可能会增加攻击的隐蔽性。在我们的攻击框架中,我们结合了循环对抗生成网络(cyclegan)和受害分类器。循环GAN用于生成对抗性构图,受害分类器的体系结构为VGG 16。我们的实验结果表明,我们的攻击可以有效地克服化妆应用中的手动错误,例如颜色和位置相关错误。我们还证明了用于训练模型的方法可以影响物理攻击;根据预先训练的模型制作的对抗性干扰受到相应训练数据的影响。 摘要:Deep neural networks have developed rapidly and have achieved outstanding performance in several tasks, such as image classification and natural language processing. However, recent studies have indicated that both digital and physical adversarial examples can fool neural networks. Face-recognition systems are used in various applications that involve security threats from physical adversarial examples. Herein, we propose a physical adversarial attack with the use of full-face makeup. The presence of makeup on the human face is a reasonable possibility, which possibly increases the imperceptibility of attacks. In our attack framework, we combine the cycle-adversarial generative network (cycle-GAN) and a victimized classifier. The Cycle-GAN is used to generate adversarial makeup, and the architecture of the victimized classifier is VGG 16. Our experimental results show that our attack can effectively overcome manual errors in makeup application, such as color and position-related errors. We also demonstrate that the approaches used to train the models can influence physical attacks; the adversarial perturbations crafted from the pre-trained model are affected by the corresponding training data.

【4】 Simple Video Generation using Neural ODEs 标题:使用神经ODE的简单视频生成 链接:https://arxiv.org/abs/2109.03292

作者:David Kanaa,Vikram Voleti,Samira Ebrahimi Kahou,Christopher Pal 机构:Université de Montréal, Mila, École de technologie supérieure, Mila, CIFAR, Polytechnique Montréal, Mila, CIFAR 备注:None 摘要:尽管已经在很大程度上进行了研究,但有条件地生成帧序列或视频的任务仍然极具挑战性。人们普遍认为,解决这一问题的关键步骤在于对视频信号中的时空信息进行精确建模。一个很有希望的方向是学习潜在变量模型,预测潜在空间中的未来并将其投影回像素,正如最近文献中所建议的那样。遵循这一工作思路,在先前工作中介绍的一系列模型(Neural ODE)的基础上,我们研究了一种方法,该方法通过一个关于时间的微分方程,对连续潜空间上的时间连续动力学进行建模。这种方法背后的直觉是,这些潜在空间中的轨迹可以被外推,以生成超出模型训练时间步长的视频帧。我们表明,我们的方法在移动MNIST数据集上的1位和2位未来帧预测任务中产生了有希望的结果。 摘要:Despite having been studied to a great extent, the task of conditional generation of sequences of frames, or videos, remains extremely challenging. It is a common belief that a key step towards solving this task resides in modelling accurately both spatial and temporal information in video signals. A promising direction to do so has been to learn latent variable models that predict the future in latent space and project back to pixels, as suggested in recent literature. Following this line of work and building on top of a family of models introduced in prior work, Neural ODE, we investigate an approach that models time-continuous dynamics over a continuous latent space with a differential equation with respect to time. The intuition behind this approach is that these trajectories in latent space could then be extrapolated to generate video frames beyond the time steps for which the model is trained. We show that our approach yields promising results in the task of future frame prediction on the Moving MNIST dataset with 1 and 2 digits.

【5】 AWGAN: Empowering High-Dimensional Discriminator Output for Generative Adversarial Networks 标题:AWGAN:增强生成性对抗网络高维鉴别器输出的能力 链接:https://arxiv.org/abs/2109.03378

作者:Mengyu Dai,Haibin Hang,Anuj Srivastava 机构: Microsoft, University of Delaware, Florida State University, Department of Statistics 摘要:经验性多维鉴别器(批评家)输出可能是有利的,但尚未对其进行详细解释。在本文中,(i)我们严格证明了高维批评输出在区分真分布和假分布方面具有优势;(ii)我们还引入了平方根速度变换(SRVT)块,进一步放大了这一优势。证明基于我们提出的最大p-中心性差异,该差异以p-Wasserstein距离为界,完全符合高维临界输出n的Wasserstein-GAN框架。我们还表明,当n=1时,建议的差异相当于1-Wasserstein距离。SRVT块用于打破高维批评输出的对称结构,提高鉴别器网络的泛化能力。在实现方面,建议的框架不需要额外的超参数调整,这在很大程度上促进了其使用。对图像生成任务的实验表明,与基准数据集相比,性能有所提高。 摘要:Empirically multidimensional discriminator (critic) output can be advantageous, while a solid explanation for it has not been discussed. In this paper, (i) we rigorously prove that high-dimensional critic output has advantage on distinguishing real and fake distributions; (ii) we also introduce an square-root velocity transformation (SRVT) block which further magnifies this advantage. The proof is based on our proposed maximal p-centrality discrepancy which is bounded above by p-Wasserstein distance and perfectly fits the Wasserstein GAN framework with high-dimensional critic output n. We have also showed when n = 1, the proposed discrepancy is equivalent to 1-Wasserstein distance. The SRVT block is applied to break the symmetric structure of high-dimensional critic output and improve the generalization capability of the discriminator network. In terms of implementation, the proposed framework does not require additional hyper-parameter tuning, which largely facilitates its usage. Experiments on image generation tasks show performance improvement on benchmark datasets.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】 Active Learning by Acquiring Contrastive Examples 标题:通过获取对比实例进行主动学习 链接:https://arxiv.org/abs/2109.03764

作者:Katerina Margatina,Giorgos Vernikos,Loïc Barrault,Nikolaos Aletras 机构:†University of Sheffield, ‡EPFL, ∗HEIG-VD 备注:Accepted at EMNLP 2021 摘要:用于主动学习的常见采集函数使用不确定性或多样性采样,旨在分别从未标记数据池中选择困难和多样的数据点。在这项工作中,利用这两个方面的优势,我们提出了一个采集函数,用于选择\text{对比示例},即模型特征空间中相似但模型输出最大不同预测可能性的数据点。我们将我们的方法CAL(对比主动学习)与四个自然语言理解任务和七个数据集中的一组不同的习得功能进行了比较。我们的实验表明,无论是域内还是域外数据,CAL在所有任务中的性能始终优于或等于最佳性能基线。我们还对我们的方法进行了广泛的研究,并进一步分析了所有主动获取的数据集,结果表明,与其他策略相比,CAL在不确定性和多样性之间实现了更好的权衡。 摘要:Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting \textit{contrastive examples}, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive Active Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.

【2】 Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering 标题:口语答疑中的自监督对比跨模态表征学习 链接:https://arxiv.org/abs/2109.03381

作者:Chenyu You,Nuo Chen,Yuexian Zou 机构:† Department of Electrical Engineering, Yale University, New Haven, CT, USA, ‡ADSPLAB, School of ECE, Peking University, Shenzhen, China, §Peng Cheng Laboratory, Shenzhen, China 摘要:口语问答(SQA)需要对口语文档和问题进行细粒度的理解,以实现最佳答案预测。在本文中,我们提出了一种新的口语问答训练方案,包括自我监督训练阶段和对比表征学习阶段。在自监督阶段,我们提出了三个辅助的自监督任务,包括话语恢复、话语插入和问题识别,并联合训练模型在不需要任何附加数据或注释的情况下捕获语音文档之间的一致性和连贯性。然后,我们提出在对比目标中通过多种增广策略学习噪声不变的话语表征,包括广度删除和广度替换。此外,我们还设计了一种时间对齐方法,注意在学习到的公共空间中对语音文本线索进行语义对齐,从而有利于SQA任务的完成。通过这种方式,训练方案可以更有效地指导生成模型预测更合适的答案。实验结果表明,我们的模型在三个SQA基准上达到了最先进的结果。 摘要:Spoken question answering (SQA) requires fine-grained understanding of both spoken documents and questions for the optimal answer prediction. In this paper, we propose novel training schemes for spoken question answering with a self-supervised training stage and a contrastive representation learning stage. In the self-supervised stage, we propose three auxiliary self-supervised tasks, including utterance restoration, utterance insertion, and question discrimination, and jointly train the model to capture consistency and coherence among speech documents without any additional data or annotations. We then propose to learn noise-invariant utterance representations in a contrastive objective by adopting multiple augmentation strategies, including span deletion and span substitution. Besides, we design a Temporal-Alignment attention to semantically align the speech-text clues in the learned common space and benefit the SQA tasks. By this means, the training schemes can more effectively guide the generation model to predict more proper answers. Experimental results show that our model achieves state-of-the-art results on three SQA benchmarks.

【3】 Uncertainty Quantification and Experimental Design for large-scale linear Inverse Problems under Gaussian Process Priors 标题:高斯过程先验下大规模线性反问题的不确定性量化与实验设计 链接:https://arxiv.org/abs/2109.03457

作者:Cédric Travelletti,David Ginsbourger,Niklas Linde 机构:‡InstituteofMathematicalStatisticsandActuarialScience, UniversityofBern 备注:under review 摘要:我们考虑使用高斯过程(GP)先验来解决逆问题的贝叶斯框架。众所周知,GPs的计算复杂度是以数据点的数量为单位的。我们在这里表明,在涉及积分算子的反问题中,人们面临着阻碍在大网格上反演的额外困难。此外,在这种情况下,协方差矩阵可能变得太大而无法存储。通过利用高斯测度序列分解的结果,我们能够引入后验协方差矩阵的隐式表示,通过仅存储低秩中间矩阵来减少内存占用,同时允许动态访问单个元素,而无需构建完整的后验协方差矩阵。此外,它允许快速有序地包含新的观测值。在考虑顺序实验设计任务时,这些特性至关重要。我们通过计算重力反问题偏移集恢复的顺序数据收集计划来演示我们的方法,其目标是提供意大利斯特龙博利火山内部高密度区域的精细分辨率估计。通过将加权综合方差缩减(wIVR)准则扩展到反问题,计算顺序数据收集计划。我们的结果表明,该标准能够显著降低偏移量的不确定度,达到接近最小剩余不确定度水平。总的来说,我们的技术允许在自然科学中产生的大规模反问题上发挥概率模型的优势。 摘要:We consider the use of Gaussian process (GP) priors for solving inverse problems in a Bayesian framework. As is well known, the computational complexity of GPs scales cubically in the number of datapoints. We here show that in the context of inverse problems involving integral operators, one faces additional difficulties that hinder inversion on large grids. Furthermore, in that context, covariance matrices can become too large to be stored. By leveraging results about sequential disintegrations of Gaussian measures, we are able to introduce an implicit representation of posterior covariance matrices that reduces the memory footprint by only storing low rank intermediate matrices, while allowing individual elements to be accessed on-the-fly without needing to build full posterior covariance matrices. Moreover, it allows for fast sequential inclusion of new observations. These features are crucial when considering sequential experimental design tasks. We demonstrate our approach by computing sequential data collection plans for excursion set recovery for a gravimetric inverse problem, where the goal is to provide fine resolution estimates of high density regions inside the Stromboli volcano, Italy. Sequential data collection plans are computed by extending the weighted integrated variance reduction (wIVR) criterion to inverse problems. Our results show that this criterion is able to significantly reduce the uncertainty on the excursion volume, reaching close to minimal levels of residual uncertainty. Overall, our techniques allow the advantages of probabilistic models to be brought to bear on large-scale inverse problems arising in the natural sciences.

【4】 Weakly supervised semantic segmentation of tomographic images in the diagnosis of stroke 标题:中风诊断中断层图像的弱监督语义分割 链接:https://arxiv.org/abs/2109.01887

作者:Anna Dobshik,Andrey Tulupov,Vladimir Berikov 机构: 3 1 Novosibirsk State University, Russia 3 Sobolev Institute of mathematics SB RAS 备注:6 pages, 3 figures, 1 table 摘要:本文提出了一种在非对比度CT脑图像上自动分割急性中风影响区域的算法。该算法设计用于在弱监督情况下,当一些图像被精确标记,而一些图像被错误标记时的学习。由于放射科医生在人工注释计算机断层扫描图像的过程中所做的不准确,会出现错误的标签。我们提出了在训练数据标记不准确的情况下解决分割问题的方法。我们使用了U-Net神经网络结构,并进行了一些修改。实际CT扫描实验表明,该方法提高了分割精度。 摘要:This paper presents an automatic algorithm for the segmentation of areas affected by an acute stroke on the non-contrast computed tomography brain images. The proposed algorithm is designed for learning in a weakly supervised scenario when some images are labeled accurately, and some images are labeled inaccurately. Wrong labels appear as a result of inaccuracy made by a radiologist in the process of manual annotation of computed tomography images. We propose methods for solving the segmentation problem in the case of inaccurately labeled training data. We use the U-Net neural network architecture with several modifications. Experiments on real computed tomography scans show that the proposed methods increase the segmentation accuracy.

迁移|Zero/Few/One-Shot|自适应(2篇)

【1】 FedZKT: Zero-Shot Knowledge Transfer towards Heterogeneous On-Device Models in Federated Learning 标题:FedZKT:联邦学习中面向异构设备模型的零概率知识转移 链接:https://arxiv.org/abs/2109.03775

作者:Lan Zhang,Xiaoyong Yuan 机构:Michigan Technological University 备注:13 pages 摘要:联合学习使分布式设备能够协作学习共享预测模型,而无需集中于设备训练数据。目前的大多数算法都需要在具有相同结构和尺寸的设备模型上进行类似的个人训练,这阻碍了资源受限设备的参与。针对当前广泛存在的异构设备,本文提出了一个新的框架,该框架通过零炮知识转移支持跨异构设备模型的联邦学习,名为FedZKT。具体而言,FedZKT允许参与设备独立确定其设备型号。为了在设备模型上传递知识,FedZKT开发了一种零丸蒸馏方法,与基于公共数据集或预先训练的数据生成器的某些先前研究相反。为了最大程度地减少设备上的工作负载,将资源密集型蒸馏任务分配给服务器,服务器构造一个生成器,以使用接收到的异构设备上模型的集合进行对抗性训练。然后,提取的中心知识将以相应的设备模型参数的形式发回,这些参数很容易在设备端吸收。实验研究证明了FedZKT对异构设备模型和具有挑战性的联邦学习场景(如非iid数据分布和离散效应)的有效性和鲁棒性。 摘要:Federated learning enables distributed devices to collaboratively learn a shared prediction model without centralizing on-device training data. Most of the current algorithms require comparable individual efforts to train on-device models with the same structure and size, impeding participation from resource-constrained devices. Given the widespread yet heterogeneous devices nowadays, this paper proposes a new framework supporting federated learning across heterogeneous on-device models via Zero-shot Knowledge Transfer, named by FedZKT. Specifically, FedZKT allows participating devices to independently determine their on-device models. To transfer knowledge across on-device models, FedZKT develops a zero-shot distillation approach contrary to certain prior research based on a public dataset or a pre-trained data generator. To utmostly reduce on-device workload, the resource-intensive distillation task is assigned to the server, which constructs a generator to adversarially train with the ensemble of the received heterogeneous on-device models. The distilled central knowledge will then be sent back in the form of the corresponding on-device model parameters, which can be easily absorbed at the device side. Experimental studies demonstrate the effectiveness and the robustness of FedZKT towards heterogeneous on-device models and challenging federated learning scenarios, such as non-iid data distribution and straggler effects.

【2】 CRNNTL: convolutional recurrent neural network and transfer learning for QSAR modelling 标题:CRNNTL:用于QSAR建模的卷积递归神经网络和转移学习 链接:https://arxiv.org/abs/2109.03309

作者:Yaqin Li,Yongjin Xu,Yi Yu 机构: West China Hospital, Sichuan University, Chengdu, PR, China, Department of Chemistry and Molecular Biology, University of Gothenburg, Kemivägen , Gothenburg, Sweden 摘要:在这项研究中,我们提出了卷积递归神经网络和转移学习(CRNNTL)用于QSAR建模。该方法受复调声音检测和心电图分类应用的启发。我们的策略利用卷积神经网络和递归神经网络进行特征提取,以及数据增强方法。在此,与基线方法相比,CRNNTL在20个基准数据集上进行了评估。此外,一个基于异构体的数据集被用来阐明其局部和全局特征提取的能力。然后,测试了CRNNTL的知识转移性能,特别是对于小型生物活性数据集。最后,使用其他类型AEs的不同潜在表示对我们的模型进行多功能性研究。结果表明,使用不同的潜在表示法,CRNNTL是有效的。此外,考虑到不同目标之间的结合位点相似性,实现了有效的知识转移以克服数据稀缺性。 摘要:In this study, we propose the convolutional recurrent neural network and transfer learning (CRNNTL) for QSAR modelling. The method was inspired by the applications of polyphonic sound detection and electrocardiogram classification. Our strategy takes advantages of both convolutional and recurrent neural networks for feature extraction, as well as the data augmentation method. Herein, CRNNTL is evaluated on 20 benchmark datasets in comparison with baseline methods. In addition, one isomers based dataset is used to elucidate its ability for both local and global feature extraction. Then, knowledge transfer performance of CRNNTL is tested, especially for small biological activity datasets. Finally, different latent representations from other type of AEs were used for versatility study of our model. The results show the effectiveness of CRNNTL using different latent representation. Moreover, efficient knowledge transfer is achieved to overcome data scarcity considering binding site similarity between different targets.

强化学习(2篇)

【1】 A Deep Reinforcement Learning Approach for Constrained Online Logistics Route Assignment 标题:约束在线物流路径分配的深度强化学习方法 链接:https://arxiv.org/abs/2109.03467

作者:Hao Zeng,Yangdong Liu,Dandan Zhang,Kunpeng Han,Haoyuan Hu 机构:Cainiao Network, Wenyi West Road, Xixi Block B, Hangzhou, China 备注:8 pages, 7 figures 摘要:随着网上购物的盛行和电子商务平台的出现,每天都有大量包裹被运送。因此,对于物流行业来说,如何为每个运输包裹正确分配候选物流路线至关重要,因为这会对总物流成本优化和业务约束满意度(如运输枢纽容量和交付供应商的交付比例)产生重大影响。该在线路径分配问题可以看作是一个有约束的在线决策问题。值得注意的是,每天大量(超过${10^5}$)的包裹,包裹信息的可变性和非马尔可夫特征,在不过度违反约束的情况下,难以获得(接近)最优解。在本文中,我们开发了一种称为PPO-RA的无模型DRL方法,其中使用专用技术改进了近端策略优化(PPO),以应对路由分配(RA)的挑战。参与者和评论家网络使用注意机制和参数共享来适应每个具有不同数量和身份的候选路线的传入包裹,而无需建模非马尔可夫包裹到达动力学,因为我们假设了i.i.d.包裹到达。我们通过模拟将PPO-RA与广泛使用的基线进行比较,使用记录的交付包裹数据来评估PPO-RA的性能。结果表明,该方法能够在满足大多数约束条件的同时实现可观的成本节约。 摘要:As online shopping prevails and e-commerce platforms emerge, there is a tremendous number of parcels being transported every day. Thus, it is crucial for the logistics industry on how to assign a candidate logistics route for each shipping parcel properly as it leaves a significant impact on the total logistics cost optimization and business constraints satisfaction such as transit hub capacity and delivery proportion of delivery providers. This online route-assignment problem can be viewed as a constrained online decision-making problem. Notably, the large amount (beyond ${10^5}$) of daily parcels, the variability and non-Markovian characteristics of parcel information impose difficulties on attaining (near-) optimal solution without violating constraints excessively. In this paper, we develop a model-free DRL approach named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with dedicated techniques to address the challenges for route assignment (RA). The actor and critic networks use attention mechanism and parameter sharing to accommodate each incoming parcel with varying numbers and identities of candidate routes, without modeling non-Markovian parcel arriving dynamics since we make assumption of i.i.d. parcel arrival. We use recorded delivery parcel data to evaluate the performance of PPO-RA by comparing it with widely-used baselines via simulation. The results show the capability of the proposed approach to achieve considerable cost savings while satisfying most constraints.

【2】 Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning 标题:批异步随机逼近的收敛性及其在强化学习中的应用 链接:https://arxiv.org/abs/2109.03445

作者:Rajeeva L. Karandikar,M. Vidyasagar 备注:11 pages 摘要:随机近似(SA)算法是一种广泛使用的概率方法,用于寻找形式为$\mathbf{f}(\boldsymbol{\theta})=\mathbf{0}$的方程的解,其中$\mathbf{f}:\mathbb{R}^d\rightarrow\mathbb{R}^d$,此时只有$\mathbf f{f}(\cdot)的噪声测量值可用。在迄今为止的文献中,人们可以区分“同步”更新和“异步”更新,前者是指每次更新当前猜测$\boldsymbol{\theta}\u t$的整个向量,后者是指更新$\boldsymbol{\theta}\u t$的一个组件。在凸优化和非凸优化中,还存在“批量”更新的概念,即$\boldsymbol{\theta}\u t$的部分但不是全部组件在每次$t$时更新。此外,使用“本地”时钟和“全局”时钟之间也有区别。在迄今为止的文献中,当使用本地时钟时,收敛性证明假设测量噪声是i.i.d \序列,这一假设在强化学习(RL)中不成立。在本文中,我们提供了批非同步随机逼近(BASA)的一般收敛理论,该理论适用于测量噪声形成鞅差序列的情况,无论更新使用本地时钟还是全局时钟。这是迄今为止最普遍的结果,包括所有其他结果。 摘要:The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a solution to an equation of the form $\mathbf{f}(\boldsymbol{\theta}) = \mathbf{0}$ where $\mathbf{f} : \mathbb{R}^d \rightarrow \mathbb{R}^d$, when only noisy measurements of $\mathbf{f}(\cdot)$ are available. In the literature to date, one can make a distinction between "synchronous" updating, whereby the entire vector of the current guess $\boldsymbol{\theta}_t$ is updated at each time, and "asynchronous" updating, whereby ony one component of $\boldsymbol{\theta}_t$ is updated. In convex and nonconvex optimization, there is also the notion of "batch" updating, whereby some but not all components of $\boldsymbol{\theta}_t$ are updated at each time $t$. In addition, there is also a distinction between using a "local" clock versus a "global" clock. In the literature to date, convergence proofs when a local clock is used make the assumption that the measurement noise is an i.i.d\ sequence, an assumption that does not hold in Reinforcement Learning (RL). In this note, we provide a general theory of convergence for batch asymchronous stochastic approximation (BASA), that works whether the updates use a local clock or a global clock, for the case where the measurement noises form a martingale difference sequence. This is the most general result to date and encompasses all others.

符号|符号学习(1篇)

【1】 Signal-domain representation of symbolic music for learning embedding spaces 标题:用于学习嵌入空间的符号音乐的信号域表示 链接:https://arxiv.org/abs/2109.03454

作者:Mathieu Prang,Philippe Esling 机构:IRCAM 备注:None 摘要:机器学习模型的一个关键方面在于它们学习有效中间特征的能力。然而,输入表征在这一过程中起着至关重要的作用,而复调乐谱仍然是一种特别复杂的信息类型。在本文中,我们介绍了一种新的符号音乐数据表示方法,它将复调乐谱转换为连续信号。我们从音乐的角度评估从这种表现中学习有意义特征的能力。因此,我们介绍了一种基于合成数据原则生成的评估方法。最后,为了测试我们提出的表示,我们对最近的复调符号表示进行了广泛的基准测试。我们表明,我们的信号表示导致更好的重建和分离特征。这种改进反映在度量属性和根据音乐理论属性从我们的类信号表示学习到的空间的生成能力上。 摘要:A key aspect of machine learning models lies in their ability to learn efficient intermediate features. However, the input representation plays a crucial role in this process, and polyphonic musical scores remain a particularly complex type of information. In this paper, we introduce a novel representation of symbolic music data, which transforms a polyphonic score into a continuous signal. We evaluate the ability to learn meaningful features from this representation from a musical point of view. Hence, we introduce an evaluation method relying on principled generation of synthetic data. Finally, to test our proposed representation we conduct an extensive benchmark against recent polyphonic symbolic representations. We show that our signal-like representation leads to better reconstruction and disentangled features. This improvement is reflected in the metric properties and in the generation ability of the space learned from our signal-like representation according to music theory properties.

医学相关(3篇)

【1】 Disentangling Alzheimer's disease neurodegeneration from typical brain aging using machine learning 标题:基于机器学习的阿尔茨海默病神经变性与典型脑老化的分离 链接:https://arxiv.org/abs/2109.03723

作者:Gyujoon Hwang,Ahmed Abdulkadir,Guray Erus,Mohamad Habes,Raymond Pomponio,Haochang Shou,Jimit Doshi,Elizabeth Mamourian,Tanweer Rashid,Murat Bilgel,Yong Fan,Aristeidis Sotiras,Dhivya Srinivasan,John C. Morris,Daniel Marcus,Marilyn S. Albert,Nick R. Bryan,Susan M. Resnick,Ilya M. Nasrallah,Christos Davatzikos,David A. Wolk 备注:4 figures, 3 tables 摘要:区分典型脑老化和阿尔茨海默病(AD)的神经成像生物标记物对于确定两者对认知能力下降的贡献程度很有价值。机器学习模型可以导出与这两个过程相关的多变量大脑变化模式,包括本文研究的SPARE-AD(识别阿尔茨海默病的萎缩空间模式)和SPARE-BA(大脑老化)。然而,在这两个过程中受影响的大脑区域之间的大量重叠使单独测量它们变得混淆。我们提出了一种方法来解开这两者。分析了来自iSTAGING(基于成像的老年和神经退行性疾病坐标系)联盟的4054名AD、轻度认知障碍(MCI)或认知正常(CN)诊断参与者(48-95岁)的T1加权MRI图像。首先,仅根据临床诊断选择AD患者和CN成人的子集,以训练SPARE-BA1(使用CN个体进行年龄回归)和SPARE-AD1(CN与AD的分类)。其次,根据临床和分子标记选择类似组来训练备用BA2和备用AD2:淀粉样阳性(A+)AD连续组(由A+AD、A+MCI和A+和tau阳性CN个体组成)和淀粉样阴性(A-)CN组。最后,使用AD连续体和A-/CN个体组成的联合组来训练备用BA3,目的是估算大脑年龄,而不考虑与AD相关的大脑变化。分离备用模型得出的大脑模式对这两种类型的大脑变化更为具体。备用BA和备用AD之间的相关性显著降低。分离的SPARE-AD的相关性并不低于分子测量和APOE4等位基因的数量,但与AD相关的心理测试分数的相关性较小,这表明晚期大脑老化对这些分数的贡献。 摘要:Neuroimaging biomarkers that distinguish between typical brain aging and Alzheimer's disease (AD) are valuable for determining how much each contributes to cognitive decline. Machine learning models can derive multi-variate brain change patterns related to the two processes, including the SPARE-AD (Spatial Patterns of Atrophy for Recognition of Alzheimer's Disease) and SPARE-BA (of Brain Aging) investigated herein. However, substantial overlap between brain regions affected in the two processes confounds measuring them independently. We present a methodology toward disentangling the two. T1-weighted MRI images of 4,054 participants (48-95 years) with AD, mild cognitive impairment (MCI), or cognitively normal (CN) diagnoses from the iSTAGING (Imaging-based coordinate SysTem for AGIng and NeurodeGenerative diseases) consortium were analyzed. First, a subset of AD patients and CN adults were selected based purely on clinical diagnoses to train SPARE-BA1 (regression of age using CN individuals) and SPARE-AD1 (classification of CN versus AD). Second, analogous groups were selected based on clinical and molecular markers to train SPARE-BA2 and SPARE-AD2: amyloid-positive (A+) AD continuum group (consisting of A+AD, A+MCI, and A+ and tau-positive CN individuals) and amyloid-negative (A-) CN group. Finally, the combined group of the AD continuum and A-/CN individuals was used to train SPARE-BA3, with the intention to estimate brain age regardless of AD-related brain changes. Disentangled SPARE models derived brain patterns that were more specific to the two types of the brain changes. Correlation between the SPARE-BA and SPARE-AD was significantly reduced. Correlation of disentangled SPARE-AD was non-inferior to the molecular measurements and to the number of APOE4 alleles, but was less to AD-related psychometric test scores, suggesting contribution of advanced brain aging to these scores.

【2】 fastMRI+: Clinical Pathology Annotations for Knee and Brain Fully Sampled Multi-Coil MRI Data 标题:FastMRI+:膝关节和大脑全采样多线圈MRI数据的临床病理注释 链接:https://arxiv.org/abs/2109.03812

作者:Ruiyang Zhao,Burhaneddin Yaman,Yuxin Zhang,Russell Stewart,Austin Dixon,Florian Knoll,Zhengnan Huang,Yvonne W. Lui,Michael S. Hansen,Matthew P. Lungren 机构: University of Wisconsin-Madison, Department of Radiology, University of Wisconsin-Madison, Department of Medical Physics, University of Minnesota, Department of Electrical and Computer Engineering, Stanford University, School of Medicine 摘要:通过新的重建方法提高磁共振成像(MRI)的速度和图像质量仍然是医学成像深度学习中最具影响力的应用之一。fastMRI数据集的独特之处在于它包含大量原始MRI数据,它在使用基于深度学习的重建方法加速MRI方面取得了重大进展。尽管fastMRI数据集对医学成像领域的影响是毋庸置疑的,但该数据集目前缺乏临床专家病理学注释,这对于解决临床相关重建框架和探索使用此类新方法呈现特定病理学的重要问题至关重要。这项工作介绍了fastMRI+,它包括fastMRI膝关节数据集上22个不同病理类别的16154个亚专科专家边界框注释和13个研究级标签,以及fastMRI大脑数据集上30个不同病理类别的7570个亚专科专家边界框注释和643个研究级标签。fastMRI+数据集是开放存取的,旨在支持MRI重建和其他领域医学成像的进一步研究和进步。 摘要:Improving speed and image quality of Magnetic Resonance Imaging (MRI) via novel reconstruction approaches remains one of the highest impact applications for deep learning in medical imaging. The fastMRI dataset, unique in that it contains large volumes of raw MRI data, has enabled significant advances in accelerating MRI using deep learning-based reconstruction methods. While the impact of the fastMRI dataset on the field of medical imaging is unquestioned, the dataset currently lacks clinical expert pathology annotations, critical to addressing clinically relevant reconstruction frameworks and exploring important questions regarding rendering of specific pathology using such novel approaches. This work introduces fastMRI+, which consists of 16154 subspecialist expert bounding box annotations and 13 study-level labels for 22 different pathology categories on the fastMRI knee dataset, and 7570 subspecialist expert bounding box annotations and 643 study-level labels for 30 different pathology categories for the fastMRI brain dataset. The fastMRI+ dataset is open access and aims to support further research and advancement of medical imaging in MRI reconstruction and beyond.

【3】 A New Non-Negative Matrix Co-Factorisation Approach for Noisy Neonatal Chest Sound Separation 标题:一种新的噪声新生儿胸音分离的非负矩阵协分解方法 链接:https://arxiv.org/abs/2109.03275

作者:Ethan Grooby,Jinyuan He,Davood Fattahi,Lindsay Zhou,Arrabella King,Ashwin Ramanathan,Atul Malhotra,Guy A. Dumont,Faezeh Marzbanrad 机构: Monash Children’s Hospital and Department of Paediatrics, Monash University 备注:6 pages, 2 figures. To appear as conference paper at 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 1st-5th November 2021 摘要:获得高质量的心音和肺音可以让临床医生准确评估新生儿的心肺健康状况,并提供及时的护理。然而,嘈杂的胸部录音是常见的,阻碍了及时和准确的评估。为了解决这一问题,提出了一种新的基于非负矩阵协因子分解的方法,将有噪声的胸部录音分离为心脏、肺和噪声分量。这种方法是通过训练20个高质量的心音和肺音,同时分离噪音记录的声音来实现的。该方法在68个包含心音和肺音的10秒噪声记录上进行了测试,并与当前最先进的非负矩阵分解方法进行了比较。结果表明,与现有方法相比,心音和肺音质量分数分别显著提高,心音和呼吸频率估计的准确度分别提高了3.6bpm和1.2bpm。 摘要:Obtaining high-quality heart and lung sounds enables clinicians to accurately assess a newborn's cardio-respiratory health and provide timely care. However, noisy chest sound recordings are common, hindering timely and accurate assessment. A new Non-negative Matrix Co-Factorisation-based approach is proposed to separate noisy chest sound recordings into heart, lung, and noise components to address this problem. This method is achieved through training with 20 high-quality heart and lung sounds, in parallel with separating the sounds of the noisy recording. The method was tested on 68 10-second noisy recordings containing both heart and lung sounds and compared to the current state of the art Non-negative Matrix Factorisation methods. Results show significant improvements in heart and lung sound quality scores respectively, and improved accuracy of 3.6bpm and 1.2bpm in heart and breathing rate estimation respectively, when compared to existing methods.

蒸馏|知识提取(1篇)

【1】 Dual Correction Strategy for Ranking Distillation in Top-N Recommender System 标题:Top-N推荐系统中排名精馏的双重校正策略 链接:https://arxiv.org/abs/2109.03459

作者:Youngjune Lee,Kee-Eung Kim 机构:School of Computing, KAIST, Daejeon, Republic of Korea, Graduate School of AI, KAIST, Daejeon, Republic of Korea 摘要:知识提取(KD)将训练有素的大模型(教师)的知识转化为小模型(学生),已成为推荐系统实际部署的一个重要研究领域。最近,松弛排序提取(RRD)表明,提取推荐列表中的排序信息可以显著提高性能。然而,该方法仍存在局限性:1)没有充分利用学生模型的预测误差,导致训练效率不高;2)只提取用户端排名信息,在稀疏隐式反馈下提供的视图不足。本文提出了一种双校正蒸馏策略(DCD),该策略能更有效地将排序信息从教师模型传递到学生模型。最重要的是,DCD使用教师模型和学生模型预测之间的差异来决定要提取哪些知识。通过这样做,DCD本质上提供了定制的学习指导,以“纠正”学生模型未能准确预测的内容。该过程用于传输来自用户端和项目端的排名信息,以处理稀疏的隐式用户反馈。我们的实验表明,所提出的方法优于最先进的基线,烧蚀研究验证了每个组件的有效性。 摘要:Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) to a small model (student), has become an important area of research for practical deployment of recommender systems. Recently, Relaxed Ranking Distillation (RRD) has shown that distilling the ranking information in the recommendation list significantly improves the performance. However, the method still has limitations in that 1) it does not fully utilize the prediction errors of the student model, which makes the training not fully efficient, and 2) it only distills the user-side ranking information, which provides an insufficient view under the sparse implicit feedback. This paper presents Dual Correction strategy for Distillation (DCD), which transfers the ranking information from the teacher model to the student model in a more efficient manner. Most importantly, DCD uses the discrepancy between the teacher model and the student model predictions to decide which knowledge to be distilled. By doing so, DCD essentially provides the learning guidance tailored to "correcting" what the student model has failed to accurately predict. This process is applied for transferring the ranking information from the user-side as well as the item-side to address sparse implicit user feedback. Our experiments show that the proposed method outperforms the state-of-the-art baselines, and ablation studies validate the effectiveness of each component.

推荐(1篇)

【1】 DeepAltTrip: Top-k Alternative Itineraries for Trip Recommendation 标题:DeepAltTrip:旅行推荐的Top-k备选行程 链接:https://arxiv.org/abs/2109.03535

作者:Syed Md. Mukit Rashid,Mohammed Eunus Ali,Muhammad Aamir Cheema 摘要:出行路线推荐从城市中大量候选POI中查找有序的兴趣点序列(POI)。在本文中,我们提出了一个基于深度学习的框架,称为DeepAltTrip,该框架学习为给定的来源和目的地POI推荐top-k备选路线。考虑到过去用户所采用的历史路线,这些替代路线不仅会很受欢迎,而且彼此之间也会不一样(或不同)。DeepAltTrip由两个主要部分组成:(i)行程网络(ITRNet),它通过使用图形自动编码器和两个(向前和向后)LSTM来估计行程上出现POI的可能性;以及(ii)路线生成程序,用于生成通过使用ITRNet获得的相关POI的k条不同路线。对于路由生成步骤,我们提出了一种新的采样算法,可以无缝地处理各种用户定义的约束。据我们所知,这是第一部从历史旅行中学习为用户提供一套替代路线的作品。在八个流行的真实数据集上进行的大量实验表明,我们的方法比最先进的方法更有效。 摘要:Trip itinerary recommendation finds an ordered sequence of Points-of-Interest (POIs) from a large number of candidate POIs in a city. In this paper, we propose a deep learning-based framework, called DeepAltTrip, that learns to recommend top-k alternative itineraries for given source and destination POIs. These alternative itineraries would be not only popular given the historical routes adopted by past users but also dissimilar (or diverse) to each other. The DeepAltTrip consists of two major components: (i) Itinerary Net (ITRNet) which estimates the likelihood of POIs on an itinerary by using graph autoencoders and two (forward and backward) LSTMs; and (ii) a route generation procedure to generate k diverse itineraries passing through relevant POIs obtained using ITRNet. For the route generation step, we propose a novel sampling algorithm that can seamlessly handle a wide variety of user-defined constraints. To the best of our knowledge, this is the first work that learns from historical trips to provide a set of alternative itineraries to the users. Extensive experiments conducted on eight popular real-world datasets show the effectiveness and efficacy of our approach over state-of-the-art methods.

聚类(2篇)

【1】 A Clustering-aided Ensemble Method for Predicting Ridesourcing Demand in Chicago 标题:芝加哥水资源需求预测的聚类辅助集成方法 链接:https://arxiv.org/abs/2109.03433

作者:Xiaojian Zhang,Xilei Zhao 机构:Department of Civil and Coastal Engineering, University of Florida 备注:31 pages, 8 tables, 3 figures 摘要:准确预测乘车资源需求对于有效的交通规划和决策非常重要。随着人工智能(AI)的兴起,研究人员开始利用机器学习模型预测出行需求,在许多情况下,机器学习模型可以产生比统计模型更高的预测精度。然而,大多数现有的机器学习研究使用一个全局模型来预测需求,而忽略了空间异质性的影响(即解释变量影响的空间变化)。空间异质性可以驱动参数估计随空间变化;不考虑空间变化可能限制模型的预测性能。考虑到空间异质性,本研究提出了一种聚类辅助集成方法(CEM)来预测区域间(人口普查区到人口普查区)的乘车出行需求。具体来说,我们开发了一个聚类框架,将源-目的地对划分为不同的聚类,并集成特定于聚类的机器学习模型进行预测。我们使用芝加哥的乘车出行数据实现并测试了所提出的方法。结果表明,与基准模型相比,CEM模型具有更透明和灵活的模型结构,显著提高了预测精度(即,直接针对所有观测值训练的全局机器学习和统计模型)。这项研究为交通研究人员和从业者提供了一种新的出行需求预测方法,特别是对于新的出行模式,如乘车出行和微型移动。 摘要:Accurately forecasting ridesourcing demand is important for effective transportation planning and policy-making. With the rise of Artificial Intelligence (AI), researchers have started to utilize machine learning models to forecast travel demand, which, in many cases, can produce higher prediction accuracy than statistical models. However, most existing machine-learning studies used a global model to predict the demand and ignored the influence of spatial heterogeneity (i.e., the spatial variations in the impacts of explanatory variables). Spatial heterogeneity can drive the parameter estimations varying over space; failing to consider the spatial variations may limit the model's prediction performance. To account for spatial heterogeneity, this study proposes a Clustering-aided Ensemble Method (CEM) to forecast the zone-to-zone (census-tract-to-census-tract) travel demand for ridesourcing services. Specifically, we develop a clustering framework to split the origin-destination pairs into different clusters and ensemble the cluster-specific machine learning models for prediction. We implement and test the proposed methodology by using the ridesourcing-trip data in Chicago. The results show that, with a more transparent and flexible model structure, the CEM significantly improves the prediction accuracy than the benchmark models (i.e., global machine-learning and statistical models directly trained on all observations). This study offers transportation researchers and practitioners a new methodology of travel demand forecasting, especially for new travel modes like ridesourcing and micromobility.

【2】 Federated Learning Beyond the Star: Local D2D Model Consensus with Global Cluster Sampling 标题:超越星际的联合学习:采用全局聚类抽样的局部D2D模型共识 链接:https://arxiv.org/abs/2109.03350

作者:Frank Po-Chen Lin,Seyyedali Hosseinalipour,Sheikh Shams Azam,Christopher G. Brinton,Nicolò Michelusi 机构:∗School of Electrical and Computer Engineering, Purdue University, IN, USA, †School of Electrical, Computer and Energy Engineering, Arizona State University, AZ, USA 备注:This paper has been published in IEEE Global Communications Conference 2021 (Globecom). arXiv admin note: substantial text overlap with arXiv:2103.10481 摘要:联邦学习已经成为一种流行的跨网络边缘分发模型训练的技术。它的学习架构通常是设备和中央服务器之间的星形拓扑。在本文中,我们提出了两个时间尺度的混合联邦学习(TT-HF),它通过设备到设备(D2D)通信迁移到更分布式的拓扑结构。在TT-HF中,本地模型训练通过连续梯度迭代在设备上进行,同步过程在两个时间尺度上进行:(i)宏观尺度,其中通过设备-服务器交互执行全局聚合,以及(ii)微观尺度,其中,通过不同设备集群中的D2D合作一致性形成来执行本地聚合。我们的理论分析揭示了设备、集群和网络级参数如何影响TT-HF的收敛,并得出了一组条件,在这些条件下,保证了O(1/t)的收敛速度。实验结果表明,TT-HF在最先进的联邦学习基线上可以提高收敛性和利用率。 摘要:Federated learning has emerged as a popular technique for distributing model training across the network edge. Its learning architecture is conventionally a star topology between the devices and a central server. In this paper, we propose two timescale hybrid federated learning (TT-HF), which migrates to a more distributed topology via device-to-device (D2D) communications. In TT-HF, local model training occurs at devices via successive gradient iterations, and the synchronization process occurs at two timescales: (i) macro-scale, where global aggregations are carried out via device-server interactions, and (ii) micro-scale, where local aggregations are carried out via D2D cooperative consensus formation in different device clusters. Our theoretical analysis reveals how device, cluster, and network-level parameters affect the convergence of TT-HF, and leads to a set of conditions under which a convergence rate of O(1/t) is guaranteed. Experimental results demonstrate the improvements in convergence and utilization that can be obtained by TT-HF over state-of-the-art federated learning baselines.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking 标题:全景NuScenes:一种用于LiDAR全景分割和跟踪的大规模基准 链接:https://arxiv.org/abs/2109.03805

作者:Whye Kit Fong,Rohit Mohan,Juana Valeria Hurtado,Lubing Zhou,Holger Caesar,Oscar Beijbom,Abhinav Valada 机构: 2 Department of Computer Science, University of Freiburg 备注:The benchmark is available at this https URL and this https URL 摘要:动态代理的全景场景理解和跟踪对于机器人和自动车辆在城市环境中导航至关重要。由于激光雷达提供与照明无关的精确场景几何描述,因此使用激光雷达点云执行这些任务可以提供可靠的预测。然而,现有数据集缺乏城市场景类型的多样性,动态对象实例数量有限,这既妨碍了对这些任务的学习,也妨碍了对所开发方法的可靠基准测试。在本文中,我们介绍了大规模全景nuScenes基准数据集,它扩展了我们流行的nuScenes数据集,并为语义分割、全景分割和全景跟踪任务提供了逐点的地面真相注释。为了便于比较,我们在建议的数据集上为每项任务提供了几个强大的基线。此外,我们分析了现有的全景跟踪指标的缺点,并提出了一种新的以实例为中心的指标来解决这些问题。我们提供了大量的实验,与现有的数据集相比,这些实验证明了全景式nuScenes的实用性,并使在线评估服务器在\url{nuScenes.org}上可用。我们相信,这一扩展将加速研究动态城市环境场景理解的新方法。 摘要:Panoptic scene understanding and tracking of dynamic agents are essential for robots and automated vehicles to navigate in urban environments. As LiDARs provide accurate illumination-independent geometric depictions of the scene, performing these tasks using LiDAR point clouds provides reliable predictions. However, existing datasets lack diversity in the type of urban scenes and have a limited number of dynamic object instances which hinders both learning of these tasks as well as credible benchmarking of the developed methods. In this paper, we introduce the large-scale Panoptic nuScenes benchmark dataset that extends our popular nuScenes dataset with point-wise groundtruth annotations for semantic segmentation, panoptic segmentation, and panoptic tracking tasks. To facilitate comparison, we provide several strong baselines for each of these tasks on our proposed dataset. Moreover, we analyze the drawbacks of the existing metrics for the panoptic tracking problem and propose a novel instance-centric metric that addresses the concerns. We present extensive experiments that demonstrate the utility of Panoptic nuScenes compared to existing datasets and make the online evaluation server available at \url{nuScenes.org}. We believe that this extension will accelerate the research of novel methods for scene understanding of dynamic urban environments.

推理|分析|理解|解释(2篇)

【1】 Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis 标题:具有有限时间分析的样本和通信高效的分布式Actor-Critic算法 链接:https://arxiv.org/abs/2109.03699

作者:Ziyi Chen,Yi Zhou,Rongrong Chen,Shaofeng Zou 机构:Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT , Department of Electrical Engineering, University at Buffalo, Buffalo, NY 备注:40 pages, 2 figures 摘要:在分散的多智能体系统中,行动者-批评家(AC)算法被广泛应用于学习最优联合控制策略。然而,现有的分散AC算法要么不能保护代理的隐私,要么采样和通信效率不高。在这项工作中,我们开发了两种分散AC和自然AC(NAC)算法,它们是私有的,并且采样和通信效率高。在这两种算法中,代理共享噪声信息以保护隐私,并采用小批量更新以提高采样和通信效率。特别是对于分散的NAC,我们开发了一种具有自适应小批量的分散马尔可夫SGD算法,以有效地计算自然策略梯度。在马尔可夫采样和线性函数近似下,我们证明了所提出的分散AC和NAC算法分别达到了最先进的样本复杂度$\mathcal{O}\big(\epsilon^{-2}\ln(\epsilon^{-1})\big)$和$\mathcal{O}\big(\epsilon^{-3}\ln(\epsilon^{-1})\big)$,以及相同的小通信复杂度$\mathcal{O} \big(\epsilon^{-1}\ln(\epsilon^{-1})\big)$。数值实验表明,所提出的算法比现有的分散交流算法具有更低的采样和通信复杂度。 摘要:Actor-critic (AC) algorithms have been widely adopted in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either do not preserve the privacy of agents or are not sample and communication-efficient. In this work, we develop two decentralized AC and natural AC (NAC) algorithms that are private, and sample and communication-efficient. In both algorithms, agents share noisy information to preserve privacy and adopt mini-batch updates to improve sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities $\mathcal{O}\big(\epsilon^{-2}\ln(\epsilon^{-1})\big)$ and $\mathcal{O}\big(\epsilon^{-3}\ln(\epsilon^{-1})\big)$, respectively, and the same small communication complexity $\mathcal{O}\big(\epsilon^{-1}\ln(\epsilon^{-1})\big)$. Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithm.

【2】 Understanding and Preparing Data of Industrial Processes for Machine Learning Applications 标题:理解和准备用于机器学习应用的工业过程数据 链接:https://arxiv.org/abs/2109.03469

作者:Philipp Fleck,Manfred Kügel,Michael Kommenda 机构: Heuristic and Evolutionary Algorithms Laboratory, University of Applied Sciences Upper Austria, Softwarepark , Hagenberg i.M., Austria, Primetals Technologies Austria GmbH, Turmstraße , Linz, Austria 备注:None 摘要:由于原始工业数据的性质,机器学习的工业应用面临着独特的挑战。为机器学习应用程序预处理和准备原始工业数据是一项要求很高的任务,通常需要比实际建模过程本身更多的时间和工作,并带来额外的挑战。本文讨论了其中一个挑战,特别是由于非线性生产线的不同生产单元的传感器不可用而导致的缺失值的挑战。在只有一小部分数据缺失的情况下,通常可以对缺失的值进行插补。在大量缺失数据的情况下,插补通常是不可行的,删除包含缺失值的观测值通常是唯一的选择。本文介绍了一种技术,该技术允许利用所有可用数据,而无需删除数据仅部分可用的大量观测值。我们不仅讨论了所提出方法的基本思想,而且还展示了根据手头的数据可以应用的不同可能实现。最后,我们用一个钢铁生产厂的数据演示了该方法的应用。 摘要:Industrial applications of machine learning face unique challenges due to the nature of raw industry data. Preprocessing and preparing raw industrial data for machine learning applications is a demanding task that often takes more time and work than the actual modeling process itself and poses additional challenges. This paper addresses one of those challenges, specifically, the challenge of missing values due to sensor unavailability at different production units of nonlinear production lines. In cases where only a small proportion of the data is missing, those missing values can often be imputed. In cases of large proportions of missing data, imputing is often not feasible, and removing observations containing missing values is often the only option. This paper presents a technique, that allows to utilize all of the available data without the need of removing large amounts of observations where data is only partially available. We do not only discuss the principal idea of the presented method, but also show different possible implementations that can be applied depending on the data at hand. Finally, we demonstrate the application of the presented method with data from a steel production plant.

检测相关(3篇)

【1】 RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management 标题:RoadAtlas:道路缺陷自动检测和资产管理的智能平台 链接:https://arxiv.org/abs/2109.03385

作者:Zhuoxiao Chen,Yiyun Zhang,Yadan Luo,Zijian Wang,Jinjiang Zhong,Anthony Southon 机构:The University of Queensland,Logan City Council, Australia 摘要:随着基于深度学习的智能检测算法的快速发展,道路缺陷自动识别和道路标线解析技术取得了很大的进展。这可以有效地解决专业检查员手工审查街道的过程既昂贵又耗时的问题。为了实现这一目标,我们推出了RoadAtlas,这是一种新型的端到端集成系统,可支持1)道路缺陷检测、2)道路标记解析、3)基于web的仪表板,用于显示和输入用户数据,以及4)包含结构良好的数据库和开发的API的后端。 摘要:With the rapid development of intelligent detection algorithms based on deep learning, much progress has been made in automatic road defect recognition and road marking parsing. This can effectively address the issue of an expensive and time-consuming process for professional inspectors to review the street manually. Towards this goal, we present RoadAtlas, a novel end-to-end integrated system that can support 1) road defect detection, 2) road marking parsing, 3) a web-based dashboard for presenting and inputting data by users, and 4) a backend containing a well-structured database and developed APIs.

【2】 DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection based on Image Representation of Bytecode 标题:DexRay:一种简单有效的基于字节码图像表示的Android恶意软件检测深度学习方法 链接:https://arxiv.org/abs/2109.03326

作者:Nadia Daoudi,Jordan Samhi,Abdoul Kader Kabore,Kevin Allix,Tegawendé F. Bissyandé,Jacques Klein 机构:SnT, University of Luxembourg, Avenue J.F Kennedy, L-, Luxembourg, Luxembourg 备注:This manuscript has been accepted at MLHat 2021, and it will be archived in Springer Communications in Computer and Information Science (CCIS) 摘要:近年来,计算机视觉取得了一些进展,深度表征学习研究提供了前所未有的性能。因此,图像格式对恶意软件检测等其他领域具有吸引力,在这些领域中,对图像的深入学习可以减少对全面手工制作的功能的需求,从而推广到不同的恶意软件变体。我们假设这一研究方向可能成为Android恶意软件检测的下一个前沿,因此需要一个明确的路线图,以确保新方法确实带来新的贡献。通过开发和评估基于图像的恶意软件检测的基线管道,我们提供了第一个构建块。我们提出了DexRay,它将app-DEX文件的字节码转换为灰度“向量”图像,并将其输入到一维卷积神经网络模型。由于设计选择的极端基本性质,我们将DexRay视为基础,从而可以推断恶意软件检测中基于图像的学习可以获得的最低性能。DexRay在158k多个应用程序上的性能评估表明,虽然简单,但我们的方法有效,检测率高(F1分数=0.96)。最后,我们研究了时间衰减和图像大小调整对DexRay性能的影响,并评估了其抗混淆能力。本正在进行的工作论文通过提供一种合理、简单但有效的方法(使用可用的人工制品)为基于深度学习的恶意软件检测领域做出了贡献,该方法可以作为界定许多需要调查的深刻问题的基础,以充分开发该领域。 摘要:Computer vision has witnessed several advances in recent years, with unprecedented performance provided by deep representation learning research. Image formats thus appear attractive to other fields such as malware detection, where deep learning on images alleviates the need for comprehensively hand-crafted features generalising to different malware variants. We postulate that this research direction could become the next frontier in Android malware detection, and therefore requires a clear roadmap to ensure that new approaches indeed bring novel contributions. We contribute with a first building block by developing and assessing a baseline pipeline for image-based malware detection with straightforward steps. We propose DexRay, which converts the bytecode of the app DEX files into grey-scale "vector" images and feeds them to a 1-dimensional Convolutional Neural Network model. We view DexRay as foundational due to the exceedingly basic nature of the design choices, allowing to infer what could be a minimal performance that can be obtained with image-based learning in malware detection. The performance of DexRay evaluated on over 158k apps demonstrates that, while simple, our approach is effective with a high detection rate(F1-score= 0.96). Finally, we investigate the impact of time decay and image-resizing on the performance of DexRay and assess its resilience to obfuscation. This work-in-progress paper contributes to the domain of Deep Learning based Malware detection by providing a sound, simple, yet effective approach (with available artefacts) that can be the basis to scope the many profound questions that will need to be investigated to fully develop this domain.

【3】 Amazon SageMaker Clarify: Machine Learning Bias Detection and Explainability in the Cloud 标题:Amazon SageMaker Clarify:云中的机器学习偏差检测和可解释性 链接:https://arxiv.org/abs/2109.03285

作者:Michaela Hardt,Xiaoguang Chen,Xiaoyi Cheng,Michele Donini,Jason Gelman,Satish Gollaprolu,John He,Pedro Larroy,Xinyu Liu,Nick McCarthy,Ashish Rathi,Scott Rees,Ankit Siva,ErhYuan Tsai,Keerthan Vasist,Pinar Yilmaz,Muhammad Bilal Zafar,Sanjiv Das,Kevin Haas,Tyler Hill,Krishnaram Kenthapadi 机构:Amazon Web Services, Appeared in the proceedings of the ,th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ,) 备注:None 摘要:理解机器学习(ML)模型的预测及其潜在偏差仍然是一项具有挑战性的劳动密集型任务,这取决于应用程序、数据集和特定模型。我们介绍了Amazon SageMaker Clearify,这是Amazon SageMaker于2020年12月推出的一项可解释性功能,通过识别偏差和解释预测,提供对数据和ML模型的见解。它深入集成到Amazon SageMaker中,这是一种完全管理的服务,使数据科学家和开发人员能够以任何规模构建、训练和部署ML模型。Clearify支持在数据准备、模型评估和部署后监控期间跨ML生命周期进行偏差检测和功能重要性计算。我们概述了从客户输入、模块化体系结构以及偏差和解释计算方法得出的需求。此外,我们还描述了遇到的技术挑战以及我们必须进行的权衡。为了进行说明,我们讨论了两个客户用例。我们展示了我们的部署结果,包括定性客户反馈和定量评估。最后,我们总结经验教训,并讨论在实践中成功采用公平和解释工具的最佳实践。 摘要:Understanding the predictions made by machine learning (ML) models and their potential biases remains a challenging and labor-intensive task that depends on the application, the dataset, and the specific model. We present Amazon SageMaker Clarify, an explainability feature for Amazon SageMaker that launched in December 2020, providing insights into data and ML models by identifying biases and explaining predictions. It is deeply integrated into Amazon SageMaker, a fully managed service that enables data scientists and developers to build, train, and deploy ML models at any scale. Clarify supports bias detection and feature importance computation across the ML lifecycle, during data preparation, model evaluation, and post-deployment monitoring. We outline the desiderata derived from customer input, the modular architecture, and the methodology for bias and explanation computations. Further, we describe the technical challenges encountered and the tradeoffs we had to make. For illustration, we discuss two customer use cases. We present our deployment results including qualitative customer feedback and a quantitative evaluation. Finally, we summarize lessons learned, and discuss best practices for the successful adoption of fairness and explanation tools in practice.

分类|识别(4篇)

【1】 Highly Scalable and Provably Accurate Classification in Poincare Balls 标题:Poincare球的高度可伸缩性和可证明的准确分类 链接:https://arxiv.org/abs/2109.03781

作者:Eli Chien,Chao Pan,Puoya Tabaghi,Olgica Milenkovic 机构:ECE, University of Illinois, Urbana Champaign, N Goodwin Ave, Urbana, Illinois , USA 备注:A short version of this paper appears in ICDM 2021 摘要:许多具有实际意义的高维大容量数据集具有由树、图或时间序列诱导的层次结构。这样的数据集很难在欧几里德空间中处理,人们通常会在其他空间形式中寻找低维嵌入来执行所需的学习任务。对于分层数据,选择的空间是双曲空间,因为它保证了树状结构的低失真嵌入。不幸的是,双曲空间的几何性质在欧几里德空间中没有遇到,这在试图严格分析算法解时带来了挑战。在这里,我们首次建立了一个统一的框架,用于学习可伸缩的简单双曲线性分类器,并提供可证明的性能保证。我们的方法的要点是专注于庞加莱球模型,并使用切线空间形式来制定分类问题。我们的结果包括一个新的双曲线和二阶感知器算法,以及一个高效和高精度的双曲线支持向量机分类器凸优化设置。所有算法都可以证明是收敛的,并且具有高度的可扩展性,因为它们的复杂性与欧几里德算法相当。它们在由数百万个点组成的合成数据集以及复杂的真实数据集(如单细胞RNA序列表达测量、CIFAR10、Fashion MNIST和mini ImageNet)上的性能精确度。 摘要:Many high-dimensional and large-volume data sets of practical relevance have hierarchical structures induced by trees, graphs or time series. Such data sets are hard to process in Euclidean spaces and one often seeks low-dimensional embeddings in other space forms to perform required learning tasks. For hierarchical data, the space of choice is a hyperbolic space since it guarantees low-distortion embeddings for tree-like structures. Unfortunately, the geometry of hyperbolic spaces has properties not encountered in Euclidean spaces that pose challenges when trying to rigorously analyze algorithmic solutions. Here, for the first time, we establish a unified framework for learning scalable and simple hyperbolic linear classifiers with provable performance guarantees. The gist of our approach is to focus on Poincar\'e ball models and formulate the classification problems using tangent space formalisms. Our results include a new hyperbolic and second-order perceptron algorithm as well as an efficient and highly accurate convex optimization setup for hyperbolic support vector machine classifiers. All algorithms provably converge and are highly scalable as they have complexities comparable to those of their Euclidean counterparts. Their performance accuracies on synthetic data sets comprising millions of points, as well as on complex real-world data sets such as single-cell RNA-seq expression measurements, CIFAR10, Fashion-MNIST and mini-ImageNet.

【2】 Forget me not: A Gentle Reminder to Mind the Simple Multi-Layer Perceptron Baseline for Text Classification 标题:别忘了我:温文尔雅地提醒你记住文本分类的简单多层感知器基线(Simple-Multi-Layer Perceptron Baseline) 链接:https://arxiv.org/abs/2109.03777

作者:Lukas Galke,Ansgar Scherp 机构:University of Kiel ZBW, Germany, University of Ulm 备注:5 pages 摘要:图神经网络引发了基于图的文本分类的复兴。我们证明了一个简单的MLP基线已经在基准数据集上实现了相当的性能,质疑了合成图结构的重要性。当考虑归纳情景时,即:。e、 ,当向语料库添加新文档时,一个简单的MLP甚至比大多数基于图的模型更出色。我们进一步微调DistilBERT进行比较,发现它优于所有最先进的模型。我们建议未来的研究至少使用MLP基线来分析结果。我们为此类基线的设计和训练提供建议。 摘要:Graph neural networks have triggered a resurgence of graph-based text classification. We show that already a simple MLP baseline achieves comparable performance on benchmark datasets, questioning the importance of synthetic graph structures. When considering an inductive scenario, i. e., when adding new documents to a corpus, a simple MLP even outperforms most graph-based models. We further fine-tune DistilBERT for comparison and find that it outperforms all state-of-the-art models. We suggest that future studies use at least an MLP baseline to contextualize the results. We provide recommendations for the design and training of such a baseline.

【3】 Open Aspect Target Sentiment Classification with Natural Language Prompts 标题:基于自然语言提示的开放式方面目标情感分类 链接:https://arxiv.org/abs/2109.03685

作者:Ronald Seoh,Ian Birle,Mrinal Tak,Haw-Shiuan Chang,Brian Pinette,Alfred Hough 机构: University of Massachusetts Amherst, Lexalytics, Inc. 摘要:对于许多商业应用程序,我们经常试图分析与商业产品任意方面相关的情绪,尽管标签数量非常有限,甚至根本没有任何标签。然而,现有的方面目标情绪分类(ATSC)模型在没有注释数据集的情况下是不可训练的。即使使用标记数据,它们也无法达到令人满意的性能。为了解决这个问题,我们提出了一些简单的方法,通过自然语言提示更好地解决ATSC问题,使任务能够在Zero-Shot情况下完成,并增强监督设置,特别是在少数镜头情况下。在SemEval 2014 Task 4笔记本电脑域的少数镜头设置下,我们将ATSC重新格式化为NLI任务的方法优于监督SOTA方法,精确度高达24.13点,宏F1点数高达33.14点。此外,我们还证明,我们的提示也可以处理隐含的方面:我们的模型在检测方面类别(如食物)的情绪方面达到了77%左右的准确率,而这些情绪不一定出现在文本中,即使我们仅使用明确提到的方面术语(如fajitas)训练模型仅从16次审查——而无提示基线的准确率仅为65%左右。 摘要:For many business applications, we often seek to analyze sentiments associated with any arbitrary aspects of commercial products, despite having a very limited amount of labels or even without any labels at all. However, existing aspect target sentiment classification (ATSC) models are not trainable if annotated datasets are not available. Even with labeled data, they fall short of reaching satisfactory performance. To address this, we propose simple approaches that better solve ATSC with natural language prompts, enabling the task under zero-shot cases and enhancing supervised settings, especially for few-shot cases. Under the few-shot setting for SemEval 2014 Task 4 laptop domain, our method of reformulating ATSC as an NLI task outperforms supervised SOTA approaches by up to 24.13 accuracy points and 33.14 macro F1 points. Moreover, we demonstrate that our prompts could handle implicitly stated aspects as well: our models reach about 77% accuracy on detecting sentiments for aspect categories (e.g., food), which do not necessarily appear within the text, even though we trained the models only with explicitly mentioned aspect terms (e.g., fajitas) from just 16 reviews - while the accuracy of the no-prompt baseline is only around 65%.

【4】 Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi 标题:低资源语言的跨语言攻击性语言识别--以马拉提语为例 链接:https://arxiv.org/abs/2109.03552

作者:Saurabh Gaikwad,Tharindu Ranasinghe,Marcos Zampieri,Christopher M. Homan 机构:Rochester Institute of Technology, USA, University of Wolverhampton, UK 备注:Accepted to RANLP 2021 摘要:攻击性语言在社交媒体上的广泛存在推动了能够自动识别此类内容的系统的发展。除了几个显著的例外,大多数关于自动攻击性语言识别的研究都涉及英语。为了解决这个缺点,我们引入了MOLD,即马拉地攻击性语言数据集。MOLD是第一个为马拉地语编译的同类数据集,从而为低资源印度-雅利安语的研究开辟了一个新领域。我们展示了在该数据集上进行的几次机器学习实验的结果,包括zero short和其他基于现有孟加拉语、英语和印地语数据的最新跨语言转换学习实验。 摘要:The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically. Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English. To address this shortcoming, we introduce MOLD, the Marathi Offensive Language Dataset. MOLD is the first dataset of its kind compiled for Marathi, thus opening a new domain for research in low-resource Indo-Aryan languages. We present results from several machine learning experiments on this dataset, including zero-short and other transfer learning experiments on state-of-the-art cross-lingual transformers from existing data in Bengali, English, and Hindi.

表征(3篇)

【1】 Computing on Functions Using Randomized Vector Representations 标题:用随机向量表示法计算函数 链接:https://arxiv.org/abs/2109.03429

作者:E. Paxon Frady,Denis Kleyko,Christopher J. Kymn,Bruno A. Olshausen,Friedrich T. Sommer 机构:. Intel, Neuromorphic Computing Lab, . UC Berkeley, Redwood Center for Theoretical Neuroscience, . Research Institutes of Sweden, Intelligent Systems Lab 备注:33 pages, 18 Figures 摘要:认知科学和连接主义团体以向量符号体系结构(VSA)和超维(HD)计算的名义提出了用于符号处理的向量空间模型,该模型通过随机向量对符号进行编码。本文通过将连续值数据映射到向量空间,使得任意两个数据点表示之间的内积表示一个相似核,从而将VSAs推广到函数空间。通过类比VSA,我们将这种新的函数编码和计算框架称为向量函数体系结构(VFA)。在VFAs中,向量可以表示单个数据点以及函数空间(再生核希尔BERT空间)的元素。代数向量运算继承自VSA,对应于函数空间中定义良好的运算。此外,我们研究了先前提出的连续数据编码方法分数幂编码(FPE),该方法使用随机基向量的幂运算生成数据点的随机表示,并满足诱导VFA的核心属性。我们证明了基向量元素的采样分布决定了FPE核的形状,这反过来又导致了带限函数计算的VFA。特别是,VFAs提供了一个代数框架,用于实现具有随机特性的大规模内核机器,扩展了Rahimi和Recht,2007。最后,我们展示了VFA模型在图像识别、密度估计和非线性回归问题中的一些应用。我们的分析和结果表明,VFA构成了一个强大的新框架,用于表示和操作分布式神经系统中的功能,在人工智能中有着广泛的应用。 摘要:Vector space models for symbolic processing that encode symbols by random vectors have been proposed in cognitive science and connectionist communities under the names Vector Symbolic Architecture (VSA), and, synonymously, Hyperdimensional (HD) computing. In this paper, we generalize VSAs to function spaces by mapping continuous-valued data into a vector space such that the inner product between the representations of any two data points represents a similarity kernel. By analogy to VSA, we call this new function encoding and computing framework Vector Function Architecture (VFA). In VFAs, vectors can represent individual data points as well as elements of a function space (a reproducing kernel Hilbert space). The algebraic vector operations, inherited from VSA, correspond to well-defined operations in function space. Furthermore, we study a previously proposed method for encoding continuous data, fractional power encoding (FPE), which uses exponentiation of a random base vector to produce randomized representations of data points and fulfills the kernel properties for inducing a VFA. We show that the distribution from which elements of the base vector are sampled determines the shape of the FPE kernel, which in turn induces a VFA for computing with band-limited functions. In particular, VFAs provide an algebraic framework for implementing large-scale kernel machines with random features, extending Rahimi and Recht, 2007. Finally, we demonstrate several applications of VFA models to problems in image recognition, density estimation and nonlinear regression. Our analyses and results suggest that VFAs constitute a powerful new framework for representing and manipulating functions in distributed neural systems, with myriad applications in artificial intelligence.

【2】 On the Fundamental Trade-offs in Learning Invariant Representations 标题:论学习不变表示的基本取舍 链接:https://arxiv.org/abs/2109.03386

作者:Bashir Sadeghi,Vishnu Boddeti 机构:Computer Science and Engineering, Michigan State University 摘要:表示学习的许多应用,如隐私保护、算法公平性和域自适应,都要求对丢弃的语义信息进行显式控制。这个目标通常被表述为满足两个潜在的竞争目标:最大化预测目标属性的效用,同时独立于或不变于已知语义属性。在本文中,我们识别并确定了由数据与其对应的目标和语义属性之间的统计依赖性引起的效用和语义依赖性之间的两个基本权衡。在温和的假设下,我们推导了基本优化问题的全局最优解的闭式解,从而得到了精确权衡的闭式公式。我们还得出了权衡的经验估计,并显示了它们与相应的人口对应项的收敛性。最后,我们对代表性问题的权衡进行了数值量化,并与基线代表性学习算法的解进行了比较。 摘要:Many applications of representation learning, such as privacy-preservation, algorithmic fairness and domain adaptation, desire explicit control over semantic information being discarded. This goal is often formulated as satisfying two potentially competing objectives: maximizing utility for predicting a target attribute while simultaneously being independent or invariant with respect to a known semantic attribute. In this paper, we \emph{identify and determine} two fundamental trade-offs between utility and semantic dependence induced by the statistical dependencies between the data and its corresponding target and semantic attributes. We derive closed-form solutions for the global optima of the underlying optimization problems under mild assumptions, which in turn yields closed formulae for the exact trade-offs. We also derive empirical estimates of the trade-offs and show their convergence to the corresponding population counterparts. Finally, we numerically quantify the trade-offs on representative problems and compare to the solutions achieved by baseline representation learning algorithms.

【3】 Desiderata for Representation Learning: A Causal Perspective 标题:表征学习的愿望:因果视角 链接:https://arxiv.org/abs/2109.03795

作者:Yixin Wang,Michael I. Jordan 机构:UC Berkeley, EECS and Statistics 备注:67 pages 摘要:表征学习构造低维表征来概括高维数据的基本特征。这种学习问题通常通过描述与学习表征相关的各种需求来解决;e、 例如,它们是非虚假的、有效的或不受牵连的。然而,将这些直观的需求转化为正式的标准是一个挑战,可以根据观察到的数据进行测量和增强。在本文中,我们从因果的角度来研究表征学习,使用因果断言的反事实量和可观察结果来形式化非虚假性和效率(在有监督的表征学习中)以及解纠缠(在无监督的表征学习中)。这产生了可计算的指标,可用于评估表示满足感兴趣需求的程度,并从单个观测数据集中学习非虚假和分离表示。 摘要:Representation learning constructs low-dimensional representations to summarize essential features of high-dimensional data. This learning problem is often approached by describing various desiderata associated with learned representations; e.g., that they be non-spurious, efficient, or disentangled. It can be challenging, however, to turn these intuitive desiderata into formal criteria that can be measured and enhanced based on observed data. In this paper, we take a causal perspective on representation learning, formalizing non-spuriousness and efficiency (in supervised representation learning) and disentanglement (in unsupervised representation learning) using counterfactual quantities and observable consequences of causal assertions. This yields computable metrics that can be used to assess the degree to which representations satisfy the desiderata of interest and learn non-spurious and disentangled representations from single observational datasets.

3D|3D重建等相关(1篇)

【1】 Forward and Inverse models in HCI:Physical simulation and deep learning for inferring 3D finger pose 标题:人机交互中的正反向模型:推断三维手指姿态的物理仿真和深度学习 链接:https://arxiv.org/abs/2109.03366

作者:Roderick Murray-Smith,John H. Williamson,Andrew Ramsay,Francesco Tonolini,Simon Rogers,Antoine Loriette 机构: School of Computing Science, University of GlasgowJohn H, University of GlasgowAndrew Ramsay, University of GlasgowFrancesco Tonolini, University of GlasgowSimon Rogers, University of GlasgowAntoine Loriette 摘要:我们概述了正向和逆向建模方法在人机交互系统设计中的作用。因果正向模型往往更容易指定和模拟,但HCI需要反问题的解决方案。我们使用电容传感器推断手指在移动设备上的三维位置$(x,y,z)$和姿势(俯仰和偏航),电容传感器可以感应到手指在屏幕上方5cm处。我们使用机器学习开发数据驱动模型,根据以下训练数据推断位置、姿势和传感器读数:1。机器人生成的数据,2。数据来自静电模拟器3。人工生成的数据。机器学习仿真用于将静电模拟性能提高数百万倍。我们将条件变分自动编码器与实验收集的数据相结合。我们比较了直接推断手指姿势的正向和反向模型方法。该组合提供了最准确的报告结果,可以通过移动设备上的电容传感器推断3D位置和姿势。 摘要:We outline the role of forward and inverse modelling approaches in the design of human--computer interaction systems. Causal, forward models tend to be easier to specify and simulate, but HCI requires solutions of the inverse problem. We infer finger 3D position $(x,y,z)$ and pose (pitch and yaw) on a mobile device using capacitive sensors which can sense the finger up to 5cm above the screen. We use machine learning to develop data-driven models to infer position, pose and sensor readings, based on training data from: 1. data generated by robots, 2. data from electrostatic simulators 3. human-generated data. Machine learned emulation is used to accelerate the electrostatic simulation performance by a factor of millions. We combine a Conditional Variational Autoencoder with domain expertise/models experimentally collected data. We compare forward and inverse model approaches to direct inference of finger pose. The combination gives the most accurate reported results on inferring 3D position and pose with a capacitive sensor on a mobile device.

优化|敛散性(2篇)

【1】 Class-conditioned Domain Generalization via Wasserstein Distributional Robust Optimization 标题:基于Wasserstein分布鲁棒优化的类条件域泛化 链接:https://arxiv.org/abs/2109.03676

作者:Jingge Wang,Yang Li,Liyan Xie,Yao Xie 备注:presented as a RobustML workshop paper at ICLR 2021 摘要:给定多个源域,域泛化旨在学习一个通用模型,该模型在任何看不见但相关的目标域上都表现良好。在这项工作中,我们关注的是域泛化场景,其中域在不同域的类条件分布之间发生转移。当给定同一类的条件分布变化较大时,现有方法的鲁棒性不够。在这项工作中,我们扩展了分布鲁棒优化的概念来解决类条件域泛化问题。我们的方法优化了以源条件分布的重心为中心的Wasserstein球中的类条件分布上的分类器的最坏情况性能。我们还提出了一种迭代算法,用于自动学习Wasserstein球的最佳半径。实验表明,该框架在不可见目标域上的性能优于没有域泛化的方法。 摘要:Given multiple source domains, domain generalization aims at learning a universal model that performs well on any unseen but related target domain. In this work, we focus on the domain generalization scenario where domain shifts occur among class-conditional distributions of different domains. Existing approaches are not sufficiently robust when the variation of conditional distributions given the same class is large. In this work, we extend the concept of distributional robust optimization to solve the class-conditional domain generalization problem. Our approach optimizes the worst-case performance of a classifier over class-conditional distributions within a Wasserstein ball centered around the barycenter of the source conditional distributions. We also propose an iterative algorithm for learning the optimal radius of the Wasserstein balls automatically. Experiments show that the proposed framework has better performance on unseen target domain than approaches without domain generalization.

【2】 YAHPO Gym -- Design Criteria and a new Multifidelity Benchmark for Hyperparameter Optimization 标题:YAHPO健身房--超参数优化的设计准则和新的多保真基准 链接:https://arxiv.org/abs/2109.03670

作者:Florian Pfisterer,Lennart Schneider,Julia Moosbauer,Martin Binder,Bernd Bischl 机构:Ludwig Maximilian University of Munich 备注:Preprint. Under review. 17 pages, 4 tables, 5 figures 摘要:在开发和分析新的超参数优化(HPO)方法时,在精心策划的基准套件上对其进行经验评估和比较是至关重要的。在这项工作中,我们列出了这些基准的理想特性和要求,并提出了一组新的挑战性和相关的多理想性HPO基准问题。为此,我们重新审视了基于代理的基准的概念,并将其与更广泛使用的表格基准进行了实证比较,表明后者可能导致HPO方法的性能估计和排名出现偏差。我们为多理想HPO方法提出了一个新的基于代理的基准测试套件,该套件由9个基准测试集合组成,总共构成700多个多理想HPO问题。我们的所有基准测试还允许查询多个优化目标,从而实现多目标HPO的基准测试。我们根据定义的需求检查并比较了我们的基准测试套件,并表明我们的基准测试为现有套件提供了可行的补充。 摘要:When developing and analyzing new hyperparameter optimization (HPO) methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we list desirable properties and requirements for such benchmarks and propose a new set of challenging and relevant multifidelity HPO benchmark problems motivated by these requirements. For this, we revisit the concept of surrogate-based benchmarks and empirically compare them to more widely-used tabular benchmarks, showing that the latter ones may induce bias in performance estimation and ranking of HPO methods. We present a new surrogate-based benchmark suite for multifidelity HPO methods consisting of 9 benchmark collections that constitute over 700 multifidelity HPO problems in total. All our benchmarks also allow for querying of multiple optimization targets, enabling the benchmarking of multi-objective HPO. We examine and compare our benchmark suite with respect to the defined requirements and show that our benchmarks provide viable additions to existing suites.

预测|估计(3篇)

【1】 How do I update my model? On the resilience of Predictive Process Monitoring models to change 标题:如何更新我的模型?预测过程监控模型对变化的抗跌性研究 链接:https://arxiv.org/abs/2109.03501

作者:Williams Rizzi1,Chiara Di Francescomarino,Chiara Ghidini,Fabrizio Maria Maggi 机构: Fondazione Bruno Kessler (FBK), Trento, Italy, Free University of Bozen-Bolzano, Bolzano, Italy 摘要:现有的经过充分研究的预测性流程监控技术通常基于过去的流程执行情况构建预测模型,然后使用该模型预测新的正在进行的案例的未来,而不可能在新案例完成执行后使用新案例更新该模型。这可能会使预测过程监控过于僵化,无法处理在实际环境中工作的过程的可变性,这些环境随着时间的推移不断演变和/或表现出新的变化行为。作为这个问题的解决方案,我们评估了三种不同策略的使用,它们允许预测模型的周期性重新发现或增量构建,以便利用新的可用数据。评估侧重于新学习的预测模型相对于原始模型在准确性和时间方面的性能,并使用了大量真实和合成数据集,有无明确的概念漂移。结果证明了增量学习算法在实际环境中预测过程监控的潜力。 摘要:Existing well investigated Predictive Process Monitoring techniques typically construct a predictive model based on past process executions, and then use it to predict the future of new ongoing cases, without the possibility of updating it with new cases when they complete their execution. This can make Predictive Process Monitoring too rigid to deal with the variability of processes working in real environments that continuously evolve and/or exhibit new variant behaviours over time. As a solution to this problem, we evaluate the use of three different strategies that allow the periodic rediscovery or incremental construction of the predictive model so as to exploit new available data. The evaluation focuses on the performance of the new learned predictive models, in terms of accuracy and time, against the original one, and uses a number of real and synthetic datasets with and without explicit Concept Drift. The results provide an evidence of the potential of incremental learning algorithms for predicting process monitoring in real environments.

【2】 Preprocessing and Modeling of Radial Fan Data for Health State Prediction 标题:面向健康状态预测的径向风机数据预处理与建模 链接:https://arxiv.org/abs/2109.03468

作者:Florian Holzinger,Michael Kommenda 机构:Heuristic and Evolutionary Algorithms Laboratory, University of Applied Sciences Upper Austria, Hagenberg 备注:None 摘要:监测系统的关键部件是实现故障安全的关键步骤。可提供价格合理的传感器,行业正在引入和扩展监控解决方案,以提高产品质量。通常,不存在特定任务(如监控)需要多少数据的专业知识。特别是在重要机械中,可能会注意到传感器在质量和数量上都有夸大的趋势。这通常会导致生成过多的数据,尽管如此,这些数据仍应进行传输、处理和存储。在之前的一个案例研究中,几个传感器安装在一个健康的径向风扇上,该风扇后来被人为损坏。收集的数据用于建模(并因此监测)健康状态。模型在使用故障叶轮创建的数据集上进行评估。本文的重点是通过下采样和分块来减少这些数据。用线性回归和随机森林回归建立了不同的模型,并讨论了由此产生的质量差异。 摘要:Monitoring critical components of systems is a crucial step towards failure safety. Affordable sensors are available and the industry is in the process of introducing and extending monitoring solutions to improve product quality. Often, no expertise of how much data is required for a certain task (e.g. monitoring) exists. Especially in vital machinery, a trend to exaggerated sensors may be noticed, both in quality and in quantity. This often results in an excessive generation of data, which should be transferred, processed and stored nonetheless. In a previous case study, several sensors have been mounted on a healthy radial fan, which was later artificially damaged. The gathered data was used for modeling (and therefore monitoring) a healthy state. The models were evaluated on a dataset created by using a faulty impeller. This paper focuses on the reduction of this data through downsampling and binning. Different models are created with linear regression and random forest regression and the resulting difference in quality is discussed.

【3】 Predicting Process Name from Network Data 标题:从网络数据预测进程名称 链接:https://arxiv.org/abs/2109.03328

作者:Justin Allen,David Knapp,Kristine Monteith 机构:Lawrence Livermore National Laboratory 备注:Presented at 1st International Workshop on Adaptive Cyber Defense, 2021 (arXiv:2108.08476) 摘要:基于应用程序生成的网络数据识别应用程序的能力可能是网络防御的宝贵工具。我们报告了一种机器学习技术,该技术能够使用类似netflow的特性来预测产生流量的应用程序。在我们的实验中,我们使用了从部署在大型企业环境中的基于主机的传感器获得的地面真相标签;我们将随机森林和多层感知器应用于浏览器与非浏览器识别、浏览器指纹识别和进程名称预测等任务。对于这些任务中的每一项,我们将演示机器学习模型如何仅使用类似netflow的特征作为分类基础来实现高分类精度。 摘要:The ability to identify applications based on the network data they generate could be a valuable tool for cyber defense. We report on a machine learning technique capable of using netflow-like features to predict the application that generated the traffic. In our experiments, we used ground-truth labels obtained from host-based sensors deployed in a large enterprise environment; we applied random forests and multilayer perceptrons to the tasks of browser vs. non-browser identification, browser fingerprinting, and process name prediction. For each of these tasks, we demonstrate how machine learning models can achieve high classification accuracy using only netflow-like features as the basis for classification.

其他神经网络|深度学习|模型|建模(20篇)

【1】 A Survey on Machine Learning Techniques for Auto Labeling of Video, Audio, and Text Data 标题:视频、音频和文本数据自动标注的机器学习技术综述 链接:https://arxiv.org/abs/2109.03784

作者:Shikun Zhang,Omid Jafari,Parth Nagarkar 机构:Department of Computer Science, New Mexico State University, Las Cruces, NM 摘要:机器学习已被用于执行许多不同领域的任务,如分类、目标检测、图像分割和自然语言分析。数据标注一直是机器学习中最重要的任务之一。然而,标记大量数据会增加机器学习的金钱成本。因此,研究人员开始专注于降低数据注释和标记成本。迁移学习作为一种有效的方法被设计和广泛使用,它可以合理地减少有限数据的负面影响,从而降低数据准备成本。即使从源域转移以前的知识也会减少目标域中所需的数据量。然而,为了建立稳健的模型和提高模型的预测精度,仍然需要大量的注释数据。因此,研究人员开始关注自动标注和标签。在这篇综述文章中,我们回顾了以前的技术,这些技术侧重于视频、音频和文本数据的优化数据注释和标记。 摘要:Machine learning has been utilized to perform tasks in many different domains such as classification, object detection, image segmentation and natural language analysis. Data labeling has always been one of the most important tasks in machine learning. However, labeling large amounts of data increases the monetary cost in machine learning. As a result, researchers started to focus on reducing data annotation and labeling costs. Transfer learning was designed and widely used as an efficient approach that can reasonably reduce the negative impact of limited data, which in turn, reduces the data preparation cost. Even transferring previous knowledge from a source domain reduces the amount of data needed in a target domain. However, large amounts of annotated data are still demanded to build robust models and improve the prediction accuracy of the model. Therefore, researchers started to pay more attention on auto annotation and labeling. In this survey paper, we provide a review of previous techniques that focuses on optimized data annotation and labeling for video, audio, and text data.

【2】 A robust approach for deep neural networks in presence of label noise: relabelling and filtering instances during training 标题:存在标签噪声的深度神经网络的一种鲁棒方法:训练过程中的重新标记和过滤实例 链接:https://arxiv.org/abs/2109.03748

作者:Anabel Gómez-Ríos,Julián Luengo,Francisco Herrera 机构:Department of Computer Science and Artificial Intelligence, Andalusian Research Institute, in Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain 备注:29 pages, 2 figures 摘要:深度学习在各种任务中的表现都优于其他机器学习算法,因此,它变得越来越流行和使用。然而,与其他机器学习算法一样,当数据集存在标签噪声时,深度学习和卷积神经网络(CNN)的性能更差。因此,开发有助于深层网络训练及其推广到无噪声测试集的算法非常重要。在本文中,我们提出了一种针对标签噪声的鲁棒训练策略,称为RAFNI,可用于任何CNN。该算法根据主干神经网络在训练过程中做出的预测及其概率,对训练集的实例进行过滤和重新标记。这样,该算法就提高了CNN自身的泛化能力。RAFNI由三种机制组成:两种过滤实例的机制和一种重新标记实例的机制。此外,它不假设噪声率已知,也不需要对其进行估计。我们使用不同大小和特征的数据集评估了我们的算法。我们还将其与使用CIFAR10和CIFAR100基准测试的最先进模型在不同类型和速率的标签噪声下进行了比较,发现RAFNI在大多数情况下取得了更好的结果。 摘要:Deep learning has outperformed other machine learning algorithms in a variety of tasks, and as a result, it has become more and more popular and used. However, as other machine learning algorithms, deep learning, and convolutional neural networks (CNNs) in particular, perform worse when the data sets present label noise. Therefore, it is important to develop algorithms that help the training of deep networks and their generalization to noise-free test sets. In this paper, we propose a robust training strategy against label noise, called RAFNI, that can be used with any CNN. This algorithm filters and relabels instances of the training set based on the predictions and their probabilities made by the backbone neural network during the training process. That way, this algorithm improves the generalization ability of the CNN on its own. RAFNI consists of three mechanisms: two mechanisms that filter instances and one mechanism that relabels instances. In addition, it does not suppose that the noise rate is known nor does it need to be estimated. We evaluated our algorithm using different data sets of several sizes and characteristics. We also compared it with state-of-the-art models using the CIFAR10 and CIFAR100 benchmarks under different types and rates of label noise and found that RAFNI achieves better results in most cases.

【3】 Multiscale Laplacian Learning 标题:多尺度拉普拉斯学习 链接:https://arxiv.org/abs/2109.03718

作者:Ekaterina Merkurjev,Duc DUy Nguyen,Guo-Wei Wei 机构:Received: date Accepted: date 摘要:机器学习方法极大地改变了科学、工程、金融、商业和其他领域。尽管机器学习和深度学习方法取得了巨大成就,但仍然存在许多挑战。特别是,在数据多样的情况下,机器学习方法的性能往往受到严重影响,通常与较小的数据集或与研究领域相关的数据有关,其中数据集的大小受到复杂性和/或高实验成本的限制。此外,标记样本有限的数据对大多数学习方法来说都是一个挑战。在本文中,通过集成基于图的框架、多尺度结构、修改和调整的优化过程以及半监督技术来解决上述挑战。这导致了两种创新的多尺度拉普拉斯学习(MLL)方法,用于机器学习任务,如数据分类,以及处理不同的数据、样本有限的数据和较小的数据集。第一种方法称为多核流形学习(MML),将流形学习与多核信息相结合,并使用多尺度图拉普拉斯算子解决由损失函数和扭曲核正则化器组成的正则化问题。第二种方法称为多尺度MBO(MMBO)方法,它将多尺度拉普拉斯算子引入到著名的经典Merriman Bence-Osher(MBO)格式的修改中,并利用快速求解器寻找图拉普拉斯算子极值特征向量的近似值。我们在各种数据集(如生物、文本和图像数据)上通过实验证明了我们的方法的性能,并与现有方法进行了比较。 摘要:Machine learning methods have greatly changed science, engineering, finance, business, and other fields. Despite the tremendous accomplishments of machine learning and deep learning methods, many challenges still remain. In particular, the performance of machine learning methods is often severely affected in case of diverse data, usually associated with smaller data sets or data related to areas of study where the size of the data sets is constrained by the complexity and/or high cost of experiments. Moreover, data with limited labeled samples is a challenge to most learning approaches. In this paper, the aforementioned challenges are addressed by integrating graph-based frameworks, multiscale structure, modified and adapted optimization procedures and semi-supervised techniques. This results in two innovative multiscale Laplacian learning (MLL) approaches for machine learning tasks, such as data classification, and for tackling diverse data, data with limited samples and smaller data sets. The first approach, called multikernel manifold learning (MML), integrates manifold learning with multikernel information and solves a regularization problem consisting of a loss function and a warped kernel regularizer using multiscale graph Laplacians. The second approach, called the multiscale MBO (MMBO) method, introduces multiscale Laplacians to a modification of the famous classical Merriman-Bence-Osher (MBO) scheme, and makes use of fast solvers for finding the approximations to the extremal eigenvectors of the graph Laplacian. We demonstrate the performance of our methods experimentally on a variety of data sets, such as biological, text and image data, and compare them favorably to existing approaches.

【4】 Self-explaining variational posterior distributions for Gaussian Process models 标题:高斯过程模型的自解释变分后验分布 链接:https://arxiv.org/abs/2109.03708

作者:Sarem Seitz 机构:Department of Information Systems and Applied Computer Science, University of Bamberg, Bamberg, Germany 摘要:贝叶斯方法已成为将先验知识和不确定性概念融入机器学习模型的一种流行方法。同时,现代机器学习的复杂性使得理解模型的推理过程具有挑战性,更不用说以严格的方式表达特定的先验假设了。虽然主要对前一个问题感兴趣,但最近的发展也可以扩大我们可以提供给复杂贝叶斯模型的先验信息的范围。受自解释模型思想的启发,我们引入了相应的变分高斯过程概念。一方面,我们的贡献提高了这些类型模型的透明度。更重要的是,我们提出的自解释变分后验分布允许将关于目标函数整体的一般先验知识和关于单个特征贡献的先验知识结合起来。 摘要:Bayesian methods have become a popular way to incorporate prior knowledge and a notion of uncertainty into machine learning models. At the same time, the complexity of modern machine learning makes it challenging to comprehend a model's reasoning process, let alone express specific prior assumptions in a rigorous manner. While primarily interested in the former issue, recent developments intransparent machine learning could also broaden the range of prior information that we can provide to complex Bayesian models. Inspired by the idea of self-explaining models, we introduce a corresponding concept for variational GaussianProcesses. On the one hand, our contribution improves transparency for these types of models. More importantly though, our proposed self-explaining variational posterior distribution allows to incorporate both general prior knowledge about a target function as a whole and prior knowledge about the contribution of individual features.

【5】 EMA: Auditing Data Removal from Trained Models 标题:EMA:审核从训练模型中删除的数据 链接:https://arxiv.org/abs/2109.03675

作者:Yangsibo Huang,Xiaoxiao Li,Kai Li 机构: Princeton University, NJ, USA, The University of Columbia British, BC, Canada 备注:MICCAI 2021 摘要:数据审核是验证是否已从经过训练的模型中删除某些数据的过程。最近提出的一种方法(Liu等人20)使用Kolmogorov-Smirnov(KS)距离进行此类数据审计。然而,它在某些实际条件下是失败的。在本文中,我们提出了一种新的方法,称为集成成员审核(EMA),用于审核数据删除,以克服这些限制。我们使用基准数据集(MNIST和SVHN)和胸部X射线数据集(多层感知器(MLP)和卷积神经网络(CNN)对这两种方法进行了比较。我们的实验表明,EMA在各种条件下都是鲁棒的,包括先前提出的方法的失效情况。我们的代码可从以下网址获得:https://github.com/Hazelsuko07/EMA. 摘要:Data auditing is a process to verify whether certain data have been removed from a trained model. A recently proposed method (Liu et al. 20) uses Kolmogorov-Smirnov (KS) distance for such data auditing. However, it fails under certain practical conditions. In this paper, we propose a new method called Ensembled Membership Auditing (EMA) for auditing data removal to overcome these limitations. We compare both methods using benchmark datasets (MNIST and SVHN) and Chest X-ray datasets with multi-layer perceptrons (MLP) and convolutional neural networks (CNN). Our experiments show that EMA is robust under various conditions, including the failure cases of the previously proposed method. Our code is available at: https://github.com/Hazelsuko07/EMA.

【6】 AgreementLearning: An End-to-End Framework for Learning with Multiple Annotators without Groundtruth 标题:Agreement ementLearning:一种端到端的无Ground Truth的多注释器学习框架 链接:https://arxiv.org/abs/2109.03596

作者:Chongyang Wang,Yuan Gao,Chenyou Fan,Junjie Hu,Tin Lun Lam,Nicholas D. Lane,Nadia Bianchi-Berthouze 机构:University College London,Shenzhen Institute of Artificial Intelligence and Robotics for Society,University of Cambridge 备注:Submitted to AAAI'22 摘要:领域专家的注释对于一些客观事实定义不明确的医学应用非常重要,例如,一些慢性病的康复,以及一些肌肉骨骼异常的预筛选,而无需进一步的医学检查。然而,注释的不当使用可能会妨碍开发可靠的模型。一方面,强制使用由多个注释生成的单一事实对建模来说信息量较小。另一方面,如果存在分歧,则在没有适当正则化的情况下向模型提供所有注释是有噪声的。针对这些问题,我们提出了一个新的协议学习框架,以应对在没有客观事实的情况下从多个注释者那里学习的挑战。该框架有两个流,一个流与多个注释器拟合,另一个流学习注释器之间的协议信息。特别地,一致性学习流向分类器流产生正则化信息,调整其决策以更好地符合注释者之间的一致性。所提出的方法可以很容易地插入到现有的主干开发与多数票或多个注释。在此基础上,在两个医学数据集上的实验表明,与注释者的一致性水平有所提高。 摘要:The annotation of domain experts is important for some medical applications where the objective groundtruth is ambiguous to define, e.g., the rehabilitation for some chronic diseases, and the prescreening of some musculoskeletal abnormalities without further medical examinations. However, improper uses of the annotations may hinder developing reliable models. On one hand, forcing the use of a single groundtruth generated from multiple annotations is less informative for the modeling. On the other hand, feeding the model with all the annotations without proper regularization is noisy given existing disagreements. For such issues, we propose a novel agreement learning framework to tackle the challenge of learning from multiple annotators without objective groundtruth. The framework has two streams, with one stream fitting with the multiple annotators and the other stream learning agreement information between the annotators. In particular, the agreement learning stream produces regularization information to the classifier stream, tuning its decision to be better in line with the agreement between the annotators. The proposed method can be easily plugged to existing backbones developed with majority-voted groundtruth or multiple annotations. Thereon, experiments on two medical datasets demonstrate improved agreement levels with annotators.

【7】 Deriving Explanation of Deep Visual Saliency Models 标题:深度视觉显著性模型的推导解释 链接:https://arxiv.org/abs/2109.03575

作者:Sai Phani Kumar Malladi,Jayanta Mukhopadhyay,Chaker Larabi,Santanu Chaudhury 机构:∗Visual Information Processing Laboratory, Dept. of Computer Science & Engg., IIT Kharagpur, India, †XLIM UMR CNRS , University of Poitiers, France, ‡Dept. of Computer Science & Engg., IIT Jodhpur, India 摘要:深度神经网络在视觉显著性预测中对实现人类水平的性能有着深远的影响。然而,目前还不清楚他们是如何学习这项任务的,以及它对理解人类视觉系统意味着什么。在这项工作中,我们开发了一种技术,通过应用人类感知理论和传统的显著性概念,从相应的基于深层神经结构的显著性模型中推导出可解释的显著性模型。这项技术通过激活图帮助我们了解深度网络中间层的学习模式。首先,我们考虑两个国家的最先进的显著性模型,即UNISAL和MSI网络为我们的解释。我们使用一组生物学上合理的log-gabor滤波器,利用我们的可解释显著性模型来识别和重建它们的激活图。使用这些重构的激活图生成最终的显著性图。我们还建立了自己的深度显著性模型,称为交叉级联多尺度残差块基网络(CMRNet),用于显著性预测。然后,我们评估并比较了从UNISAL、MSI Net和CMRNet导出的可解释模型在三个基准数据集上与其他最先进方法的性能。因此,我们建议,这种解释性方法可以应用于任何深度视觉显著性模型的解释,这使得它成为一个通用的模型。 摘要:Deep neural networks have shown their profound impact on achieving human level performance in visual saliency prediction. However, it is still unclear how they learn the task and what it means in terms of understanding human visual system. In this work, we develop a technique to derive explainable saliency models from their corresponding deep neural architecture based saliency models by applying human perception theories and the conventional concepts of saliency. This technique helps us understand the learning pattern of the deep network at its intermediate layers through their activation maps. Initially, we consider two state-of-the-art deep saliency models, namely UNISAL and MSI-Net for our interpretation. We use a set of biologically plausible log-gabor filters for identifying and reconstructing the activation maps of them using our explainable saliency model. The final saliency map is generated using these reconstructed activation maps. We also build our own deep saliency model named cross-concatenated multi-scale residual block based network (CMRNet) for saliency prediction. Then, we evaluate and compare the performance of the explainable models derived from UNISAL, MSI-Net and CMRNet on three benchmark datasets with other state-of-the-art methods. Hence, we propose that this approach of explainability can be applied to any deep visual saliency model for interpretation which makes it a generic one.

【8】 A Review of Sound Source Localization with Deep Learning Methods 标题:基于深度学习方法的声源定位研究综述 链接:https://arxiv.org/abs/2109.03465

作者:Pierre-Amaury Grumiaux,Srđan Kitić,Laurent Girin,Alexandre Guérin 备注:Submitted to IEEE Transactions on Audio, Speech, and Language Processing 摘要:本文综述了单声源定位和多声源定位的深度学习方法。我们特别感兴趣的是室内/家庭环境中的声源定位,其中存在混响和扩散噪声。在此背景下,我们提供了基于神经的定位文献的详尽图景,按照以下几个方面进行组织:神经网络结构、输入特征类型、输出策略(分类或回归)、用于模型训练和评估的数据类型以及模型训练策略。通过这种方式,感兴趣的读者可以轻松理解基于深度学习的声源定位方法的广阔前景。综述末尾提供了总结文献综述的表格,用于快速搜索具有给定目标特征集的方法。 摘要:This article is a review on deep learning methods for single and multiple sound source localization. We are particularly interested in sound source localization in indoor/domestic environment, where reverberation and diffuse noise are present. We provide an exhaustive topography of the neural-based localization literature in this context, organized according to several aspects: the neural network architecture, the type of input features, the output strategy (classification or regression), the types of data used for model training and evaluation, and the model training strategy. This way, an interested reader can easily comprehend the vast panorama of the deep learning-based sound source localization methods. Tables summarizing the literature review are provided at the end of the review for a quick search of methods with a given set of target characteristics.

【9】 Learning Zero-sum Stochastic Games with Posterior Sampling 标题:后验抽样学习零和随机对策 链接:https://arxiv.org/abs/2109.03396

作者:Mehdi Jafarnia-Jahromi,Rahul Jain,Ashutosh Nayyar 机构:University of Southern California 摘要:在本文中,我们提出了零和随机对策的后验抽样强化学习(PSRL-ZSG),这是第一个在线学习算法,它在具有平均报酬准则的无限水平零和随机对策中实现了$O(HS\sqrt{AT})$的贝叶斯遗憾界。这里,$H$是偏差函数跨度的上限,$S$是状态数,$A$是联合行动数,$T$是地平线。我们认为在线设置,对手不能被控制,可以采取任意时间自适应历史依赖策略。这改进了Wei等人2017年在相同假设下得出的$O(\sqrt[3]{DS^2AT^2})$的现有最佳后悔界限,并与$A$和$T$的理论下限相匹配。 摘要:In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. This improves the best existing regret bound of $O(\sqrt[3]{DS^2AT^2})$ by Wei et. al., 2017 under the same assumption and matches the theoretical lower bound in $A$ and $T$.

【10】 On the space of coefficients of a Feed Forward Neural Network 标题:关于前馈神经网络的系数空间 链接:https://arxiv.org/abs/2109.03362

作者:Dinesh Valluri,Rory Campbell 机构:Department of Computer Science, The Univerity of Western Ontario 备注:13 pages, 5 figures 摘要:我们定义并建立了“等效神经网络”的条件,即具有不同权重、偏差和阈值函数的神经网络,它们产生相同的关联函数。我们证明了,给定一个分段线性激活的神经网络$\mathcal{N}$,描述所有等价神经网络的系数空间由一个半代数集给出。这个结果是通过使用Tarski-Seidenberg定理研究给定分段线性函数的不同表示得到的。 摘要:We define and establish the conditions for `equivalent neural networks' - neural networks with different weights, biases, and threshold functions that result in the same associated function. We prove that given a neural network $\mathcal{N}$ with piece-wise linear activation, the space of coefficients describing all equivalent neural networks is given by a semialgebraic set. This result is obtained by studying different representations of a given piece-wise linear function using the Tarski-Seidenberg theorem.

【11】 CyGIL: A Cyber Gym for Training Autonomous Agents over Emulated Network Systems 标题:CyGIL:一种在仿真网络系统上训练自治Agent的网络健身房 链接:https://arxiv.org/abs/2109.03331

作者:Li Li,Raed Fayad,Adrian Taylor 机构:Defence Research and Development Canada, Dept. of Electrical and Computer Engineering, Queens University, Canada 备注:Presented at 1st International Workshop on Adaptive Cyber Defense, 2021 (arXiv:2108.08476) 摘要:鉴于强化学习(RL)在各个领域的成功,有希望探索其方法在智能自主网络代理开发中的应用。实现这一发展需要有代表性的RL训练环境。为此,这项工作介绍了CyGIL:一个用于网络网络操作的模拟RL训练环境的实验测试台。CyGIL使用无状态环境体系结构,并结合MITRE ATT&CK框架来建立高保真训练环境,同时提供充分抽象的接口以支持RL训练。其全面的行动空间和灵活的游戏设计允许代理训练专注于特定的高级持久性威胁(APT)配置文件,并纳入广泛的潜在威胁和漏洞。通过在保真度和简单性之间取得平衡,它旨在利用最先进的RL算法应用于现实世界的网络防御。 摘要:Given the success of reinforcement learning (RL) in various domains, it is promising to explore the application of its methods to the development of intelligent and autonomous cyber agents. Enabling this development requires a representative RL training environment. To that end, this work presents CyGIL: an experimental testbed of an emulated RL training environment for network cyber operations. CyGIL uses a stateless environment architecture and incorporates the MITRE ATT&CK framework to establish a high fidelity training environment, while presenting a sufficiently abstracted interface to enable RL training. Its comprehensive action space and flexible game design allow the agent training to focus on particular advanced persistent threat (APT) profiles, and to incorporate a broad range of potential threats and vulnerabilities. By striking a balance between fidelity and simplicity, it aims to leverage state of the art RL algorithms for application to real-world cyber defence.

【12】 Effective and interpretable dispatching rules for dynamic job shops via guided empirical learning 标题:基于引导式经验学习的动态作业车间有效可解释调度规则 链接:https://arxiv.org/abs/2109.03323

作者:Cristiane Ferreira,Gonçalo Figueira,Pedro Amorim 机构:INESC TEC, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, sn,-, Porto, Portugal 摘要:工业4.0的出现使生产系统更加灵活,也更加动态。在这些设置中,调度规则通常需要实时调整调度。虽然直到90年代才取得实质性进展,但这些规则的执行仍然相当有限。机器学习文献正在开发各种方法来改进它们,但是产生的规则很难解释,并且不能很好地推广到广泛的环境中。本文是将机器学习与领域问题推理相结合用于调度的首次重大尝试。这一想法包括利用后者获得的见解来指导前者的实证研究。我们的假设是,这种有指导的经验学习过程应该产生有效且可解释的调度规则,并且可以很好地推广到不同的实例类。我们在经典的动态jobshop调度问题中测试了我们的方法,该问题是研究最为深入的调度问题之一。尽管如此,结果表明,我们的方法能够找到新的最先进的规则,在绝大多数情况下,从宽松的到期日到紧张的到期日,从低利用率条件到拥挤的商店,这些规则的表现明显优于现有文献。总体而言,平均改善率为19%。此外,这些规则紧凑、可解释,并能很好地推广到极端的、看不见的场景。 摘要:The emergence of Industry 4.0 is making production systems more flexible and also more dynamic. In these settings, schedules often need to be adapted in real-time by dispatching rules. Although substantial progress was made until the '90s, the performance of these rules is still rather limited. The machine learning literature is developing a variety of methods to improve them, but the resulting rules are difficult to interpret and do not generalise well for a wide range of settings. This paper is the first major attempt at combining machine learning with domain problem reasoning for scheduling. The idea consists of using the insights obtained with the latter to guide the empirical search of the former. Our hypothesis is that this guided empirical learning process should result in dispatching rules that are effective and interpretable and which generalise well to different instance classes. We test our approach in the classical dynamic job shop scheduling problem minimising tardiness, which is one of the most well-studied scheduling problems. Nonetheless, results suggest that our approach was able to find new state-of-the-art rules, which significantly outperform the existing literature in the vast majority of settings, from loose to tight due dates and from low utilisation conditions to congested shops. Overall, the average improvement is 19%. Moreover, the rules are compact, interpretable, and generalise well to extreme, unseen scenarios.

【13】 Text-Free Prosody-Aware Generative Spoken Language Modeling 标题:无文本韵律感知的生成性口语建模 链接:https://arxiv.org/abs/2109.03264

作者:Eugene Kharitonov,Ann Lee,Adam Polyak,Yossi Adi,Jade Copet,Kushal Lakhotia,Tu-Anh Nguyen,Morgane Rivière,Abdelrahman Mohamed,Emmanuel Dupoux,Wei-Ning Hsu 机构:Facebook AI Research 摘要:语音预训练主要证明了其在分类任务上的有效性,而其生成新语音的能力,类似于GPT-2生成连贯段落的能力,几乎没有被探索过。生成性口语建模(GSLM)(Lakhotia et al.,2021)是之前唯一一项解决语音预训练生成方面的工作,它用发现的类似手机的语言建模单元替换文本,并显示出生成有意义的新句子的能力。不幸的是,尽管消除了对文本的需求,但GSLM中使用的单元丢弃了大部分韵律信息。因此,GSLM无法利用韵律来更好地理解,也无法生成富有表现力的语音。在这项工作中,我们提出了一个韵律感知的生成性口语模型(pGSLM)。它由语音的多流变换器语言模型(MS-TLM)组成,表示为发现的单元和韵律特征流,以及将MS-TLM输出转换为波形的自适应HiFi GAN模型。我们为韵律建模和生成设计了一系列度量,并将GSLM中的度量用于内容建模。实验结果表明,pGSLM可以利用韵律改进韵律和内容建模,并在语音提示下生成自然、有意义和连贯的语音。音频样本可在以下网址找到:https://speechbot.github.io/pgslm. 摘要:Speech pre-training has primarily demonstrated efficacy on classification tasks, while its capability of generating novel speech, similar to how GPT-2 can generate coherent paragraphs, has barely been explored. Generative Spoken Language Modeling (GSLM) (Lakhotia et al., 2021) is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences. Unfortunately, despite eliminating the need of text, the units used in GSLM discard most of the prosodic information. Hence, GSLM fails to leverage prosody for better comprehension, and does not generate expressive speech. In this work, we present a prosody-aware generative spoken language model (pGSLM). It is composed of a multi-stream transformer language model (MS-TLM) of speech, represented as discovered unit and prosodic feature streams, and an adapted HiFi-GAN model converting MS-TLM outputs to waveforms. We devise a series of metrics for prosody modeling and generation, and re-use metrics from GSLM for content modeling. Experimental results show that the pGSLM can utilize prosody to improve both prosody and content modeling, and also generate natural, meaningful, and coherent speech given a spoken prompt. Audio samples can be found at https://speechbot.github.io/pgslm.

【14】 Training Algorithm Matters for the Performance of Neural Network Potential 标题:训练算法对神经网络势性能的影响 链接:https://arxiv.org/abs/2109.03769

作者:Yunqi Shao,Florian M. Dietrich,Carl Nettelblad,Chao Zhang 机构:†Department of Chemistry-˚, Angstr¨om Laboratory, Uppsala University, L¨agerhyddsv¨agen , ‡Division of Scientific Computing, Department of Information Technology, Uppsala, University, L¨agerhyddsv¨agen , BOX , Uppsala, Sweden 摘要:开发神经网络潜能(NNP)的一个隐藏但重要的问题是训练算法的选择。在这里,我们使用Behler-Parrinelo神经网络(BPNN)和两个可公开访问的液态水数据集,比较了两种流行的训练算法,自适应矩估计算法(Adam)和扩展卡尔曼滤波算法(EKF)的性能。研究发现,与Adam相比,使用EKF训练的NNP更具可转移性,且对学习率的值不太敏感。在这两种情况下,测试集的错误度量并不总是作为NNP实际性能的良好指标。相反,我们证明了它们的性能与基于Fisher信息的相似性度量有很好的相关性。 摘要:One hidden yet important issue for developing neural network potentials (NNPs) is the choice of training algorithm. Here we compare the performance of two popular training algorithms, the adaptive moment estimation algorithm (Adam) and the extended Kalman filter algorithm (EKF), using the Behler-Parrinello neural network (BPNN) and two publicly accessible datasets of liquid water. It is found that NNPs trained with EKF are more transferable and less sensitive to the value of the learning rate, as compared to Adam. In both cases, error metrics of the test set do not always serve as a good indicator for the actual performance of NNPs. Instead, we show that their performance correlates well with a Fisher information based similarity measure.

【15】 U-FNO -- an enhanced Fourier neural operator based-deep learning model for multiphase flow 标题:U-FNO--一种基于改进傅立叶神经算子的多相流深度学习模型 链接:https://arxiv.org/abs/2109.03697

作者:Gege Wen,Zongyi Li,Kamyar Azizzadenesheli,Anima Anandkumar,Sally M. Benson 机构:Energy Resources Engineering, Stanford University, Panama, St, Stanford, CA, USA, Computing and Mathematical Sciences, California Institute of Technology, E., California Blvd., MC ,-, Pasadena, CA, USA 摘要:多孔介质中多相流的数值模拟是许多地球科学应用的基础。然而,由于多物理、非线性和多尺度问题的性质,这些模拟在理想的网格分辨率下非常昂贵,并且计算成本常常阻碍严格的工程决策。机器学习方法通过训练具有数值模拟数据映射的神经网络模型,为传统仿真器提供了更快的替代方案。传统的基于卷积神经网络(CNN)的模型虽然精确,但数据密集,并且容易过度拟合。在这里,我们提出了一种新的结构,U-FNO,一种用于解决多相流问题的增强型傅立叶神经算子。U-FNO是基于傅里叶神经算子(FNO)设计的,该算子在傅里叶空间中学习积分核。通过对CNN基准和三种类型的FNO变体在CO2地质封存背景下的CO2-水多相问题上的系统比较,我们表明U-FNO体系结构具有传统CNN和原始FNO的优点,提供比以前的体系结构更精确、更高效的性能。经过训练的U-FNO在保持类似精度的同时,提供了比传统数值模拟器快10000倍的气体饱和度和压力恢复预测。 摘要:Numerical simulation of multiphase flow in porous media is essential for many geoscience applications. However, due to the multi-physics, non-linear, and multi-scale problem nature, these simulations are very expensive at desirable grid resolutions, and the computational cost often impedes rigorous engineering decision-making. Machine learning methods provide faster alternatives to traditional simulators by training neural network models with numerical simulation data mappings. Traditional convolutional neural network (CNN)-based models are accurate yet data-intensive and are prone to overfitting. Here we present a new architecture, U-FNO, an enhanced Fourier neural operator for solving the multiphase flow problem. The U-FNO is designed based on the Fourier neural operator (FNO) that learns an integral kernel in the Fourier space. Through a systematic comparison among a CNN benchmark and three types of FNO variations on a CO2-water multiphase problem in the context of CO2 geological storage, we show that the U-FNO architecture has the advantages of both traditional CNN and original FNO, providing significantly more accurate and efficient performance than previous architectures. The trained U-FNO provides gas saturation and pressure buildup predictions with a 10,000 times speedup compared to traditional numerical simulators while maintaining similar accuracy.

【16】 Single Plane-Wave Imaging using Physics-Based Deep Learning 标题:基于物理深度学习的单平面波成像 链接:https://arxiv.org/abs/2109.03661

作者:Georgios Pilikos,Chris L. de Korte,Tristan van Leeuwen,Felix Lucka 机构:∗Computational Imaging, Centrum Wiskunde & Informatica, Amsterdam, NL, † Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, NL 摘要:在平面波成像中,多个未聚焦的超声波从不同角度传输到感兴趣的介质中,并从记录的反射形成图像。使用的平面波的数量导致帧速率和图像质量之间的权衡,其中单平面波(SPW)成像是最快的成像方式,图像质量最差。近年来,人们提出了深入学习的方法来改善超声成像。一种方法是使用在形成的图像上工作的图像到图像网络,另一种方法是直接学习从数据到图像的映射。这两种方法都使用纯数据驱动的模型,需要深入、表达性强的网络体系结构,并结合大量的训练样本来获得良好的结果。在这里,我们提出了一种数据到图像的体系结构,它在深度卷积神经网络之间结合了基于波物理的图像形成算法。为了实现这一点,我们将傅里叶(FK)迁移方法作为网络层来实现,并对整个网络进行端到端的训练。在模拟医学超声应用的数据实验中,我们将我们提出的数据到图像网络与图像到图像网络进行了比较。实验表明,可以获得高质量的SPW图像,几乎类似于在$\pm$16$^\circ$的角度范围内使用75个平面波形成的图像。这说明了深层神经网络与基于物理的SPW成像成像算法相结合的巨大潜力。 摘要:In plane-wave imaging, multiple unfocused ultrasound waves are transmitted into a medium of interest from different angles and an image is formed from the recorded reflections. The number of plane waves used leads to a trade-off between frame-rate and image quality, with single-plane-wave (SPW) imaging being the fastest possible modality with the worst image quality. Recently, deep learning methods have been proposed to improve ultrasound imaging. One approach is to use image-to-image networks that work on the formed image and another is to directly learn a mapping from data to an image. Both approaches utilize purely data-driven models and require deep, expressive network architectures, combined with large numbers of training samples to obtain good results. Here, we propose a data-to-image architecture that incorporates a wave-physics-based image formation algorithm in-between deep convolutional neural networks. To achieve this, we implement the Fourier (FK) migration method as network layers and train the whole network end-to-end. We compare our proposed data-to-image network with an image-to-image network in simulated data experiments, mimicking a medical ultrasound application. Experiments show that it is possible to obtain high-quality SPW images, almost similar to an image formed using 75 plane waves over an angular range of $\pm$16$^\circ$. This illustrates the great potential of combining deep neural networks with physics-based image formation algorithms for SPW imaging.

【17】 Deep Learning for Multi-View Ultrasonic Image Fusion 标题:深度学习在多视角超声图像融合中的应用 链接:https://arxiv.org/abs/2109.03616

作者:Georgios Pilikos,Lars Horchens,Tristan van Leeuwen,Felix Lucka 机构:∗Computational Imaging, Centrum Wiskunde & Informatica, Amsterdam, NL, †Applus+ E&I Technology Centre, Rotterdam, NL 摘要:超声成像通过向介质中发射波并使用超声换能器阵列记录它们之间的相互作用来获取有关介质声学特性的信息。延迟和求和(DAS)算法使用反射信号传回传感器的主路径形成图像。在一些应用中,可以考虑不同的内均匀化路径,例如,通过将传感器放置在不同的位置,或者如果事先知道介质中的强反射器,则可以考虑不同的内均匀化路径。这些不同的模式产生了多个反映散射体不同几何信息的DAS图像,挑战在于要么将它们融合到一个图像中,要么直接提取有关介质材料的更高级别信息,例如分割图。传统的图像融合技术通常使用预定义图像变换、池操作和阈值的特殊组合。在这项工作中,我们提出了一种深度神经网络(DNN)架构,它直接将所有可用数据映射到一个分割图,同时明确地将不同内渗路径的DAS图像形成合并为网络层。这使得经过端到端训练的数据预处理和图像后处理DNN之间的信息流动成为可能。我们使用模拟数据实验将我们提出的方法与传统的图像融合技术进行比较,模拟具有四种图像模式的无损检测应用,即两个传感器位置和两个内反射边界。使用我们的方法,可以获得更精确的缺陷分割。 摘要:Ultrasonic imaging is being used to obtain information about the acoustic properties of a medium by emitting waves into it and recording their interaction using ultrasonic transducer arrays. The Delay-And-Sum (DAS) algorithm forms images using the main path on which reflected signals travel back to the transducers. In some applications, different insonification paths can be considered, for instance by placing the transducers at different locations or if strong reflectors inside the medium are known a-priori. These different modes give rise to multiple DAS images reflecting different geometric information about the scatterers and the challenge is to either fuse them into one image or to directly extract higher-level information regarding the materials of the medium, e.g., a segmentation map. Traditional image fusion techniques typically use ad-hoc combinations of pre-defined image transforms, pooling operations and thresholding. In this work, we propose a deep neural network (DNN) architecture that directly maps all available data to a segmentation map while explicitly incorporating the DAS image formation for the different insonification paths as network layers. This enables information flow between data pre-processing and image post-processing DNNs, trained end-to-end. We compare our proposed method to a traditional image fusion technique using simulated data experiments, mimicking a non-destructive testing application with four image modes, i.e., two transducer locations and two internal reflection boundaries. Using our approach, it is possible to obtain much more accurate segmentation of defects.

【18】 Can Noise on Qubits Be Learned in Quantum Neural Network? A Case Study on QuantumFlow 标题:量子神经网络可以学习量子比特上的噪声吗?QuantumFlow的案例研究 链接:https://arxiv.org/abs/2109.03430

作者:Zhiding Liang,Zhepeng Wang,Junhuan Yang,Lei Yang,Jinjun Xiong,Yiyu Shi,Weiwen Jiang 机构:†University of Notre Dame, IN, USA. ‡George Mason University, VA, USA., ¶University of New Mexico, NM, USA. §University at Buffalo, NY, USA. 摘要:在噪声中尺度量子(NISQ)时代,如何处理物理量子比特中存在的高噪声是一个关键问题。量子纠错很有希望,但需要大量(例如,超过1000)的物理量子位才能产生一个“完美”量子位,超过现有量子计算机的容量。本文旨在从另一个角度解决噪声问题:我们不是为一般的量子算法创建完美的量子比特,而是研究为专用算法缓解噪声问题的潜力。具体而言,本文以量子神经网络(QNN)为目标,提出在训练阶段学习误差,使识别出的QNN模型具有抗噪声能力。因此,QNN的实现不需要或只需要少量的附加物理量子位,这对于近期的量子计算机来说更为现实。为了实现这一目标,一个特定于应用程序的编译器是必不可少的:一方面,如果从逻辑量子位到物理量子位的映射存在随机性,则无法学习错误;另一方面,编译器需要高效,以便在合理的时间内完成冗长的训练过程。在本文中,我们使用最新的QNN框架QuantumFlow作为案例研究。实验结果表明,该方法可以针对不同的量子位误差优化QNN模型,与误差不可知训练得到的模型相比,精度提高了28%。 摘要:In the noisy intermediate-scale quantum (NISQ) era, one of the key questions is how to deal with the high noise level existing in physical quantum bits (qubits). Quantum error correction is promising but requires an extensive number (e.g., over 1,000) of physical qubits to create one "perfect" qubit, exceeding the capacity of the existing quantum computers. This paper aims to tackle the noise issue from another angle: instead of creating perfect qubits for general quantum algorithms, we investigate the potential to mitigate the noise issue for dedicate algorithms. Specifically, this paper targets quantum neural network (QNN), and proposes to learn the errors in the training phase, so that the identified QNN model can be resilient to noise. As a result, the implementation of QNN needs no or a small number of additional physical qubits, which is more realistic for the near-term quantum computers. To achieve this goal, an application-specific compiler is essential: on the one hand, the error cannot be learned if the mapping from logical qubits to physical qubits exists randomness; on the other hand, the compiler needs to be efficient so that the lengthy training procedure can be completed in a reasonable time. In this paper, we utilize the recent QNN framework, QuantumFlow, as a case study. Experimental results show that the proposed approach can optimize QNN models for different errors in qubits, achieving up to 28% accuracy improvement compared with the model obtained by the error-agnostic training.

【19】 Entangled Datasets for Quantum Machine Learning 标题:量子机器学习中的纠缠数据集 链接:https://arxiv.org/abs/2109.03400

作者:Louis Schatzki,Andrew Arrasmith,Patrick J. Coles,M. Cerezo 机构:Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM , USA, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL , USA 备注:12 + 8 pages, 10 + 3 figures, 1 table 摘要:高质量、大规模的数据集对经典机器学习的发展和成功起着至关重要的作用。量子机器学习(QML)是一个新领域,旨在利用量子计算机进行数据分析,以期获得某种量子优势。虽然大多数提出的QML体系结构都是使用经典数据集进行基准测试的,但仍有疑问,经典数据集上的QML是否能实现这样的优势。在这项工作中,我们认为应该使用由量子态组成的量子数据集。为此,我们引入了由具有不同数量和类型的多体纠缠的量子态组成的纠缠数据集。我们首先展示如何训练量子神经网络来生成纠缠数据集中的状态。然后,我们使用纠缠数据集对QML模型进行基准测试,以完成监督学习分类任务。我们还考虑了另一种基于纠缠的数据集,它是可伸缩的,并且由由不同深度的量子电路所制备的状态组成。作为我们结果的副产品,我们介绍了一种产生多体纠缠态的新方法,为量子纠缠理论提供了一个量子神经网络的应用案例。 摘要:High-quality, large-scale datasets have played a crucial role in the development and success of classical machine learning. Quantum Machine Learning (QML) is a new field that aims to use quantum computers for data analysis, with the hope of obtaining a quantum advantage of some sort. While most proposed QML architectures are benchmarked using classical datasets, there is still doubt whether QML on classical datasets will achieve such an advantage. In this work, we argue that one should instead employ quantum datasets composed of quantum states. For this purpose, we introduce the NTangled dataset composed of quantum states with different amounts and types of multipartite entanglement. We first show how a quantum neural network can be trained to generate the states in the NTangled dataset. Then, we use the NTangled dataset to benchmark QML models for supervised learning classification tasks. We also consider an alternative entanglement-based dataset, which is scalable and is composed of states prepared by quantum circuits with different depths. As a byproduct of our results, we introduce a novel method for generating multipartite entangled states, providing a use-case of quantum neural networks for quantum entanglement theory.

【20】 Reconstructing High-resolution Turbulent Flows Using Physics-Guided Neural Networks 标题:用物理引导神经网络重建高分辨率湍流流动 链接:https://arxiv.org/abs/2109.03327

作者:Shengyu Chen,Shervin Sammak,Peyman Givi,Joseph P. Yurko1,Xiaowei Jia 机构:Department of Computer Science, University of Pittsburgh, Department of Mechanical Engineering and Materials Science, University of Pittsburgh 摘要:湍流的直接数值模拟(DNS)在计算上非常昂贵,无法应用于雷诺数较大的流动。大涡模拟(LES)是一种计算要求较低的替代方法,但无法准确捕捉湍流输送的所有尺度。我们在这项工作中的目标是建立一种新的基于超分辨率技术的数据驱动方法,以从LES预测中重建DNS数据。我们利用潜在的物理关系来规范不同物理变量之间的关系。我们还引入了分层生成过程和反向降级过程,以充分探索DNS和LES数据之间的对应关系。我们通过一个单快照实验和一个跨时间实验证明了该方法的有效性。结果表明,在像素级重建误差和结构相似性方面,我们的方法能够更好地在空间和时间上重建高分辨率DNS数据。视觉比较表明,我们的方法在捕捉精细水平流动力学方面表现得更好。 摘要:Direct numerical simulation (DNS) of turbulent flows is computationally expensive and cannot be applied to flows with large Reynolds numbers. Large eddy simulation (LES) is an alternative that is computationally less demanding, but is unable to capture all of the scales of turbulent transport accurately. Our goal in this work is to build a new data-driven methodology based on super-resolution techniques to reconstruct DNS data from LES predictions. We leverage the underlying physical relationships to regularize the relationships amongst different physical variables. We also introduce a hierarchical generative process and a reverse degradation process to fully explore the correspondence between DNS and LES data. We demonstrate the effectiveness of our method through a single-snapshot experiment and a cross-time experiment. The results confirm that our method can better reconstruct high-resolution DNS data over space and over time in terms of pixel-wise reconstruction error and structural similarity. Visual comparisons show that our method performs much better in capturing fine-level flow dynamics.

其他(13篇)

【1】 Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values 标题:使用变分自动编码器构建具有缺失值的记录数据的保守策略 链接:https://arxiv.org/abs/2109.03747

作者:Mahed Abroshan,Kai Hou Yip,Cem Tekin,Mihaela van der Schaar 机构:TekiniswithBilkentUniversity(cemtekin 摘要:在医疗保健等数据驱动决策的高风险应用中,学习一种在存在不确定性时避免潜在危险行为的同时实现回报最大化的政策至关重要。通常与此问题相关的主要挑战有两个。首先,由于此类应用的关键性,通过在线探索学习是不可能的。因此,我们需要求助于没有反事实的观测数据集。其次,这样的数据集通常是不完善的,而且特征属性中缺少值。在本文中,我们考虑的问题,构建个性化的政策使用日志数据时,在训练和测试数据中的特征属性缺失值。目标是在观察到$\Xt$是$\Xb$的降级版本且缺少值时,建议采取措施(治疗)。我们考虑三个策略来处理错误。特别是,我们引入了\text{保守策略},其中策略设计用于安全地处理由于缺失而产生的不确定性。为了实现这一策略,我们需要估计后验分布$p(\Xb |\Xt)$,我们使用变分自动编码器来实现这一点。特别是,我们的方法基于部分变分自动编码器(PVAE),其设计用于捕获具有缺失值的特征的底层结构。 摘要:In high-stakes applications of data-driven decision making like healthcare, it is of paramount importance to learn a policy that maximizes the reward while avoiding potentially dangerous actions when there is uncertainty. There are two main challenges usually associated with this problem. Firstly, learning through online exploration is not possible due to the critical nature of such applications. Therefore, we need to resort to observational datasets with no counterfactuals. Secondly, such datasets are usually imperfect, additionally cursed with missing values in the attributes of features. In this paper, we consider the problem of constructing personalized policies using logged data when there are missing values in the attributes of features in both training and test data. The goal is to recommend an action (treatment) when $\Xt$, a degraded version of $\Xb$ with missing values, is observed. We consider three strategies for dealing with missingness. In particular, we introduce the \textit{conservative strategy} where the policy is designed to safely handle the uncertainty due to missingness. In order to implement this strategy we need to estimate posterior distribution $p(\Xb|\Xt)$, we use variational autoencoder to achieve this. In particular, our method is based on partial variational autoencoders (PVAE) which are designed to capture the underlying structure of features with missing values.

【2】 Priming PCA with EigenGame 标题:基于特征博弈的PCA初始化 链接:https://arxiv.org/abs/2109.03709

作者:Bálint Máté,François Fleuret 机构: University of Geneva{balint 摘要:我们介绍了素数PCA(pPCA),这是最近提出的特征博弈算法的一个扩展,用于在大规模环境中计算主成分。我们的算法首先运行特征博弈来获得主成分的近似值,然后在它们所跨越的子空间中应用精确的PCA。由于在本征对策的任何实际应用中,该子空间的维数都很小,因此第二步的计算成本非常低。尽管如此,它显著提高了跨数据集的给定计算预算的准确性。在此设置中,特征博弈的目的是缩小搜索空间,并为第二步(精确计算)准备数据。我们正式证明了pPCA在非常温和的条件下改进了EigenGame,并且我们在合成和真实大规模数据集上提供了实验验证,表明它系统地转化为改进的性能。在我们的实验中,我们在原始EigenGame论文的数据集上实现了5-25倍的收敛速度改进。 摘要:We introduce primed-PCA (pPCA), an extension of the recently proposed EigenGame algorithm for computing principal components in a large-scale setup. Our algorithm first runs EigenGame to get an approximation of the principal components, and then applies an exact PCA in the subspace they span. Since this subspace is of small dimension in any practical use of EigenGame, this second step is extremely cheap computationally. Nonetheless, it improves accuracy significantly for a given computational budget across datasets. In this setup, the purpose of EigenGame is to narrow down the search space, and prepare the data for the second step, an exact calculation. We show formally that pPCA improves upon EigenGame under very mild conditions, and we provide experimental validation on both synthetic and real large-scale datasets showing that it systematically translates to improved performance. In our experiments we achieve improvements in convergence speed by factors of 5-25 on the datasets of the original EigenGame paper.

【3】 Tactile Image-to-Image Disentanglement of Contact Geometry from Motion-Induced Shear 标题:运动诱导剪切中接触几何的触觉像像解缠 链接:https://arxiv.org/abs/2109.03615

作者:Anupam K. Gupta,Laurence Aitchison,Nathan F. Lepora 机构:Department of Engineering Maths and Bristol Robotics Laboratory, University of Bristol United Kingdom, Department of Computer Science 备注:15 pages, 6 figure, under review CORL 2021 摘要:机器人触摸,特别是使用软光学触觉传感器时,会因运动相关剪切而产生变形。传感器接触刺激物的方式与关于刺激物几何形状的触觉信息纠缠在一起。在这项工作中,我们提出了一个有监督的卷积深度神经网络模型,该模型学习在潜在空间中分离由接触几何引起的传感器变形分量和由滑动诱发剪切引起的传感器变形分量。该方法通过从剪切图像重建无耳触觉图像并显示它们与无滑动运动采集的无耳触觉图像匹配来验证。此外,无耳触觉图像提供了从剪切数据不可能实现的接触几何体的忠实重建,以及可用于伺服控制围绕各种2D形状滑动的接触姿态的稳健估计。最后,将接触几何重建与伺服控制滑动相结合,实现了各种二维形状的真实全对象重建。该方法对具有剪切敏感触觉的机器人深度学习模型具有广泛的适用性。 摘要:Robotic touch, particularly when using soft optical tactile sensors, suffers from distortion caused by motion-dependent shear. The manner in which the sensor contacts a stimulus is entangled with the tactile information about the geometry of the stimulus. In this work, we propose a supervised convolutional deep neural network model that learns to disentangle, in the latent space, the components of sensor deformations caused by contact geometry from those due to sliding-induced shear. The approach is validated by reconstructing unsheared tactile images from sheared images and showing they match unsheared tactile images collected with no sliding motion. In addition, the unsheared tactile images give a faithful reconstruction of the contact geometry that is not possible from the sheared data, and robust estimation of the contact pose that can be used for servo control sliding around various 2D shapes. Finally, the contact geometry reconstruction in conjunction with servo control sliding were used for faithful full object reconstruction of various 2D shapes. The methods have broad applicability to deep learning models for robots with a shear-sensitive sense of touch.

【4】 RepNAS: Searching for Efficient Re-parameterizing Blocks 标题:RepNAS:搜索有效的重新参数化块 链接:https://arxiv.org/abs/2109.03508

作者:Mingyang Zhang,Xinyi Yu,Jingtao Rong,Linlin Ou,Feng Gao 机构:College of Information Engineering, Zhejiang University of Technology, Hang Zhou, People’s Republic of China, Zhejiang Lab, Hangzhou, People’s Republic of China 摘要:在过去的几年中,神经结构搜索(NAS)领域取得了显著的进步。然而,由于搜索到的约束条件与实际推理时间之间的差距,搜索有效的网络仍然是一个挑战。为了寻找推理时间短的高性能网络,以前的一些工作为搜索算法设置了计算复杂度约束。然而,许多因素会影响推理速度(例如,触发器、MAC)。单个指标与延迟之间的相关性不强。目前,一些重新参数化(Rep)技术被提出用于将多分支结构转换为推理友好的单路径结构。然而,多分支体系结构仍然是人为定义的,效率低下。在这项工作中,我们提出了一个新的搜索空间,适用于结构再参数化技术。RepNAS是一种单阶段NAS方法,用于在分支数约束下有效地搜索每一层的最优分支块(ODBB)。我们的实验结果表明,搜索到的ODBB可以通过有效的训练轻松超越手动多样分支块(DBB)。代码和模型将很快提供。 摘要:In the past years, significant improvements in the field of neural architecture search(NAS) have been made. However, it is still challenging to search for efficient networks due to the gap between the searched constraint and real inference time exists. To search for a high-performance network with low inference time, several previous works set a computational complexity constraint for the search algorithm. However, many factors affect the speed of inference(e.g., FLOPs, MACs). The correlation between a single indicator and the latency is not strong. Currently, some re-parameterization(Rep) techniques are proposed to convert multi-branch to single-path architecture which is inference-friendly. Nevertheless, multi-branch architectures are still human-defined and inefficient. In this work, we propose a new search space that is suitable for structural re-parameterization techniques. RepNAS, a one-stage NAS approach, is present to efficiently search the optimal diverse branch block(ODBB) for each layer under the branch number constraint. Our experimental results show the searched ODBB can easily surpass the manual diverse branch block(DBB) with efficient training. Code and models will be available sooner.

【5】 R2-D2: A Modular Baseline for Open-Domain Question Answering 标题:R2-D2:开放领域问答的模块化基线 链接:https://arxiv.org/abs/2109.03502

作者:Martin Fajcik,Martin Docekal,Karel Ondrej,Pavel Smrz 机构:Brno University of Technology 备注:Accepted to Findings of EMNLP'21. arXiv admin note: substantial text overlap with arXiv:2102.10697 摘要:这项工作提出了一种新的四阶段开放域QA管道R2-D2(排名两次,读取两次)。该管道由检索器、通道重排序器、抽取式读卡器、生成式读卡器和一个从所有系统组件聚合最终预测的机制组成。我们通过三个开放域QA数据集展示了它的实力:NaturalQuestions、TriviaQA和EfficientQA,在前两个方面超过了最先进的水平。我们的分析表明:(i)将抽取式和生成式读取器相结合可产生高达5个精确匹配的绝对改进,其效果至少是具有不同参数的相同模型的后验平均集成的两倍,(ii)参数较少的抽取式读卡器可以与生成式读卡器在抽取式QA数据集上的性能相匹配。 摘要:This work presents a novel four-stage open-domain QA pipeline R2-D2 (Rank twice, reaD twice). The pipeline is composed of a retriever, passage reranker, extractive reader, generative reader and a mechanism that aggregates the final prediction from all system's components. We demonstrate its strength across three open-domain QA datasets: NaturalQuestions, TriviaQA and EfficientQA, surpassing state-of-the-art on the first two. Our analysis demonstrates that: (i) combining extractive and generative reader yields absolute improvements up to 5 exact match and it is at least twice as effective as the posterior averaging ensemble of the same models with different parameters, (ii) the extractive reader with fewer parameters can match the performance of the generative reader on extractive QA datasets.

【6】 Estimating Expected Calibration Errors 标题:估计预期校准误差 链接:https://arxiv.org/abs/2109.03480

作者:Nicolas Posocco,Antoine Bonnefoy 机构:EURA NOVA, Marseille, France 备注:12 pages, 2 figures, ICANN2021 摘要:当模型用于支持人类决策时,在更广泛的概率管道中,或当必须采取敏感的自动决策时,概率分类器预测中的不确定性是一个关键问题。研究表明,大多数模型本质上没有经过很好的校准,这意味着它们的决策分数与后验概率不一致。因此,能够校准这些模型,或在学习它们的同时强制校准,在最近的文献中重新获得了兴趣。在这种情况下,正确评估校准对于量化新的校准贡献至关重要。然而,常用指标仍有改进空间,校准评估可从更深入的分析中受益。因此,本文着重于分类背景下校准指标的实证评估。更具体地说,它评估了预期校准误差(ECE$)的不同估计量,其中包括本文提出的传统估计量和一些新估计量。我们建立了一个经验程序来量化这些$ECE$估计量的质量,并使用它来决定在不同的环境中应该使用哪个估计量。 摘要:Uncertainty in probabilistic classifiers predictions is a key concern when models are used to support human decision making, in broader probabilistic pipelines or when sensitive automatic decisions have to be taken. Studies have shown that most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities. Hence being able to calibrate these models, or enforce calibration while learning them, has regained interest in recent literature. In this context, properly assessing calibration is paramount to quantify new contributions tackling calibration. However, there is room for improvement for commonly used metrics and evaluation of calibration could benefit from deeper analyses. Thus this paper focuses on the empirical evaluation of calibration metrics in the context of classification. More specifically it evaluates different estimators of the Expected Calibration Error ($ECE$), amongst which legacy estimators and some novel ones, proposed in this paper. We build an empirical procedure to quantify the quality of these $ECE$ estimators, and use it to decide which estimator should be used in practice for different settings.

【7】 ADER:Adapting between Exploration and Robustness for Actor-Critic Methods 标题:ADER:演员批评方法在探索性和稳健性之间的适应 链接:https://arxiv.org/abs/2109.03443

作者:Bo Zhou,Kejiao Li,Hongsheng Zeng,Fan Wang,Hao Tian 摘要:研究发现,将非策略强化学习方法与神经网络等函数逼近器相结合会导致对值函数的高估和次优解。为了解决这个问题,已经提出了TD3等改进方案。然而,我们惊奇地发现,在一些原始环境中,它的性能落后于普通的actor-critic方法(如DDPG)。在本文中,我们表明,一些案例的失败可归因于勘探不足。我们揭示了TD3中探索不足的罪魁祸首,并针对这个问题提出了一种适应于探索性和鲁棒性的新算法,即ADER。为了提高探索能力,同时消除高估偏差,我们在根据估计不确定性计算的价值估计中引入了动态惩罚项,该项考虑了不同学习阶段不确定性的不同组成。在多个具有挑战性的环境中的实验证明了该方法在连续控制任务中的优越性。 摘要:Combining off-policy reinforcement learning methods with function approximators such as neural networks has been found to lead to overestimation of the value function and sub-optimal solutions. Improvement such as TD3 has been proposed to address this issue. However, we surprisingly find that its performance lags behind the vanilla actor-critic methods (such as DDPG) in some primitive environments. In this paper, we show that the failure of some cases can be attributed to insufficient exploration. We reveal the culprit of insufficient exploration in TD3, and propose a novel algorithm toward this problem that ADapts between Exploration and Robustness, namely ADER. To enhance the exploration ability while eliminating the overestimation bias, we introduce a dynamic penalty term in value estimation calculated from estimated uncertainty, which takes into account different compositions of the uncertainty in different learning stages. Experiments in several challenging environments demonstrate the supremacy of the proposed method in continuous control tasks.

【8】 Fixed Support Tree-Sliced Wasserstein Barycenter 标题:固定支撑树切片Wasserstein重心 链接:https://arxiv.org/abs/2109.03431

作者:Yuki Takezawa,Ryoma Sato,Zornitsa Kozareva,Sujith Ravi,Makoto Yamada 机构:Kyoto University, RIKEN AIP, Facebook AI Research, SliceX AI 摘要:Wasserstein重心在自然语言处理和计算机视觉等领域得到了广泛的研究。然而,解决Wasserstein重心问题需要较高的计算成本,因为计算Wasserstein距离需要与支撑数量相关的二次时间。相比之下,树上的Wasserstein距离(称为树Wasserstein距离)可以在线性时间内计算,并允许快速比较大量分布。在这项研究中,我们提出了树瓦瑟斯坦距离下的重心,称为固定支撑树瓦瑟斯坦重心(FS-TWB)及其扩展,称为固定支撑树切片瓦瑟斯坦重心(FS-TSWB)。更具体地说,我们首先证明了FS-TWB和FS-TSWB问题是凸优化问题,可以使用投影次梯度下降法求解。此外,我们还利用树的Wasserstein重心问题的性质,提出了一种更有效的计算次梯度和目标函数值的算法。通过真实世界的实验,我们表明,使用所提出的算法,FS-TWB和FS-TSWB的求解速度比原始Wasserstein重心快两个数量级。 摘要:The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions. In this study, we propose a barycenter under the tree-Wasserstein distance, called the fixed support tree-Wasserstein barycenter (FS-TWB) and its extension, called the fixed support tree-sliced Wasserstein barycenter (FS-TSWB). More specifically, we first show that the FS-TWB and FS-TSWB problems are convex optimization problems and can be solved by using the projected subgradient descent. Moreover, we propose a more efficient algorithm to compute the subgradient and objective function value by using the properties of tree-Wasserstein barycenter problems. Through real-world experiments, we show that, by using the proposed algorithm, the FS-TWB and FS-TSWB can be solved two orders of magnitude faster than the original Wasserstein barycenter.

【9】 Axial multi-layer perceptron architecture for automatic segmentation of choroid plexus in multiple sclerosis 标题:轴向多层感知器结构在多发性硬化脉络丛自动分割中的应用 链接:https://arxiv.org/abs/2109.03778

作者:Marius Schmidt-Mengin,Vito A. G. Ricigliano,Benedetta Bodini,Emanuele Morena,Annalisa Colombi,Mariem Hamzaoui,Arya Yazdan Panah,Bruno Stankoff,Olivier Colliot 机构:Colliota,b, Sorbonne Universit´e, Paris Brain Institute, Inserm, CNRS, AP-HP, Paris, France, Inria, Aramis project-team, Paris, France, AP-HP, Hˆopital Saint-Antoine, Department of Neurology, DMU Neurosciences, Paris, France 摘要:脉络丛(CP)是产生大部分脑脊液(CSF)的脑室结构。一些尸检和体内研究已经指出它们在多发性硬化症(MS)炎症过程中的作用。因此,从MRI中自动分割CP对于研究大样本患者的CP特征具有很高的价值。据我们所知,唯一免费提供的CP分割工具是FreeSurfer,但它对于这种特定结构的准确性很差。在本文中,我们提出了从非对比增强T1加权MRI中自动分割CP。为此,我们介绍了一种基于轴向多层感知器(MLP)组件的新模型“轴向MLP”。这是受最近的作品启发的,这些作品表明,Transformer的自我关注层可以用MLP代替。该方法与标准3D U-Net、nnU-Net、Freesurfer和FastSurfer进行了系统比较。在我们的实验中,我们使用了141名受试者(44名对照组和97名MS患者)的数据集。我们表明,所有经过测试的深度学习(DL)方法都优于FreeSurfer(DL的骰子约为0.7,FreeSurfer的骰子约为0.33)。轴向MLP与U形网相比具有竞争力,尽管其精确度稍低。我们的研究结论有两个方面:1)所研究的深度学习方法可能是研究大规模MS患者CP的有用工具;2) 对于此类任务,轴向MLP是卷积神经网络的潜在可行替代方案,尽管它可以从进一步的改进中获益。 摘要:Choroid plexuses (CP) are structures of the ventricles of the brain which produce most of the cerebrospinal fluid (CSF). Several postmortem and in vivo studies have pointed towards their role in the inflammatory process in multiple sclerosis (MS). Automatic segmentation of CP from MRI thus has high value for studying their characteristics in large cohorts of patients. To the best of our knowledge, the only freely available tool for CP segmentation is FreeSurfer but its accuracy for this specific structure is poor. In this paper, we propose to automatically segment CP from non-contrast enhanced T1-weighted MRI. To that end, we introduce a new model called "Axial-MLP" based on an assembly of Axial multi-layer perceptrons (MLPs). This is inspired by recent works which showed that the self-attention layers of Transformers can be replaced with MLPs. This approach is systematically compared with a standard 3D U-Net, nnU-Net, Freesurfer and FastSurfer. For our experiments, we make use of a dataset of 141 subjects (44 controls and 97 patients with MS). We show that all the tested deep learning (DL) methods outperform FreeSurfer (Dice around 0.7 for DL vs 0.33 for FreeSurfer). Axial-MLP is competitive with U-Nets even though it is slightly less accurate. The conclusions of our paper are two-fold: 1) the studied deep learning methods could be useful tools to study CP in large cohorts of MS patients; 2)~Axial-MLP is a potentially viable alternative to convolutional neural networks for such tasks, although it could benefit from further improvements.

【10】 FaBiAN: A Fetal Brain magnetic resonance Acquisition Numerical phantom 标题:Fabian:一种胎脑磁共振采集数字体模 链接:https://arxiv.org/abs/2109.03624

作者:Hélène Lajous,Christopher W. Roy,Tom Hilbert,Priscille de Dumast,Sébastien Tourbier,Yasser Alemán-Gómez,Jérôme Yerly,Thomas Yu,Hamza Kebiri,Kelly Payette,Jean-Baptiste Ledoux,Reto Meuli,Patric Hagmann,Andras Jakab,Vincent Dunet,Mériam Koob,Tobias Kober,Matthias Stuber,Meritxell Bach Cuadra 机构:Department of Radiology, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, CIBM Center for Biomedical Imaging, Switzerland, Advanced Clinical Imaging Technology (ACIT), Siemens Healthcare, Lausanne, Switzerland 备注:23 pages, 9 figures (including Supplementary Material), 4 tables, 1 supplement. Submitted to Scientific Reports (2021) 摘要:准确描述宫内人脑成熟是至关重要的,因为它涉及复杂和相互关联的结构和功能过程,可能会影响以后的健康。磁共振成像是研究胎儿发育过程中模棱两可的神经模式的有力工具。然而,在这组敏感受试者中,获得令人满意质量的采集数量仍然很少,因此阻碍了先进图像处理技术的验证。数字幻影可以通过提供具有已知地面真相的受控环境来缓解这些限制。在这项工作中,我们介绍了FaBiAN,一种开源的胎脑磁共振采集数字模型,模拟胎儿脑的临床T2加权快速自旋回波序列。这一独特的工具基于一个通用、灵活和真实的设置,包括随机的胎动,从而提供整个成熟期的胎儿大脑图像,可与临床采集相媲美。与合成的高分辨率参考体积相比,我们证明了它在评估模拟运动破坏的2D低分辨率序列的超分辨率胎儿脑磁共振成像算法的鲁棒性和优化精度方面的价值。我们还表明,生成的图像可以补充临床数据集,以支持胎儿脑组织分割的数据密集型深度学习方法。 摘要:Accurate characterization of in utero human brain maturation is critical as it involves complex and interconnected structural and functional processes that may influence health later in life. Magnetic resonance imaging is a powerful tool to investigate equivocal neurological patterns during fetal development. However, the number of acquisitions of satisfactory quality available in this cohort of sensitive subjects remains scarce, thus hindering the validation of advanced image processing techniques. Numerical phantoms can mitigate these limitations by providing a controlled environment with a known ground truth. In this work, we present FaBiAN, an open-source Fetal Brain magnetic resonance Acquisition Numerical phantom that simulates clinical T2-weighted fast spin echo sequences of the fetal brain. This unique tool is based on a general, flexible and realistic setup that includes stochastic fetal movements, thus providing images of the fetal brain throughout maturation comparable to clinical acquisitions. We demonstrate its value to evaluate the robustness and optimize the accuracy of an algorithm for super-resolution fetal brain magnetic resonance imaging from simulated motion-corrupted 2D low-resolution series as compared to a synthetic high-resolution reference volume. We also show that the images generated can complement clinical datasets to support data-intensive deep learning methods for fetal brain tissue segmentation.

【11】 Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes 标题:捕捉随机过程渗流的高阶核平均嵌入方法 链接:https://arxiv.org/abs/2109.03582

作者:Cristopher Salvi,Maud Lemercier,Chong Liu,Blanka Hovarth,Theodoros Damoulas,Terry Lyons 机构:Blanka Horvath 摘要:随机过程是在某些路径空间中具有值的随机变量。然而,将随机过程简化为路径值随机变量会忽略其过滤,即过程随时间传递的信息流。通过调节过滤过程,我们引入了一系列高阶核均值嵌入(KME),它概括了KME的概念,并捕获了与过滤相关的附加信息。我们推导了相关的高阶最大平均偏差(MMD)的经验估计,并证明了一致性。然后,我们构建了一个过滤敏感的内核双样本测试,能够提取标准MMD测试遗漏的信息。此外,利用我们的高阶MMD,我们在随机过程上构建了一系列通用核,允许通过经典的基于核的回归方法解决定量金融(如美式期权定价)中的实际校准和最优停止问题。最后,将现有的条件独立性测试应用于随机过程的情况,我们设计了一种因果发现算法,仅从多维轨迹的观测中恢复相互作用物体之间结构依赖的因果图。 摘要:Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captures additional information related to the filtration. We derive empirical estimators for the associated higher order maximum mean discrepancies (MMDs) and prove consistency. We then construct a filtration-sensitive kernel two-sample test able to pick up information that gets missed by the standard MMD test. In addition, leveraging our higher order MMDs we construct a family of universal kernels on stochastic processes that allows to solve real-world calibration and optimal stopping problems in quantitative finance (such as the pricing of American options) via classical kernel-based regression methods. Finally, adapting existing tests for conditional independence to the case of stochastic processes, we design a causal-discovery algorithm to recover the causal graph of structural dependencies among interacting bodies solely from observations of their multidimensional trajectories.

【12】 A Bottom-up method Towards the Automatic and Objective Monitoring of Smoking Behavior In-the-wild using Wrist-mounted Inertial Sensors 标题:一种利用腕式惯性传感器实现野外吸烟行为自动客观监测的自下而上方法 链接:https://arxiv.org/abs/2109.03475

作者:Athanasios Kirmizis,Konstantinos Kyritsis,Anastasios Delopoulos 机构: Aristotle University of Thessaloniki 备注:Manuscript accepted to be published in the 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2021. Acceptance date: 16-July-2021 摘要:烟草消费已达到全球流行的程度,被认为是死亡和疾病的主要原因。在不同的烟草消费方式(如无烟、雪茄)中,吸烟最为普遍。在本文中,我们提出了一种两步自底向上算法,利用商用smartwatch的3D加速度和定向速度测量值,实现对白天吸烟行为的自动和客观监测。在第一步中,我们的算法使用具有卷积层和递归层的人工神经网络来检测个人吸烟姿势(即烟团)。在第二步中,我们利用检测到的烟团密度来实现全天吸烟的时间定位。在实验部分,我们分别使用在半控制和自由生活条件下记录的公开、现实吸烟事件检测(SED)和自由生活吸烟事件检测(SED-FL)数据集,对所提出算法的每个步骤进行扩展评估。特别是,漏掉一名受试者(LOSO)实验显示,对于检测烟团,F1得分为0.863,对于白天吸烟的时间定位,F1得分/Jaccard指数等于0.878/0.604。最后,为了进一步了解,我们还将我们算法中的烟团检测部分与最近文献中发现的类似方法进行了比较。 摘要:The consumption of tobacco has reached global epidemic proportions and is characterized as the leading cause of death and illness. Among the different ways of consuming tobacco (e.g., smokeless, cigars), smoking cigarettes is the most widespread. In this paper, we present a two-step, bottom-up algorithm towards the automatic and objective monitoring of cigarette-based, smoking behavior during the day, using the 3D acceleration and orientation velocity measurements from a commercial smartwatch. In the first step, our algorithm performs the detection of individual smoking gestures (i.e., puffs) using an artificial neural network with both convolutional and recurrent layers. In the second step, we make use of the detected puff density to achieve the temporal localization of smoking sessions that occur throughout the day. In the experimental section we provide extended evaluation regarding each step of the proposed algorithm, using our publicly available, realistic Smoking Event Detection (SED) and Free-living Smoking Event Detection (SED-FL) datasets recorded under semi-controlled and free-living conditions, respectively. In particular, leave-one-subject-out (LOSO) experiments reveal an F1-score of 0.863 for the detection of puffs and an F1-score/Jaccard index equal to 0.878/0.604 towards the temporal localization of smoking sessions during the day. Finally, to gain further insight, we also compare the puff detection part of our algorithm with a similar approach found in the recent literature.

【13】 C-MinHash: Rigorously Reducing K Permutations to Two标题:C-MinHash:严格将K排列减少到两个链接:https://arxiv.org/abs/2109.03337

作者:Xiaoyun Li,Ping Li 机构:Cognitive Computing Lab, Baidu Research, NE ,th St. Bellevue, WA , USA 摘要:Minwise hashing(MinHash)是一种重要而实用的算法,用于生成随机散列以逼近海量二进制(0/1)数据中的Jaccard(相似性)相似性。MinHash的基本理论要求对数据集中的每个数据向量应用数百甚至数千个独立的随机排列,以便获得可靠的结果(例如)在海量数据中构建大规模学习模型或近似近邻搜索。在本文中,我们提出了{\bf循环MinHash(C-MinHash)},并给出了令人惊讶的理论结果,我们只需要两个独立的随机置换。对于C-MinHash,我们首先对数据向量进行初始置换,然后使用第二次置换来生成哈希值。基本上,第二种排列通过循环移位重复使用$K$次,以产生$K$散列。与经典的MinHash不同,这些$K$散列显然是相关的,但我们能够提供严格的证据,证明我们仍然能够获得Jaccard相似性的无偏估计,并且理论方差一致小于具有$K$独立排列的经典MinHash。C-MinHash的理论证明需要一些非平凡的工作。通过数值实验验证了该理论的正确性,并证明了C-MinHash算法的有效性。 摘要:Minwise hashing (MinHash) is an important and practical algorithm for generating random hashes to approximate the Jaccard (resemblance) similarity in massive binary (0/1) data. The basic theory of MinHash requires applying hundreds or even thousands of independent random permutations to each data vector in the dataset, in order to obtain reliable results for (e.g.,) building large-scale learning models or approximate near neighbor search in massive data. In this paper, we propose {\bf Circulant MinHash (C-MinHash)} and provide the surprising theoretical results that we just need \textbf{two} independent random permutations. For C-MinHash, we first conduct an initial permutation on the data vector, then we use a second permutation to generate hash values. Basically, the second permutation is re-used $K$ times via circulant shifting to produce $K$ hashes. Unlike classical MinHash, these $K$ hashes are obviously correlated, but we are able to provide rigorous proofs that we still obtain an unbiased estimate of the Jaccard similarity and the theoretical variance is uniformly smaller than that of the classical MinHash with $K$ independent permutations. The theoretical proofs of C-MinHash require some non-trivial efforts. Numerical experiments are conducted to justify the theory and demonstrate the effectiveness of C-MinHash.

机器翻译,仅供参考

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2021-09-09,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
批量计算
批量计算(BatchCompute,Batch)是为有大数据计算业务的企业、科研单位等提供高性价比且易用的计算服务。批量计算 Batch 可以根据用户提供的批处理规模,智能地管理作业和调动其所需的最佳资源。有了 Batch 的帮助,您可以将精力集中在如何分析和处理数据结果上。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档